Chapter 3 T t L T ranspor t Layer Computer Networking: A Top Do App oach Do wn Appr oach 6 th edition Jim Kurose, Keith Ross Addison Wesley All material copyright 1996-2012 J.F Kurose and K.W. Ross, All Rights Reserved Addison-Wesley March 2012 Transport Layer 3-1
105
Embed
Chapter 3 L TtT ransport Layer - cc.ntut.edu.thtwu/courses/Fall2013CN/edited... · Chapter 3 outline 31 3.1 transport-l layer 35 iid services 32 m tli l i d 3.5 connection-oriented
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Chapter 3T t LTransport Layer
Computer Networking A Top Do App oach Down Approach
6th edition Jim Kurose Keith Ross
Addison WesleyAll material copyright 1996-2012JF Kurose and KW Ross All Rights Reserved Addison-Wesley
March 2012
Transport Layer 3-1
Chapter 3 Transport LayerChapter 3 Transport Layerour goals our goals understand
principles behind learn about Internet
transport layer protocolsprinciples behind transport layer services
transport layer protocols UDP connectionless
transport multiplexing
demultiplexingli bl d f
TCP connection-oriented reliable transport TCP congestion control reliable data transfer
flow control congestion control
TCP congestion control
congestion control
Transport Layer 3-2
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-3
Transport services and protocolsTransport services and protocols provide logical communication
applicationtransportnetwork provide logical communication
between app processes running on different hosts
networkdata linkphysical
transport protocols run in end systems send side breaks app send side breaks app
messages into segments passes to network layer rcv side reassembles
segments into messages passes to app layer
applicationtransportnetworkdata link
h i lp pp y
more than one transport protocol available to apps
I TCP d UDP
physical
Transport Layer 3-4
Internet TCP and UDP
Transport vs network layerTransport vs network layer
network layer logical network layer logical communication between hosts 12 kids in Annrsquos house sending
l tt t 12 kid i Billrsquo
household analogy
between hosts transport layer
logical
letters to 12 kids in Bill s house
hosts = houses
gcommunication between processes relies on enhances
processes = kids app messages = letters in
envelopes relies on enhances
network layer services
p transport protocol = Ann
and Bill who demux to in-house siblings
network-layer protocol = postal service
Transport Layer 3-5
Internet transport-layer protocolsInternet transport layer protocols reliable in-order
applicationtransport
k reliable in order
delivery (TCP) congestion control
networkdata linkphysical
networkdata link
networkdata linkphysical
flow control connection setup
physicalnetworkdata linkphysical
network
unreliable unordered delivery UDP
f ill t i f
data linkphysical
networkdata linkphysical
network no-frills extension of ldquobest-effortrdquo IP
services not available
applicationtransportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
services not available delay guarantees bandwidth guarantees
p y
Transport Layer 3-6
g
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Chapter 3 Transport LayerChapter 3 Transport Layerour goals our goals understand
principles behind learn about Internet
transport layer protocolsprinciples behind transport layer services
transport layer protocols UDP connectionless
transport multiplexing
demultiplexingli bl d f
TCP connection-oriented reliable transport TCP congestion control reliable data transfer
flow control congestion control
TCP congestion control
congestion control
Transport Layer 3-2
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-3
Transport services and protocolsTransport services and protocols provide logical communication
applicationtransportnetwork provide logical communication
between app processes running on different hosts
networkdata linkphysical
transport protocols run in end systems send side breaks app send side breaks app
messages into segments passes to network layer rcv side reassembles
segments into messages passes to app layer
applicationtransportnetworkdata link
h i lp pp y
more than one transport protocol available to apps
I TCP d UDP
physical
Transport Layer 3-4
Internet TCP and UDP
Transport vs network layerTransport vs network layer
network layer logical network layer logical communication between hosts 12 kids in Annrsquos house sending
l tt t 12 kid i Billrsquo
household analogy
between hosts transport layer
logical
letters to 12 kids in Bill s house
hosts = houses
gcommunication between processes relies on enhances
processes = kids app messages = letters in
envelopes relies on enhances
network layer services
p transport protocol = Ann
and Bill who demux to in-house siblings
network-layer protocol = postal service
Transport Layer 3-5
Internet transport-layer protocolsInternet transport layer protocols reliable in-order
applicationtransport
k reliable in order
delivery (TCP) congestion control
networkdata linkphysical
networkdata link
networkdata linkphysical
flow control connection setup
physicalnetworkdata linkphysical
network
unreliable unordered delivery UDP
f ill t i f
data linkphysical
networkdata linkphysical
network no-frills extension of ldquobest-effortrdquo IP
services not available
applicationtransportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
services not available delay guarantees bandwidth guarantees
p y
Transport Layer 3-6
g
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-3
Transport services and protocolsTransport services and protocols provide logical communication
applicationtransportnetwork provide logical communication
between app processes running on different hosts
networkdata linkphysical
transport protocols run in end systems send side breaks app send side breaks app
messages into segments passes to network layer rcv side reassembles
segments into messages passes to app layer
applicationtransportnetworkdata link
h i lp pp y
more than one transport protocol available to apps
I TCP d UDP
physical
Transport Layer 3-4
Internet TCP and UDP
Transport vs network layerTransport vs network layer
network layer logical network layer logical communication between hosts 12 kids in Annrsquos house sending
l tt t 12 kid i Billrsquo
household analogy
between hosts transport layer
logical
letters to 12 kids in Bill s house
hosts = houses
gcommunication between processes relies on enhances
processes = kids app messages = letters in
envelopes relies on enhances
network layer services
p transport protocol = Ann
and Bill who demux to in-house siblings
network-layer protocol = postal service
Transport Layer 3-5
Internet transport-layer protocolsInternet transport layer protocols reliable in-order
applicationtransport
k reliable in order
delivery (TCP) congestion control
networkdata linkphysical
networkdata link
networkdata linkphysical
flow control connection setup
physicalnetworkdata linkphysical
network
unreliable unordered delivery UDP
f ill t i f
data linkphysical
networkdata linkphysical
network no-frills extension of ldquobest-effortrdquo IP
services not available
applicationtransportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
services not available delay guarantees bandwidth guarantees
p y
Transport Layer 3-6
g
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Transport services and protocolsTransport services and protocols provide logical communication
applicationtransportnetwork provide logical communication
between app processes running on different hosts
networkdata linkphysical
transport protocols run in end systems send side breaks app send side breaks app
messages into segments passes to network layer rcv side reassembles
segments into messages passes to app layer
applicationtransportnetworkdata link
h i lp pp y
more than one transport protocol available to apps
I TCP d UDP
physical
Transport Layer 3-4
Internet TCP and UDP
Transport vs network layerTransport vs network layer
network layer logical network layer logical communication between hosts 12 kids in Annrsquos house sending
l tt t 12 kid i Billrsquo
household analogy
between hosts transport layer
logical
letters to 12 kids in Bill s house
hosts = houses
gcommunication between processes relies on enhances
processes = kids app messages = letters in
envelopes relies on enhances
network layer services
p transport protocol = Ann
and Bill who demux to in-house siblings
network-layer protocol = postal service
Transport Layer 3-5
Internet transport-layer protocolsInternet transport layer protocols reliable in-order
applicationtransport
k reliable in order
delivery (TCP) congestion control
networkdata linkphysical
networkdata link
networkdata linkphysical
flow control connection setup
physicalnetworkdata linkphysical
network
unreliable unordered delivery UDP
f ill t i f
data linkphysical
networkdata linkphysical
network no-frills extension of ldquobest-effortrdquo IP
services not available
applicationtransportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
services not available delay guarantees bandwidth guarantees
p y
Transport Layer 3-6
g
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Transport vs network layerTransport vs network layer
network layer logical network layer logical communication between hosts 12 kids in Annrsquos house sending
l tt t 12 kid i Billrsquo
household analogy
between hosts transport layer
logical
letters to 12 kids in Bill s house
hosts = houses
gcommunication between processes relies on enhances
processes = kids app messages = letters in
envelopes relies on enhances
network layer services
p transport protocol = Ann
and Bill who demux to in-house siblings
network-layer protocol = postal service
Transport Layer 3-5
Internet transport-layer protocolsInternet transport layer protocols reliable in-order
applicationtransport
k reliable in order
delivery (TCP) congestion control
networkdata linkphysical
networkdata link
networkdata linkphysical
flow control connection setup
physicalnetworkdata linkphysical
network
unreliable unordered delivery UDP
f ill t i f
data linkphysical
networkdata linkphysical
network no-frills extension of ldquobest-effortrdquo IP
services not available
applicationtransportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
services not available delay guarantees bandwidth guarantees
p y
Transport Layer 3-6
g
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Internet transport-layer protocolsInternet transport layer protocols reliable in-order
applicationtransport
k reliable in order
delivery (TCP) congestion control
networkdata linkphysical
networkdata link
networkdata linkphysical
flow control connection setup
physicalnetworkdata linkphysical
network
unreliable unordered delivery UDP
f ill t i f
data linkphysical
networkdata linkphysical
network no-frills extension of ldquobest-effortrdquo IP
services not available
applicationtransportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
services not available delay guarantees bandwidth guarantees
p y
Transport Layer 3-6
g
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
How demultiplexing worksHow demultiplexing works
h i IP d host receives IP datagrams each datagram has source IP
address destination IP source port dest port
32 bits
address destination IP address each datagram carries one
l
other header fields
transport-layer segment each segment has source
destination port number application
datadestination port number host uses IP addresses amp
port numbers to direct
data (payload)
psegment to appropriate socket TCPUDP segment format
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Connection-oriented demux examplep
application
P3
application
P4 application
P2P6P5
P3
transport
li k
network
transport
link
transport
li k
networknetwork
physical
link physicalphysical
link
server IP address B
source IPport B80dest IPport A9157
host IP address A
host IP address Csource IPport C5775
d t IP t B 80
source IPport A9157dest IP port B80
dest IPport B80
source IPport C9157dest IP port B 80
Transport Layer 3-13
dest IPport B80
three segments all destined to IP address Bdest port 80 are demultiplexed to different sockets
Connection-oriented demux examplepthreaded server
application
P3
applicationapplication
P2 P3P4
transport
li k
network
transport
link
transport
li k
networknetwork
physical
link physicalphysical
link
server IP address B
source IPport B80dest IPport A9157
host IP address A
host IP address Csource IPport C5775
d t IP t B 80
source IPport A9157dest IP port B80
dest IPport B80
source IPport C9157dest IP port B 80
Transport Layer 3-14
dest IPport B80
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-15
UDP User Datagram Protocol [RFC 768]g [ ]
ldquono frillsrdquo ldquobare bonesrdquoI
UDP useInternet transport protocol
ldquobest effortrdquo service
streaming multimedia apps (loss tolerant rate sensitive) best effort service
UDP segments may be lost
sensitive) DNS SNMP
delivered out-of-order to app
connectionless
reliable transfer over UDP
connectionless no handshaking
between UDP sender
add reliability at application layer application specific error receiver
each UDP segment handled independently
application-specific error recovery
Transport Layer 3-16
handled independently of others
UDP segment headerUDP segment header
32 bitslength in bytes of
UDP t
source port dest port
32 bits
length checksum
UDP segment including header
li ti
length checksum
no connection why is there a UDP
applicationdata
(payload)
establishment (which can add delay)
simple no connection simple no connection state at sender receiver
small header size
UDP segment format no congestion control UDP can blast away as fast as desired
Transport Layer 3-17
fast as desired
UDP checksumUDP checksum
Goal detect ldquoerrorsrdquo (e g flipped bits) in transmitted Goal detect errors (eg flipped bits) in transmitted segment
sender treat segment contents
including header fields
receiver compute checksum of
i d tincluding header fields as sequence of 16-bit integersh k dd
received segment check if computed
checksum equals checksum checksum addition
(onersquos complement sum) of segment
qfield value NO - error detected
contents sender puts checksum
value into UDP
YES - no error detected But maybe errors nonetheless More later
Transport Layer 3-18
value into UDP checksum field hellip
Internet checksum examplep
example add two 16 bit integersexample add two 16-bit integers
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Connection-oriented demux examplepthreaded server
application
P3
applicationapplication
P2 P3P4
transport
li k
network
transport
link
transport
li k
networknetwork
physical
link physicalphysical
link
server IP address B
source IPport B80dest IPport A9157
host IP address A
host IP address Csource IPport C5775
d t IP t B 80
source IPport A9157dest IP port B80
dest IPport B80
source IPport C9157dest IP port B 80
Transport Layer 3-14
dest IPport B80
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-15
UDP User Datagram Protocol [RFC 768]g [ ]
ldquono frillsrdquo ldquobare bonesrdquoI
UDP useInternet transport protocol
ldquobest effortrdquo service
streaming multimedia apps (loss tolerant rate sensitive) best effort service
UDP segments may be lost
sensitive) DNS SNMP
delivered out-of-order to app
connectionless
reliable transfer over UDP
connectionless no handshaking
between UDP sender
add reliability at application layer application specific error receiver
each UDP segment handled independently
application-specific error recovery
Transport Layer 3-16
handled independently of others
UDP segment headerUDP segment header
32 bitslength in bytes of
UDP t
source port dest port
32 bits
length checksum
UDP segment including header
li ti
length checksum
no connection why is there a UDP
applicationdata
(payload)
establishment (which can add delay)
simple no connection simple no connection state at sender receiver
small header size
UDP segment format no congestion control UDP can blast away as fast as desired
Transport Layer 3-17
fast as desired
UDP checksumUDP checksum
Goal detect ldquoerrorsrdquo (e g flipped bits) in transmitted Goal detect errors (eg flipped bits) in transmitted segment
sender treat segment contents
including header fields
receiver compute checksum of
i d tincluding header fields as sequence of 16-bit integersh k dd
received segment check if computed
checksum equals checksum checksum addition
(onersquos complement sum) of segment
qfield value NO - error detected
contents sender puts checksum
value into UDP
YES - no error detected But maybe errors nonetheless More later
Transport Layer 3-18
value into UDP checksum field hellip
Internet checksum examplep
example add two 16 bit integersexample add two 16-bit integers
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-15
UDP User Datagram Protocol [RFC 768]g [ ]
ldquono frillsrdquo ldquobare bonesrdquoI
UDP useInternet transport protocol
ldquobest effortrdquo service
streaming multimedia apps (loss tolerant rate sensitive) best effort service
UDP segments may be lost
sensitive) DNS SNMP
delivered out-of-order to app
connectionless
reliable transfer over UDP
connectionless no handshaking
between UDP sender
add reliability at application layer application specific error receiver
each UDP segment handled independently
application-specific error recovery
Transport Layer 3-16
handled independently of others
UDP segment headerUDP segment header
32 bitslength in bytes of
UDP t
source port dest port
32 bits
length checksum
UDP segment including header
li ti
length checksum
no connection why is there a UDP
applicationdata
(payload)
establishment (which can add delay)
simple no connection simple no connection state at sender receiver
small header size
UDP segment format no congestion control UDP can blast away as fast as desired
Transport Layer 3-17
fast as desired
UDP checksumUDP checksum
Goal detect ldquoerrorsrdquo (e g flipped bits) in transmitted Goal detect errors (eg flipped bits) in transmitted segment
sender treat segment contents
including header fields
receiver compute checksum of
i d tincluding header fields as sequence of 16-bit integersh k dd
received segment check if computed
checksum equals checksum checksum addition
(onersquos complement sum) of segment
qfield value NO - error detected
contents sender puts checksum
value into UDP
YES - no error detected But maybe errors nonetheless More later
Transport Layer 3-18
value into UDP checksum field hellip
Internet checksum examplep
example add two 16 bit integersexample add two 16-bit integers
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
UDP User Datagram Protocol [RFC 768]g [ ]
ldquono frillsrdquo ldquobare bonesrdquoI
UDP useInternet transport protocol
ldquobest effortrdquo service
streaming multimedia apps (loss tolerant rate sensitive) best effort service
UDP segments may be lost
sensitive) DNS SNMP
delivered out-of-order to app
connectionless
reliable transfer over UDP
connectionless no handshaking
between UDP sender
add reliability at application layer application specific error receiver
each UDP segment handled independently
application-specific error recovery
Transport Layer 3-16
handled independently of others
UDP segment headerUDP segment header
32 bitslength in bytes of
UDP t
source port dest port
32 bits
length checksum
UDP segment including header
li ti
length checksum
no connection why is there a UDP
applicationdata
(payload)
establishment (which can add delay)
simple no connection simple no connection state at sender receiver
small header size
UDP segment format no congestion control UDP can blast away as fast as desired
Transport Layer 3-17
fast as desired
UDP checksumUDP checksum
Goal detect ldquoerrorsrdquo (e g flipped bits) in transmitted Goal detect errors (eg flipped bits) in transmitted segment
sender treat segment contents
including header fields
receiver compute checksum of
i d tincluding header fields as sequence of 16-bit integersh k dd
received segment check if computed
checksum equals checksum checksum addition
(onersquos complement sum) of segment
qfield value NO - error detected
contents sender puts checksum
value into UDP
YES - no error detected But maybe errors nonetheless More later
Transport Layer 3-18
value into UDP checksum field hellip
Internet checksum examplep
example add two 16 bit integersexample add two 16-bit integers
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
UDP segment headerUDP segment header
32 bitslength in bytes of
UDP t
source port dest port
32 bits
length checksum
UDP segment including header
li ti
length checksum
no connection why is there a UDP
applicationdata
(payload)
establishment (which can add delay)
simple no connection simple no connection state at sender receiver
small header size
UDP segment format no congestion control UDP can blast away as fast as desired
Transport Layer 3-17
fast as desired
UDP checksumUDP checksum
Goal detect ldquoerrorsrdquo (e g flipped bits) in transmitted Goal detect errors (eg flipped bits) in transmitted segment
sender treat segment contents
including header fields
receiver compute checksum of
i d tincluding header fields as sequence of 16-bit integersh k dd
received segment check if computed
checksum equals checksum checksum addition
(onersquos complement sum) of segment
qfield value NO - error detected
contents sender puts checksum
value into UDP
YES - no error detected But maybe errors nonetheless More later
Transport Layer 3-18
value into UDP checksum field hellip
Internet checksum examplep
example add two 16 bit integersexample add two 16-bit integers
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
UDP checksumUDP checksum
Goal detect ldquoerrorsrdquo (e g flipped bits) in transmitted Goal detect errors (eg flipped bits) in transmitted segment
sender treat segment contents
including header fields
receiver compute checksum of
i d tincluding header fields as sequence of 16-bit integersh k dd
received segment check if computed
checksum equals checksum checksum addition
(onersquos complement sum) of segment
qfield value NO - error detected
contents sender puts checksum
value into UDP
YES - no error detected But maybe errors nonetheless More later
Transport Layer 3-18
value into UDP checksum field hellip
Internet checksum examplep
example add two 16 bit integersexample add two 16-bit integers
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Internet checksum examplep
example add two 16 bit integersexample add two 16-bit integers
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-20
Principles of reliable data transferPrinciples of reliable data transfer important in application transport link layers
10 li f i ki i top-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-21
complexity of reliable data transfer protocol (rdt)
Principles of reliable data transferPrinciples of reliable data transfer important in application transport link layers
10 li f i ki i top-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-22
complexity of reliable data transfer protocol (rdt)
Principles of reliable data transfer important in application transport link layers
10 li f i ki i
Principles of reliable data transfer
top-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-23
complexity of reliable data transfer protocol (rdt)
Reliable data transfer getting startedg g
rdt send() called from above deliver data() called byrdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
deliver_data() called by rdt to deliver data to upper
send receivesendside
receiveside
udt_send() called by rdtto transfer packet over li bl h l t i
rdt_rcv() called when packet arrives on rcv-side of channel
Transport Layer 3-24
unreliable channel to receiver
Reliable data transfer getting started
wersquoll
g g
incrementally develop sender receiver sides of reliable data transfer protocol (rdt)
id l idi i l d f consider only unidirectional data transfer but control info will flow on both directions
fi it t t hi (FSM) t if d use finite state machines (FSM) to specify sender receiver
event causing state transition
state state
gactions taken on state transition
state when in this ldquostaterdquo next state
1state
2state next state
uniquely determined by next event
eventactions
Transport Layer 3-25
rdt10 reliable transfer over a reliable channelt 0 e ab e t a s e ove a e ab e c a e
underlying channel perfectly reliabley g p y no bit errors no loss of packets
separate FSMs for sender receiver sender sends data into underlying channel receiver reads data from underlying channel
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Principles of reliable data transferPrinciples of reliable data transfer important in application transport link layers
10 li f i ki i top-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-21
complexity of reliable data transfer protocol (rdt)
Principles of reliable data transferPrinciples of reliable data transfer important in application transport link layers
10 li f i ki i top-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-22
complexity of reliable data transfer protocol (rdt)
Principles of reliable data transfer important in application transport link layers
10 li f i ki i
Principles of reliable data transfer
top-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-23
complexity of reliable data transfer protocol (rdt)
Reliable data transfer getting startedg g
rdt send() called from above deliver data() called byrdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
deliver_data() called by rdt to deliver data to upper
send receivesendside
receiveside
udt_send() called by rdtto transfer packet over li bl h l t i
rdt_rcv() called when packet arrives on rcv-side of channel
Transport Layer 3-24
unreliable channel to receiver
Reliable data transfer getting started
wersquoll
g g
incrementally develop sender receiver sides of reliable data transfer protocol (rdt)
id l idi i l d f consider only unidirectional data transfer but control info will flow on both directions
fi it t t hi (FSM) t if d use finite state machines (FSM) to specify sender receiver
event causing state transition
state state
gactions taken on state transition
state when in this ldquostaterdquo next state
1state
2state next state
uniquely determined by next event
eventactions
Transport Layer 3-25
rdt10 reliable transfer over a reliable channelt 0 e ab e t a s e ove a e ab e c a e
underlying channel perfectly reliabley g p y no bit errors no loss of packets
separate FSMs for sender receiver sender sends data into underlying channel receiver reads data from underlying channel
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Principles of reliable data transferPrinciples of reliable data transfer important in application transport link layers
10 li f i ki i top-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-22
complexity of reliable data transfer protocol (rdt)
Principles of reliable data transfer important in application transport link layers
10 li f i ki i
Principles of reliable data transfer
top-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-23
complexity of reliable data transfer protocol (rdt)
Reliable data transfer getting startedg g
rdt send() called from above deliver data() called byrdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
deliver_data() called by rdt to deliver data to upper
send receivesendside
receiveside
udt_send() called by rdtto transfer packet over li bl h l t i
rdt_rcv() called when packet arrives on rcv-side of channel
Transport Layer 3-24
unreliable channel to receiver
Reliable data transfer getting started
wersquoll
g g
incrementally develop sender receiver sides of reliable data transfer protocol (rdt)
id l idi i l d f consider only unidirectional data transfer but control info will flow on both directions
fi it t t hi (FSM) t if d use finite state machines (FSM) to specify sender receiver
event causing state transition
state state
gactions taken on state transition
state when in this ldquostaterdquo next state
1state
2state next state
uniquely determined by next event
eventactions
Transport Layer 3-25
rdt10 reliable transfer over a reliable channelt 0 e ab e t a s e ove a e ab e c a e
underlying channel perfectly reliabley g p y no bit errors no loss of packets
separate FSMs for sender receiver sender sends data into underlying channel receiver reads data from underlying channel
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Principles of reliable data transfer important in application transport link layers
10 li f i ki i
Principles of reliable data transfer
top-10 list of important networking topics
characteristics of unreliable channel will determine
Transport Layer 3-23
complexity of reliable data transfer protocol (rdt)
Reliable data transfer getting startedg g
rdt send() called from above deliver data() called byrdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
deliver_data() called by rdt to deliver data to upper
send receivesendside
receiveside
udt_send() called by rdtto transfer packet over li bl h l t i
rdt_rcv() called when packet arrives on rcv-side of channel
Transport Layer 3-24
unreliable channel to receiver
Reliable data transfer getting started
wersquoll
g g
incrementally develop sender receiver sides of reliable data transfer protocol (rdt)
id l idi i l d f consider only unidirectional data transfer but control info will flow on both directions
fi it t t hi (FSM) t if d use finite state machines (FSM) to specify sender receiver
event causing state transition
state state
gactions taken on state transition
state when in this ldquostaterdquo next state
1state
2state next state
uniquely determined by next event
eventactions
Transport Layer 3-25
rdt10 reliable transfer over a reliable channelt 0 e ab e t a s e ove a e ab e c a e
underlying channel perfectly reliabley g p y no bit errors no loss of packets
separate FSMs for sender receiver sender sends data into underlying channel receiver reads data from underlying channel
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Reliable data transfer getting startedg g
rdt send() called from above deliver data() called byrdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer
deliver_data() called by rdt to deliver data to upper
send receivesendside
receiveside
udt_send() called by rdtto transfer packet over li bl h l t i
rdt_rcv() called when packet arrives on rcv-side of channel
Transport Layer 3-24
unreliable channel to receiver
Reliable data transfer getting started
wersquoll
g g
incrementally develop sender receiver sides of reliable data transfer protocol (rdt)
id l idi i l d f consider only unidirectional data transfer but control info will flow on both directions
fi it t t hi (FSM) t if d use finite state machines (FSM) to specify sender receiver
event causing state transition
state state
gactions taken on state transition
state when in this ldquostaterdquo next state
1state
2state next state
uniquely determined by next event
eventactions
Transport Layer 3-25
rdt10 reliable transfer over a reliable channelt 0 e ab e t a s e ove a e ab e c a e
underlying channel perfectly reliabley g p y no bit errors no loss of packets
separate FSMs for sender receiver sender sends data into underlying channel receiver reads data from underlying channel
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Reliable data transfer getting started
wersquoll
g g
incrementally develop sender receiver sides of reliable data transfer protocol (rdt)
id l idi i l d f consider only unidirectional data transfer but control info will flow on both directions
fi it t t hi (FSM) t if d use finite state machines (FSM) to specify sender receiver
event causing state transition
state state
gactions taken on state transition
state when in this ldquostaterdquo next state
1state
2state next state
uniquely determined by next event
eventactions
Transport Layer 3-25
rdt10 reliable transfer over a reliable channelt 0 e ab e t a s e ove a e ab e c a e
underlying channel perfectly reliabley g p y no bit errors no loss of packets
separate FSMs for sender receiver sender sends data into underlying channel receiver reads data from underlying channel
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
rdt10 reliable transfer over a reliable channelt 0 e ab e t a s e ove a e ab e c a e
underlying channel perfectly reliabley g p y no bit errors no loss of packets
separate FSMs for sender receiver sender sends data into underlying channel receiver reads data from underlying channel
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors
k l d (ACK ) i li i l ll d acknowledgements (ACKs) receiver explicitly tells sender that pkt received OK negative acknowledgements (NAKs) receiver explicitly tells g g ( ) p y
sender that pkt had errors sender retransmits pkt on receipt of NAK
new mechanisms in rdt2 0 (beyond rdt1 0)How do humans recover from ldquoerrorsrdquo
new mechanisms in rdt20 (beyond rdt10) error detection receiver feedback control msgs (ACKNAK) rcvr-
during conversationg ( )
gtsender
Transport Layer 3-27
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
rdt20 channel with bit errors underlying channel may flip bits in packet
rdt20 channel with bit errorsy g y p p
checksum to detect bit errors the question how to recover from errors acknowledgements (ACKs) receiver explicitly tells sender
that pkt received OKti k l d t (NAK ) i li itl t ll negative acknowledgements (NAKs) receiver explicitly tells
sender that pkt had errors sender retransmits pkt on receipt of NAKp p
new mechanisms in rdt20 (beyond rdt10) error detection feedback control msgs (ACKNAK) from receiver to
sender
Transport Layer 3-28
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
rdt20 FSM specificationp
sndpkt = make pkt(data checksum) receiverrdt_send(data)
Wait for
sndpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for
receiver
call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
rdt20 operation with no errorsp
snkpkt = make pkt(data checksum)rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
rdt20 error scenario
snkpkt = make pkt(data checksum)rdt_send(data)
Wait for
snkpkt make_pkt(data checksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) ampampisNAK(rcvpkt)
rdt rcv(rcvpkt) ampampWait for call from above
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)ACK or
NAK
rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)Wait for call from
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
rdt20 has a fatal flawrdt20 has a fatal flaw
h t h if h dli d li what happens if ACKNAK corrupted
sender doesnrsquot know
handling duplicates sender retransmits
current pkt if ACKNAK sender doesn t know what happened at receiver
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
rdt21 sender handles garbled ACKNAKs g
sndpkt make pkt(0 data checks m)rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)
Wait for
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )Wait for
call 0 from above
ACK or NAK 0 udt_send(sndpkt)
isNAK(rcvpkt) )
rdt rcv(rcvpkt) rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) _ ( p )
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)
dt d(d t )
rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||
Wait forcall 1 from
above
Wait for ACK or NAK 1
sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
( p ( p ) ||isNAK(rcvpkt) )
Transport Layer 3-33
rdt21 receiver handles garbled ACKNAKs
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
rdt21 receiver handles garbled ACKNAKs
rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has seq0(rcvpkt)
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
rdt2 1 discussionrdt21 discussion
d isender seq added to pkt
rsquo
receiver must check if received
k t i d li t two seq rsquos (01) will suffice Why
h k if i d
packet is duplicate state indicates whether
0 or 1 is expected pkt must check if received
ACKNAK corrupted twice as many states
0 or 1 is expected pkt seq
note receiver can not twice as many states state must ldquorememberrdquo whether
know if its last ACKNAK received OK at senderremember whether
ldquoexpectedrdquo pkt should have seq of 0 or 1
OK at sender
Transport Layer 3-35
rdt22 a NAK-free protocolrdt22 a NAK free protocol
f ti lit dt2 1 i ACK l same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt
received OKreceived OK receiver must explicitly include seq of pkt being ACKed
duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt
Transport Layer 3-36
rdt22 sender receiver fragmentsg
sndpkt = make pkt(0 data checksum)rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
rdt22 a NAK-free protocolrdt22 a NAK free protocol
f ti lit dt2 1 i ACK l same functionality as rdt21 using ACKs only instead of NAK receiver sends ACK for last pkt
received OKreceived OK receiver must explicitly include seq of pkt being ACKed
duplicate ACK at sender results in same action as duplicate ACK at sender results in same action as NAK retransmit current pkt
Transport Layer 3-36
rdt22 sender receiver fragmentsg
sndpkt = make pkt(0 data checksum)rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
rdt22 sender receiver fragmentsg
sndpkt = make pkt(0 data checksum)rdt_send(data)
Wait for
sndpkt = make_pkt(0 data checksum)udt_send(sndpkt) rdt_rcv(rcvpkt) ampamp
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
rdt30 channels with errors and lossrdt30 channels with errors and loss
i h d i new assumptionunderlying channel can also lose packets
approach sender waits ldquoreasonablerdquo amount of time for ACK also lose packets
(data ACKs) checksum seq
time for ACK retransmits if no ACK
received in this timec c su s q ACKs retransmissions will be of help hellip but not enough
if pkt (or ACK) just delayed (not lost) retransmission will be not enough retransmission will be
duplicate but seq rsquos already handles this i t if receiver must specify seq
of pkt being ACKed requires countdown timer
Transport Layer 3-38
q
rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
rdt30 senderrdt30 sendersndpkt = make_pkt(0 data checksum)
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
rdt30 in action
sender receiver sender receiver
send ack0
send pkt0rcv pkt0
pkt0
ack0 send ack0
send pkt0rcv pkt0
pkt0
ack0
rcv pkt1send ack1
rcv ack0send pkt1 pkt1
ack1
ack0rcv ack0
send pkt1
ack0
pkt1X
loss
rcv pkt0
send ack1
send ack0
send pkt0rcv ack1
pkt0
ack0 pkt1timeout
resend pkt1rcv pkt1send ack1
send pkt0rcv ack1
pkt0
ack1
pp
(a) no loss rcv pkt0send ack0
send pkt0
ack0
Transport Layer 3-40
(b) packet loss
rdt30 in actionsender receiver
send pkt0 k 0
sender receiversend pkt0
rcv pkt0pkt0
send ack0rcv ack0
send pkt0rcv pkt0
pkt0
ack0rcv pkt1
pkt1
send ack0rcv ack0
send pkt1
rcv pkt0ack0
rcv pkt1send ack1
pkt1send pkt1ack1
Xloss
rcv pkt1send ack1
timeout
ack1
(detect duplicate)rcv pkt1
loss
pkt1timeout
resend pkt1 (detect duplicate)rcv pkt1pkt1
timeoutresend pkt1
send ack1ack1
send pkt0rcv ack1 pkt0
(detect duplicate)
rcv pkt0
send ack1
send pkt0rcv ack1
pkt0
ack1send pkt0rcv ack1
pkt0
ack1
ack0
rcv pkt0send ack0ack0
rcv pkt0(detect duplicate)p
send ack0ack0
(c) ACK loss (d) premature timeout delayed ACK
ack0send ack0(detect duplicate)
Transport Layer 3-41
(c) ACK loss (d) premature timeout delayed ACK
Performance of rdt3 0Performance of rdt30
rdt3 0 is correct but performance stinks rdt30 is correct but performance stinks eg 1 Gbps link 15 ms prop delay 8000 bit packet
L 8000 bitDtrans = LR
8000 bits109 bitssec= = 8 microsecs
U sender utilization ndash fraction of time sender busy sending
U 008L RU sender =
00830008
= 000027L R
RTT + L R =
if RTT=30 msec 1KB pkt every 30 msec 33kBsec thruput if RTT=30 msec 1KB pkt every 30 msec 33kBsec thruput over 1 Gbps link
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
rdt30 in actionsender receiver
send pkt0 k 0
sender receiversend pkt0
rcv pkt0pkt0
send ack0rcv ack0
send pkt0rcv pkt0
pkt0
ack0rcv pkt1
pkt1
send ack0rcv ack0
send pkt1
rcv pkt0ack0
rcv pkt1send ack1
pkt1send pkt1ack1
Xloss
rcv pkt1send ack1
timeout
ack1
(detect duplicate)rcv pkt1
loss
pkt1timeout
resend pkt1 (detect duplicate)rcv pkt1pkt1
timeoutresend pkt1
send ack1ack1
send pkt0rcv ack1 pkt0
(detect duplicate)
rcv pkt0
send ack1
send pkt0rcv ack1
pkt0
ack1send pkt0rcv ack1
pkt0
ack1
ack0
rcv pkt0send ack0ack0
rcv pkt0(detect duplicate)p
send ack0ack0
(c) ACK loss (d) premature timeout delayed ACK
ack0send ack0(detect duplicate)
Transport Layer 3-41
(c) ACK loss (d) premature timeout delayed ACK
Performance of rdt3 0Performance of rdt30
rdt3 0 is correct but performance stinks rdt30 is correct but performance stinks eg 1 Gbps link 15 ms prop delay 8000 bit packet
L 8000 bitDtrans = LR
8000 bits109 bitssec= = 8 microsecs
U sender utilization ndash fraction of time sender busy sending
U 008L RU sender =
00830008
= 000027L R
RTT + L R =
if RTT=30 msec 1KB pkt every 30 msec 33kBsec thruput if RTT=30 msec 1KB pkt every 30 msec 33kBsec thruput over 1 Gbps link
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Performance of rdt3 0Performance of rdt30
rdt3 0 is correct but performance stinks rdt30 is correct but performance stinks eg 1 Gbps link 15 ms prop delay 8000 bit packet
L 8000 bitDtrans = LR
8000 bits109 bitssec= = 8 microsecs
U sender utilization ndash fraction of time sender busy sending
U 008L RU sender =
00830008
= 000027L R
RTT + L R =
if RTT=30 msec 1KB pkt every 30 msec 33kBsec thruput if RTT=30 msec 1KB pkt every 30 msec 33kBsec thruput over 1 Gbps link
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Pipelined protocolsp p
pipelining sender allows multiple ldquoin-flightrdquo yet-p p g p g yto-be-acknowledged pkts range of sequence numbers must be increased
b ff d d buffering at sender andor receiver
two generic forms of pipelined protocols go-Back-N selective repeat
Transport Layer 3-44
selective repeat
Pipelining increased utilizationp g
first packet bit transmitted t = 0
sender receiver
first packet bit transmitted t = 0last bit transmitted t = L R
RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 3rd packet arrives send ACK
3 k t i li i i3-packet pipelining increasesutilization by a factor of 3
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Pipelining increased utilizationp g
first packet bit transmitted t = 0
sender receiver
first packet bit transmitted t = 0last bit transmitted t = L R
RTT first packet bit arriveslast packet bit arrives send ACKlast bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK
ACK arrives send next packet t = RTT + L R
last bit of 3rd packet arrives send ACK
3 k t i li i i3-packet pipelining increasesutilization by a factor of 3
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
Go-back-N Selective RepeatGo back N sender can have up to
N unacked packets in i li
Selective Repeat sender can have up to N
unackrsquoed packets in i lipipeline
receiver only sends cumulative ack
pipeline rcvr sends individual ack
for each packetcumulative ack doesnrsquot ack packet if
therersquos a gap
for each packet
sender has timer for oldest unacked packet
h ti i
sender maintains timer for each unacked packet when timer expires when timer expires
retransmit all unacked packets
when timer expires retransmit only that unacked packet
Transport Layer 3-46
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Go-Back-N sender k-bit seq in pkt header ldquowindowrdquo of up to N consecutive unackrsquoed pkts allowed
ACK( ) ACK ll kt t i l di ldquo l ti ACK(n) ACKs all pkts up to including seq n - cumulative ACKrdquo may receive duplicate ACKs (see receiver)y p ( )
timer for oldest in-flight pkt timeout(n) retransmit packet n and all higher seq pkts in
d
Transport Layer 3-47
window
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Go-Back-N receiver
ACK-only always send ACK for correctly-received pkt with highest in-order seq pkt with highest in order seq may generate duplicate ACKs need only remember expectedseqnumy p q
out-of-order pkt discard (donrsquot buffer) no receiver buffering( ) g re-ACK pkt with highest in-order seq
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Selective repeatSelective repeat
receiver individually acknowledges all correctly receiver individually acknowledges all correctly received pkts buffers pkts as needed for eventual in-order delivery p y
to upper layer sender only resends pkts for which ACK not
dreceived sender timer for each unACKed pkt
d i d sender window N consecutive seq rsquos limits seq s of sent unACKed pkts limits seq s of sent unACKed pkts
Transport Layer 3-50
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Selective repeat sender receiver windowsp
Transport Layer 3-51
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Selective repeatSelective repeat
d f bsender
kt i receiver
data from above if next available seq in
window send pkt
pkt n in [rcvbase rcvbase+N-1]
send ACK(n) out of order bufferwindow send pkt
timeout(n) resend pkt n restart
out-of-order buffer in-order deliver (also
deliver buffered in-order resend pkt n restart timer
ACK(n) in [sendbasesendbase+N]
pkts) advance window to next not-yet-received pkt
pkt n in [ b N b 1] mark pkt n as received if n smallest unACKed
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
f ll d le data point to point full duplex data bi-directional data flow
in same connection
point-to-point one sender one receiver
reliable in order byte in same connection MSS maximum segment
size
reliable in-order byte steam no ldquomessage
connection-oriented handshaking (exchange
of control msgs) inits
no message boundariesrdquo
pipelinedof control msgs) inits sender receiver state before data exchange
TCP congestion and flow control set window size
flow controlled sender will not
h l i
size
Transport Layer 3-56
overwhelm receiver
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP segment structureg
t d t t
32 bitsURG urgent data countingsource port dest port
sequence numberacknowledgement number
g(generally not used)
ACK ACK lid
countingby bytes of data(not segments)acknowledgement number
receive window
Urg data pointerchecksumFSRPAUhead
lennot
used
valid
PSH push data now(generally not used) bytes
illi
(not segments)
Urg data pointer
options (variable length)RST SYN FINconnection estab( t t d
rcvr willingto accept
applicationdata
(setup teardowncommands)
Internet data (variable length)
Internetchecksum
(as in UDP)
Transport Layer 3-57
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP seq numbers ACKsq sequence numbers source port dest port
sequence number
outgoing segment from sender
byte stream ldquonumberrdquo of first byte in segmentrsquos data
sequence numberacknowledgement number
checksum
rwndurg pointer
dataacknowledgementsseq of next byte
window sizeN
q yexpected from other sidecumulative ACK
Q h i h dl sent sent not- usable not
sender sequence number space
Q how receiver handles out-of-order segmentsA TCP spec doesnrsquot say incoming segment to sender
sent ACKed
sent not-yet ACKed(ldquoin-flightrdquo)
usablebut not yet sent
not usable
A TCP spec doesn t say - up to implementor source port dest port
sequence numberacknowledgement number
rwnd
incoming segment to sender
A
Transport Layer 3-58
checksum
rwndurg pointer
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP seq numbers ACKsq
Host BHost A
UsertypeslsquoCrsquo
host ACKsi t f
Seq=42 ACK=79 data = lsquoCrsquo
host ACKsreceipt
receipt oflsquoCrsquo echoesback lsquoCrsquoSeq=79 ACK=43 data = lsquoCrsquo
receipt of echoed
lsquoCrsquo Seq=43 ACK=80
simple telnet scenario
Transport Layer 3-59
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP round trip time timeoutTCP round trip time timeout
Q how to set TCP Q how to estimate RTTQ how to set TCP timeout value
longer than RTT
Q how to estimate RTT SampleRTT measured
time from segment longer than RTT but RTT varies
too short premature
gtransmission until ACK receipt ignore retransmissions too short premature
timeout unnecessary retransmissions
ignore retransmissions SampleRTT will vary want
estimated RTT ldquosmootherrdquoretransmissions too long slow reaction
to segment loss
average several recentmeasurements not just current SampleRTTto segment loss current SampleRTT
Transport Layer 3-60
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP round trip time timeoutEstimatedRTT = (1- )EstimatedRTT + SampleRTT
TCP round trip time timeout
exponential weighted moving average influence of past sample decreases exponentially fast
RTT gaiacsumassedu to fantasiaeurecomfr
350
typical value = 0125
RTT gaiacsumassedu to fantasiaeurecomfr
300
econ
ds)
200
250
RTT
(mill
iseco
nds)
RTT
(mill
ise
150
R
sampleRTTEstimatedRTT
Transport Layer 3-61
1001 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
SampleRTT Estimated RTTtime (seconds)
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP round trip time timeout
i i l ldquo rdquo
TCP round trip time timeout
timeout interval EstimatedRTT plus ldquosafety marginrdquo large variation in EstimatedRTT -gt larger safety margin
i S l RTT d i i f E i dRTT estimate SampleRTT deviation from EstimatedRTT DevRTT = (1-)DevRTT +
|SampleRTT-EstimatedRTT| |SampleRTT EstimatedRTT|
(typically = 025)
TimeoutInterval = EstimatedRTT + 4DevRTT
estimated RTT ldquosafety marginrdquo
Transport Layer 3-62
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-63
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP reliable data transferTCP reliable data transfer
TCP creates rdt service TCP creates rdt service on top of IPrsquos unreliable service pipelined segments cumulative acks letrsquos initially consider
i lifi d TCP d single retransmission timert i i
simplified TCP sender ignore duplicate acks ignore flow control retransmissions
triggered by timeout events
ignore flow control congestion control
timeout events duplicate acks
Transport Layer 3-64
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP sender eventsdata rcvd from app
i h timeout
i create segment with seq
i b t t
retransmit segment that caused timeout
t t ti seq is byte-stream number of first data byte in segment
restart timerack rcvd
if k k l d byte in segment start timer if not
already running
if ack acknowledges previously unacked segmentsa ea y u g
think of timer as for oldest unacked
t
segments update what is known
to be ACKedsegment expiration interval TimeOutInterval
start timer if there are still unacked segments
Transport Layer 3-65
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP sender (simplified)( p )
data received from application abovecreate segment seq NextSeqNumpass segment to IP (ie ldquosendrdquo)NextSeqNum = NextSeqNum + length(data) if (timer currently not running)
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP retransmission scenariosTCP retransmission scenariosHost BHost A Host BHost A
SendBase=92
Seq=92 8 bytes of data
ACK=100Xm
eout
Seq=92 8 bytes of data
meo
ut Seq=100 20 bytes of data
Xtim
ACK=100
tim
ACK=120
Seq=92 8 bytes of data
ACK=100
Seq=92 8bytes of dataSendBase=100
SendBase=120ACK=100 ACK=120
SendBase=120
Transport Layer 3-67
lost ACK scenario premature timeout
TCP retransmission scenariosTCP retransmission scenariosHost BHost A
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP retransmission scenariosTCP retransmission scenariosHost BHost A
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
delayed ACK Wait up to 500msf t t If t texpected seq All data up to
expected seq already ACKed
i l f i d t ith
for next segment If no next segmentsend ACK
immediatel send single c m lati earrival of in-order segment withexpected seq One other segment has ACK pending
immediately send single cumulative ACK ACKing both in-order segments
arrival of out-of-order segmenthigher-than-expect seq Gap detected
immediately send duplicate ACKindicating seq of next expected byte
Gap detected
arrival of segment that partially or completely fills gap
immediate send ACK provided thatsegment starts at lower end of gap
Transport Layer 3-69
partially or completely fills gap segment starts at lower end of gap
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP fast retransmitTCP fast retransmit
time-out period often time out period often relatively long long delay before if sender receives 3
TCP fast retransmit
resending lost packet detect lost segments
ia d licate ACKs
ACKs for same data(ldquotriple duplicate ACKsrdquo)
d k d (ldquotriple duplicate ACKsrdquo)
via duplicate ACKs sender often sends
many segments back-
resend unacked segment with smallest seq many segments back
to-back if segment is lost there
ill lik l b
seq likely that unacked
segment lost so donrsquot will likely be many duplicate ACKs
gwait for timeout
Transport Layer 3-70
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP fast retransmitHost BHost A
TCP fast retransmit
Seq=92 8 bytes of data
XSeq=100 20 bytes of data
ACK=100
meo
ut ACK=100
ACK=100ti ACK=100
ACK=100Seq=100 20 bytes of data
Transport Layer 3-71
fast retransmit after sender receipt of triple duplicate ACK
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-72
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP flow controlTCP flow controlapplication
processapplication may remove data from
TCP socketreceiver buffers
application
OS
remove data from TCP socket buffers hellip
receiver buffers
TCP
hellip slower than TCP receiver is delivering(sender is sending)
code
IPIPcode
receiver controls sender so sender wonrsquot overflow
flow control
receiver protocol stack
from sender
sender won t overflow receiverrsquos buffer by transmitting too much too fast
Transport Layer 3-73
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP flow controlTCP flow control
receiver ldquoadvertisesrdquo free to application process
receiver advertises free buffer space by including rwnd value in TCP header f i d
buffered dataRcvBufferof receiver-to-sender segments RcvBuffer size set via
free buffer spacerwndsocket options (typical default is 4096 bytes)
many operating systems TCP segment payloads
y p g yautoadjust RcvBuffer
sender limits amount of unacked (ldquoin-flightrdquo) data to i id b ff iunacked ( in flight ) data to receiverrsquos rwnd value
guarantees receive buffer ill fl
receiver-side buffering
Transport Layer 3-74
will not overflow
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-75
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Connection Managementgbefore exchanging data senderreceiver ldquohandshakerdquo agree to establish connection (each knowing the other willing agree to establish connection (each knowing the other willing
to establish connection) agree on connection parameters
application application
connection state ESTABconnection variables
seq client-to-servert li t
connection state ESTABconnection Variables
seq client-to-serverliserver-to-client
rcvBuffer sizeat serverclient
network
server-to-clientrcvBuffer size
at serverclient
networknetwork network
Socket clientSocket = Socket connectionSocket =
Transport Layer 3-76
newSocket(hostnameport number)
Socket connectionSocket welcomeSocketaccept()
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP 3-way handshakey
li t t t t t
choose init seq num xsend TCP SYN msg
client state
LISTEN
server state
LISTEN
SYNbit=1 Seq=xsend TCP SYN msg
choose init seq num ysend TCP SYNACKmsg acking SYN
SYNSENT
SYN RCVD
SYNbit=1 Seq=yACKbit=1 ACKnum=x+1
msg acking SYN
received SYNACK(x) i di t i li
SYN RCVD
ACKbit=1 ACKnum=y+1
indicates server is livesend ACK for SYNACK
this segment may contain client-to-server data
received ACK(y)
ESTAB
ESTAB
(y)indicates client is live
Transport Layer 3-77
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP 3-way handshake FSMy
closedclosed
Socket connectionSocket = welcomeSocketaccept()
Socket clientSocket =
newSocket(hostnameport number)
SYN(x)SYNACK(seq=yACKnum=x+1)
t k t f listen SYN(seq=x)create new socket for communication back to client
SYNrcvd
SYNsent
ESTABSYNACK(seq=yACKnum=x+1)
ACK(ACKnum=y+1)ACK(ACKnum=y+1)
Transport Layer 3-78
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP closing a connectiong
client server each close their side of connection send TCP segment with FIN bit = 1
respond to received FIN with ACK on receiving FIN ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-79
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP closing a connectiongclient state server state
FIN WAIT 1 FINbit=1 seq=xcan no longer
clientSocketclose()
ESTABESTAB
FIN WAIT 2
CLOSE_WAITACKbit=1 ACKnum=x+1
wait for servercan still
d d t
FIN_WAIT_1 FINbit=1 seq=xcan no longersend but canreceive data
FIN_WAIT_2
FINbit=1 seq=y
wait for serverclose
send data
LAST_ACKFINbit 1 seq y
ACKbit=1 ACKnum=y+1
can no longersend data
TIMED_WAIT
timed wait CLOSEDfor 2max
segment lifetime
Transport Layer 3-80
CLOSED
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-81
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Principles of congestion control
i
Principles of congestion control
congestion informally ldquotoo many sources sending too much
d f f k h dl rdquodata too fast for network to handlerdquo different from flow control manifestations lost packets (buffer overflow at routers) long delays (queueing in router buffers)
a top-10 problem
Transport Layer 3-82
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Causescosts of congestion scenario 1g
two senders two original data in throughput out
two senders two receivers
one router infinite buffers
unlimited shared output link buffers
Host A
buffers output link capacity R no retransmission
p
Host B
R2R2
out
dela
y maximum per-connection
R2in R2
din
large delays as arrival rate i
Transport Layer 3-83
maximum per-connection throughput R2
large delays as arrival rate in approaches capacity
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Causescosts of congestion scenario 2
one router finite buffers d t i i f ti d t k t
g
sender retransmission of timed-out packet application-layer input = application-layer output in = tout transport-layer input includes retransmissions in inlsquo
in original dataoutin original data plus
retransmitted data
Host A
retransmitted data
Transport Layer 3-84
finite shared output link buffersHost B
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Causescosts of congestion scenario 2
idealization perfect R2
g
pknowledge
sender sends only when router buffers available
out
router buffers available R2in
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
free buffer spaceA
Transport Layer 3-85
finite shared output link buffersHost B
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Causescosts of congestion scenario 2Idealization known loss
packets can be lost
g
packets can be lost dropped at router due to full buffers
d l d f sender only resends if packet known to be lost
in original dataoutin original data plus
retransmitted data
copy
retransmitted data
no buffer spaceA
Transport Layer 3-86
Host B
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Causescosts of congestion scenario 2gIdealization known loss
packets can be lost R2
packets can be lost dropped at router due to full buffers
d l d f
out
when sending at R2 some packets are retransmissions but asymptotic goodput
sender only resends if packet known to be lost R2in
asymptotic goodput is still R2 (why)
in original dataoutin original data plus
retransmitted dataretransmitted data
free buffer spaceA
Transport Layer 3-87
Host B
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Causescosts of congestion scenario 2
R2Realistic duplicates packets can be lost dropped
g
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f
R2in
including duplicated that are deliveredsending two copies both of
which are delivered
in outincopytimeout
A free buffer space
Transport Layer 3-88
Host B
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Causescosts of congestion scenario 2
R2
gRealistic duplicates packets can be lost dropped
out
when sending at R2 some packets are retransmissions including duplicated
packets can be lost dropped at router due to full buffers
sender times out prematurely di i b h f including duplicated
that are delivered
R2in
sending two copies both of which are delivered
ldquocostsrdquo of congestion unneeded retransmissions link carries multiple copies of pkt
Transport Layer 3-89
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Causescosts of congestion scenario 3
four senders Q what happens as in and inrsquo
increase
g
multihop paths timeoutretransmit
increase A as red in
rsquo increases all arriving blue pkts at upper queue are d d bl th h t 0
Host A out Host Bin original data original data plus
dropped blue throughput 0
finite shared output link buffers
in original data plusretransmitted data
link buffers
Host DHost C
Transport Layer 3-90
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Causescosts of congestion scenario 3g
C2ou
t o
C2inrsquo
another ldquocostrdquo of congestion when packet dropped any ldquoupstream when packet dropped any upstream
transmission capacity used for that packet was wasted
Transport Layer 3-91
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Approaches towards congestion controlApproaches towards congestion control
two broad approaches towards congestion controltwo broad approaches towards congestion control
end end congestion network assisted end-end congestion control
no explicit feedback
network-assisted congestion control
routers provide no explicit feedback from network
congestion inferred f d
routers provide feedback to end systems single bit indicating
(SNA from end-system observed loss delay
approach taken by
congestion (SNA DECbit TCPIP ECN ATM) approach taken by
TCP)
explicit rate for sender to send at
Transport Layer 3-92
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Chapter 3 outlineChapter 3 outline
3 1 l 3 5 i i d 31 transport-layer services
3 2 m lti l i d
35 connection-oriented transport TCP segment structure32 multiplexing and
demultiplexing3 3 connectionless
segment structure reliable data transfer flow control33 connectionless
transport UDP3 4 principles of reliable
flow control connection management
36 principles of congestion 34 principles of reliable data transfer
p p gcontrol
37 TCP congestion controlg
Transport Layer 3-93
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP Congestion Control detailsg
TCP sending ratedsender sequence number space TCP sending rate
roughly send cwnd bytes wait RTT for
cwnd
bytes wait RTT for ACKS then send more bytes
last byteACKed sent not-
yet ACKed
last byte sent
sender limits transmission
y(ldquoin-flightrdquo)
rate ~~cwndRTT
bytessec
LastByteSent-LastByteAcked
lt cwnd
cwnd is dynamic function of perceived network
tiTransport Layer 3-94
congestion
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP Slow Start TCP Slow Start
when connection begins Host A Host B
when connection begins increase rate exponentially until first l Tloss event initially cwnd = 1 MSS
d bl d RTT
RTT
double cwnd every RTT done by incrementing cwnd for every ACK yreceived
summary initial rate is slow but ramps up exponentially fast time
Transport Layer 3-95
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP switching from slow start to CAQ when should the
exponential
TCP switching from slow start to CA
exponential increase switch to linear
A when cwnd gets to 12 of its value before timeoutbefore timeout
Implementationp e e tat o variable ssthresh on loss event ssthresh
is set to 12 of cwnd just before loss event
Transport Layer 3-96
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP detecting reacting to lossTCP detecting reacting to loss
loss indicated by timeout loss indicated by timeout cwnd set to 1 MSS
i d h i ll ( i l ) window then grows exponentially (as in slow start) to threshold then grows linearly
l i di d b 3 d li ACK TCP RENO loss indicated by 3 duplicate ACKs TCP RENO dup ACKs indicate network capable of delivering
some segments cwnd is cut in half window then grows linearlyC ( 3 TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
Transport Layer 3-97
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Summary TCP Congestion Controly g
cwnd = cwnd + MSS (MSScwnd)new ACKduplicate ACK
NewACK
NewACK
cwnd gt ssthresh
cwnd cwnd + MSS (MSScwnd)dupACKcount = 0
transmit new segment(s) as allowedcwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed
cwnd = cwnd + MSStransmit new segment(s) as allowed
p
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP congestion control additive increase multiplicative decrease
approach sender increases transmission rate (window pp (size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every y y
RTT until loss detectedmultiplicative decrease cut cwnd in half after loss
zeadditively increase window size helliphellip until loss occurs (then cut window in half)
CP
send
er
n w
indo
w s
iz
AIMD saw toothbehavior probing
cwnd
Tco
nges
tionfor bandwidth
Transport Layer 3-99
time
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP throughputTCP throughput avg TCP thruput as function of window size RTTg p ignore slow start assume always data to send
W window size (measured in bytes) where loss occurs avg window size ( in-flight bytes) is frac34 W avg thruput is 34W per RTT
avg TCP thruput = 34
WRTT bytessec
W
W2
Transport Layer 3-100
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP Futures TCP over ldquolong fat pipesrdquoTCP Futures TCP over long fat pipes
l 1500 b 100 RTT example 1500 byte segments 100ms RTT want 10 Gbps throughput
i W = 83 333 i fli ht m t requires W = 83333 in-flight segments throughput in terms of segment loss probability L
[Mathis 1997][Mathis 1997]
TCP throughput = 122 MSSRTT L
to achieve 10 Gbps throughput need a loss rate of L = 210-10 a very small loss rate
L
= 210-10 ndash a very small loss rate new versions of TCP for high-speed
Transport Layer 3-101
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
TCP Fairnessfairness goal if K TCP sessions share same
TCP Fairnessfairness goal if K TCP sessions share same
bottleneck link of bandwidth R each should have average rate of RK
TCP connection 1
bottleneckrouter
it Rcapacity RTCP connection 2
Transport Layer 3-102
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Why is TCP fairWhy is TCP fairtwo competing sessionstwo competing sessions additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput proportionally p g p p p y
R equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increaseloss decrease window by factor of 2
congestion avoidance additive increase
RC ti 1 th h t
Transport Layer 3-103
RConnection 1 throughput
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Fairness (more)( )Fairness and UDP Fairness parallel TCP
i multimedia apps often do not use TCP
d
connections application can open
l i l ll l do not want rate throttled by congestion control
multiple parallel connections between two hosts
instead use UDP send audiovideo at
hosts web browsers do this
li k f R i h 9 constant rate tolerate packet loss
eg link of rate R with 9 existing connections new app asks for 1 TCP gets rate new app asks for 1 TCP gets rate
R10 new app asks for 11 TCPs gets R2
Transport Layer 3-104
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control
network edge(application transport layers)
instantiation implementation in the I
p y ) into the network ldquocorerdquo
Internet UDP
TCP
Transport Layer 3-105
TCP
Chapter 3 summaryChapter 3 summary principles behind p p
transport layer servicesmultiplexing
nextp g
demultiplexing reliable data transfer
next leaving the
network ldquoedgerdquo flow control congestion control