Operating Systems and Networks Network Lecture 8: Transport Layer Adrian Perrig Network Security Group ETH Zürich
OperatingSystemsandNetworks
NetworkLecture8:TransportLayer
AdrianPerrigNetworkSecurityGroupETHZürich
3
WhereweareintheCourse• StartingtheTransportLayer!– Buildsonthenetworklayertodeliverdataacrossnetworksforapplicationswiththedesiredreliabilityorquality
PhysicalLink
NetworkTransportApplication
Recall• Transportlayerprovidesend-to-endconnectivityacrossthenetwork
4
TCPIP
802.11
app
IP
802.11
IP
Ethernet
TCP
IPEthernet
app
RouterHost Host
5
Recall(2)• Segmentscarryapplicationdataacrossthenetwork• Segmentsarecarriedwithinpacketswithinframes
802.11 IP TCP App,e.g.,HTTP
Segment
PacketFrame
6
TransportLayerServices• Providedifferentkindsofdatadeliveryacrossthenetworktoapplications
Unreliable ReliableMessages Datagrams(UDP)Bytestream Streams (TCP)
ComparisonofInternetTransports• TCPisfull-featured,UDPisaglorifiedpacket
7
TCP(Streams) UDP(Datagrams)Connections Datagrams
Bytesaredelivered once,reliably,andinorder
Messages maybelost,reordered,duplicated
Arbitrarylengthcontent LimitedmessagesizeFlowcontrolmatchessendertoreceiver
Cansendregardlessofreceiver state
Congestion controlmatchessendertonetwork
Cansendregardlessofnetworkstate
8
SocketAPI• Simpleabstractiontousethenetwork
– The“network”API(reallyTransportservice)usedtowriteallInternetapps
– PartofallmajorOSes andlanguages;originallyBerkeley(Unix)~1983
• SupportsbothInternettransportservices(StreamsandDatagrams)
SocketAPI(3)• SameAPIusedforStreamsandDatagrams
10
Primitive MeaningSOCKET CreateanewcommunicationendpointBIND Associatealocaladdress(port)withasocketLISTEN AnnouncewillingnesstoacceptconnectionsACCEPT PassivelyestablishanincomingconnectionCONNECT ActivelyattempttoestablishaconnectionSEND(TO) SendsomedataoverthesocketRECEIVE(FROM) ReceivesomedataoverthesocketCLOSE Releasethesocket
OnlyneededforStreams
To/FromformsforDatagrams
11
Ports• ApplicationprocessisidentifiedbythetupleIPaddress,protocol,
andport– Portsare16-bitintegersrepresentinglocal“mailboxes”thataprocess
leases
• Serversoftenbindto“well-knownports”– <1024,requireadministrativeprivileges
• Clientsoftenassigned“ephemeral”ports– ChosenbyOS,usedtemporarily
SomeWell-KnownPorts
12
Port Protocol Use20,21 FTP Filetransfer
22 SSH Remotelogin,replacementforTelnet25 SMTP Email80 HTTP WorldWideWeb
110 POP-3 Remoteemailaccess143 IMAP Remoteemailaccess443 HTTPS SecureWeb(HTTPoverSSL/TLS)543 RTSP Mediaplayercontrol631 IPP Printersharing
13
Topics• Servicemodels
– SocketAPIandports– Datagrams,Streams
• UserDatagramProtocol(UDP)• Connections(TCP)• SlidingWindow(TCP)• Flowcontrol(TCP)• Retransmissiontimers(TCP)
• Congestioncontrol(TCP) Later
Thistime
14
UserDatagramProtocol(UDP)(§6.4)• SendingmessageswithUDP– Ashimlayeronpackets
Ijustwanttosendapacket!
Network
15
UserDatagramProtocol(UDP)• Usedbyappsthatdon’twantreliabilityorbytestreams– Voice-over-IP(unreliable)– DNS,RPC(message-oriented)– DHCP(bootstrapping)
(Ifapplicationwantsreliabilityandmessagesthenithasworktodo!)
17
DatagramSockets(2)Client(host1) Server(host2)Time
1:socket 2:bind1:socket
6:sendto
3:recvfrom*4:sendto
5:recvfrom*
7:close 7:close*=callblocks
request
reply
18
UDPBufferingApp
PortMux/Demux
App AppApplication
Transport(TCP)
Network(IP) packet
Messagequeues
Ports
19
UDPHeader• Usesportstoidentifysendingandreceivingapplicationprocesses
• Datagramlengthupto64K• Checksum(16bits)forreliability
20
UDPPseudoheader• OptionalchecksumcoversUDPsegmentandIPpseudoheader– CheckskeyIPfields(addresses)– Valueofzeromeans“nochecksum”
21
ConnectionEstablishment(6.5.5,6.5.7,6.2.2)
• Howtosetupconnections– We’llseehowTCPdoesit
SYN!ACK!
Network
SYNACK!
22
ConnectionEstablishment• Bothsenderandreceivermustbereadybeforewestartthetransferofdata– Needtoagreeonasetofparameters– e.g.,theMaximumSegmentSize(MSS)
• Thisissignaling– Itsetsupstateattheendpoints– Like“dialing”foratelephonecall
23
Three-WayHandshake• UsedinTCP;opensconnectionfor
datainbothdirections
• EachsideprobestheotherwithafreshInitialSequenceNumber(ISN)– SendsonaSYNchronize segment– EchoonanACKnowledge segment
• Chosentoberobustevenagainstdelayedduplicates
Activeparty(client)
Passiveparty(server)
24
Three-WayHandshake(2)• Threesteps:– ClientsendsSYN(x)– ServerreplieswithSYN(y)ACK(x+1)– ClientreplieswithACK(y+1)– SYNsareretransmittediflost
• Sequenceandack numberscarriedonfurthersegments
1
2
3
Activeparty(client)
Passiveparty(server)
Time
25
Three-WayHandshake(3)• Supposedelayed,duplicatecopiesoftheSYNandACKarriveattheserver!– Improbable,butanyhow…
Activeparty(client)
Passiveparty(server)
26
Three-WayHandshake(4)• Supposedelayed,duplicatecopiesoftheSYNandACKarriveattheserver!– Improbable,butanyhow…
• ConnectionwillbecleanlyrejectedonbothsidesJ
Activeparty(client)
Passiveparty(server)
XXREJECT
REJECT
TCPConnectionStateMachine• Capturesthestates(rectangles)andtransitions(arrows)– A/BmeanseventAtriggersthetransition,withactionB
27
Bothpartiesruninstancesofthisstatemachine
TCPConnections(4)• Again,withstates…
30
LISTEN
SYN_RCVD
SYN_SENT
ESTABLISHED
ESTABLISHED
1
2
3
Activeparty(client) Passiveparty(server)
Time
CLOSEDCLOSED
31
TCPConnections(5)• Finitestatemachinesareausefultooltospecifyandcheckthehandlingofallcasesthatmayoccur
• TCPallowsforsimultaneousopen– i.e.,bothsidesopenatonceinsteadoftheclient-serverpattern
– TryathometoconfirmitworksJ
32
ConnectionRelease(6.5.6-6.5.7,6.2.3)
• Howtoreleaseconnections– We’llseehowTCPdoesit
Network
FIN! FIN!
33
ConnectionRelease• Orderlyreleasebybothpartieswhendone– Deliversallpendingdataand“hangsup”– Cleansupstateinsenderandreceiver
• Keyproblemistoprovidereliabilitywhilereleasing– TCPusesa“symmetric”closeinwhichbothsidesshutdownindependently
34
TCPConnectionRelease• Twosteps:
– ActivepartysendsFIN(x),passivepartysendsACK
– PassivepartysendsFIN(y),activepartysendsACK
– FINsareretransmittediflost
• EachFIN/ACKclosesonedirectionofdatatransfer
Activeparty Passiveparty
1
2
TCPRelease(3)• Again,withstates…
38
Activeparty Passiveparty
1
2
FIN_WAIT_1
CLOSE_WAIT
LAST_ACKFIN_WAIT_2
TIME_WAIT
CLOSEDCLOSED
ESTABLISHED
(timeout)
ESTABLISHED
39
TIME_WAITState• Wewaitalongtime(twotimesthemaximumsegmentlifetimeof60seconds)aftersendingallsegmentsandbeforecompletingtheclose
• Why?– ACKmighthavebeenlost,inwhichcaseFINwillberesentforanorderlyclose
– Couldotherwiseinterferewithasubsequentconnection
40
SlidingWindows(§3.4,§6.5.8)• Theslidingwindowalgorithm– Pipeliningandreliability– BuildingonStop-and-Wait
Yeah!
Network
41
Recall• ARQwithonemessageatatimeisStop-and-Wait(normalcasebelow)
Frame0
ACK0Timeout Time
Sender Receiver
Frame1
ACK1
42
LimitationofStop-and-Wait• Itallowsonlyasinglemessagetobeoutstandingfromthesender:– FineforLAN(onlyoneframefit)– NotefficientfornetworkpathswithBD>>1packet
43
LimitationofStop-and-Wait(2)• Example:R=1Mbps,D=50ms– RTT(RoundTripTime)=2D=100ms– Howmanypackets/sec?
– WhatifR=10Mbps?
44
SlidingWindow• Generalizationofstop-and-wait– AllowsWpacketstobeoutstanding– CansendWpacketsperRTT(=2D)
– Pipelining improvesperformance– NeedW=2BDtofillnetworkpath
46
SlidingWindow(3)• Ex:R=1Mbps,D=50ms
– 2BD=106 b/secx100.10-3sec=100kbit– W=2BD=10packetsof1200bytes
• Ex:WhatifR=10Mbps?– 2BD=1000kbit– W=2BD=100packetsof1200bytes
47
SlidingWindowProtocol• Manyvariations,dependingonhowbuffers,acknowledgements,andretransmissionsarehandled
• Go-Back-N– Simplestversion,canbeinefficient
• SelectiveRepeat– Morecomplex,betterperformance
48
SlidingWindow– Sender• SenderbuffersuptoWsegmentsuntiltheyareacknowledged– LFS=LAST FRAME SENT,LAR=LAST ACK REC’D– SendswhileLFS– LAR≤W
.. 5 6 7 .. 2 3 4 5 2 3 ..
LAR LFS
W=5
Acked Unacked 3 ..Unavailable
Available
seq.number
SlidingWindow
49
SlidingWindow– Sender(2)• TransportacceptsanothersegmentofdatafromtheApplication...– Transportsendsit(asLFS–LARà 5)
.. 5 6 7 .. 2 3 4 5 2 3 ..
LAR LFS
W=5
Acked Unacked 3 ..Unavailable
seq.number
4
50
SlidingWindow– Sender(3)• NexthigherACKarrivesfrompeer…– Windowadvances,bufferisfreed– LFS–LARà 4(cansendonemore)
.. 5 6 7 2 3 4 5 2 3 ..
LAR LFS
W=5
Acked 3 ..Unavail.
Available
seq.number
..2 Unacked
51
SlidingWindow– Go-Back-N• Receiverkeepsonlyasinglepacketbufferforthenextsegment– Statevariable,LAS=LAST ACK SENT
• Onreceive:– Ifseq.numberisLAS+1,acceptandpassittoapp,updateLAS,sendACK
– Otherwisediscard(asoutoforder)
52
SlidingWindow– SelectiveRepeat• Receiverpassesdatatoappinorder,andbuffersout-of-order
segmentstoreduceretransmissions
• ACKconveyshighestin-ordersegment,plushintsaboutout-of-ordersegments
• TCPusesaselectiverepeatdesign;we’llseethedetailslater
53
SlidingWindow– SelectiveRepeat(2)
• BuffersWsegments,keepsstatevariable,LAS=LAST ACKSENT
• Onreceive:– Buffersegments[LAS+1,LAS+W]– Passuptoappin-ordersegmentsfromLAS+1,andupdateLAS– SendACKforLASregardless
54
SlidingWindow– Retransmissions• Go-Back-Nsenderusesasingletimertodetectlosses
– Ontimeout,resendsbufferedpackets startingatLAR+1
• SelectiveRepeatsenderusesatimerperunacked segmenttodetectlosses– Ontimeoutforsegment,resendit– Hopetoresendfewersegments
55
SequenceNumbers• Needmorethan0/1forStop-and-Wait…
– Buthowmany?
• ForSelectiveRepeat,needWnumbersforpackets,plusWforacks ofearlierpackets– 2Wseq.numbers– FewerforGo-Back-N(W+1)
• Typicallyimplementseq.numberwithanN-bitcounterthatwrapsaroundat2N—1– E.g.,N=8:…,253,254,255,0,1,2,3,…
56
SequenceTimePlot
Time
Seq.Num
ber
Acks(atReceiver)
Delay(=RTT/2)
Transmissions(atSender)
Windowsize
59
FlowControl(§6.5.8)• Addingflowcontroltotheslidingwindowalgorithm– Toslowtheover-enthusiasticsender
Pleaseslowdown!
Network
60
Problem• Slidingwindowusespipeliningtokeepthenetworkbusy– Whatifthereceiverisoverloaded?
StreamingvideoBigIron WeeMobile
Arg …
61
SlidingWindow– Receiver• ConsiderreceiverwithWbuffers– LAS=LAST ACK SENT,apppullsin-orderdatafrombufferwithrecv()call
.. 5 6 7 5 2 3 ..
LAS
W=5
Finished 3 ..Toohigh
seq.number
555 5Acceptable
SlidingWindow
62
SlidingWindow– Receiver(2)• Supposethenexttwosegmentsarrivebutappdoesnotcallrecv()
.. 5 6 7 5 2 3 ..
LAS
W=5
Finished 3 ..Toohigh
Acceptable
seq.number
555 5
63
SlidingWindow– Receiver(3)• Supposethenexttwosegmentsarrivebutappdoesnotcallrecv()– LASrises,butwecan’tslidewindow!
.. 5 6 7 5 2 3 ..
LAS
W=5
Finished 3 ..Toohigh
Acceptable
seq.number
555 544Acked
64
SlidingWindow– Receiver(4)• Iffurthersegmentsarrive(eveninorder)wecanfillthebuffer– Mustdropsegmentsuntilapprecvs!
.. 5 6 7 5 2 3 ..
LAS
W=5
Finished 3 ..Toohigh
NothingAcceptable
seq.number
5 44Acked 44 4Acked
65
SlidingWindow– Receiver(5)• Apprecv()takestwosegments– Windowslides(phew)
.. 5 6 7 5 2 3 ..
LAS
W=5
Finished 3 ..Toohigh
Acceptable
seq.number
555 5 44 4Acked
66
FlowControl• Avoidlossatreceiverbytellingsendertheavailablebufferspace– WIN=#Acceptable,notW(fromLAS)
.. 5 6 7 5 2 3 ..
LAS
W=5
Finished 3 ..Toohigh
Acceptable
seq.number
555 544Acked
67
FlowControl(2)• Senderusestheloweroftheslidingwindowandflowcontrolwindow(WIN)astheeffectivewindowsize
.. 5 6 7 5 2 3 ..
LAS
WIN=3
Finished 3 ..Toohigh
seq.number
555 544Acked
68
FlowControl(3)• TCP-styleexample– SEQ/ACK slidingwindow– FlowcontrolwithWIN
– SEQ +length<ACK+WIN
– 4KBbufferatreceiver– Circularbufferofbytes
69
RetransmissionTimeouts(§6.5.9)• Howtosetthetimeoutforsendingaretransmission– Adaptingtothenetworkpath
Lost?
Network
70
Retransmissions• Withslidingwindow,thestrategyfordetectinglossisthetimeout– Settimerwhenasegmentissent– Canceltimerwhenack isreceived– Iftimerfires,retransmit dataaslost
Retransmit!
71
TimeoutProblem• Timeoutshouldbe“justright”
– Toolongwastesnetworkcapacity– Tooshortleadstospuriousresends– Butwhatis“justright”?
• EasytosetonaLAN(Link)– Short,fixed,predictableRTT
• HardontheInternet(Transport)– Widerange,variableRTT
ExampleofRTTs
72
0
100
200
300
400
500
600
700
800
900
1000
0 50 100 150 200Seconds
Roun
dTripTim
e(m
s)
BCNàSEAàBCN
ExampleofRTTs(2)
73
0
100
200
300
400
500
600
700
800
900
1000
0 50 100 150 200Seconds
Roun
dTripTim
e(m
s) Variationduetoqueuingatrouters,changesinnetworkpaths,etc.
BCNàSEAàBCN
Propagation(+transmission)delay≈2D
ExampleofRTTs(3)
74
0
100
200
300
400
500
600
700
800
900
1000
0 50 100 150 200Seconds
Roun
dTripTim
e(m
s)
Timertoohigh!
Timertoolow!
Needtoadapttothenetworkconditions
75
AdaptiveTimeout• KeepsmoothedestimatesoftheRTT(1)andvarianceinRTT(2)
– Updateestimateswithamovingaverage1. SRTTN+1 =0.9*SRTTN +0.1*RTTN+12. SvarN+1 =0.9*SvarN +0.1*|RTTN+1– SRTTN+1|
• Settimeouttoamultipleofestimates– ToestimatetheupperRTTinpractice– TCPTimeoutN =SRTTN +4*SvarN
ExampleofAdaptiveTimeout
76
0
100
200
300
400
500
600
700
800
900
1000
0 50 100 150 200Seconds
RTT(m
s)
SRTT
Svar
ExampleofAdaptiveTimeout(2)
77
0
100
200
300
400
500
600
700
800
900
1000
0 50 100 150 200Seconds
RTT(m
s)
Timeout(SRTT+4*Svar)
Earlytimeout
78
AdaptiveTimeout(2)• Simpletocompute,doesagoodjoboftrackingactualRTT– Little“headroom”tolower– Yetveryfewearlytimeouts
• Turnsouttobeimportantforgoodperformanceandrobustness
79
TransmissionControlProtocol(TCP)(§6.5)• HowTCPworks!– ThetransportprotocolusedformostcontentontheInternet
TCPTCPTCP
WeloveTCP/IP!
Network
WeloveTCP/IP!WeloveTCP/IP!We© TCP/IP!
80
TCPFeatures• Areliablebytestream service• Basedonconnections• Slidingwindowforreliability– Withadaptivetimeout
• Flowcontrolforslowreceivers
• Congestioncontroltoallocatenetworkbandwidth
Thistime
Nexttime
ReliableBytestream• Messageboundariesnotpreservedfromsend()torecv()– Butreliableandordered(receivebytesinsameorderassent)
81
Foursegments,eachwith512bytesofdataandcarriedinanIPpacket
2048bytesofdatadeliveredtoappinasinglerecv()call
Sender Receiver
82
ReliableBytestream (2)• Bidirectionaldatatransfer– Controlinformation(e.g.,ACK)piggybacksondatasegmentsinreversedirection
A BdataBàA
ACK AàB
ACK BàA
dataAàB
85
TCPSlidingWindow– Receiver• CumulativeACK tellsnextexpectedbytesequencenumber(“LAS+1”)
• Optionally,selectiveACKs (SACK)givehintsforreceiverbufferstate– Listupto3rangesofreceivedbytes
ACK upto100and200-299
86
TCPSlidingWindow– Sender• UsesadaptiveretransmissiontimeouttoresenddatafromLAS+1• Usesheuristicstoinferlossquicklyandresendtoavoidtimeouts
– “ThreeduplicateACKs”treatedasloss
ACK 100ACK 100,200-299
ACK 100,200-399
ACK 100,200-499
Senderdecides100-199islost