1 Chapter 5 End-to-End Protocols Outline 5.1 UDP 5.2 TCP 5.3 Remote Procedure Call
1
Chapter 5 End-to-End Protocols
Outline5.1 UDP5.2 TCP5.3 Remote Procedure Call
2
End-to-End Protocols• Underlying best-effort network
– drop messages– re-orders messages– delivers duplicate copies of a given message– limits messages to some finite size– delivers messages after an arbitrarily long delay
• Common end-to-end services– guarantee message delivery– deliver messages in the same order they are sent– deliver at most one copy of each message– support arbitrarily large messages– support synchronization– allow the receiver to flow control the sender– support multiple application processes on each host
3
5.1 Simple Demultiplexor (UDP)• Unreliable and unordered datagram service• Adds multiplexing• No flow control• Endpoints identified by ports
– servers have well-known ports– see /etc/services on Unix
• Header format
• Optional checksum– psuedo header + UDP header + data
SrcPort DstPort
ChecksumLength
Data
0 16 31
4
5.2 Reliable Byte-Stream (TCP)Outline
5.2.1 End-to-end Issues5.2.2 Segment Format5.2.3 Connection Establishment/Termination5.2.4 Sliding Window Revisited 5.2.5 Triggering Transmission
Silly Window Syndrome Nagle’s Algorithm5.2.6 Adaptive Retransmission Original Algorithm Karn/Partridge Algorithm Jacobson/Karels Algorithm5.2.7 Record Boundaries5.2.8 TCP Extensions
5
TCP Overview
• Connection-oriented• Byte-stream
– app writes bytes– TCP sends segments– app reads bytes
• Full duplex• Flow control: keep sender from
overrunning receiver• Congestion control: keep sender
from overrunning network
Application process
Writebytes
TCP
Send buffer
Segment Segment Segment
Transmit segments
Application process
Readbytes
TCP
Receive buffer
■ ■ ■
6
Data Link Versus Transport• Potentially connects many different hosts
– need explicit connection establishment and termination
• Potentially different RTT– need adaptive timeout mechanism
• Potentially long delay in network– need to be prepared for arrival of very old packets
• Potentially different capacity at destination – need to accommodate different node capacity
• Potentially different network capacity– need to be prepared for network congestion
7
5.2.2 Segment Format
Options (variable)
Data
Checksum
SrcPort DstPort
HdrLen 0 Flags
UrgPtr
AdvertisedWindow
SequenceNum
Acknowledgment
0 4 10 16 31
8
Segment Format (cont)• Each connection identified with 4-tuple:
– (SrcPort, SrcIPAddr, DsrPort, DstIPAddr)
• Sliding window + flow control– acknowledgment, SequenceNum, AdvertisedWinow
• Flags– SYN, FIN, RESET, PUSH, URG, ACK
• Checksum– pseudo header + TCP header + data
Sender
Data (SequenceNum)
Acknowledgment +AdvertisedWindow
Receiver
9
5.2.3 Connection Establishment and Termination
Active participant(client)
Passive participant(server)
SYN, SequenceNum =x
ACK, Acknowledgment =y+1
Acknowledgment =x+1
SYN+ACK,
SequenceNum=y,
10
State Transition DiagramCLOSED
LISTEN
SYN_RCVD SYN_SENT
ESTABLISHED
CLOSE_WAIT
LAST_ACKCLOSING
TIME_WAIT
FIN_WAIT_2
FIN_WAIT_1
Passive open Close
Send/SYNSYN/SYN + ACK
SYN + ACK/ACK
SYN/SYN + ACK
ACK
Close/FIN
FIN/ACKClose/FIN
FIN/ACKACK + FIN/ACK
Timeout after twosegment lifetimes
FIN/ACK
ACK
ACK
ACK
Close/FIN
Close
CLOSED
Active open/SYN
11
5.2.4 Sliding Window Revisited
• Sending side– LastByteAcked < = LastByteSent
– LastByteSent < = LastByteWritten
– buffer bytes between LastByteAcked and LastByteWritten
• Receiving side– LastByteRead < NextByteExpected
– NextByteExpected < = LastByteRcvd +1
– buffer bytes between NextByteRead and LastByteRcvd
Sending application
LastByteWrittenTCP
LastByteSentLastByteAcked
Receiving application
LastByteReadTCP
LastByteRcvdNextByteExpected
(a) (b)
12
Flow Control• Send buffer size: MaxSendBuffer• Receive buffer size: MaxRcvBuffer• Receiving side
– LastByteRcvd - LastByteRead < = MaxRcvBuffer– AdvertisedWindow = MaxRcvBuffer - (NextByteExpected - NextByteRead)
• Sending side– LastByteSent - LastByteAcked < = AdvertisedWindow– EffectiveWindow = AdvertisedWindow - (LastByteSent - LastByteAcked)
– LastByteWritten - LastByteAcked < = MaxSendBuffer– block sender if (LastByteWritten - LastByteAcked) + y > MaxSenderBuffer
• Always send ACK in response to arriving data segment• Persist when AdvertisedWindow = 0
13
Protection Against Wrap Around
• 32-bit SequenceNum
Bandwidth Time Until Wrap AroundT1 (1.5 Mbps) 6.4 hoursEthernet (10 Mbps) 57 minutesT3 (45 Mbps) 13 minutesFDDI (100 Mbps) 6 minutesSTS-3 (155 Mbps) 4 minutesSTS-12 (622 Mbps) 55 secondsSTS-24 (1.2 Gbps) 28 seconds
14
Keeping the Pipe Full
• 16-bit AdvertisedWindow
Bandwidth Delay x Bandwidth ProductT1 (1.5 Mbps) 18KBEthernet (10 Mbps) 122KBT3 (45 Mbps) 549KBFDDI (100 Mbps) 1.2MBSTS-3 (155 Mbps) 1.8MBSTS-12 (622 Mbps) 7.4MBSTS-24 (1.2 Gbps) 14.8MB
assuming 100ms RTT
15
5.2.5 Triggering Transmission
• How aggressively does sender exploit open window?
• If the sender aggressively fills an empty container as soon as it arrives, then any small container introduced into the system remains in the system indefinitely.
• Receiver-side solutions– after advertising zero window, wait for space equal to a maximum segment
size (MSS)– delayed acknowledgements
Sender Receiver
Silly Window Syndrome
16
Nagle’s Algorithm
• How long does sender delay sending data?– too long: hurts interactive applications
– too short: poor network utilization
– strategies: timer-based vs self-clocking (ack)
• When application generates additional data– if fills a max segment (and window open): send it
– else• if there is unack’ed data in transit: buffer it until ACK arrives
• else: send it
17
5.2.6 Adaptive Retransmission(Original Algorithm)
• Measure SampleRTT for each segment / ACK pair• Compute weighted average of RTT
– EstRTT = x EstRTT + x SampleRTT– where + = 1 between 0.8 and 0.9 between 0.1 and 0.2
• Set timeout based on EstRTT– TimeOut = 2 x EstRTT
18
Karn/Partridge Algorithm
• Do not sample RTT when retransmitting • Double timeout after each retransmission
Sender Receiver
Original transmission
ACK
Retransmission
Sender Receiver
Original transmission
ACK
Retransmission
(a) (b)
19
Jacobson/ Karels Algorithm• New Calculations for average RTT• Diff = SampleRTT - EstRTT• EstRTT = EstRTT + ( x Diff)• Dev = Dev + ( |Diff| - Dev)
– where is a factor between 0 and 1
• Consider variance when setting timeout value• TimeOut = x EstRTT + x Dev
– where = 1 and = 4
• Notes– algorithm only as good as granularity of clock (500ms on Unix)– accurate timeout mechanism important to congestion control (later)
20
5.2.7 Record Boundaries
• TCP is a byte-stream protocol, the number of bytes written by the sender are not necessarily the same as the number of bytes read by the receiver.
• How to insert “record boundaries” into this byte stream ?
• URG flag and UrgPtr field (out-of-band data)
• PUSH operation
21
5.2.8 TCP Extensions
• Mitigate some problem that TCP is facing as the underlying network gets faster.
• Implemented as header options– Store timestamp in outgoing segments to improve
TCP’s timeout mechanism.– Extend sequence space with 32-bit timestamp– Shift (scale) advertised window (larger window)