TO 4-25-06 p. 1 Spring 2006 EE 5304/EETS 7304 Internet Protocols Tom Oh Dept of Electrical Engineering [email protected] Lecture 14 TCP-Part 1
Jan 17, 2016
TO 4-25-06 p. 1
Spring 2006
EE 5304/EETS 7304 Internet Protocols
Tom OhDept of Electrical Engineering
Lecture 14
TCP-Part 1
TO 4-25-06 p. 2
Administrative Issues
(For distance learning students) If you are graduating this semester, you need to take the Final on May 9, 2006.
For in-class students, we will have the Final (Test #3) on May 9, 2006, 6:30PM.
The Final will cover lecture 11-15.
The Final will consists of multiple choice, T/F and short answers.
You are allowed to bring one 3 ½ X 5 card.
TO 4-25-06 p. 3
Outline (Comer, Ch. 25)
TCP
TCP header
TCP retransmissions
TCP duplicate detection
TCP connection set-up and close
TO 4-25-06 p. 4
TCP (Transmission Control Protocol)
TCP is predominant transport layer protocol to add end-to-end reliability above IP
Designed for reliable sequential byte stream delivery with no duplicates, no loss
Views application data as continuous byte stream, breaks into segments of 64-Kbyte max. length
Keeps track of each byte with a sequence number Segments are prefixed with TCP header and encapsulated
into IP packets
TO 4-25-06 p. 5
TCP (cont)
TCP dataTCP dataTCP headerTCP headerIP headerIP header
Sending application
Data• • • • • •
DataData
DataDataTCP headerTCP header
TCP segment
Receiving application
Data• • • • • •
DataData
DataDataTCP headerTCP header
TCP segment
IP packet
TO 4-25-06 p. 6
TCP (cont)
Provides connection-oriented service between applications on different hosts
An application is identified to TCP by port address
Application is completely identified by 16-bit port address & 32-bit IP address
TCP connection is between two endpoints, source <host address, port> and destination <host address, port>
TO 4-25-06 p. 7
TCP (cont)
TransportTransport
ApplicationApplication
TransportTransport
ApplicationApplication
Host Host
TCP port 80
ApplicationApplication
TCP port 25
ApplicationApplication
TCP port 26
TCP port 18
Reliable connection-oriented service with no duplicate, lost, misordered, or errored bytes
TO 4-25-06 p. 8
TCP (cont)
TCP assumes IP - a type C network - so has all of most complicated functions of transport protocol
Error control detects missing, errored, non-sequential, and duplicate packets
Uses sequence numbers and piggybacked ACKs, adaptive retransmissions
Flow control using credits
Connection control: 3-way handshake
Also, TCP assumes responsibility for congestion avoidance because IP has no congestion control
TO 4-25-06 p. 9
TCP Header
bits:
data
TCP source port
8
TCP destination port
checksum urgent pointer
8 8 8
sequence number
acknowledgement number
HLEN windowflagsRES
options
Source port (16 bits): optional; allows replies to sender
Destination port (16 bits): identifies application at destination host
TO 4-25-06 p. 10
TCP Header
bits:
data
TCP source port
8
TCP destination port
checksum urgent pointer
8 8 8
sequence number
acknowledgement number
HLEN windowflagsRES
options
Checksum (16 bits): error detection over pseudoheader + TCP segment
TO 4-25-06 p. 11
TCP Header (cont)
Pseudoheader is constructed from IP packet header including IP source/destination addresses, protocol field (=6 for TCP), length of TCP segment
Ensures that IP addresses are correct
Like UDP, this violates layering principle of OSI model
bits:
source IP address
8
zero TCP length
8 8 8
protocol
destination IP address
TO 4-25-06 p. 12
TCP Header
bits:
data
TCP source port
8
TCP destination port
checksum urgent pointer
8 8 8
sequence number
acknowledgement number
HLEN windowflagsRES
options
Sequence number (32 bits): number of first data byte, except if SYN=1; data bytes are numbered sequentially, to reconstruct sender’s byte stream
TO 4-25-06 p. 13
TCP Header (cont)
Sending application
Byte n+1Byte n • • •
DataData
DataDataTCP headerTCP header
Number of first byte = sequence number
Receiving application
DataData
Byte n+2 Byte n+1Byte n • • •Byte n+2
Sequence number tells where this segment belongs in reconstructed byte stream
TO 4-25-06 p. 14
TCP Header
bits:
data
TCP source port
8
TCP destination port
checksum urgent pointer
8 8 8
sequence number
acknowledgement number
HLEN windowflagsRES
options
Acknowledgement (32 bits): piggybacked ACK tells sender the next byte that is expected; ACKs are cumulative and refers to end of contiguous received data; additional received data, if not contiguous, triggers a duplicate ACK
TO 4-25-06 p. 15
TCP Header (cont)
Sending application
DataData
Segment ASEQ = 400
Receiver’s buffer
Byte 399
DataData
DataData
Segment BSEQ = 600
Segment CSEQ = 800
DataData
Segment Breceived first
ACK 400
bytes
TO 4-25-06 p. 16
TCP Header (cont)
Sending application
DataData
Segment ASEQ = 400
Receiver’s buffer
Byte 399
DataData
DataData
Segment BSEQ = 600
Segment CSEQ = 800
DataData
Segment Creceived second
ACK 400
duplicate
bytes
TO 4-25-06 p. 17
TCP Header (cont)
Sending application
DataData
Segment ASEQ = 400
Receiver’s buffer
Byte 999
DataData
DataData
Segment BSEQ = 600
Segment CSEQ = 800
DataData
Segment Areceived third
ACK 1000
bytes
TO 4-25-06 p. 18
TCP Header (cont)
bits:
data
TCP source port
8
TCP destination port
checksum urgent pointer
8 8 8
sequence number
acknowledgement number
HLEN windowflagsRES
options
Header length (4 bits): in units of 4 bytes; header is 20 bytes (value = 5) + options (if any)
Reserved (6 bits): all zeros
TO 4-25-06 p. 19
TCP Header (cont)
bits:
data
TCP source port
8
TCP destination port
checksum urgent pointer
8 8 8
sequence number
acknowledgement number
HLEN windowflagsRES
options
Flags (6 bits): URG: tells if Urgent pointer is usedACK: tells if Acknowledgement field is used PUSH: forces immediate transmission at senderRST: tells receiver to abort and reset connectionSYN: segments for 3-way handshake to set up connectionFIN: segments for 3-way handshake to terminate connection
TO 4-25-06 p. 20
TCP Header (cont)
bits:
data
TCP source port
8
TCP destination port
checksum urgent pointer
8 8 8
sequence number
acknowledgement number
HLEN windowflagsRES
options
URG flag: tells if Urgent pointer is used
Urgent pointer (16 bits): used if URG=1
TO 4-25-06 p. 21
TCP Header (cont)
Urgent pointer (2 bytes): points to number of first byte after urgent data in segment
If URG flag =1, data up to urgent pointer is urgent data to be processed immediately; rest of data is regular (not urgent)
Allows "out of band" data (to be processed immediately, out of sequence)
DataDataTCP headerTCP header
Urgent pointer
Urgent data
Regular data
TO 4-25-06 p. 22
TCP Header (cont)
Push function:
Normally, TCP accumulates data from sender before transmitting a segment
If sender issues a “push”, TCP will send the ready data, even if segment will be short (e.g., 1 byte of data)
TO 4-25-06 p. 23
TCP Header (cont)
bits:
data
TCP source port
8
TCP destination port
checksum urgent pointer
8 8 8
sequence number
acknowledgement number
HLEN windowflagsRES
options
Window (16 bits): piggybacked credit advertised by receiver; for flow control of sender
TO 4-25-06 p. 24
TCP Retransmissions
Sender waits for piggybacked acknowledgements
ACK is next expected byte (cumulative: acknowledges all previous bytes)
ACK does not acknowledge any additional non-contiguous data received
Sender will resend if retransmission timer expires
TCP tries to adjust time-out to just a little longer than estimated roundtrip time (RTT)
But timer is very difficult to determine when RTT varies widely in Internet
TO 4-25-06 p. 25
TCP Adaptive Retransmission Algorithm
Sender keeps track of returned ACKs as samples of RTT
Can continually update estimate of average roundtrip delay as weighted average of new measurement and old estimate, eg:
TO 4-25-06 p. 26
TCP Adaptive Retransmission Algorithm (cont)
Noticed β should depend on variance of roundtrip samples
Estimate can’t keep up with widely varying samples, resulting in unnecessary retransmissions
Current algorithm adapts RTO based on mean and variance of RTT
TO 4-25-06 p. 27
TCP Adaptive Retransmission Algorithm (cont)
mean RTTstandard dev.
RTO
packets
ACKs
mean RTT
standard dev.
RTO
packets
ACKs
RTT with small variance
RTT with large variance
TO 4-25-06 p. 28
TCP Adaptive Retransmission Algorithm (cont)
Problem: acknowledgement ambiguity problem
Suppose segment is transmitted twice, and then ACKed Does ACK refers to first segment or duplicate?
packet
ACK
Sender cannot know which case is true
duplicate
packet
ACK
duplicate
TO 4-25-06 p. 29
TCP Adaptive Retransmission Algorithm (cont)
If assume ACK from first transmission, RTT estimate could be too small → cause RTO to be too short and unnecessary retransmissions
If assume ACK from duplicate packet, RTT estimate could be too large → cause RTO to be too long
TO 4-25-06 p. 30
TCP Adaptive Retransmission Algorithm (cont)
Karn's algorithm: timer backoff strategy
RTT estimate is adjusted only for unambiguous ACKs If segment is sent twice due to time-out, ignore measured
delay to get its ACK and instead increase next RTO Rate of increase is implementation-dependent, usually
increases by factor of 2 On next unambiguous ACK, recompute RTT estimate and
reset RTO
TO 4-25-06 p. 31
TCP Duplicate Detection
Receiver can get duplicate segments caused by early time-outs, lost ACKs, or late ACKs
Should be no confusion because duplicates of TCP segment are identified by same sequence number
Large range of sequence numbers needed to avoid ambiguity
TCP uses 32 bits (4 billion) so sequence numbers will not wrap around in short time
Receiver will not be confused by duplicate segments with same number
TO 4-25-06 p. 32
TCP Duplicate Detection (cont)
For duplicate segments, receiver assumes first ACK was lost and will ACK the duplicate
Sender will not be confused by duplicate ACKs
Possible confusion is a duplicate TCP segment arrives after connection is closed and new connection is opened
CLS (FIN=1)
CLS (FIN=1)
CLS (FIN=1)
RFC (SYN=1)
RFC (SYN=1)
RFC (SYN=1)
Connection closes
Connection opens
old duplicate TCP segment arrives
TO 4-25-06 p. 33
TCP Duplicate Detection (cont)
TCP segment from old connection could arrive during new connection and be mistaken for a valid TCP segment
TCP avoids this confusion by:
New connection starts with random initial sequence number
Duplicate segments arriving during new connection will probably have a sequence number outside of new range
Any duplicate segments received during this time are discarded
TO 4-25-06 p. 34
TCP Duplicate Detection (cont)
New TCP connection chooses initial byte number at random
bytes
Byte number0
Byte number232
Byte numbers used for this connection
An old segment from another connection will more likely fall outside of expected range when range is very big (as in TCP)
TO 4-25-06 p. 35
TCP Duplicate Detection (cont)
Also, TCP keeps record of old connection for a timed Wait state after connection is closed
Time = 2 x Maximum Segment Lifetime (MSL = longest time a TCP segment might take to arrive)
Any duplicate segments received during this time are discarded
TO 4-25-06 p. 36
TCP Connection Set-up
TCP 3-way handshake:
SYN=1, SEQ=x
SYN=1, SEQ=x, ACK=y+1
Connection request; first data byte will be x
SYN=1, SEQ=y, ACK=x+1
A B
Connection acknowledgement; first data byte will be y
Connection confirm; send data starting at
byte x
TO 4-25-06 p. 37
TCP Connection Set-up (cont)
As seen before, 3-way handshake works even if both initiate connection at same time
Use of retransmission timer may cause duplicate SYN segments but there is no confusion
host Bhost A
normal
SEQ i, ACK j
SYN i
host Bhost A
old SYN, connection is rejected by A
SYN j, ACK i
host Bhost A
delayed SYN/ACK, connection is rejected by A, new connection is accepted
RST , ACK j
old SYN i
SYN j, ACK i
SEQ i, ACK j
SYN i
SYN j, ACK i
old SYN k, ACK m
RST, ACK k
TO 4-25-06 p. 38
TCP Connection Close
3-way handshake like procedure for connection set-up
Connection can be closed in one direction with segment with FIN=1
No more data is accepted in this direction
Other end will immediately ACK to prevent getting duplicate FIN segments
Delays FIN response until application is ready to close connection in reverse direction
TO 4-25-06 p. 39
Spring 2006
EE 5304/EETS 7304 Internet Protocols
Tom OhDept of Electrical Engineering
Lecture 14
TCP-Part 2
TO 4-25-06 p. 40
Outline
TCP flow control
TCP congestion avoidance
Slow start
Fast retransmit and recovery
TO 4-25-06 p. 41
Flow Control vs Congestion Control
Flow control: destination can slow down source through feedback control
Destination may not be ready to receive data Host-to-host control (network not involved)
Congestion control: network should not get overloaded with traffic
May be handled by hosts (e.g., TCP), the network (e.g., resource reservations), or both hosts and network cooperating together (e.g., congestion notification)
TO 4-25-06 p. 42
Flow Control
2 approaches to flow control:
Window-based control (typically sliding window): destination constrains how many packets (volume) can be in transit by slowing down ACKs or withholding credits
• Destination simply advertises the amount of its unused buffer space
• Inefficient for high-speed networks Rate-based control: destination constrains the sender’s
transmission rate (not volume)• Suited for streaming type applications that need a minimum
bandwidth
TO 4-25-06 p. 43
TCP Flow Control
TCP flow control operates in units of bytes (not segments)
Destination piggybacks ACK (4 bytes) and window advertisement (2 bytes) in data segments going to source
Advertised window = number of bytes it is ready to receive beyond last ACK’ed byte (i.e., a credit)
Example: <ACK n+1, window advertisement = m> gives the sender permission to send up to byte n+m
Window advertisement = 0 means stop sending
TO 4-25-06 p. 44
TCP Flow Control (cont)
Possible deadlock if destination closes window, then opens window but this credit is lost
Destination is expecting data while sender thinks window is closed
Sender starts a persist timer when window is closed
If timer expires, sender will send a window probe (TCP segment with 1-byte data) to see if window has been increased
TO 4-25-06 p. 45
TCP Flow Control (cont)
Lost
Probe with one byte of data
Host is waiting
ACK=x, credit=0
Sender Dest.
Host is waiting
Process continues until credit is received or
connection is closed; persist timer doubles
each time up to 60 sec
ACK=x, credit=m
Persist timer
ACK=x, credit=m
Probe should trigger duplicate of last credit or a new credit
TO 4-25-06 p. 46
Congestion Control
Without congestion control, Internet would reach congestion collapse
Since IP is best effort, sender’s best strategy is to send as much data as possible to hog the network and increase its chances of successful delivery
Everyone following this strategy will increase load on network, pushing it into congestion
Increasing congestion will cause more retransmissions → higher load will increase congestion even more → congestion collapse: very long delays; network full of duplicate packets; few packets delivered
TO 4-25-06 p. 47
Congestion Control (cont)
offered load
throughput
ideal
controlled
uncontrolled - congestion collapse
TO 4-25-06 p. 48
Congestion Control (cont)
Congestion control can be:
Window-based• Traditional sliding window is naturally responsive to
congestion• Congestion increases → RTT increases → ACKs slow down
→ sender slow down Rate-based
• Better suited for streaming type applications• Easier to think in terms of fair shares of bandwidth
TCP congestion control is window-based
TO 4-25-06 p. 49
Congestion Control (cont)
Congestion control can be:
Preventive: traffic is blocked from entering network to prevent congestion from occurring
• Need some type of admission control procedure or explicit congestion notification
Reactive: traffic is restricted after congestion occurs• Can be implemented in hosts without complexity of admission
control or congestion notification• Congestion prevention is preferred when possible
TCP uses reactive congestion control because IP layer does nothing
TO 4-25-06 p. 50
Congestion Control (cont)
Closely related, congestion control can be:
Closed loop• Continuous feedback during transmission allows sender to
adapt its rate to current congestion state Open loop
• Traffic is either admitted or blocked; once admitted, transmission is not controlled by feedback but source must conform to its specified rate
• Good for streaming type applications, if admission control is possible
TCP uses closed loop control (keeps routers simple)
TO 4-25-06 p. 51
Congestion Control (cont)
Closed loop control uses feedback that is either:
Explicit• Congested routers send explicit congestion notification• Sender can adapt its rate to current congestion state
Implicit• Sender must adapt its rate by inferring the congestion state -
typically from packet losses and RTT• No information from routers• Performance will not be as good as explicit feedback
TCP uses implicit feedback (keeps routers simple)
TO 4-25-06 p. 52
TCP Congestion Avoidance (cont)
TCP sender reacts to congestion in network by keeping an adaptive “congestion window”
Congestion window (cwnd) = amount of data that is appropriate for level of network congestion
Current sending window = min(window advertisement, congestion window)
Sender is constrained by either network congestion or the destination
Congestion avoidance algorithm: adapts congestion window by AIMD (additive increase, multiplicative decrease)
TO 4-25-06 p. 53
TCP Congestion Avoidance (cont)
Multiplicative decrease: idea is to back off senders quickly (exponentially) when congestion is detected
TCP assumes a lost segment (detected by retransmission timeout) is caused by congestion, and not because of error in RTO
If segment is lost (and retransmitted), decrease congestion window by half
If loss continues, congestion window keeps decreasing by half (down to one segment)
TO 4-25-06 p. 54
TCP Congestion Avoidance (cont)
Idealized cwnd
Time
Retransmission timeout drops cwnd to half
Linear increase
TO 4-25-06 p. 55
TCP Congestion Avoidance (cont)
Why back off window exponentially?
Some believe queues build exponentially during congestion → sources should back off as quickly
Additive increase: when congestion abates (an ACK for new data), increase congestion window linearly (one more segment per RTT)
Why not increase multiplicatively? Leads to instability and oscillations (easy to cause
congestion, harder to recover)
TO 4-25-06 p. 56
TCP Slow Start
Idea: if network is in equilibrium (running stably with full window in transit on each connection) when new connection starts or recovering from long period of congestion, sending a large initial window of segments might upset equilibrium and cause oscillations or congestion
Slow start: idea is to start congestion window at one segment and gradually increase rate
Increase congestion window by one segment for each ACK that is returned
Attempts to probe network for acceptable sending rate
TO 4-25-06 p. 57
TCP Slow Start (cont)
Slow in sense of starting with small window but rate of increase may not be slow
Window could increase exponentially: send 1 → get 1 ACK, increase window to 2 → get 2 ACKs, increase window to 4,...
This is actually fast rate of increase to allow sender to reach equilibrium point quickly (although gently)
Eventually, a segment will be lost• Set “slow start threshold” SST = 1/2 current congestion
window (the equilibrium point); then go into congestion avoidance
TO 4-25-06 p. 58
Slow Start and Congestion Avoidance
These are separate algorithms but implemented together because both triggered by time-out and change congestion window
New connection begins with congestion window = 1 segment, SST = 65,535 bytes
Go into slow start to search for acceptable window
Congestion is indicated by packet loss evidenced by timeout
Set SST = 1/2 current congestion window
TO 4-25-06 p. 59
Slow Start and Congestion Avoidance
If time-out occurred (this assumes that adaptive timer is accurate, so time-out means a lost segment), set congestion window = 1 segment and go into slow start
Slow start can continue until window reaches SST (half of window when congestion occurred)
Then go into congestion avoidance phase: congestion window can increase beyond SST but at more cautious rate (as it approaches the equilibrium point when congestion occurred)
TO 4-25-06 p. 60
Slow Start and Congestion Avoidance
In congestion avoidance phase, congestion window increases linearly as long as ACKs are returned
Whenever congestion window ≤ SST, it’s in slow start; if congestion window > SST, then it’s in congestion avoidance
Slow start
Congestion avoidance
Slow start
Congestion avoidance
TO 4-25-06 p. 61
Fast Retransmit and Recovery Algorithm
Destination will send duplicate ACK whenever it gets out-of-order segment
Sender does not know if duplicate ACKs mean segment was lost or segments were received out of order
Fast retransmit algorithm:
Assumes that out-of-order segments will result in only 1 or 2 duplicate ACKs, and 3 or more duplicate ACKs means a segment was lost
TO 4-25-06 p. 62
TCP Header (cont)
Receiver’s buffer
DataData
First ACK
ACK
These out-of-order segments will cause 3
duplicate ACKs → TCP assumes that missing
segment is lost
DataData DataData
TO 4-25-06 p. 63
Fast Retransmit and Recovery Algorithm
That lost segment is retransmitted immediately (even if retransmit timer hasn’t expired)
Fast recovery: do congestion avoidance but not slow start because duplicate ACKs indicate that some segments (after lost segment) were delivered, so congestion is not too bad
Set SST = 1/2 congestion window Reduce congestion window to half + 3 segments (to allow
for 3 segments already at dest.) Expand congestion window linearly until next lost segment
TO 4-25-06 p. 64
Retransmissions around time = 10, 14, and 21 sec
SST is sent to 1/2 congestion window but window is allowed to increase with each duplicate ACK
When missing segment is ACKed, congestion window closes down to SST
Fast Retransmit and Recovery Algorithm