Top Banner
COMS/CSEE 4140 Networking Laboratory Lecture 06 Salman Abdul Baset Spring 2008
79

COMS/CSEE 4140 Networking Laboratory Lecture 06

Feb 01, 2016

Download

Documents

Patrick jenge

COMS/CSEE 4140 Networking Laboratory Lecture 06. Salman Abdul Baset Spring 2008. Announcements. Lab 4 (5-7) due next week before your lab slot Prelab 5 due next week. There will be Lab 5 next week. Midterm (March 10 th , duration ~1.5 hours) Assignment 2 issues aslookup compilation? - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: COMS/CSEE 4140 Networking Laboratory Lecture 06

COMS/CSEE 4140 Networking Laboratory

Lecture 06

Salman Abdul BasetSpring 2008

Page 2: COMS/CSEE 4140 Networking Laboratory Lecture 06

2

Announcements Lab 4 (5-7) due next week before your lab

slot Prelab 5 due next week. There will be Lab 5 next week. Midterm (March 10th, duration ~1.5 hours) Assignment 2 issues

aslookup compilation? ISP name: nslookup or whois for IP address

Lab 4 (count-to-infinity issues)

Page 3: COMS/CSEE 4140 Networking Laboratory Lecture 06

3

Agenda Autonomous Systems (AS) Policy vs. distance based routing Border gateway protocol (BGP) Transmission control protocol (TCP)

Page 4: COMS/CSEE 4140 Networking Laboratory Lecture 06

4

Autonomous Systems Terminology local traffic = traffic with source or

destination in AS transit traffic = traffic that passes

through the AS Stub AS = has connection to

only one AS, only carry local traffic Multihomed AS = has connection to >1

AS, but does not carry transit traffic Transit AS = has connection to >1

AS and carries transit traffic

Page 5: COMS/CSEE 4140 Networking Laboratory Lecture 06

5

Stub and Transit Networks

AS 1, AS 2, and AS 5 are stub networks

AS 2 is a multi-homed stub network

AS 3 and AS 4 are transit networks

AS 3

AS 1

AS 4

AS 2

AS 5

Page 6: COMS/CSEE 4140 Networking Laboratory Lecture 06

6

Selective Transit

Example: Transit AS 3 carries

traffic between AS 1 and AS 4 and between AS 2 and AS 4

But AS 3 does not carry traffic between AS 1 and AS 2

The example shows a routing policy.

AS 2AS 1

AS 3

AS 4

Page 7: COMS/CSEE 4140 Networking Laboratory Lecture 06

7

Customer/Provider

A stub network typically obtains access to the Internet through a transit network.

Transit network that is a provider may be a customer for another network

Customer pays provider for service

AS 5

AS 2

Customer/Provider

AS 6

Customer/Provider

AS 6

Customer/Provider

AS 4

Customer/Provider

AS 6

Customer/Provider

Page 8: COMS/CSEE 4140 Networking Laboratory Lecture 06

8

Customer/Provider and Peers

Transit networks can have a peer relationship Peers provide transit between their respective customers Peers do not provide transit between peers Peers normally do not pay each other for service

AS 3

AS 5

AS 2Peers

Customer/Provider

AS 6

Customer/Provider

AS 1Peers

AS 6

Customer/Provider

AS 4

Customer/Provider

AS 6

Customer/Provider

Page 9: COMS/CSEE 4140 Networking Laboratory Lecture 06

9

Shortcuts through peering

Note that peering reduces upstream traffic Delays can be reduced through peering But: Peering may not generate revenue

AS 3

AS 5

AS 2Peers

Customer/Provider

AS 6

Customer/Provider

AS 1Peers

AS 6

Customer/Provider

AS 4

Customer/Provider

AS 6

Customer/Provider

Peers

Page 10: COMS/CSEE 4140 Networking Laboratory Lecture 06

10

ASNs already assigned

Source: http://www.potaroo.net/tools/asn32/

private ASN: 65412 – 65536

Page 11: COMS/CSEE 4140 Networking Laboratory Lecture 06

11

ASNs in use

Page 12: COMS/CSEE 4140 Networking Laboratory Lecture 06

12

ASN projections

Page 13: COMS/CSEE 4140 Networking Laboratory Lecture 06

13

Autonomous Routing Domains Don’t Always Need BGP or an ASN

Qwest

Yale University

Nail up default routes 0.0.0.0/0pointing to Qwest

Nail up routes 130.132.0.0/16pointing to Yale

130.132.0.0/16

Static routing is the most common way of connecting anautonomous routing domain to the Internet. This helps explain why BGP is a mystery to many …

ARDs versus ASes

Page 14: COMS/CSEE 4140 Networking Laboratory Lecture 06

14

ASNs Can Be “Shared” (RFC 2270)

AS 701UUNet

ASN 7046 is assigned to UUNet. It is used byCustomers single homed to UUNet, but needing BGP for some reason (load balancing, etc..) [RFC 2270]

AS 7046Crestar Bank

AS 7046 NJIT

AS 7046HoodCollege

128.235.0.0/16

Page 15: COMS/CSEE 4140 Networking Laboratory Lecture 06

15

ARDs and ASes: Summary Most ARDs have no ASN (statically routed

at Internet edge)

Some unrelated ARDs share the same ASN (RFC 2270)

Some ARDs are implemented with multiple ASNs (example: Worldcom)

ASes are just an implementation detail of Inter-domain routing

Page 16: COMS/CSEE 4140 Networking Laboratory Lecture 06

16

Agenda Autonomous Systems (AS) Policy vs. distance based routing Border gateway protocol (BGP) Transmission control protocol (TCP)

Page 17: COMS/CSEE 4140 Networking Laboratory Lecture 06

17

Regional ISP1

Regional ISP2

Regional ISP3

Cust1Cust3 Cust2

National ISP1

National ISP2

YES

NO

Shortest path routing is not compatible with commercial relations

Why not minimize “AS hop Count”?

Page 18: COMS/CSEE 4140 Networking Laboratory Lecture 06

18

Customer versus Provider

Customer pays provider for access to the Internet

provider

customer

IP trafficprovider customer

Page 19: COMS/CSEE 4140 Networking Laboratory Lecture 06

19

peer peer

customerprovider

Peers provide transit between their respective customers

Peers do not provide transit between peers

Peers (often) do not exchange $$$trafficallowed

traffic NOTallowed

The “Peering” Relationship

Page 20: COMS/CSEE 4140 Networking Laboratory Lecture 06

20

Peering also allows connectivity betweenthe customers of “Tier 1” providers.

peer peer

customerprovider

Peering Provides Shortcuts

Page 21: COMS/CSEE 4140 Networking Laboratory Lecture 06

21

Peering Wars

Reduces upstream transit costs

Can increase end-to-end performance

May be the only way to connect your customers to some part of the Internet (“Tier 1”)

You would rather have customers

Peers are usually your competition

Peering relationships may require periodic renegotiation

Peering struggles are by far the most contentious issues in the ISP world!

Peering agreements are often confidential.

Peer Don’t Peer

Page 22: COMS/CSEE 4140 Networking Laboratory Lecture 06

22

Agenda Autonomous Systems (AS) Policy vs. distance based routing Border gateway protocol (BGP) Transmission control protocol (TCP)

Page 23: COMS/CSEE 4140 Networking Laboratory Lecture 06

23

The Gang of FourLink State Vectoring

EGP

IGP

BGP

RIPIS-IS

OSPF

Page 24: COMS/CSEE 4140 Networking Laboratory Lecture 06

24

BGP Overview BGP = Border Gateway Protocol v4 . RFC 1771. (~

60 pages) Note: In the context of BGP, a gateway is nothing

else but an IP router that connects autonomous systems.

Interdomain routing protocol for routing between autonomous systems.

Uses TCP to establish a BGP session and to send routing messages over the BGP session.

Update only new routes. BGP is a path vector protocol. Routing messages in

BGP contain complete routes. Network administrators can specify routing

policies.

Page 25: COMS/CSEE 4140 Networking Laboratory Lecture 06

25

BGP Policy-based Routing Each node is assigned an AS number (ASN)

BGP’s goal is to find any AS-path (not an optimal one). Since the internals of the AS are never revealed, finding an optimal path is not feasible.

Network administrator sets BGP’s policies to determine the best path to reach a destination network.

Page 26: COMS/CSEE 4140 Networking Laboratory Lecture 06

26

BGP = RFC 1771

+ “optional” extensionsRFC 1997 (communities) RFC 2439 (damping) RFC 2796 (reflection) RFC3065 (confederation) …

+ routing policy configurationlanguages (vendor-specific)

+ Current Best Practices in management of Interdomain Routing

BGP was not DESIGNED. It EVOLVED.

The Border Gateway Protocol (BGP)

Page 27: COMS/CSEE 4140 Networking Laboratory Lecture 06

27

BGP Route Processing

Best Route Selection

Apply Import Policies

Best Route Table

Apply Export Policies

Install forwardingEntries for bestRoutes.

ReceiveBGPUpdates

BestRoutes

TransmitBGP Updates

Apply Policy =filter routes & tweak attributes

Based onAttributeValues

IP Forwarding Table

Apply Policy =filter routes & tweak attributes

Open ended programming.Constrained only by vendor configuration language

Page 28: COMS/CSEE 4140 Networking Laboratory Lecture 06

28

BGP Attributes

Value Code Reference----- --------------------------------- --------- 1 ORIGIN [RFC1771] 2 AS_PATH [RFC1771] 3 NEXT_HOP [RFC1771] 4 MULTI_EXIT_DISC [RFC1771] 5 LOCAL_PREF [RFC1771] 6 ATOMIC_AGGREGATE [RFC1771] 7 AGGREGATOR [RFC1771] 8 COMMUNITY [RFC1997] 9 ORIGINATOR_ID [RFC2796] 10 CLUSTER_LIST [RFC2796] 11 DPA [Chen] 12 ADVERTISER [RFC1863] 13 RCID_PATH / CLUSTER_ID [RFC1863] 14 MP_REACH_NLRI [RFC2283] 15 MP_UNREACH_NLRI [RFC2283] 16 EXTENDED COMMUNITIES [Rosen] ... 255 reserved for development

From IANA: http://www.iana.org/assignments/bgp-parameters

Mostimportantattributes

Not all attributesneed to be present inevery announcement

Page 29: COMS/CSEE 4140 Networking Laboratory Lecture 06

29

LOCAL_PREF Attribute

Forces outbound traffic to take primary link, unless link is down.

Page 30: COMS/CSEE 4140 Networking Laboratory Lecture 06

30

NEXT_HOP Attribute

EGP: IP address used to reach the advertising router IGP: next-hop address is carried into local AS

Page 31: COMS/CSEE 4140 Networking Laboratory Lecture 06

31

AS_PATH Attribute

Used to detect routing loops and find shortest paths

Page 32: COMS/CSEE 4140 Networking Laboratory Lecture 06

32

Prepending will (usually) force inbound traffic from AS 1to take primary linkAS 1

192.0.2.0/24ASPATH = 2 2 2

customerAS 2

provider

192.0.2.0/24

backupprimary

192.0.2.0/24ASPATH = 2

Yes, this is a Glorious Hack …

Shedding Inbound Traffic with ASPATH Prepending

Page 33: COMS/CSEE 4140 Networking Laboratory Lecture 06

33

AS 1

192.0.2.0/24ASPATH = 2 2 2 2 2 2 2 2 2 2 2 2 2

customerAS 2

provider

192.0.2.0/24

192.0.2.0/24ASPATH = 2

AS 3provider

AS 3 will sendtraffic on “backup”link because it prefers customer routes and localpreference is considered before ASPATH length!

Padding in this way is oftenused as a form of loadbalancing

backupprimary

… But Padding Does Not Always Work

Page 34: COMS/CSEE 4140 Networking Laboratory Lecture 06

34

AS 1

customerAS 2

provider

192.0.2.0/24

192.0.2.0/24ASPATH = 2

AS 3provider

backupprimary

192.0.2.0/24ASPATH = 2 COMMUNITY = 3:70

Customer import policy at AS 3:If 3:90 in COMMUNITY then set local preference to 90If 3:80 in COMMUNITY then set local preference to 80If 3:70 in COMMUNITY then set local preference to 70

AS 3: normal customer local pref is 100,peer local pref is 90

COMMUNITY Attribute to the Rescue!

Page 35: COMS/CSEE 4140 Networking Laboratory Lecture 06

35

BGP Issues - What is a BGP Wedgie?

BGP policies make sense locally Interaction of local policies allows

multiple stable routings Some routings are consistent with

intended policies, and some are not If an unintended routing is

installed (BGP is “wedged”), then manual intervention is needed to change to an intended routing

When an unintended routing is installed, no single group of network operators has enough knowledge to debug the problem

¾ wedgie

Full wedgie

Page 36: COMS/CSEE 4140 Networking Laboratory Lecture 06

36

YouTube blocking Pakistan blocks YouTube How? (according to BBC)

Advertise a shorter route to reach YouTube The incorrect short route gets propagated Seen by two thirds of the Internet Traffic to YouTube goes through Pakistan Since Pakistan blocked YouTube, all traffic

reaches a dead end!

Page 37: COMS/CSEE 4140 Networking Laboratory Lecture 06

37

Dynamic Routing Protocols: Summary Dynamic routing protocols: RIP, OSPF, BGP

RIP uses distance vector algorithm, and converges slow (the count-to-infinity problem)

OSPF uses link state algorithm, and converges fast. But it is more complicated than RIP.

Both RIP and OSPF finds lowest-cost path.

BGP uses path vector algorithm, and its path selection algorithm is complicated, and is influenced by policies.

BGP has its own problems see WIDGI by Tim Griffin

Page 38: COMS/CSEE 4140 Networking Laboratory Lecture 06

38

More Readings (Optional)BGP Wedgies: Bad Routing Policy Interactions that Cannot be Debugged

JI’s Intro to interdomain routing.

"Interdomain Setting of PlanetLab Nodes." PlanetLab Meeting, May 14, 2004.

Understanding the Border Gateway Protocol (BGP) ICNP 2002 Tutorial Session

Page 39: COMS/CSEE 4140 Networking Laboratory Lecture 06

39

Agenda Autonomous Systems (AS) Policy vs. distance based routing Border gateway protocol (BGP) Transmission control protocol (TCP)

Page 40: COMS/CSEE 4140 Networking Laboratory Lecture 06

40

Transmission Control Protocol (RFC) Reliable and in-order byte-stream service

TCP format Connection establishment Flow control Reaction to congestion Packet corruption

Page 41: COMS/CSEE 4140 Networking Laboratory Lecture 06

41

TCP Format

IP header TCP header TCP data

Sequence number (32 bits)

DATA

20 bytes 20 bytes

0 15 16 31

Source Port Number Destination Port Number

Acknowledgement number (32 bits)

window sizeheaderlength

0 Flags

Options (if any)

TCP checksum urgent pointer

20 bytes• TCP segments have a 20 byte header with >= 0 bytes of data.

Page 42: COMS/CSEE 4140 Networking Laboratory Lecture 06

42

TCP header fields Sequence Number (SeqNo):

Sequence number is 32 bits long. So the range of SeqNo is

0 <= SeqNo <= 232 -1 4.3 Gbyte

Each sequence number identifies a byte in the byte stream

Initial Sequence Number (ISN) of a connection is set during connection establishmentQ: What are possible requirements for ISN ?

Page 43: COMS/CSEE 4140 Networking Laboratory Lecture 06

43

TCP header fields Acknowledgement Number (AckNo):

Acknowledgements are piggybacked, i.e.,a segment from A -> B can contain an acknowledgement for a data sent in the B -> A direction

Q: Why is piggybacking good ?

A hosts uses the AckNo field to send acknowledgements. (If a host sends an AckNo in a segment it sets the “ACK flag”)

The AckNo contains the next SeqNo that a hosts wants to receiveExample: The acknowledgement for a segment with sequence numbers 0-1500 is AckNo=1501

Page 44: COMS/CSEE 4140 Networking Laboratory Lecture 06

44

TCP header fields Acknowledge Number (cont’d)

TCP uses the sliding window flow protocol (see CS 457) to regulate the flow of traffic from sender to receiver

TCP uses the following variation of sliding window: no NACKs (Negative ACKnowledgement) only cumulative ACKs

Example: Assume: Sender sends two segments with “1..1500”

and “1501..3000”, but receiver only gets the second segment.

In this case, the receiver cannot acknowledge the second packet. It can only send AckNo=1

Page 45: COMS/CSEE 4140 Networking Laboratory Lecture 06

45

TCP header fields Header Length ( 4bits):

Length of header in 32-bit words Note that TCP header has variable length

(with minimum 20 bytes)

Page 46: COMS/CSEE 4140 Networking Laboratory Lecture 06

46

TCP header fields Flag bits:

URG: Urgent pointer is valid If the bit is set, the following bytes contain an urgent

message in the range:SeqNo <= urgent message <= SeqNo+urgent pointer

ACK: Acknowledgement Number is valid PSH: PUSH Flag

Notification from sender to the receiver that the receiver should pass all data that it has to the application.

Normally set by sender when the sender’s buffer is empty

Page 47: COMS/CSEE 4140 Networking Laboratory Lecture 06

47

TCP header fields Flag bits:

RST: Reset the connection The flag causes the receiver to reset the connection Receiver of a RST terminates the connection and

indicates higher layer application about the reset

SYN: Synchronize sequence numbers Sent in the first packet when initiating a connection

FIN: Sender is finished with sending Used for closing a connection Both sides of a connection must send a FIN

Page 48: COMS/CSEE 4140 Networking Laboratory Lecture 06

48

TCP header fields Window Size:

Each side of the connection advertises the window size

Window size is the maximum number of bytes that a receiver can accept.

Maximum window size is 216-1= 65535 bytes TCP Checksum:

TCP checksum covers over both TCP header and TCP data (also covers some parts of the IP header)

16-bit one’s complement Urgent Pointer:

Only valid if URG flag is set

Page 49: COMS/CSEE 4140 Networking Laboratory Lecture 06

49

TCP header fields Options:

End ofOptions kind=0

1 byte

NOP(no operation) kind=1

1 byte

MaximumSegment Size kind=2

1 byte

len=4

1 byte

maximumsegment size

2 bytes

Window ScaleFactor kind=3

1 byte

len=3

1 byte

shift count

1 byte

Timestamp kind=8

1 byte

len=10

1 byte

timestamp value

4 bytes

timestamp echo reply

4 bytes

Page 50: COMS/CSEE 4140 Networking Laboratory Lecture 06

50

TCP header fields Options:

NOP is used to pad TCP header to multiples of 4 bytes

Maximum Segment Size Window Scale Options

Increases the TCP window from 16 to 32 bits, i.e., the window size is interpreted differentlyQ: What is the different interpretation ?

This option can only be used in the SYN segment (first segment) during connection establishment time

Timestamp Option Can be used for roundtrip measurements

Page 51: COMS/CSEE 4140 Networking Laboratory Lecture 06

51

Three-Way Handshake

aida.poly.edu mng.poly.edu

S 1031880193:1031880193(0)win 16384 <mss 1460, ...>

S 172488586:172488586(0)

ack 1031880194 win 8760 <mss 1460>

ack 172488587 win 17520

Page 52: COMS/CSEE 4140 Networking Laboratory Lecture 06

52

Why is a Two-Way Handshake not enough?

aida.poly.edu mng.poly.edu

S 15322112354:15322112354(0)win 16384 <mss 1460, ...>

S 172488586:172488586(0)

win 8760 <mss 1460>

S 1031880193:1031880193(0)win 16384 <mss 1460, ...>

The redline is adelayedduplicatepacket.

When aida initiates the data transfer (starting with SeqNo=15322112355), mng will reject all data.

Will be discarded as a duplicate

SYN

Page 53: COMS/CSEE 4140 Networking Laboratory Lecture 06

53

TCP Connection Termination

aida.poly.edu mng.poly.edu

F 172488734:172488734(0)

ack 1031880221 win 8733

. ack 172488735 win 17484

. ack 1031880222 win 8733

F 1031880221:1031880221(0)ack 172488735 win 17520

Page 54: COMS/CSEE 4140 Networking Laboratory Lecture 06

54

Connection termination with tcpdump

1 mng.poly.edu.telnet > aida.poly.edu.1121: F 172488734:172488734(0) ack 1031880221 win 8733

2 aida.poly.edu.1121 > mng.poly.edu.telnet: . ack 172488735 win 174843 aida.poly.edu.1121 > mng.poly.edu.telnet: F 1031880221:1031880221(0)

ack 172488735 win 175204 mng.poly.edu.telnet > aida.poly.edu.1121: . ack 1031880222 win 8733

aida.poly.edu mng.poly.edu

aida issuesan "telnet mng"

Page 55: COMS/CSEE 4140 Networking Laboratory Lecture 06

55

TCP States in “Normal” Connection Lifetime

SYN (SeqNo = x)

SYN (SeqNo = y, AckNo = x + 1 )

(AckNo = y + 1 )

SYN_SENT(active open)

SYN_RCVD

ESTABLISHED

ESTABLISHED

FIN_WAIT_1(active close)

LISTEN(passive open)

FIN (SeqNo = m)

CLOSE_WAIT(passive close)

(AckNo = m+ 1 )

FIN (SeqNo = n )

(AckNo = n+1)LAST_ACK

FIN_WAIT_2

TIME_WAIT

CLOSED

Page 56: COMS/CSEE 4140 Networking Laboratory Lecture 06

56

TCP State Transition DiagramOpening A Connection

CLOSED

LISTEN

SYN RCVD SYN SENT

ESTABLISHED

active opensend: SYN

recv: SYN, ACKsend: ACK

recv: SYNse nd: SYN, ACK

recvd: ACKsend: . / .

recv:RST

Application sends datasend: SYN

simultaneous openrecv: SYNsend: SYN, ACK

close ortimeout

passive opensend: . / .

recvd: FIN send: FIN

send:FIN

Page 57: COMS/CSEE 4140 Networking Laboratory Lecture 06

57

TCP State Transition DiagramClosing A Connection

FIN_WAIT_1

FIN_WAIT_2

ESTABLISHED

recv: FINsend: ACK

recv: ACKsend: . / .

recvd: ACKsend: . / .

recv:FIN, ACKsend: ACK

active closesend: FIN

TIME_WAIT

CLOSING

recv: FINsend: ACK

CLOSED

Timeout(2 MSL)

CLOSE_WAIT

LAST_ACK

passive closerecv: FINsend: ACK

applicationclosessend: FIN

recv: ACKsend: . / .

Issue close()

Page 58: COMS/CSEE 4140 Networking Laboratory Lecture 06

58

2MSL Wait State2MSL Wait State = TIME_WAIT When TCP does an active close, and sends the final

ACK, the connection must stay in in the TIME_WAIT state for twice the maximum segment lifetime.

2MSL= 2 * Maximum Segment Lifetime

Why? TCP is given a chance to resent the final ACK. (Server will timeout after sending the FIN segment and resend the FIN)

The MSL is set to 2 minutes or 1 minute or 30 seconds.

Page 59: COMS/CSEE 4140 Networking Laboratory Lecture 06

59

Rules for sending Acknowledgments TCP has rules that influence the transmission of

acknowledgments

Rule 1: Delayed Acknowledgments Goal: Avoid sending ACK segments that do not carry data Implementation: Delay the transmission of (some) ACKs

Rule 2: Nagle’s rule Goal: Reduce transmission of small segments

Implementation: A sender cannot send multiple segments with a 1-byte payload (i.e., it must wait for an ACK)

Page 60: COMS/CSEE 4140 Networking Laboratory Lecture 06

60

Delayed Acknowledgement TCP delays transmission of ACKs for up to 200ms

Goal: Avoid to send ACK packets that do not carry data. The hope is that, within the delay, the receiver will have data ready to be sent to the receiver. Then, the ACK can be

piggybacked with a data segmentIn Example: Delayed ACK explains why the “ACK of character” and the “echo of character” are sent in the same segment The duration of delayed ACKs can be observed in the example when Argon sends ACKs

Exceptions: ACK should be sent for every second full sized segment Delayed ACK is not used when packets arrive out of order

Page 61: COMS/CSEE 4140 Networking Laboratory Lecture 06

61

Observing Delayed Acknowledgements

• Remote terminal applications (e.g., Telnet) send characters to a server. The server interprets the character and sends the output at the server to the client.

• For each character typed, you see three packets:1. Client Server: Send typed character 2. Server Client: Echo of character (or user output) and

acknowledgement for first packet3. Client Server: Acknowledgement for second packet

1.send character

2.interpretcharacter

3.send echo of character

and/or output

Host withTelnet client

Host withTelnet server

Page 62: COMS/CSEE 4140 Networking Laboratory Lecture 06

62

Observing Delayed Acknowledgements

Argon Neon

Telnet sessionfrom Argonto Neon

This is the output of typing 3 (three) characters :

Time 44.062449: Argon Neon: Push, SeqNo 0:1(1), AckNo 1 Time 44.063317: Neon Argon: Push, SeqNo 1:2(1), AckNo 1Time 44.182705: Argon Neon: No Data, AckNo 2

Time 48.946471: Argon Neon: Push, SeqNo 1:2(1), AckNo 2 Time 48.947326: Neon Argon: Push, SeqNo 2:3(1), AckNo 2 Time 48.982786: Argon Neon: No Data, AckNo 3

Time 55.116581: Argon Neon: Push, SeqNo 2:3(1) AckNo 3Time 55.117497: Neon Argon: Push, SeqNo 3:4(1) AckNo 3 Time 55.183694: Argon Neon: No Data, AckNo 4

Page 63: COMS/CSEE 4140 Networking Laboratory Lecture 06

63

Why 3 segments per character? We would expect four

segments per character:

But we only see three segments per character:

This is due to delayed acknowledgements

character

ACK of character

ACK of echoed character

echo of character

character

ACK and echo of character

ACK of echoed character

Page 64: COMS/CSEE 4140 Networking Laboratory Lecture 06

64

Observing Nagle’s Rule

argon.cs.virginia.edu

3000miles

tenet.cs.berkeley.edu

Telnet sessionbetween argon.cs.virginia.eduandtenet.cs.berkeley.edu

This is the output of typing 7 characters :

Time 16.401963: Argon Tenet: Push, SeqNo 1:2(1), AckNo 2 Time 16.481929: Tenet Argon: Push, SeqNo 2:3(1) , AckNo 2

Time 16.482154: Argon Tenet: Push, SeqNo 2:3(1) , AckNo 3Time 16.559447: Tenet Argon: Push, SeqNo 3:4(1), AckNo 3

Time 16.559684: Argon Tenet: Push, SeqNo 3:4(1), AckNo 4 Time 16.640508: Tenet Argon: Push, SeqNo 4:5(1) AckNo 4

Time 16.640761: Argon Tenet: Push, SeqNo 4:8(4) AckNo 5 Time 16.728402: Tenet Argon: Push, SeqNo 5:9(4) AckNo 8

Page 65: COMS/CSEE 4140 Networking Laboratory Lecture 06

65

Observing Nagle’s Rule Observation: Transmission

of segments follows a different pattern, i.e., there are only two segments per character typed

Delayed acknowledgment does not kick in at Argon

The reason is that there is always data at Argon ready to sent when the ACK arrives

Why is Argon not sending the data (typed character) as soon as it is available?

char1

ACK + char2

ACK + char3

ACK + char4-7

Page 66: COMS/CSEE 4140 Networking Laboratory Lecture 06

66

Resetting Connections Resetting connections is done by setting

the RST flag When is the RST flag set?

Connection request arrives and no server process is waiting on the destination port

Abort (Terminate) a connection Causes the receiver to throw away buffered data. Receiver does not acknowledge the RST segment

Page 67: COMS/CSEE 4140 Networking Laboratory Lecture 06

67

TCP Congestion Control TCP has a mechanism for congestion control.

The mechanism is implemented at the sender

The window size at the sender is set as follows:Send Window = MIN (flow control window, congestion window)

where flow control window is advertised by the receiver congestion window is adjusted based on feedback

from the network

Page 68: COMS/CSEE 4140 Networking Laboratory Lecture 06

68

TCP Congestion Control TCP congestion control is governed by

two parameters: Congestion Window (cwnd)

Slow-start threshhold Value (ssthresh)Initial value is 216-1

Congestion control works in two modes: slow start (cwnd < ssthresh) congestion avoidance (cwnd ≥ ssthresh

Page 69: COMS/CSEE 4140 Networking Laboratory Lecture 06

69

Slow Start Initial value: Set cwnd = 1

Note: Unit is a segment size. TCP actually is based on bytes and increments by 1 MSS (maximum segment size)

The receiver sends an acknowledgement (ACK) for each Segment Note: Generally, a TCP receiver sends an ACK for every other

segment. Each time an ACK is received by the sender, the congestion

window is increased by 1 segment:cwnd = cwnd + 1

If an ACK acknowledges two segments, cwnd is still increased by only 1 segment.

Even if ACK acknowledges a segment that is smaller than MSS bytes long, cwnd is increased by 1.

Does Slow Start increment slowly? Not really. In fact, the increase of cwnd is exponential

Page 70: COMS/CSEE 4140 Networking Laboratory Lecture 06

70

Slow Start Example The congestion

window size grows very rapidly For every ACK, we

increase cwnd by 1 irrespective of the number of segments ACK’ed

TCP slows down the increase of cwnd when cwnd > ssthresh

cwnd = 1

cwnd = 2

cwnd = 4

cwnd = 7

Page 71: COMS/CSEE 4140 Networking Laboratory Lecture 06

71

Congestion Avoidance Congestion avoidance phase is started if

cwnd has reached the slow-start threshold value

If cwnd ≥ ssthresh then each time an ACK is received, increment cwnd as follows:

cwnd = cwnd + 1/ cwnd

So cwnd is increased by one only if all cwnd segments have been acknowledged.

Page 72: COMS/CSEE 4140 Networking Laboratory Lecture 06

72

Example of Slow Start/Congestion Avoidance

Assume that ssthresh = 8

cwnd = 1

cwnd = 2

cwnd = 4

cwnd = 8

cwnd = 9

cwnd = 10

0

2

4

6

8

10

12

14

t=0

t=2

t=4

t=6

Roundtrip times

Cw

nd

(in

seg

men

ts)

ssthresh

Page 73: COMS/CSEE 4140 Networking Laboratory Lecture 06

73

Responses to Congestion So, TCP assumes there is congestion if it

detects a packet loss A TCP sender can detect lost packets via:

Timeout of a retransmission timer Receipt of a duplicate ACK

TCP interprets a Timeout as a binary congestion signal. When a timeout occurs, the sender performs: cwnd is reset to one:

cwnd = 1 ssthresh is set to half the current size of the congestion

window:ssthressh = cwnd / 2

and slow-start is entered

Page 74: COMS/CSEE 4140 Networking Laboratory Lecture 06

74

Fast Retransmit If three or more duplicate

ACKs are received in a row, the TCP sender believes that a segment has been lost.

Then TCP performs a retransmission of what seems to be the missing segment, without waiting for a timeout to happen.

Enter slow start:ssthresh = cwnd/2

cwnd = 1

1. duplicate

2. duplicate

3. duplicate

Page 75: COMS/CSEE 4140 Networking Laboratory Lecture 06

75

Fast Recovery Fast recovery avoids slow start

after a fast retransmit

Intuition: Duplicate ACKs indicate that data is getting through

After three duplicate ACKs set: Retransmit packet that is

presumed lost ssthresh = cwnd/2 cwnd = cwnd+3 (note the order of operations) Increment cwnd by one for each

additional duplicate ACK

When ACK arrives that acknowledges “new data” (here: AckNo=6148), set:

cwnd=ssthreshenter congestion avoidance

1K SeqNo=0

AckNo=1024

AckNo=1024

1K SeqNo=1024

SeqNo=20481K

AckNo=1024

SeqNo=30721K

SeqNo=40961K

1. duplicate

2. duplicate

AckNo=1024

SeqNo=10241K

SeqNo=51201K

3. duplicate

cwnd=12sshtresh=5

cwnd=12sshtresh=5

cwnd=12sshtresh=5

cwnd=12sshtresh=5

cwnd=15sshtresh=6

AckNo=6148cwnd=6sshtresh=6

ACK for new data

Page 76: COMS/CSEE 4140 Networking Laboratory Lecture 06

76

Flavors of TCP Congestion Control TCP Tahoe (1988, FreeBSD 4.3 Tahoe)

Slow Start Congestion Avoidance Fast Retransmit

TCP Reno (1990, FreeBSD 4.3 Reno) Fast Recovery

New Reno (1996) SACK (1996)

RED (Floyd and Jacobson 1993)

Page 77: COMS/CSEE 4140 Networking Laboratory Lecture 06

77

SACK SACK = Selective acknowledgment

Issue: Reno and New Reno retransmit at most 1 lost packet per round trip time

Selective acknowledgments: The receiver can acknowledge non-continuous blocks of data (SACK 0-1023, 1024-2047)

Multiple blocks can be sent in a single segment.

TCP SACK: Enters fast recovery upon 3 duplicate ACKs Sender keeps track of SACKs and infers if segments are lost.

Sender retransmits the next segment from the list of segments that are deemed lost.

Page 78: COMS/CSEE 4140 Networking Laboratory Lecture 06

78

TCP in Linux Congestion control algorithm is pluggable

/proc/sys/net/ipv4/tcp_congestion_control TCP read and write buffer sizes

/proc/sys/net/ipv4/tcp_r[w]mem

Page 79: COMS/CSEE 4140 Networking Laboratory Lecture 06

79

Midterm questions ARP, ICMP, UDP, TCP, RIP, OSPF, BGP Compare and contrast design principles in

protocols. Fragmentation