Top Banner
For FAST Meeting July.2 Towards Gigabit Towards Gigabit David Wei David Wei Netlab@Caltech Netlab@Caltech
60

For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

For FAST Meeting July.2

Towards GigabitTowards Gigabit

David WeiDavid WeiNetlab@CaltechNetlab@Caltech

Page 2: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Potential ProblemsPotential Problems

Hardware / Driver / OSHardware / Driver / OS Protocol Stack OverheadProtocol Stack Overhead Scalability of the protocol specificatiScalability of the protocol specificati

onon TCP Stability /Utilization (New CongTCP Stability /Utilization (New Cong

estion Control Algorithm)estion Control Algorithm) Related Experiments & MeasuremenRelated Experiments & Measuremen

tsts

Page 3: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Hardware / Drivers /OSHardware / Drivers /OS NIC DriverNIC Driver Device ManagemeDevice Manageme

nt (Interrupt)nt (Interrupt) Redundant CopiesRedundant Copies

Device Polling (httDevice Polling (http://info.iet.unipi.it/p://info.iet.unipi.it/~luigi/polling/)~luigi/polling/)

Zero-Copy TCPZero-Copy TCP ……

www.cs.duke.edu/ari/publications/talks/freebsdconwww.cs.duke.edu/ari/publications/talks/freebsdcon

Page 4: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Device PollingDevice Polling Current process for NIC driver in FreeBSD:Current process for NIC driver in FreeBSD:1.1. Packet come to NICPacket come to NIC2.2. NIC->Hardware InterruptNIC->Hardware Interrupt3.3. CPU jumps to the interrupt handler for that NICCPU jumps to the interrupt handler for that NIC4.4. MAC layer process reads data from NIC to a MAC layer process reads data from NIC to a

queuequeue5.5. Upper layer process the data in queue (lower Upper layer process the data in queue (lower

priority)priority) Drawback:Drawback:CPU checks the NIC for every packet -- Context CPU checks the NIC for every packet -- Context

switching.switching.Frequent interruption for high speed deviceFrequent interruption for high speed device Live-Lock:Live-Lock:CPU is too busy working on NIC interruption to CPU is too busy working on NIC interruption to

process the data in the queue. process the data in the queue.

Page 5: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Device PollingDevice Polling

Device Polling:Device Polling: Polling: CPU checks the device when it Polling: CPU checks the device when it

has time. has time. Scheduling: User specifies a time ratio Scheduling: User specifies a time ratio

for CPU to work on devices and on for CPU to work on devices and on non-device processing.non-device processing.

Advantages:Advantages: Balance between the device service and Balance between the device service and

non-device processingnon-device processing Improve performance in fast devicesImprove performance in fast devices

Page 6: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Protocol Stack OverheadProtocol Stack Overhead

Per-packet over head:Per-packet over head: Ethernet Header / ChecksumEthernet Header / Checksum IP Header / ChecksumIP Header / Checksum TCP Header / ChecksumTCP Header / Checksum Coping / interruption processCoping / interruption process

Solution: Increase packet sizeSolution: Increase packet size Opt Packet Size=min{ packet size Opt Packet Size=min{ packet size

along the path} (Fragmentation results along the path} (Fragmentation results in low performance too.)in low performance too.)

Page 7: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Path MTU Discovery Path MTU Discovery (1191)(1191)

Current Method:Current Method: ““Don’t Fragment” bits Don’t Fragment” bits (Router: Drop/Fragment; Host: (Router: Drop/Fragment; Host:

Test/Enforce)Test/Enforce) MTU=min{576, first hop MTU}MTU=min{576, first hop MTU} MSS=MTU-40MSS=MTU-40 MTU<=65535 (Architecture)MTU<=65535 (Architecture) MSS<=65495 (IP sign-bit bugs…)MSS<=65495 (IP sign-bit bugs…) Drawback: Usually too smallDrawback: Usually too small

Page 8: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Path MTU DiscoveryPath MTU Discovery How to Discover PMTU?How to Discover PMTU?

Current:Current: Search (Proportional Decreasing / Search (Proportional Decreasing /

Binary)Binary) Update (Periodically Increasing – set to Update (Periodically Increasing – set to

the MTU of first hop)the MTU of first hop)

Proposed:Proposed: Search/Update with typical MTU valuesSearch/Update with typical MTU values Routers: provide suggestion of MTU in Routers: provide suggestion of MTU in

DTB indicating the DF pack drop.DTB indicating the DF pack drop.

Page 9: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Path MTU DiscoveryPath MTU DiscoveryImplementationImplementationHost:Host: Packetization Layer (TCP / Connection over UDPacketization Layer (TCP / Connection over UD

P): DF/Packet SizeP): DF/Packet Size IP: Store PMTU for each known path (routing taIP: Store PMTU for each known path (routing ta

ble)ble) ICMP: “Datagram Too Big” MessageICMP: “Datagram Too Big” MessageRouter:Router: Send ICMP Packet when Datagram is too big.Send ICMP Packet when Datagram is too big.Implementation problems:Implementation problems: RFC 2923RFC 2923

Page 10: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Scalability of Protocol Scalability of Protocol SpecificationsSpecifications

Windows Size Space (<=64K)Windows Size Space (<=64K) Sequence Number Space (Wrapping Sequence Number Space (Wrapping

up, <=2G)up, <=2G) Inadequate Frequency of RTT Inadequate Frequency of RTT

Sampling (1 sample per Window)Sampling (1 sample per Window)

3 4 51 2

3 4 51 2

Page 11: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Sequence Number SpaceSequence Number Space

1

Page 12: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Sequence Number SpaceSequence Number Space

3 4 51 2

3 4 51 2

ACK:

1

ACK:

1

ACK:

1

Page 13: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Sequence Number SpaceSequence Number Space

3 4 51 2

3 4 51 2

ACK:

1

ACK:

1

ACK:

1

Page 14: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Sequence Number SpaceSequence Number Space

3 4 51 2

3 4 51 2

ACK:

1

ACK:

1

ACK:

1

ACK:

6

Page 15: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Sequence Number SpaceSequence Number Space

3 4 5 6 7. . 1 2

6 7 . .

ACK:

7

ACK:

6

Page 16: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Sequence Number SpaceSequence Number Space

6 7. . . . 0

6 7 . .. . 0

ACK:

7

ACK:

6

ACK:

1

Page 17: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Sequence Number SpaceSequence Number Space

6 7. . . . 0 ??

6 7 . .. . 0

ACK:

7

ACK:

6

ACK:

1

Page 18: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Sequence Number SpaceSequence Number Space

6 7. . . . 0 ??

6 7 . .. . 0

ACK:

7

ACK:

6

ACK:

1Accept when the

del ay <=MaxSegment Li fe

Page 19: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Sequence Number SpaceSequence Number Space

MSL (Max Segment Life)>Variance of IP MSL (Max Segment Life)>Variance of IP delaydelay

MSL<Sequence Number Space/BandwidthMSL<Sequence Number Space/Bandwidth

6 7. . . . 0 ??

6 7 . .. . 0

ACK:

7

ACK:

6

ACK:

1

Accept when thedel ay <=MaxSegment Li fe

Page 20: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Sequence Number SpaceSequence Number Space

MSL (Max Segment Life)>Variance in IPMSL (Max Segment Life)>Variance in IP MSL<8*|Sequence Number MSL<8*|Sequence Number

Space|/BandwidthSpace|/Bandwidth |SN Space|=2^31=2GB|SN Space|=2^31=2GB Bandwidth=1GBBandwidth=1GB MSL<=16secMSL<=16sec Variance of IP delay<=16 secVariance of IP delay<=16 sec Current TCP: 3 min.Current TCP: 3 min. Not scalable with bandwidth growthNot scalable with bandwidth growth

Page 21: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

TCP-Extensions (1323)TCP-Extensions (1323) Window Spaces: 16bit Scale Factor in Window Spaces: 16bit Scale Factor in

SYN: Win=[Win]*2^SSYN: Win=[Win]*2^S RTT Measurement: Timestamp for each RTT Measurement: Timestamp for each

packet (generated by sender, relayed by packet (generated by sender, relayed by receiver)receiver)

PAWS (Protect Against Wrapped PAWS (Protect Against Wrapped Sequence Number): Use timestamp to Sequence Number): Use timestamp to expand the sequence space. (So the timer expand the sequence space. (So the timer should not be too fast or too slow: 1ms ~ should not be too fast or too slow: 1ms ~ 1 sec)1 sec)

Header Prediction: Simplify the process Header Prediction: Simplify the process

Page 22: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

High Speed TCPHigh Speed TCP

Floyd ’02. Goals:Floyd ’02. Goals: Achieve large window size with realistic loss Achieve large window size with realistic loss

rate (Use current window size in AIMD rate (Use current window size in AIMD parameter)parameter)

High Speed in a single connection (10Gbps)High Speed in a single connection (10Gbps) Easy to achieve high sending rate for a given Easy to achieve high sending rate for a given

loss rate. How to Achieve TCP-Friendliness?loss rate. How to Achieve TCP-Friendliness? Incremental Deployable (no router support Incremental Deployable (no router support

required)required)

Page 23: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

High Speed TCPHigh Speed TCP

Problem in Steady State:Problem in Steady State: TCP response function:TCP response function:

Large congestion window requires a Large congestion window requires a very low loss rate.very low loss rate.

Problem in Recovery:Problem in Recovery: Congestion Avoidance takes too long Congestion Avoidance takes too long

to recover (Consecutive Time-outs)to recover (Consecutive Time-outs)

Page 24: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Consecutive Time-outConsecutive Time-out

1

Page 25: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Consecutive Time-outConsecutive Time-out

1

Ti me Out: 1SS-Threshol d=cwnd/ 2Sl ow Start : cwnd=1

1

Page 26: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Consecutive Time-outConsecutive Time-out

1

Ti me Out: 1SS-Threshol d=cwnd/ 2Sl ow Start : cwnd=1

1

Ti me Out: R1SS-Threshol d=2

Sl ow Start : cwnd=1

1

Page 27: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Consecutive Time-outConsecutive Time-out

1

Ti me Out: 1SS-Threshol d=cwnd/ 2Sl ow Start : cwnd=1

1

Ti me Out: R1SS-Threshol d=2

Sl ow Start : cwnd=1

1cwnd=2

Congest i onAvoi dence

Page 28: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

High Speed TCPHigh Speed TCP

Change the TCP response function:Change the TCP response function: p is high (higher than maxP corresponding to p is high (higher than maxP corresponding to

the default cwnd size W): standard TCPthe default cwnd size W): standard TCP p is low: (cwnd >= W): use a(w), b(w) instead p is low: (cwnd >= W): use a(w), b(w) instead

of constant a,b in the adjustment of cwnd.of constant a,b in the adjustment of cwnd. For a given loss rate P and desired windows SFor a given loss rate P and desired windows S

ize Wize W11 at P: get a(w) and b(w). (Keep the linea at P: get a(w) and b(w). (Keep the linearity on a log-log scale. ∆ logWrity on a log-log scale. ∆ logW∆ logP)∆ logP)

w

awwACK : bwwwDrop :

Page 29: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Change TCP FunctionChange TCP Function Standard TCP:Standard TCP:

l og(1. 5)/ 2

l og w=- ( l og p)/ 2+(l og1. 5)/ 2

l og w

l og p0

Page 30: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Change TCP FunctionChange TCP Function

l og(1. 5)/ 2

l og w=- ( l og p)/ 2+(l og1. 5)/ 2

l og w

l og p0l og P

l og W

Page 31: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Change TCP FunctionChange TCP Function

l og(1. 5)/ 2

l og w=- ( l og p)/ 2+(l og1. 5)/ 2

l og w

l og p0l og P

l og W

l og W1

l og P1

Page 32: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Change TCP FunctionChange TCP Function

l og(1. 5)/ 2

l og w=- ( l og p)/ 2+(l og1. 5)/ 2

l og w

l og p0l og P

l og W

l og W1

l og P1

Page 33: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

ExpectationsExpectations

Achieve large window with realistic loss rateAchieve large window with realistic loss rate Relative fairness between standard TCP and Relative fairness between standard TCP and

High speed TCP (Acquired bandwidth High speed TCP (Acquired bandwidth cwn cwnd )d )

Moderate decrease instead of halving windoModerate decrease instead of halving window size when congestion detected (0.33 at 100w size when congestion detected (0.33 at 1000)0)

Pre-computed Look-upPre-computed Look-up to implement a(w) and b(w).to implement a(w) and b(w).

l og(1. 5) / 2

l og w=- ( l og p) / 2+(l og1. 5)/ 2

l og w

l og p0l og P

l og W

l og W1

l og P1

Page 34: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Slow StartSlow Start

Modification of Slow StaModification of Slow Start:rt:

Problem: doubling cwProblem: doubling cwnd for each RTT is too nd for each RTT is too aggressive for large cwaggressive for large cwndnd

Proposal: To limit ∆cProposal: To limit ∆cwnd in a RTT in Slow Swnd in a RTT in Slow Start.tart.

t

rate

Loss

Page 35: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Limited Slow StartLimited Slow Start

For each ACK:For each ACK: Cwnd<=max_ss_threshold:Cwnd<=max_ss_threshold:∆∆cwnd=MSS cwnd=MSS (Standard TCP Slow Start)(Standard TCP Slow Start) Cwnd>max_ss-threshold:Cwnd>max_ss-threshold:∆∆cwnd=0.5max_ss_threshold/cwnd=0.5max_ss_threshold/

cwnd cwnd (at most 0.5 max_ssthreshold e(at most 0.5 max_ssthreshold e

ach RTT)ach RTT)t

rate

max ssthreshol d

Page 36: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Related ProjectsRelated Projects

Cray Research (’92); Cray Research (’92); CASA Testbed (’94)CASA Testbed (’94) Duke (’99)Duke (’99) Pittsburg Supercomputing centerPittsburg Supercomputing center Portland State Univ.(’00)Portland State Univ.(’00) Internet 2 (’01)Internet 2 (’01) Web100Web100 Net100 (built on Web 100)Net100 (built on Web 100)

Page 37: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Cray Research ’92Cray Research ’92 TCP/IP Performance at Cray Research (TCP/IP Performance at Cray Research (Dave BoDave Bo

rman)rman)Configuration: Configuration: HIPPI between two dedicated Y/MPs with Model HIPPI between two dedicated Y/MPs with Model

E IOS and Unicos 8.0 E IOS and Unicos 8.0 Memory to memory transferMemory to memory transferResults:Results: Direct channel-to-channel: Direct channel-to-channel: MTU - 64K - 781 MbpsMTU - 64K - 781 Mbps Through a HIPPI switch:Through a HIPPI switch:MTU - 33K - 416 Mbps MTU - 33K - 416 Mbps MTU - 49K - 525 Mbps MTU - 49K - 525 Mbps MTU - 64K - 605 MbpsMTU - 64K - 605 Mbps

Page 38: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

CASA Testbed ’94CASA Testbed ’94Applied Network Research of San Diego SupercApplied Network Research of San Diego Superc

omputer Center + UCSDomputer Center + UCSD Goal: Delay and Loss Characteristics of HIPPGoal: Delay and Loss Characteristics of HIPP

I-based gigabit testbedI-based gigabit testbed Link Feature: Blocking (HIPPI), tradeoff betwLink Feature: Blocking (HIPPI), tradeoff betw

een high lost rate and high delayeen high lost rate and high delay Conclusion: Avoiding packet loss is more impConclusion: Avoiding packet loss is more imp

ortant than reduce delayortant than reduce delay Performance (Delay*Bandwidth =2MB; 1323 oPerformance (Delay*Bandwidth =2MB; 1323 o

n; Cray machines): 500Mbps TCP sustained thn; Cray machines): 500Mbps TCP sustained throughput (TTCP/Netperf)roughput (TTCP/Netperf)

Page 39: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Trapeze/IP (Duke)Trapeze/IP (Duke)

Goal:Goal: What optimization is most useful to What optimization is most useful to

reduce host overheads for fast TCP?reduce host overheads for fast TCP? How fast does TCP really go, at what How fast does TCP really go, at what

cost?cost?Approaches:Approaches: Zero-CopyZero-Copy Checksum offloadingChecksum offloadingResult:Result: >900Mbps for MTU>8K>900Mbps for MTU>8K

Page 40: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Trapeze/IP (Duke)Trapeze/IP (Duke)

Zero-copy Zero-copy

www.cs.duke.edu/ari/publications/talks/freebsdconwww.cs.duke.edu/ari/publications/talks/freebsdcon

Page 41: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Trapeze/IP (Duke)Trapeze/IP (Duke)

www.cs.duke.edu/ari/publications/talks/freebsdconwww.cs.duke.edu/ari/publications/talks/freebsdcon

Page 42: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Trapeze/IP (Duke)Trapeze/IP (Duke)

www.cs.duke.edu/ari/publications/talks/freebsdconwww.cs.duke.edu/ari/publications/talks/freebsdcon

Page 43: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Trapeze/IP (Duke)Trapeze/IP (Duke)

www.cs.duke.edu/ari/publications/talks/freebsdconwww.cs.duke.edu/ari/publications/talks/freebsdcon

Page 44: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Enabling High Performance Enabling High Performance Data Transfers on HostsData Transfers on Hosts

By Pittsburg Supercomputing centerBy Pittsburg Supercomputing center Enable RFC 1191 MTU DiscoveryEnable RFC 1191 MTU Discovery Enable RFC 1323 Large WindowsEnable RFC 1323 Large Windows OS Kernel: Large enough socket OS Kernel: Large enough socket

buffersbuffers Application: Set its send and receive Application: Set its send and receive

socket buffer sizessocket buffer sizes

Detailed methods to tune various OS.Detailed methods to tune various OS.

Page 45: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

PSU ExperimentPSU Experiment

Goal: Goal: Round Trip Delay and TCP throughput with Round Trip Delay and TCP throughput with

different window size different window size Influence by different devices (CISCO Influence by different devices (CISCO

3508/3524/5500), different NIC3508/3524/5500), different NIC

Environment:Environment: OS: FreeBSD 4.0/4.1 (without 1323?), OS: FreeBSD 4.0/4.1 (without 1323?),

Linux, SolarisLinux, Solaris WAN: 155Mbps OC-3 over SONET MANWAN: 155Mbps OC-3 over SONET MAN Measurement Tools: Ping + TTCPMeasurement Tools: Ping + TTCP

Page 46: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

PSU ExperimentPSU Experiment "smaller" switches and low-level routers ca"smaller" switches and low-level routers ca

n easily muck things up.n easily muck things up. bugs in Linux 2.2 kernelsbugs in Linux 2.2 kernels Different NICs have different performance.Different NICs have different performance. Fast PCI bus (64 bits * 66mhz) is necessaryFast PCI bus (64 bits * 66mhz) is necessary Switch MTU size can make a difference (giaSwitch MTU size can make a difference (gia

nt packets are better).nt packets are better). Bigger TCP window sizes can help but therBigger TCP window sizes can help but ther

e seems to be a knee around 4MB that is noe seems to be a knee around 4MB that is not remarked upon in the literature. t remarked upon in the literature.

Page 47: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Internet-2 ExperimentInternet-2 ExperimentGoal: Single TCP connection with 700-800MbpGoal: Single TCP connection with 700-800Mbp

s over WAN; Relations among Window Size, s over WAN; Relations among Window Size, MTU and ThroughputMTU and Throughput

Back-to-BackBack-to-Back OS: FreeBSD 4.3 releaseOS: FreeBSD 4.3 release Architecture: 64bit-66Mhz PCI+…Architecture: 64bit-66Mhz PCI+… Configuration: sendspace=recvspace=10240Configuration: sendspace=recvspace=10240

00 Setup: Direct connection (back-back) and WSetup: Direct connection (back-back) and W

ANAN WAN: Symmetric path: host1-Abilene-host2WAN: Symmetric path: host1-Abilene-host2 Measurement: Ping + IPerfMeasurement: Ping + IPerf

Page 48: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Internet-2 ExperimentInternet-2 Experiment

Back-to-Back-to-BackBack

No LossNo Loss Found Found

some bug some bug in in FreeBSD FreeBSD 4.34.3

WindowWindow 4KB 4KB MTUMTU

8KB 8KB MTUMTU

512K512K 690690 855-986855-986

1M1M 658658 986986

2M2M 562562 986986

4M4M 217217 987987

8M8M 9393 987987

16M16M 8686 985985WAN: WAN: <=200Mbps<=200Mbps Asymmetry in different directions Asymmetry in different directions

(cache of MTU…)(cache of MTU…)

Page 49: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Web 100Web 100 Goal: Make it easy for non-expertise to Goal: Make it easy for non-expertise to

achieve high bandwidthachieve high bandwidth Method: Get more information from TCPMethod: Get more information from TCP Software: Software: Measurement: embedded into kernel TCPMeasurement: embedded into kernel TCPApp Layer: Diagnostics / Auto-TuningApp Layer: Diagnostics / Auto-Tuning Proposal:Proposal:RFC 2012 (MIB)RFC 2012 (MIB)

Page 50: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Net 100Net 100

Built on Web 100Built on Web 100 Auto-tune the parameter for non-Auto-tune the parameter for non-

experts.experts. Network-Aware OSNetwork-Aware OS Bulk File Transportation for ORNLBulk File Transportation for ORNL Implementation of Floyd’s High Implementation of Floyd’s High

Speed TCPSpeed TCP

Page 51: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Floyd’s TCP SS on Floyd’s TCP SS on Net100Net100

www.csm.ornl.gov/~dunigan/net100/floyd.htmlwww.csm.ornl.gov/~dunigan/net100/floyd.html

RTT:80msRTT:80ms

1MBsndwnd1MBsndwnd

2MBrcvwnd2MBrcvwnd

Cwnd:web100Cwnd:web100

Page 52: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Floyd’s TCP AIMD on Floyd’s TCP AIMD on Net100Net100

www.csm.ornl.gov/~dunigan/net100/floyd.htmlwww.csm.ornl.gov/~dunigan/net100/floyd.htmlRTT:87msRTT:87msWnd:1000segWnd:1000segMax_ss:100segMax_ss:100segSs:1.8secSs:1.8secMD at 1000:MD at 1000:*0.33/Timeout*0.33/TimeoutAI at 700:AI at 700:+8/RTT+8/RTTOld TCP:Old TCP:45sec recovery45sec recovery

Page 53: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Trend (Mathis: Oct 2001)Trend (Mathis: Oct 2001)

Page 54: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Trend (Mathis: Oct 2001)Trend (Mathis: Oct 2001)

TCP over Long Path:TCP over Long Path:

YearYear WizardWizard Non-Non-WizardWizard

RatioRatio

19881988 1Mbps1Mbps 300kbps300kbps 3:13:1

19911991 10Mbps10Mbps

19951995 100Mbps100Mbps

19991999 1Gbps1Gbps 3Mbps3Mbps 300:1300:1

Page 55: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Related ToolsRelated Tools

Measurement:Measurement: IPerfIPerf TCP DumpTCP Dump Web100Web100Emulation:Emulation: DummynetDummynet

Page 56: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

NLANR-IperfNLANR-IperfFeature:Feature: Try to send data on user spaceTry to send data on user space Support: IPv4/IPv6Support: IPv4/IPv6 Support: TCP/UDP/Multicast…Support: TCP/UDP/Multicast… Similar software: Auto Tuning Enabled FTP Similar software: Auto Tuning Enabled FTP

Client/ServerClient/Server

Concern:Concern: Preemption by other processes in Gigabit Preemption by other processes in Gigabit

test? (Observation in Internet2 Experiment)test? (Observation in Internet2 Experiment)

Page 57: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Dummy NetDummy Net

Embedded in FreeBSD nowEmbedded in FreeBSD now Delay: delay in IP layerDelay: delay in IP layer Loss: random loss in IP layerLoss: random loss in IP layer

Concern: Concern: OverheadOverhead Pattern of packet lossPattern of packet loss

Page 58: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Current Status in Netlab@CalteCurrent Status in Netlab@Caltechch

100Mbps Testbed in netlab:100Mbps Testbed in netlab:

100MHub

UTP

Cabl

e

UTP Cabl eUTP Cabl e

Moni tor

Dri verDri ver

TCP

I P

TCP

I P+dummynet

Page 59: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Next Step Next Step

1Gbps Testbed in lab:1Gbps Testbed in lab:

Spl i t ter

Moni tor

Dri verDri ver

TCP

I P

TCP

I P+dummynet

Spl i t ter

Page 60: For FAST Meeting July.2 Towards Gigabit David Wei Netlab@Caltech.

Q&AQ&A