Top Banner
Summer School, Brasov, Romania, July 2005, R. Hughes- Jones Manchester 1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance Networks Richard Hughes-Jones University of Manchester
36

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Jan 05, 2016

Download

Documents

Edward Paul
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

1

TCP/IP and Other Transports for High Bandwidth Applications

TCP/IP on High Performance Networks

Richard Hughes-Jones University of Manchester

Page 2: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

2

The Bandwidth Challenge at SC2003 The peak aggregate bandwidth from the 3 booths was 23.21Gbits/s 1-way link utilisations of >90% 6.6 TBytes in 48 minutes

Page 3: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

3

Multi-Gigabit flows at SC2003 BW Challenge Three Server systems with 10 Gigabit Ethernet NICs Used the DataTAG altAIMD stack 9000 byte MTU Send mem-mem iperf TCP streams From SLAC/FNAL booth in Phoenix to:

Pal Alto PAIX rtt 17 ms , window 30 MB Shared with Caltech booth 4.37 Gbit HighSpeed TCP I=5% Then 2.87 Gbit I=16% Fall when 10 Gbit on link

3.3Gbit Scalable TCP I=8% Tested 2 flows sum 1.9Gbit I=39%

Chicago Starlight rtt 65 ms , window 60 MB Phoenix CPU 2.2 GHz 3.1 Gbit HighSpeed TCP I=1.6%

Amsterdam SARA rtt 175 ms , window 200 MB Phoenix CPU 2.2 GHz 4.35 Gbit HighSpeed TCP I=6.9%

Very Stable Both used Abilene to Chicago

10 Gbits/s throughput from SC2003 to PAIX

0

1

2

3

4

5

6

7

8

9

10

11/19/0315:59

11/19/0316:13

11/19/0316:27

11/19/0316:42

11/19/0316:56

11/19/0317:11

11/19/0317:25 Date & Time

Throughput

Gbits/s

Router to LA/PAIXPhoenix-PAIX HS-TCPPhoenix-PAIX Scalable-TCPPhoenix-PAIX Scalable-TCP #2

10 Gbits/s throughput from SC2003 to Chicago & Amsterdam

0

1

2

3

4

5

6

7

8

9

10

11/19/0315:59

11/19/0316:13

11/19/0316:27

11/19/0316:42

11/19/0316:56

11/19/0317:11

11/19/0317:25 Date & Time

Throughput

Gbits/s

Router traffic to Abilele

Phoenix-Chicago

Phoenix-Amsterdam

Page 4: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

4

SCINet

Collaboration at SC2004 Setting up the BW Bunker

The BW Challenge at the SLAC Booth

Working with S2io, Sun, Chelsio

Page 5: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

5

The Bandwidth Challenge – SC2004 The peak aggregate bandwidth from the booths was 101.13Gbits/s That is 3 full length DVDs per second ! 4 Times greater that SC2003 ! Saturated TEN 10Gigabit Ethernet waves SLAC Booth: Sunnyvale to Pittsburgh, LA to Pittsburgh and Chicago

to Pittsburgh (with UKLight).

Page 6: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

6

Just a Well Engineered End-to-End Connection

End-to-End “no loss” environment

NO contention, NO sharing on the end-to-end path

Processor speed and system bus characteristics

TCP Configuration – window size and frame size (MTU)

Tuned PCI-X bus

Tuned Network Interface Card driver

A single TCP connection on the end-to-end path

Memory-to-Memory transfer

no disk system involved

No real user application (but did file transfers!!)

Not a typical User or Campus situation BUT …

So what’s the matter with TCP – Did we cheat?

InternetInternet

Regional

Regional

Regional

Regional

Campus

Campus

Campus

Campus

Client

Server

Campus

Campus

Campus

Campus

Client

Server

UK LightUK Light

From Robin Tasker

Page 7: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

7

TCP (Reno) – What’s the problem?

TCP has 2 phases: Slowstart

Probe the network to estimate the Available BWExponential growth

Congestion AvoidanceMain data transfer phase – transfer rate glows “slowly”

AIMD and High Bandwidth – Long Distance networksPoor performance of TCP in high bandwidth wide area networks is due

in part to the TCP congestion control algorithm. For each ack in a RTT without loss:

cwnd -> cwnd + a / cwnd - Additive Increase, a=1 For each window experiencing loss:

cwnd -> cwnd – b (cwnd) - Multiplicative Decrease, b= ½

Packet loss is a killer !!

Page 8: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

8

TCP (Reno) – Details Time for TCP to recover its throughput from 1 lost packet given by:

for rtt of ~200 ms:

MSS

RTTC

*2

* 2

2 min

0.00010.0010.010.1

110

1001000

10000100000

0 50 100 150 200rtt ms

Tim

e t

o r

eco

ver

sec

10Mbit100Mbit1Gbit2.5Gbit10Gbit

UK 6 ms Europe 20 ms USA 150 ms

Page 9: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

9

Investigation of new TCP Stacks The AIMD Algorithm – Standard TCP (Reno)

For each ack in a RTT without loss:

cwnd -> cwnd + a / cwnd - Additive Increase, a=1 For each window experiencing loss:

cwnd -> cwnd – b (cwnd) - Multiplicative Decrease, b= ½ High Speed TCP

a and b vary depending on current cwnd using a table a increases more rapidly with larger cwnd – returns to the ‘optimal’ cwnd size sooner

for the network path b decreases less aggressively and, as a consequence, so does the cwnd. The effect is

that there is not such a decrease in throughput. Scalable TCP

a and b are fixed adjustments for the increase and decrease of cwnd a = 1/100 – the increase is greater than TCP Reno b = 1/8 – the decrease on loss is less than TCP Reno Scalable over any link speed.

Fast TCP

Uses round trip time as well as packet loss to indicate congestion with rapid convergence to fair equilibrium for throughput.

HSTCP-LP, H-TCP, BiC-TCP

Page 10: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

10

Packet Loss with new TCP Stacks TCP Response Function

Throughput vs Loss Rate – further to right: faster recovery Drop packets in kernel

MB-NG rtt 6ms DataTAG rtt 120 ms

Page 11: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

11

Packet Loss and new TCP Stacks TCP Response Function

UKLight London-Chicago-London rtt 177 ms 2.6.6 Kernel

Agreement withtheory good

sculcc1-chi-2 iperf 13Jan05

1

10

100

1000

100100010000100000100000010000000100000000Packet drop rate 1 in n

TC

P A

chie

vable

thro

ughput

Mbit/

s

A0 1500

A1 HSTCP

A2 Scalable

A3 HTCP

A5 BICTCP

A8 Westwood

A7 Vegas

A0 Theory

Series10

Scalable Theory

sculcc1-chi-2 iperf 13Jan05

0

100

200

300

400

500

600

700

800

900

1000

100100010000100000100000010000000100000000Packet drop rate 1 in n

TC

P A

chie

vable

thro

ughput

Mbit/

s

A0 1500

A1 HSTCP

A2 Scalable

A3 HTCP

A5 BICTCP

A8 Westwood

A7 Vegas

Page 12: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

12

Topology of the MB – NG Network

KeyGigabit Ethernet2.5 Gbit POS Access

MPLS Admin. Domains

UCL Domain

Edge Router Cisco 7609

man01

man03

Boundary Router Cisco 7609

Boundary Router Cisco 7609

RAL Domain

Manchester Domain

lon02

man02

ral01

UKERNADevelopment

Network

Boundary Router Cisco 7609

ral02

ral02

lon03

lon01

HW RAID

HW RAID

Page 13: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

13

SC2004 UKLIGHT Overview

MB-NG 7600 OSRManchester

ULCC UKLight

UCL HEP

UCL network

K2

Ci

Chicago Starlight

Amsterdam

SC2004

Caltech BoothUltraLight IP

SLAC Booth

Cisco 6509

UKLight 10GFour 1GE channels

UKLight 10G

Surfnet/ EuroLink 10GTwo 1GE channels

NLR LambdaNLR-PITT-STAR-10GE-16

K2

K2 Ci

Caltech 7600

Page 14: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

14

High Throughput Demonstrations

Manchester (Geneva)

man03lon01

2.5 Gbit SDHMB-NG Core

1 GEth1 GEth

Cisco GSRCisco GSRCisco7609

Cisco7609

London (Chicago)

Dual Zeon 2.2 GHz Dual Zeon 2.2 GHz

Send data with TCPDrop Packets

Monitor TCP with Web100

Page 15: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

15

Drop 1 in 25,000 rtt 6.2 ms Recover in 1.6 s

High Performance TCP – MB-NG

Standard HighSpeed Scalable

Page 16: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

16

High Performance TCP – DataTAG Different TCP stacks tested on the DataTAG Network rtt 128 ms Drop 1 in 106

High-SpeedRapid recovery

ScalableVery fast recovery

StandardRecovery would

take ~ 20 mins

Page 17: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

17

FAST demo via OMNInet and Datatag

J. Mambretti, F. Yeh (Northwestern)

OMNInett

NortelPassport 8600

NortelPassport 8600

Photonic Switch

NU-E (Leverone)Workstations

2 x GE

StarLight-Chicago

CalTechCisco 7609

2 x GE

Photonic Switch

Alcatel1670

10GE

10GE

Alcatel1670

2 x GE2 x GE

OC-48

DataTAG

2 x GE

Workstations CERN -Geneva

San Diego

FAST display

CERNCisco 7609

7,000 km

A. Adriaanse, C. Jin, D. Wei (Caltech)

S. Ravot (Caltech/CERN)

FAST DemoCheng Jin, David Wei

Caltech

Layer 2 path

Layer 2/3 path

Page 18: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

18

FAST TCP vs newReno

Channel #1 : newRenoChannel #1 : newReno Channel #2: FASTChannel #2: FAST

Utilization: 70%Utilization: 70%

Utilization:Utilization:

90%90%

Page 19: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

19

Is TCP fair?

a look at

Round Trip Times & Max Transfer Unit

Page 20: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

20

MTU and Fairness

Two TCP streams share a 1 Gb/s bottleneck RTT=117 ms MTU = 3000 Bytes ; Avg. throughput over a period of 7000s = 243 Mb/s MTU = 9000 Bytes; Avg. throughput over a period of 7000s = 464 Mb/s Link utilization : 70,7 %

Starlight (Chi)Starlight (Chi)CERN (GVA)CERN (GVA)

RR RRGbE GbE SwitchSwitch

Host #1Host #1POS 2.5POS 2.5 GbpsGbps1 GE1 GE

1 GE1 GE

Host #2Host #2

Host #1Host #1

Host #2Host #2

1 GE1 GE

1 GE1 GE

BottleneckBottleneck

Throughput of two streams with different MTU sizes sharing a 1 Gbps bottleneck

0

100

200

300

400

500

600

700

800

900

1000

0 1000 2000 3000 4000 5000 6000

Time(s)

Thro

ughput

(Mbps)

MTU = 3000 Byte

Average over the life of the connection MTU = 3000 Byte

MTU = 9000 Byte

Average over the life of the connection MTU = 9000 Byte

Sylvain Ravot DataTag 2003

Page 21: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

21

RTT and Fairness

SunnyvaleSunnyvaleStarlight (Chi)Starlight (Chi)

CERN (GVA)CERN (GVA)

RR RRGbE GbE SwitchSwitch

Host #1Host #1

POS 2.5POS 2.5 Gb/sGb/s1 GE1 GE

1 GE1 GE

Host #2Host #2

Host #1Host #1

Host #2Host #2

1 GE1 GE

1 GE1 GE

BottleneckBottleneck

RRPOS 10POS 10 Gb/sGb/sRR10GE10GE

Two TCP streams share a 1 Gb/s bottleneck CERN <-> Sunnyvale RTT=181ms ; Avg. throughput over a period of 7000s = 202Mb/s CERN <-> Starlight RTT=117ms; Avg. throughput over a period of 7000s = 514Mb/s MTU = 9000 bytes Link utilization = 71,6 %

Throughput of two streams with different RTT sharing a 1Gbps bottleneck

0

100

200

300

400

500

600

700

800

900

1000

0 1000 2000 3000 4000 5000 6000 7000

Time (s)

Thro

ughput

(Mbps)

RTT=181ms

Average over the life of the connection RTT=181ms

RTT=117ms

Average over the life of the connection RTT=117ms

Sylvain Ravot DataTag 2003

Page 22: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

22

Is TCP fair?

Do TCP Flows Share the Bandwidth ?

Page 23: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

23

Chose 3 paths from SLAC (California) Caltech (10ms), Univ Florida (80ms), CERN (180ms)

Used iperf/TCP and UDT/UDP to generate traffic

Each run was 16 minutes, in 7 regions

Test of TCP Sharing: Methodology (1Gbit/s)

Ping 1/s

Iperf or UDT

ICMP/ping traffic

TCP/UDPbottleneck

iperf

SLACCaltech/UFL/CERN

2 mins 4 mins

Les Cottrell PFLDnet 2005

Page 24: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

24

Low performance on fast long distance paths AIMD (add a=1 pkt to cwnd / RTT, decrease cwnd by factor b=0.5 in congestion) Net effect: recovers slowly, does not effectively use available bandwidth, so poor

throughput Unequal sharing

TCP Reno single stream

Congestion has a dramatic effect

Recovery is slow

Increase recovery rate

SLAC to CERN

RTT increases when achieves best throughput

Les Cottrell PFLDnet 2005

Remaining flows do not take up slack when flow removed

Page 25: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

25

Fast

As well as packet loss, FAST uses RTT to detect congestion RTT is very stable: σ(RTT) ~ 9ms vs 37±0.14ms for the others

SLAC-CERN

Big drops in throughput which take several seconds to recover from

2nd flow never gets equal share of bandwidth

Page 26: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

26

Hamilton TCP One of the best performers

Throughput is high Big effects on RTT when achieves best throughput Flows share equally

Appears to need >1 flow toachieve best throughput

Two flows share equally

SLAC-CERN

> 2 flows appears less stable

Page 27: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

27

SC2004 & Transfers with UKLight

A Taster for Lambda & Packet Switched Hybrid Networks

Page 28: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

28

Transatlantic Ethernet: TCP Throughput Tests

Supermicro X5DPE-G2 PCs Dual 2.9 GHz Xenon CPU FSB 533 MHz 1500 byte MTU 2.6.6 Linux Kernel Memory-memory TCP throughput Standard TCP

Wire rate throughput of 940 Mbit/s

First 10 sec

Work in progress to study: Implementation detail Advanced stacks Effect of packet loss Sharing

0

500

1000

1500

2000

0 20000 40000 60000 80000 100000 120000 140000

time ms

TCPA

chiv

e M

bit/s

0

200000000

400000000

600000000

800000000

1000000000

1200000000

1400000000

Cwnd

InstaneousBWAveBWCurCwnd (Value)

0

500

1000

1500

2000

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

time ms

TCPA

chiv

e M

bit/s

0

5000000

10000000

15000000

20000000

25000000

30000000

35000000

40000000

Cwnd

InstaneousBWAveBWCurCwnd (Value)

Page 29: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

29

SC2004 Disk-Disk bbftp (work in progress)

bbftp file transfer program uses TCP/IP UKLight: Path:- London-Chicago-London; PCs:- Supermicro +3Ware RAID0 MTU 1500 bytes; Socket size 22 Mbytes; rtt 177ms; SACK off Move a 2 Gbyte file Web100 plots:

Standard TCP Average 825 Mbit/s (bbcp: 670 Mbit/s)

Scalable TCP Average 875 Mbit/s (bbcp: 701 Mbit/s

~4.5s of overhead)

Disk-TCP-Disk at 1Gbit/sis here!

0

500

1000

1500

2000

2500

0 5000 10000 15000 20000

time msT

CP

Ach

ive M

bit

/s

050000001000000015000000200000002500000030000000350000004000000045000000

Cw

nd

InstaneousBW

AveBW

CurCwnd (Value)

0

500

1000

1500

2000

2500

0 5000 10000 15000 20000

time ms

TC

PA

ch

ive M

bit

/s

050000001000000015000000200000002500000030000000350000004000000045000000

Cw

nd

InstaneousBWAveBWCurCwnd (Value)

Page 30: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

30

Super Computing Bandwidth Challenge gives opportunity to make world-wide High performance tests.

Land Speed Record shows what can be achieved with state of the art kit Standard TCP not optimum for high throughput long distance links Packet loss is a killer for TCP

Check on campus links & equipment, and access links to backbones Users need to collaborate with the Campus Network Teams Dante Pert

New stacks are stable give better response & performance Still need to set the TCP buffer sizes ! Check other kernel settings e.g. window-scale maximum Watch for “TCP Stack implementation Enhancements”

Host is critical think Server quality not Supermarket PC Motherboards NICs, RAID controllers and Disks matter

NIC should use 64 bit 133 MHz PCI-X 66 MHz PCI can be OK but 32 bit 33 MHz is too slow for Gigabit rates

Worry about the CPU-Memory bandwidth as well as the PCI bandwidth Data crosses the memory bus at least 3 times

Separate the data transfers – use motherboards with multiple 64 bit PCI-X buses Choose a modern high throughput RAID controller

Consider SW RAID0 of RAID5 HW controllers Users are now able to perform sustained 1 Gbit/s transfers

Summary, Conclusions & Thanks

MB - NG

Page 31: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

31

More Information Some URLs UKLight web site: http://www.uklight.ac.uk MB-NG project web site: http://www.mb-ng.net/ DataTAG project web site: http://www.datatag.org/ UDPmon / TCPmon kit + writeup:

http://www.hep.man.ac.uk/~rich/net Motherboard and NIC Tests:

http://www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.ppt& http://datatag.web.cern.ch/datatag/pfldnet2003/ “Performance of 1 and 10 Gigabit Ethernet Cards with Server Quality Motherboards” FGCS Special issue 2004 http:// www.hep.man.ac.uk/~rich/

TCP tuning information may be found at:http://www.ncne.nlanr.net/documentation/faq/performance.html & http://www.psc.edu/networking/perf_tune.html

TCP stack comparisons:“Evaluation of Advanced TCP Stacks on Fast Long-Distance Production Networks” Journal of Grid Computing 2004

PFLDnet http://www.ens-lyon.fr/LIP/RESO/pfldnet2005/ Dante PERT http://www.geant2.net/server/show/nav.00d00h002

Page 32: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

32

Any Questions?

Page 33: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

33

Backup Slides

Page 34: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

34

10 Gigabit Ethernet: UDP Throughput Tests 1500 byte MTU gives ~ 2 Gbit/s Used 16144 byte MTU max user length 16080 DataTAG Supermicro PCs Dual 2.2 GHz Xenon CPU FSB 400 MHz PCI-X mmrbc 512 bytes wire rate throughput of 2.9 Gbit/s

CERN OpenLab HP Itanium PCs Dual 1.0 GHz 64 bit Itanium CPU FSB 400 MHz PCI-X mmrbc 512 bytes wire rate of 5.7 Gbit/s

SLAC Dell PCs giving a Dual 3.0 GHz Xenon CPU FSB 533 MHz PCI-X mmrbc 4096 bytes wire rate of 5.4 Gbit/s

an-al 10GE Xsum 512kbuf MTU16114 27Oct03

0

1000

2000

3000

4000

5000

6000

0 5 10 15 20 25 30 35 40Spacing between frames us

Rec

v W

ire

rate

Mb

its/

s

16080 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes

Page 35: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

35

10 Gigabit Ethernet: Tuning PCI-X 16080 byte packets every 200 µs Intel PRO/10GbE LR Adapter PCI-X bus occupancy vs mmrbc

Measured times Times based on PCI-X times from

the logic analyser Expected throughput ~7 Gbit/s Measured 5.7 Gbit/s

0

5

10

15

20

25

30

35

40

45

50

0 1000 2000 3000 4000 5000Max Memory Read Byte Count

PC

I-X

Tra

nsfe

r tim

e u

s

0

1

2

3

4

5

6

7

8

9

PC

I-X

Tra

nsfe

r ra

te G

bit/s

Measured PCI-X transfer time usexpected time usrate from expected time Gbit/s Max throughput PCI-X

Kernel 2.6.1#17 HP Itanium Intel10GE Feb04

0

2

4

6

8

10

0 1000 2000 3000 4000 5000Max Memory Read Byte Count

PC

I-X

Tra

nsfe

r tim

e

us

measured Rate Gbit/srate from expected time Gbit/s Max throughput PCI-X

mmrbc1024 bytes

mmrbc2048 bytes

mmrbc4096 bytes5.7Gbit/s

mmrbc512 bytes

CSR Access

PCI-X Sequence

Data Transfer

Interrupt & CSR Update

Page 36: Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.

Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester

36

10 Gigabit Ethernet: SC2004 TCP Tests Sun AMD opteron compute servers v20z Chelsio TOE Tests between Linux 2.6.6. hosts

10 Gbit ethernet link from SC2004 to CENIC/NLR/Level(3) PoP in Sunnyvale  Two 2.4GHz AMD 64 bit Opteron processors with 4GB of RAM at SC2004 1500B MTU, all Linux 2.6.6 in one direction 9.43G i.e. 9.07G goodput and the reverse direction 5.65G i.e. 5.44G goodput Total of 15+G on wire.

10 Gbit ethernet link from SC2004 to ESnet/QWest PoP in Sunnyvale One 2.4GHz AMD 64 bit Opteron each end 2MByte window, 16 streams, 1500B MTU, all Linux 2.6.6 in one direction 7.72Gbit/s i.e. 7.42 Gbit/s goodput 120mins (6.6Tbits shipped)

S2io NICs with Solaris 10 in 4*2.2GHz Opteron cpu v40z to one or more S2io or Chelsio NICs with Linux 2.6.5 or 2.6.6 in 2*2.4GHz V20Zs LAN 1 S2io NIC back to back: 7.46 Gbit/s LAN 2 S2io in V40z to 2 V20z : each NIC ~6 Gbit/s total 12.08 Gbit/s