Slide: 1 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 1 Investigating the interaction between high- performance network and disk sub-systems Richard Hughes-Jones, Stephen Dallison The University of Manchester MB - NG
Dec 20, 2015
Slide: 1Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
1
Investigating the interaction between high-performance
network and disk sub-systems
Richard Hughes-Jones, Stephen DallisonThe University of Manchester
MB - NG
Slide: 2Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
2
Introduction AIMD and High Bandwidth – Long Distance networks the
assumption that packet loss means congestion is well known Focus
Data moving applications with different TCP stacks and network environments
The interaction between network hardware, protocol stack and disk sub-system
Almost a user view
We studied Different TCP stacks:
standard, HSTCP, Scalable, H-TCP, BIC, Westward Several Applications:
bbftp, bbcp, Apache, gridftp 3 Networks:
MB-NG, SuperJANET4, UKLight RAID0 and RAID5 controllers
Slide: 3Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
3
Topology of the MB – NG Network
KeyGigabit Ethernet2.5 Gbit POS Access
MPLS Admin. Domains
UCL Domain
Edge Router Cisco 7609
man01
man03
Boundary Router Cisco 7609
Boundary Router Cisco 7609
RAL Domain
Manchester Domain
lon02
man02
ral01
UKERNADevelopment
Network
Boundary Router Cisco 7609
ral02
ral02
lon03
lon01
HW RAID
HW RAID
Slide: 4Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
4
Topology of the Production Network
KeyGigabit Ethernet2.5 Gbit POS Access10 Gbit POS
man01
RAL Domain
Manchester Domain
ral01
HW RAID
HW RAID routers switches
3 routers2 switches
Slide: 5Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
5
SC2004 UKLIGHT Overview
MB-NG 7600 OSRManchester
ULCC UKLight
UCL HEP
UCL network
K2
Ci
Chicago Starlight
Amsterdam
SC2004
Caltech BoothUltraLight IP
SLAC Booth
Cisco 6509
UKLight 10GFour 1GE channels
UKLight 10G
Surfnet/ EuroLink 10GTwo 1GE channels
NLR LambdaNLR-PITT-STAR-10GE-16
K2
K2 Ci
Caltech 7600
Slide: 6Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
6
Packet Loss with new TCP Stacks TCP Response Function
Throughput vs Loss Rate – further to right: faster recovery Drop packets in kernel
MB-NG rtt 6ms DataTAG rtt 120 ms
Slide: 7Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
7
Packet Loss and new TCP Stacks TCP Response Function
UKLight London-Chicago-London rtt 177 ms 2.6.6 Kernel
Agreement withtheory good
sculcc1-chi-2 iperf 13Jan05
1
10
100
1000
100100010000100000100000010000000100000000Packet drop rate 1 in n
TC
P A
chie
vabl
e th
roug
hput
M
bit/s
A0 1500
A1 HSTCP
A2 Scalable
A3 HTCP
A5 BICTCP
A8 Westwood
A7 Vegas
A0 Theory
Series10
Scalable Theory
sculcc1-chi-2 iperf 13Jan05
0
100
200
300
400
500
600
700
800
900
1000
100100010000100000100000010000000100000000Packet drop rate 1 in n
TC
P A
chie
vabl
e th
roug
hput
M
bit/s
A0 1500
A1 HSTCP
A2 Scalable
A3 HTCP
A5 BICTCP
A8 Westwood
A7 Vegas
Slide: 8Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
8
iperf Throughput + Web100 SuperMicro on MB-NG network HighSpeed TCP Linespeed 940 Mbit/s DupACK ? <10 (expect ~400)
BaBar on Production network Standard TCP 425 Mbit/s DupACKs 350-400 – re-transmits
Slide: 9Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
9
End Systems: NICs & Disks
Slide: 10Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
10
End Hosts & NICs SuperMicro P4DP6
Latency
Throughput
Bus Activity
Use UDP packets to characterise Host & NIC
SuperMicro P4DP6 motherboardDual Xenon 2.2GHz CPU400 MHz System bus66 MHz 64 bit PCI bus
gig6-7 Intel pci 66 MHz 27nov02
0
200
400
600
800
1000
0 5 10 15 20 25 30 35 40Transmit Time per frame us
Recv
Wire
rate
M
bits/
s
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
64 bytes Intel 64 bit 66 MHz
0
100
200
300
400
500
600
700
800
900
170 190 210
Latency us
N(t)
512 bytes Intel 64 bit 66 MHz
0
100
200
300
400
500
600
700
800
170 190 210Latency us
N(t)
1024 bytes Intel 64 bit 66 MHz
0
100
200
300
400
500
600
700
800
190 210 230
Latency us
N(t)
1400 bytes Intel 64 bit 66 MHz
0
100
200
300
400
500
600
700
800
190 210 230
Latency us
N(t)
Intel 64 bit 66 MHz
y = 0.0093x + 194.67
y = 0.0149x + 201.75
0
50
100
150
200
250
300
0 500 1000 1500 2000 2500 3000Message length bytes
Late
ncy u
s
Send PCI
Receive PCI
1400 bytes to NIC
1400 bytes to memory
PCI Stop Asserted
Slide: 11Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
11
RAID Controller Performance RAID5 (stripped with redundancy) 3Ware 7506 Parallel 66 MHz 3Ware 7505 Parallel 33 MHz 3Ware 8506 Serial ATA 66 MHz ICP Serial ATA 33/66 MHz Tested on Dual 2.2 GHz Xeon Supermicro P4DP8-G2 motherboard Disk: Maxtor 160GB 7200rpm 8MB Cache Read ahead kernel tuning: /proc/sys/vm/max-readahead = 512
Rates for the same PC RAID0 (stripped) Read 1040 Mbit/s, Write 800 Mbit/s
Disk – Memory Read Speeds Memory - Disk Write Speeds
Slide: 12Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
12
SC2004 RAID Controller Performance Supermicro X5DPE-G2 motherboards loaned from Boston Ltd. Dual 2.8 GHz Zeon CPUs with 512 k byte cache and 1 M byte memory 3Ware 8506-8 controller on 133 MHz PCI-X bus Configured as RAID0 64k byte stripe size Six 74.3 GByte Western Digital Raptor WD740 SATA disks
75 Mbyte/s disk-buffer 150 Mbyte/s buffer-memory Scientific Linux with 2.6.6 Kernel + altAIMD patch (Yee) + packet loss patch Read ahead kernel tuning: /sbin/blockdev --setra 16384 /dev/sda
RAID0 (stripped) 2 GByte file: Read 1500 Mbit/s, Write 1725 Mbit/s
Disk – Memory Read Speeds Memory - Disk Write Speeds
RAID0 6disks Read 3w8506-8
0
2000
4000
6000
8000
10000
12000
0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0 160.0 180.0 200.0
Buffer size kbytes
Thro
ughp
ut M
bit/s
1 G Byte Read
0.5 G Byte Read
2 G Byte Read
RAID0 6disks Write 3w8506-8 16
0
500
1000
1500
2000
2500
0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0 160.0 180.0 200.0
Buffer size kbytes
Thro
ughp
ut M
bit/s
1 G Byte write
0.5 G Byte write
2 G Byte write
Slide: 13Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
13
Data Transfer Applications
Slide: 14Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
14
bbftp: Host & Network Effects 2 Gbyte file RAID5 Disks:
1200 Mbit/s read 600 Mbit/s write
Scalable TCP
BaBar + SuperJANET Instantaneous 220 - 625 Mbit/s
SuperMicro + SuperJANET Instantaneous
400 - 665 Mbit/s for 6 sec Then 0 - 480 Mbit/s
SuperMicro + MB-NG Instantaneous
880 - 950 Mbit/s for 1.3 sec Then 215 - 625 Mbit/s
Slide: 15Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
15
bbftp: What else is going on? Scalable TCP
BaBar + SuperJANET
SuperMicro + SuperJANET
Congestion window – dupACK Variation not TCP related?
Disk speed / bus transfer Application
Slide: 16Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
16
Applications: Throughput Mbit/s HighSpeed TCP 2 GByte file RAID5 SuperMicro + SuperJANET
bbcp
bbftp
Apachie
Gridftp
Previous work used RAID0(not disk limited)
Slide: 17Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
17
Average Transfer Rates Mbit/sApp TCP Stack SuperMicro on
MB-NGSuperMicro on
SuperJANET4
BaBar on
SuperJANET4
SC2004 on UKLight
Iperf Standard 940 350-370 425 940
HighSpeed 940 510 570 940
Scalable 940 580-650 605 940
bbcp Standard 434 290-310 290
HighSpeed 435 385 360
Scalable 432 400-430 380
bbftp Standard 400-410 325 320 825
HighSpeed 370-390 380
Scalable 430 345-532 380 875
apache Standard 425 260 300-360
HighSpeed 430 370 315
Scalable 428 400 317
Gridftp Standard 405 240
HighSpeed 320
Scalable 335
New stacksgive more
throughput
Rate decreases
Slide: 18Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
18
Sc2004 & Transfers with UKLight
Slide: 19Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
19
SC2004 Disk-Disk bbftp (work in progress)
bbftp file transfer program uses TCP/IP UKLight: Path:- London-Chicago-London; PCs:- Supermicro +3Ware RAID0 MTU 1500 bytes; Socket size 22 Mbytes; rtt 177ms; SACK off Move a 2 Gbyte file Web100 plots:
Standard TCP Average 825 Mbit/s (bbcp: 670 Mbit/s)
Scalable TCP Average 875 Mbit/s (bbcp: 701 Mbit/s
~4.5s of overhead)
Disk-TCP-Disk at 1Gbit/s
0
500
1000
1500
2000
2500
0 5000 10000 15000 20000
time ms
TC
PA
chiv
e M
bit
/s050000001000000015000000200000002500000030000000350000004000000045000000
Cw
nd
InstaneousBW
AveBW
CurCwnd (Value)
0
500
1000
1500
2000
2500
0 5000 10000 15000 20000
time ms
TC
PA
chiv
e M
bit
/s
050000001000000015000000200000002500000030000000350000004000000045000000
Cw
nd
InstaneousBWAveBWCurCwnd (Value)
Slide: 20Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
20
SC2004 Disk-Disk bbftp (work in progress)
UKLight: Path:- London-Chicago-London; PCs:- Supermicro +3Ware RAID0 MTU 1500 bytes; Socket size 22 Mbytes; rtt 177ms; SACK off Move a 2 Gbyte file Web100 plots:
HS TCP
Don’t believe this is a protocol problem !
0
500
1000
1500
2000
2500
0 5000 10000 15000 20000 25000 30000 35000 40000 45000
time ms
TC
PA
chiv
e M
bit
/s
050000001000000015000000200000002500000030000000350000004000000045000000
Cw
nd
InstaneousBW
CurCwnd (Value)
0
200
400
600
800
1000
1200
0 5000 10000 15000 20000 25000 30000 35000 40000 45000time ms
Nu
m.
Du
p A
CK
s
0
5000000
1000000015000000
20000000
25000000
3000000035000000
40000000
45000000
Cw
nd
DupAcksIn (Delta)CurCwnd (Value)
0
0.2
0.4
0.6
0.8
1
1.2
0 5000 10000 15000 20000 25000 30000 35000 40000 45000time ms
Nu
m.
Tim
eo
uts
0
5000000
10000000
15000000
20000000
25000000
30000000
35000000
40000000
45000000
Cw
nd
Timeouts (Delta)CurCwnd (Value)
1
10
100
1000
0 5000 10000 15000 20000 25000 30000 35000 40000 45000time ms
nu
m O
ther R
ed
ucti
on
s
0
5000000
1000000015000000
20000000
25000000
3000000035000000
40000000
45000000
Cw
nd
OtherReductions (Delta)
CurCwnd (Value)
Slide: 21Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
21
RAID0 6disks 1 Gbyte Write 64k 3w8506-8
0
500
1000
1500
2000
0.0 20.0 40.0 60.0 80.0 100.0Trial number
Thro
ughput
Mbit/s
Network & Disk Interactions (work in progress) Hosts:
Supermicro X5DPE-G2 motherboards dual 2.8 GHz Zeon CPUs with 512 k byte cache and 1 M byte memory 3Ware 8506-8 controller on 133 MHz PCI-X bus configured as RAID0 six 74.3 GByte Western Digital Raptor WD740 SATA disks 64k byte stripe size
Measure memory to RAID0 transfer rates with & without UDP trafficRAID0 6disks 1 Gbyte Write 64k 3w8506-8
y = -1.017x + 178.32
y = -1.0479x + 174.440
20
40
60
80
100
120
140
160
180
200
0 20 40 60 80 100 120 140 160 180 200% cpu system mode L1+2
8k
64k
R0 6d 1 Gbyte udp Write 64k 3w8506-8
0
500
1000
1500
2000
0.0 20.0 40.0 60.0 80.0 100.0Trial number
Thro
ughput
Mbit/s
R0 6d 1 Gbyte udp9000 write 64k 3w8506-8
0
500
1000
1500
2000
0.0 20.0 40.0 60.0 80.0 100.0Trial number
Thro
ughput
Mbit/s
R0 6d 1 Gbyte udp Write 64k 3w8506-8
0
20
40
60
80
100
120
140
160
180
200
0 20 40 60 80 100 120 140 160 180 200% cpu system mode L1+2
8k64ky=178-1.05x
R0 6d 1 Gbyte udp9000 write 8k 3w8506-8 07Jan05 16384
0
20
40
60
80
100
120
140
160
180
200
0 20 40 60 80 100 120 140 160 180 200% cpu system mode L1+2
% c
pu
syste
m m
od
e L
3+
4
8k
64k
y=178-1.05x
Disk write1735 Mbit/s
Disk write +1500 MTU UDP
1218 Mbit/sDrop of 30%
Disk write +9000 MTU UDP
1400 Mbit/sDrop of 19%
% CPU kernel mode
Slide: 22Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
22
Network & Disk Interactions Disk Write
mem-disk: 1735 Mbit/s Tends to be in 1 die
Disk Write + UDP 1500 mem-disk : 1218 Mbit/s Both dies at ~80%
Disk Write + CPU mem mem-disk : 1341 Mbit/s 1 CPU at ~60% other 20% Large user mode usage Below Cut = hi BW Hi BW = die1 used
Disk Write + CPUload mem-disk : 1334 Mbit/s 1 CPU at ~60% other 20% All CPUs saturated in
user mode
RAID0 6disks 1 Gbyte Write 64k 3w8506-8
y = -1.017x + 178.32
y = -1.0479x + 174.440
20
40
60
80
100
120
140
160
180
200
0 20 40 60 80 100 120 140 160 180 200% cpu system mode L1+2
8k
64k
R0 6d 1 Gbyte udp Write 64k 3w8506-8
0
20
40
60
80
100
120
140
160
180
200
0 20 40 60 80 100 120 140 160 180 200% cpu system mode L1+2
8k64ky=178-1.05x
RAID0 6disks 1 Gbyte Write 8k 3w8506-8 26 Dec04 16384
y = -1.0215x + 215.63
y = -1.0529x + 206.46
0
20
40
60
80
100
120
140
160
180
200
0 20 40 60 80 100 120 140 160 180 200% cpu system mode L1+2
% c
pu
sys
tem
mo
de
L3+
4 8k total CPU
64k total CPU
R0 6d 1 Gbyte udp Write 8k 3w8506-8 26 Dec04 16384
0
20
40
60
80
100
120
140
160
180
200
0 20 40 60 80 100 120 140 160 180 200% cpu system mode L1+2
% c
pu
sys
tem
mo
de
L3+
4
8k totalCPU
64k totalCPU
y=178-1.05x
R0 6d 1 Gbyte membw write 8k 3w8506-8 04Jan05 16384
0
20
40
60
80
100
120
140
160
180
200
0 20 40 60 80 100 120 140 160 180 200% cpu system mode L1+2
% c
pu
sys
tem
mo
de
L3+
4
8k
64k
y=178-1.05xcut equn
R0 6d 1 Gbyte membw write 8k 3w8506-8 04Jan05 16384
0
20
40
60
80
100
120
140
160
180
200
0 20 40 60 80 100 120 140 160 180 200% cpu system mode L1+2
% c
pu
sys
tem
mo
de
L3+
4
8k totalCPU64k totalCPUy=178-1.05xcut equn 2
R0 6d 1 Gbyte cpuload Write 8k 3w8506-8 3Jan05 16384
0
20
40
60
80
100
120
140
160
180
200
0 20 40 60 80 100 120 140 160 180 200% cpu system mode L1+2
% c
pu
sys
tem
mo
de
L3+
4
8k
64k
y=178-1.05x
R0 6d 1 Gbyte cpuload Write 8k 3w8506-8 3Jan05 16384
0
20
40
60
80
100
120
140
160
180
200
0 20 40 60 80 100 120 140 160 180 200% cpu system mode L1+2
% c
pu
sys
tem
mo
de
L3+
4
8k total CPU
64k total CPU
y=178-1.05x
Total CPU load
Kernel CPU load
R0 6d 1 Gbyte membw write 64k 3w8506-8 04Jan05 16384
0
500
1000
1500
2000
2500
0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 100.0
Trial number
Th
rou
gh
pu
t M
bit
/s
Series1
L3+L4<cut
Slide: 23Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
23
Host is critical: Motherboards NICs, RAID controllers and Disks matter The NICs should be well designed:
NIC should use 64 bit 133 MHz PCI-X (66 MHz PCI can be OK)NIC/drivers: CSR access / Clean buffer management / Good interrupt handling
Worry about the CPU-Memory bandwidth as well as the PCI bandwidthData crosses the memory bus at least 3 times
Separate the data transfers – use motherboards with multiple 64 bit PCI-X buses32 bit 33 MHz is too slow for Gigabit rates64 bit 33 MHz > 80% used
Choose a modern high throughput RAID controllerConsider SW RAID0 of RAID5 HW controllers
Need plenty of CPU power for sustained 1 Gbit/s transfers Packet loss is a killer
Check on campus links & equipment, and access links to backbones New stacks are stable give better response & performance
Still need to set the tcp buffer sizes ! Check other kernel settings e.g. window-scale,
Application architecture & implementation is also important Interaction between HW, protocol processing, and disk sub-system complex
Summary, Conclusions & Thanks
MB - NG
Slide: 24Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
24
More Information Some URLs
UKLight web site: http://www.uklight.ac.uk MB-NG project web site: http://www.mb-ng.net/ DataTAG project web site: http://www.datatag.org/ UDPmon / TCPmon kit + writeup:
http://www.hep.man.ac.uk/~rich/net Motherboard and NIC Tests:
www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.ppt& http://datatag.web.cern.ch/datatag/pfldnet2003/ “Performance of 1 and 10 Gigabit Ethernet Cards with Server Quality Motherboards” FGCS Special issue 2004
TCP tuning information may be found at:http://www.ncne.nlanr.net/documentation/faq/performance.html & http://www.psc.edu/networking/perf_tune.html
TCP stack comparisons:“Evaluation of Advanced TCP Stacks on Fast Long-Distance Production Networks” Journal of Grid Computing 2004
Slide: 25Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
25
Backup Slides
Slide: 26Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
26
High Throughput Demonstrations
Manchester (Geneva)
man03lon01
2.5 Gbit SDHMB-NG Core
1 GEth1 GEth
Cisco GSR
Cisco GSR
Cisco7609
Cisco7609
London (Chicago)
Dual Zeon 2.2 GHz Dual Zeon 2.2 GHz
Send data with TCPDrop Packets
Monitor TCP with Web100
Slide: 27Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
27
Drop 1 in 25,000 rtt 6.2 ms Recover in 1.6 s
High Performance TCP – MB-NG
Standard HighSpeed Scalable
Slide: 28Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
28
High Performance TCP – DataTAG Different TCP stacks tested on the DataTAG Network rtt 128 ms Drop 1 in 106
High-SpeedRapid recovery
ScalableVery fast recovery
StandardRecovery would
take ~ 20 mins
Slide: 29Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
29
SC2004 RAID Controller Performance Supermicro X5DPE-G2 motherboards Dual 2.8 GHz Zeon CPUs with 512 k byte cache and 1 M byte memory 3Ware 8506-8 controller on 133 MHz PCI-X bus
Configured as RAID0 64k byte stripe size six 74.3 GByte Western Digital Raptor WD740 SATA disks
75 Mbyte/s disk-buffer 150 Mbyte/s buffer-memory Scientific Linux with 2.4.20 Kernel + altAIMD patch (Yee) + packet loss patch Read ahead kernel tuning: /proc/sys/vm/max-readahead = 512
RAID0 (stripped) 2Gbyte file: Read 1460 Mbit/s, Write 1320 Mbit/s
Disk – Memory Read Speeds Memory - Disk Write Speeds
RAID0 6disks Read 3w8506-8
0
500
1000
1500
2000
2500
3000
3500
4000
4500
0.0 200.0 400.0 600.0 800.0 1000.0 1200.0 1400.0 1600.0 1800.0 2000.0
File size Mbytes
Thro
ughp
ut M
bit/s
Mbit/s 64-64 r
Mbit/s 512-512 r
Mbit/s 2048-2048 r
RAID0 6disks Write 3w8506-8 16 Oct 04
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0.0 200.0 400.0 600.0 800.0 1000.0 1200.0 1400.0 1600.0 1800.0 2000.0
File size Mbytes
Thro
ughp
ut M
bit/s
Mbit/s 64-64 w
Mbit/s 512-512 w
Mbit/s 2048-2048 w
Slide: 30Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
30
The performance of the end host / disks BaBar Case Study: RAID BW & PCI Activity
3Ware 7500-8 RAID5 parallel EIDE 3Ware forces PCI bus to 33 MHz BaBar Tyan to MB-NG SuperMicro
Network mem-mem 619 Mbit/s Disk – disk throughput bbcp
40-45 Mbytes/s (320 – 360 Mbit/s) PCI bus effectively full! User throughput ~ 250 Mbit/s
Read from RAID5 Disks Write to RAID5 Disks
Slide: 31Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
31
Gridftp Throughput + Web100 RAID0 Disks:
960 Mbit/s read 800 Mbit/s write
Throughput Mbit/s:
See alternate 600/800 Mbit and zero Data Rate: 520 Mbit/s
Cwnd smooth No dup Ack / send stall /
timeouts
Slide: 32Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
32
http data transfers HighSpeed TCP Same Hardware RAID0 Disks Bulk data moved by web servers Apachie web server
out of the box! prototype client - curl http library 1Mbyte TCP buffers 2Gbyte file Throughput ~720 Mbit/s Cwnd - some variation No dup Ack / send stall / timeouts
Slide: 33Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester
33
Bbcp & GridFTP Throughput RAID5 - 4disks Manc – RAL 2Gbyte file transferred bbcp Mean 710 Mbit/s
GridFTP See many zeros
Mean ~710
Mean ~620
DataTAG altAIMD kernel in BaBar & ATLAS