Networkshop March 2005 Richard Hughes-Jones Manchester Bandwidth Challenge, Land Speed Record, TCP/IP and You
Jan 29, 2016
Networkshop March 2005Richard Hughes-Jones Manchester
Bandwidth Challenge, Land Speed Record,TCP/IP and You
Networkshop March 2005Richard Hughes-Jones Manchester
The SC Network
Working with S2io, Cisco & folks
At the SLAC BoothRunning theBW Challenge
Bandwidth Lust at SC2003
Networkshop March 2005Richard Hughes-Jones Manchester
The Bandwidth Challenge at SC2003 The peak aggregate bandwidth from the 3 booths was 23.21Gbits/s 1-way link utilisations of >90% 6.6 TBytes in 48 minutes
Networkshop March 2005Richard Hughes-Jones Manchester
Multi-Gigabit flows at SC2003 BW Challenge Three Server systems with 10 Gigabit Ethernet NICs Used the DataTAG altAIMD stack 9000 byte MTU Send mem-mem iperf TCP streams From SLAC/FNAL booth in Phoenix to:
Pal Alto PAIX rtt 17 ms , window 30 MB Shared with Caltech booth 4.37 Gbit HighSpeed TCP I=5% Then 2.87 Gbit I=16% Fall when 10 Gbit on link
3.3Gbit Scalable TCP I=8% Tested 2 flows sum 1.9Gbit I=39%
Chicago Starlight rtt 65 ms , window 60 MB Phoenix CPU 2.2 GHz 3.1 Gbit HighSpeed TCP I=1.6%
Amsterdam SARA rtt 175 ms , window 200 MB Phoenix CPU 2.2 GHz 4.35 Gbit HighSpeed TCP I=6.9%
Very Stable Both used Abilene to Chicago
10 Gbits/s throughput from SC2003 to PAIX
0
1
2
3
4
5
6
7
8
9
10
11/19/0315:59
11/19/0316:13
11/19/0316:27
11/19/0316:42
11/19/0316:56
11/19/0317:11
11/19/0317:25 Date & Time
Throughput
Gbits/s
Router to LA/PAIXPhoenix-PAIX HS-TCPPhoenix-PAIX Scalable-TCPPhoenix-PAIX Scalable-TCP #2
10 Gbits/s throughput from SC2003 to Chicago & Amsterdam
0
1
2
3
4
5
6
7
8
9
10
11/19/0315:59
11/19/0316:13
11/19/0316:27
11/19/0316:42
11/19/0316:56
11/19/0317:11
11/19/0317:25 Date & Time
Throughput
Gbits/s
Router traffic to Abilele
Phoenix-Chicago
Phoenix-Amsterdam
Networkshop March 2005Richard Hughes-Jones Manchester
SCINet
Collaboration at SC2004 Setting up the BW Bunker
The BW Challenge at the SLAC Booth
Working with S2io, Sun, Chelsio
Networkshop March 2005Richard Hughes-Jones Manchester
UKLight & ESLEA at SC2004 UK e-Science Researchers from Manchester, UCL & ULCC involved in
the Bandwidth Challenge Collaborated with Scientists & Engineers from Caltech, CERN, FERMI, SLAC,
Starlight, UKERNA & U. of Florida Networks used by the SLAC/UK team:
10 Gbit Ethernet link from SC2004 to ESnet/QWest PoP in Sunnyvale 10 Gbit Ethernet link from SC2004 and the CENIC/NLR/Level(3) PoP in
Sunnyvale 10 Gbit Ethernet link from SC2004 to Chicago and on to UKLight
UKLight focused on Gigabit disk-to-disk transfers between UK sites and Pittsburgh
UK had generous support from Boston Ltd who loaned the servers
The BWC Collaboration had support from: S2io NICs Chelsio TOE Sun who loaned servers
Essential support from Boston, Sun & Cisco
Networkshop March 2005Richard Hughes-Jones Manchester
The Bandwidth Challenge – SC2004 The peak aggregate bandwidth from the booths was 101.13Gbits/s That is 3 full length DVDs per second ! 4 Times greater that SC2003 ! Saturated TEN 10Gigabit Ethernet waves SLAC Booth: Sunnyvale to Pittsburgh, LA to Pittsburgh and Chicago
to Pittsburgh (with UKLight).
Networkshop March 2005Richard Hughes-Jones Manchester
Land Speed Record – SC2004 Pittsburgh-Tokyo-CERN Single stream TCP
LSR = Distance * Speed Single Stream, Multiple Stream, IPv4 and IPv6 Standard TCP Current single stream IPv4 University of Tokyo, Fujitsu & WIDE 9 Nov 05 20,645 km connection SC2004 booth - CERN via Tokyo Latency 433 ms RTT 10 Gbit Chelsio TOE Card 7.21 Gbps (TCP payload), 1500 B mtu taking about 10 min 148,850 Tetabit meter / second (Internet2 LSR approved record) Full DVD in 5 s
Networkshop March 2005Richard Hughes-Jones Manchester
Just a Well Engineered End-to-End Connection
End-to-End “no loss” environment
NO contention, NO sharing on the end-to-end path
Processor speed and system bus characteristics
TCP Configuration – window size and frame size (MTU)
Tuned PCI-X bus
Tuned Network Interface Card driver
A single TCP connection on the end-to-end path
Memory-to-Memory transfer
no disk system involved
No real user application (but did file transfers!!)
Not a typical User or Campus situation BUT …
So what’s the matter with TCP – Did we cheat?
InternetInternet
Regional
Regional
Regional
Regional
Campus
Campus
Campus
Campus
Client
Server
Campus
Campus
Campus
Campus
Client
Server
UK LightUK Light
From Robin Tasker
Networkshop March 2005Richard Hughes-Jones Manchester
TCP (Reno) – What’s the problem?
TCP has 2 phases: Slowstart
Probe the network to estimate the Available BWExponential growth
Congestion AvoidanceMain data transfer phase – transfer rate glows “slowly”
AIMD and High Bandwidth – Long Distance networksPoor performance of TCP in high bandwidth wide area networks is due
in part to the TCP congestion control algorithm. For each ack in a RTT without loss:
cwnd -> cwnd + a / cwnd - Additive Increase, a=1 For each window experiencing loss:
cwnd -> cwnd – b (cwnd) - Multiplicative Decrease, b= ½
Packet loss is a killer !!
Networkshop March 2005Richard Hughes-Jones Manchester
TCP (Reno) – Details Time for TCP to recover its throughput from 1 lost packet given by:
for rtt of ~200 ms:
MSS
RTTC
*2
* 2
2 min
0.00010.0010.010.1
110
1001000
10000100000
0 50 100 150 200rtt ms
Tim
e t
o r
eco
ver
sec
10Mbit100Mbit1Gbit2.5Gbit10Gbit
UK 6 ms Europe 20 ms USA 150 ms
Networkshop March 2005Richard Hughes-Jones Manchester
Investigation of new TCP Stacks The AIMD Algorithm – Standard TCP (Reno)
For each ack in a RTT without loss:
cwnd -> cwnd + a / cwnd - Additive Increase, a=1 For each window experiencing loss:
cwnd -> cwnd – b (cwnd) - Multiplicative Decrease, b= ½ High Speed TCP
a and b vary depending on current cwnd using a table a increases more rapidly with larger cwnd – returns to the ‘optimal’ cwnd size sooner
for the network path b decreases less aggressively and, as a consequence, so does the cwnd. The effect is
that there is not such a decrease in throughput. Scalable TCP
a and b are fixed adjustments for the increase and decrease of cwnd a = 1/100 – the increase is greater than TCP Reno b = 1/8 – the decrease on loss is less than TCP Reno Scalable over any link speed.
Fast TCP
Uses round trip time as well as packet loss to indicate congestion with rapid convergence to fair equilibrium for throughput.
HSTCP-LP, H-TCP, BiC-TCP
Networkshop March 2005Richard Hughes-Jones Manchester
Packet Loss with new TCP Stacks TCP Response Function
Throughput vs Loss Rate – further to right: faster recovery Drop packets in kernel
MB-NG rtt 6ms DataTAG rtt 120 ms
Networkshop March 2005Richard Hughes-Jones Manchester
Packet Loss and new TCP Stacks TCP Response Function
UKLight London-Chicago-London rtt 177 ms 2.6.6 Kernel
Agreement withtheory good
sculcc1-chi-2 iperf 13Jan05
1
10
100
1000
100100010000100000100000010000000100000000Packet drop rate 1 in n
TC
P A
chie
vable
thro
ughput
Mbit/
s
A0 1500
A1 HSTCP
A2 Scalable
A3 HTCP
A5 BICTCP
A8 Westwood
A7 Vegas
A0 Theory
Series10
Scalable Theory
sculcc1-chi-2 iperf 13Jan05
0
100
200
300
400
500
600
700
800
900
1000
100100010000100000100000010000000100000000Packet drop rate 1 in n
TC
P A
chie
vable
thro
ughput
Mbit/
s
A0 1500
A1 HSTCP
A2 Scalable
A3 HTCP
A5 BICTCP
A8 Westwood
A7 Vegas
Networkshop March 2005Richard Hughes-Jones Manchester
High Throughput Demonstrations
Manchester (Geneva)
man03lon01
2.5 Gbit SDHMB-NG Core
1 GEth1 GEth
Cisco GSRCisco GSRCisco7609
Cisco7609
London (Chicago)
Dual Zeon 2.2 GHz Dual Zeon 2.2 GHz
Send data with TCPDrop Packets
Monitor TCP with Web100
Networkshop March 2005Richard Hughes-Jones Manchester
High Performance TCP – DataTAG Different TCP stacks tested on the DataTAG Network rtt 128 ms Drop 1 in 106
High-SpeedRapid recovery
ScalableVery fast recovery
StandardRecovery would
take ~ 20 mins
Networkshop March 2005Richard Hughes-Jones Manchester
Is TCP fair?
TCP Flows – Sharing the Bandwidth
Networkshop March 2005Richard Hughes-Jones Manchester
Chose 3 paths from SLAC (California) Caltech (10ms), Univ Florida (80ms), CERN (180ms)
Used iperf/TCP and UDT/UDP to generate traffic
Each run was 16 minutes, in 7 regions
Test of TCP Sharing: Methodology (1Gbit/s)
Ping 1/s
Iperf or UDT
ICMP/ping traffic
TCP/UDPbottleneck
iperf
SLACCaltech/UFL/CERN
2 mins 4 mins
Les Cottrell PFLDnet 2005
Networkshop March 2005Richard Hughes-Jones Manchester
Low performance on fast long distance paths AIMD (add a=1 pkt to cwnd / RTT, decrease cwnd by factor b=0.5 in congestion) Net effect: recovers slowly, does not effectively use available bandwidth, so poor
throughput Unequal sharing
TCP Reno single stream
Congestion has a dramatic effect
Recovery is slow
Increase recovery rate
SLAC to CERN
RTT increases when achieves best throughput
Les Cottrell PFLDnet 2005
Remaining flows do not take up slack when flow removed
Networkshop March 2005Richard Hughes-Jones Manchester
UK Transfers MB-NG and SuperJANET4
Throughput for real users
Networkshop March 2005Richard Hughes-Jones Manchester
iperf Throughput + Web100 SuperMicro on MB-NG network HighSpeed TCP Linespeed 940 Mbit/s DupACK ? <10 (expect ~400)
BaBar on Production network Standard TCP 425 Mbit/s DupACKs 350-400 – re-transmits
Networkshop March 2005Richard Hughes-Jones Manchester
Applications: Throughput Mbit/s HighSpeed TCP 2 GByte file RAID5 SuperMicro + SuperJANET
bbcp
bbftp
Apachie
Gridftp
Previous work used RAID0(not disk limited)
Networkshop March 2005Richard Hughes-Jones Manchester
bbftp: What else is going on? Scalable TCP
BaBar + SuperJANET
SuperMicro + SuperJANET
Congestion window – duplicate ACK Variation not TCP related?
Disk speed / bus transfer Application
Networkshop March 2005Richard Hughes-Jones Manchester
SC2004 & Transfers with UKLight
A Taster for Lambda & Packet Switched Hybrid Networks
Networkshop March 2005Richard Hughes-Jones Manchester
Transatlantic Ethernet: TCP Throughput Tests
Supermicro X5DPE-G2 PCs Dual 2.9 GHz Xenon CPU FSB 533 MHz 1500 byte MTU 2.6.6 Linux Kernel Memory-memory TCP throughput Standard TCP
Wire rate throughput of 940 Mbit/s
First 10 sec
Work in progress to study: Implementation detail Advanced stacks Effect of packet loss Sharing
0
500
1000
1500
2000
0 20000 40000 60000 80000 100000 120000 140000
time ms
TCPA
chiv
e M
bit/s
0
200000000
400000000
600000000
800000000
1000000000
1200000000
1400000000
Cwnd
InstaneousBWAveBWCurCwnd (Value)
0
500
1000
1500
2000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
time ms
TCPA
chiv
e M
bit/s
0
5000000
10000000
15000000
20000000
25000000
30000000
35000000
40000000
Cwnd
InstaneousBWAveBWCurCwnd (Value)
Networkshop March 2005Richard Hughes-Jones Manchester
SC2004 Disk-Disk bbftp (work in progress)
bbftp file transfer program uses TCP/IP UKLight: Path:- London-Chicago-London; PCs:- Supermicro +3Ware RAID0 MTU 1500 bytes; Socket size 22 Mbytes; rtt 177ms; SACK off Move a 2 Gbyte file Web100 plots:
Standard TCP Average 825 Mbit/s (bbcp: 670 Mbit/s)
Scalable TCP Average 875 Mbit/s (bbcp: 701 Mbit/s
~4.5s of overhead)
Disk-TCP-Disk at 1Gbit/sis here!
0
500
1000
1500
2000
2500
0 5000 10000 15000 20000
time msT
CP
Ach
ive M
bit
/s
050000001000000015000000200000002500000030000000350000004000000045000000
Cw
nd
InstaneousBW
AveBW
CurCwnd (Value)
0
500
1000
1500
2000
2500
0 5000 10000 15000 20000
time ms
TC
PA
ch
ive M
bit
/s
050000001000000015000000200000002500000030000000350000004000000045000000
Cw
nd
InstaneousBWAveBWCurCwnd (Value)
Networkshop March 2005Richard Hughes-Jones Manchester
Super Computing Bandwidth Challenge gives opportunity to make world-wide High performance tests.
Land Speed Record shows what can be achieved with state of the art kit Standard TCP not optimum for high throughput long distance links Packet loss is a killer for TCP
Check on campus links & equipment, and access links to backbones Users need to collaborate with the Campus Network Teams Dante Pert
New stacks are stable give better response & performance Still need to set the TCP buffer sizes ! Check other kernel settings e.g. window-scale maximum Watch for “TCP Stack implementation Enhancements”
Host is critical think Server quality not Supermarket PC Motherboards NICs, RAID controllers and Disks matter
NIC should use 64 bit 133 MHz PCI-X 66 MHz PCI can be OK but 32 bit 33 MHz is too slow for Gigabit rates
Worry about the CPU-Memory bandwidth as well as the PCI bandwidth Data crosses the memory bus at least 3 times
Separate the data transfers – use motherboards with multiple 64 bit PCI-X buses Choose a modern high throughput RAID controller
Consider SW RAID0 of RAID5 HW controllers Users are now able to perform sustained 1 Gbit/s transfers
Summary, Conclusions & Thanks
MB - NG
Networkshop March 2005Richard Hughes-Jones Manchester
More Information Some URLs UKLight web site: http://www.uklight.ac.uk MB-NG project web site: http://www.mb-ng.net/ DataTAG project web site: http://www.datatag.org/ UDPmon / TCPmon kit + writeup:
http://www.hep.man.ac.uk/~rich/net Motherboard and NIC Tests:
http://www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.ppt& http://datatag.web.cern.ch/datatag/pfldnet2003/ “Performance of 1 and 10 Gigabit Ethernet Cards with Server Quality Motherboards” FGCS Special issue 2004 http:// www.hep.man.ac.uk/~rich/
TCP tuning information may be found at:http://www.ncne.nlanr.net/documentation/faq/performance.html & http://www.psc.edu/networking/perf_tune.html
TCP stack comparisons:“Evaluation of Advanced TCP Stacks on Fast Long-Distance Production Networks” Journal of Grid Computing 2004
PFLDnet http://www.ens-lyon.fr/LIP/RESO/pfldnet2005/ Dante PERT http://www.geant2.net/server/show/nav.00d00h002
Networkshop March 2005Richard Hughes-Jones Manchester
Any Questions?
Networkshop March 2005Richard Hughes-Jones Manchester
Backup Slides
Networkshop March 2005Richard Hughes-Jones Manchester
Topology of the MB – NG Network
KeyGigabit Ethernet2.5 Gbit POS Access
MPLS Admin. Domains
UCL Domain
Edge Router Cisco 7609
man01
man03
Boundary Router Cisco 7609
Boundary Router Cisco 7609
RAL Domain
Manchester Domain
lon02
man02
ral01
UKERNADevelopment
Network
Boundary Router Cisco 7609
ral02
ral02
lon03
lon01
HW RAID
HW RAID
Networkshop March 2005Richard Hughes-Jones Manchester
Topology of the Production Network
KeyGigabit Ethernet2.5 Gbit POS Access10 Gbit POS
man01
RAL Domain
Manchester Domain
ral01
HW RAID
HW RAID routers switches
3 routers2 switches
Networkshop March 2005Richard Hughes-Jones Manchester
SC2004 UKLIGHT Overview
MB-NG 7600 OSRManchester
ULCC UKLight
UCL HEP
UCL network
K2
Ci
Chicago Starlight
Amsterdam
SC2004
Caltech BoothUltraLight IP
SLAC Booth
Cisco 6509
UKLight 10GFour 1GE channels
UKLight 10G
Surfnet/ EuroLink 10GTwo 1GE channels
NLR LambdaNLR-PITT-STAR-10GE-16
K2
K2 Ci
Caltech 7600
Networkshop March 2005Richard Hughes-Jones Manchester
Drop 1 in 25,000 rtt 6.2 ms Recover in 1.6 s
High Performance TCP – MB-NG
Standard HighSpeed Scalable
Networkshop March 2005Richard Hughes-Jones Manchester
bbftp: Host & Network Effects 2 Gbyte file RAID5 Disks:
1200 Mbit/s read 600 Mbit/s write
Scalable TCP
BaBar + SuperJANET Instantaneous 220 - 625 Mbit/s
SuperMicro + SuperJANET Instantaneous
400 - 665 Mbit/s for 6 sec Then 0 - 480 Mbit/s
SuperMicro + MB-NG Instantaneous
880 - 950 Mbit/s for 1.3 sec Then 215 - 625 Mbit/s
Networkshop March 2005Richard Hughes-Jones Manchester
Average Transfer Rates Mbit/s
App TCP Stack SuperMicro on MB-NG
SuperMicro on
SuperJANET4
BaBar on
SuperJANET4
SC2004 on UKLight
Iperf Standard 940 350-370 425 940
HighSpeed 940 510 570 940
Scalable 940 580-650 605 940
bbcp Standard 434 290-310 290
HighSpeed 435 385 360
Scalable 432 400-430 380
bbftp Standard 400-410 325 320 825
HighSpeed 370-390 380
Scalable 430 345-532 380 875
apache Standard 425 260 300-360
HighSpeed 430 370 315
Scalable 428 400 317
Gridftp Standard 405 240
HighSpeed 320
Scalable 335
New stacksgive more
throughput
Rate decreases
Networkshop March 2005Richard Hughes-Jones Manchester
UKLight and ESLEA Collaboration forming for SC2005
Caltech, CERN, FERMI, SLAC, Starlight, UKLight, …
Current Proposals include: Bandwidth Challenge with even faster disk-to-disk transfers between UK
sites and SC2005 Radio Astronomy demo at 512 Mbit user data or 1 Gbit user data
Japan, Haystack(MIT), Jodrell Bank, JIVE High Bandwidth linkup between UK and US HPC systems 10Gig NLR wave to Seattle
Set up a 10 Gigabit Ethernet Test Bench Experiments (CALICE) need to investigate >25 Gbit to the processor
ESLEA/UKlight need resources to study: New protocols and congestion / sharing The interaction between protcol processing, applications and storage Monitoring L1/L2 behaviour in hybrid networks
Networkshop March 2005Richard Hughes-Jones Manchester
10 Gigabit Ethernet: UDP Throughput Tests 1500 byte MTU gives ~ 2 Gbit/s Used 16144 byte MTU max user length 16080 DataTAG Supermicro PCs Dual 2.2 GHz Xenon CPU FSB 400 MHz PCI-X mmrbc 512 bytes wire rate throughput of 2.9 Gbit/s
CERN OpenLab HP Itanium PCs Dual 1.0 GHz 64 bit Itanium CPU FSB 400 MHz PCI-X mmrbc 512 bytes wire rate of 5.7 Gbit/s
SLAC Dell PCs giving a Dual 3.0 GHz Xenon CPU FSB 533 MHz PCI-X mmrbc 4096 bytes wire rate of 5.4 Gbit/s
an-al 10GE Xsum 512kbuf MTU16114 27Oct03
0
1000
2000
3000
4000
5000
6000
0 5 10 15 20 25 30 35 40Spacing between frames us
Rec
v W
ire
rate
Mb
its/
s
16080 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes
Networkshop March 2005Richard Hughes-Jones Manchester
10 Gigabit Ethernet: Tuning PCI-X 16080 byte packets every 200 µs Intel PRO/10GbE LR Adapter PCI-X bus occupancy vs mmrbc
Measured times Times based on PCI-X times from
the logic analyser Expected throughput ~7 Gbit/s Measured 5.7 Gbit/s
0
5
10
15
20
25
30
35
40
45
50
0 1000 2000 3000 4000 5000Max Memory Read Byte Count
PC
I-X
Tra
nsfe
r tim
e u
s
0
1
2
3
4
5
6
7
8
9
PC
I-X
Tra
nsfe
r ra
te G
bit/s
Measured PCI-X transfer time usexpected time usrate from expected time Gbit/s Max throughput PCI-X
Kernel 2.6.1#17 HP Itanium Intel10GE Feb04
0
2
4
6
8
10
0 1000 2000 3000 4000 5000Max Memory Read Byte Count
PC
I-X
Tra
nsfe
r tim
e
us
measured Rate Gbit/srate from expected time Gbit/s Max throughput PCI-X
mmrbc1024 bytes
mmrbc2048 bytes
mmrbc4096 bytes5.7Gbit/s
mmrbc512 bytes
CSR Access
PCI-X Sequence
Data Transfer
Interrupt & CSR Update
Networkshop March 2005Richard Hughes-Jones Manchester
10 Gigabit Ethernet: SC2004 TCP Tests Sun AMD opteron compute servers v20z Chelsio TOE Tests between Linux 2.6.6. hosts
10 Gbit ethernet link from SC2004 to CENIC/NLR/Level(3) PoP in Sunnyvale Two 2.4GHz AMD 64 bit Opteron processors with 4GB of RAM at SC2004 1500B MTU, all Linux 2.6.6 in one direction 9.43G i.e. 9.07G goodput and the reverse direction 5.65G i.e. 5.44G goodput Total of 15+G on wire.
10 Gbit ethernet link from SC2004 to ESnet/QWest PoP in Sunnyvale One 2.4GHz AMD 64 bit Opteron each end 2MByte window, 16 streams, 1500B MTU, all Linux 2.6.6 in one direction 7.72Gbit/s i.e. 7.42 Gbit/s goodput 120mins (6.6Tbits shipped)
S2io NICs with Solaris 10 in 4*2.2GHz Opteron cpu v40z to one or more S2io or Chelsio NICs with Linux 2.6.5 or 2.6.6 in 2*2.4GHz V20Zs LAN 1 S2io NIC back to back: 7.46 Gbit/s LAN 2 S2io in V40z to 2 V20z : each NIC ~6 Gbit/s total 12.08 Gbit/s
Networkshop March 2005Richard Hughes-Jones Manchester
Transatlantic Ethernet: disk-to-disk Tests Supermicro X5DPE-G2 PCs Dual 2.9 GHz Xenon CPU FSB 533 MHz 1500 byte MTU 2.6.6 Linux Kernel RAID0 (6 SATA disks) Bbftp (disk-disk) throughput Standard TCP
Throughput of 436 Mbit/s
First 10 sec
Work in progress to study: Throughput limitations Help real users
0
500
1000
1500
2000
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000
time ms
TCP
Ach
ive
Mbi
t/s
0
5000000
10000000
15000000
20000000
25000000
30000000
35000000
40000000
Cw
nd
InstaneousBWAveBWCurCwnd (Value)
0
500
1000
1500
2000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
time ms
TCP
Ach
ive
Mbi
t/s
0
5000000
10000000
15000000
20000000
25000000
30000000
35000000
40000000
Cw
nd
InstaneousBWAveBWCurCwnd (Value)
sculcc1-chi-2
0
100
200
300
400
500
600
700
800
900
1000
0 10 20 30TCP buffer size Mbytes
TC
P A
chie
vabl
e th
roug
hput
M
bit/
s
iperf Sender Mbit/sbbftp Mbit/s
Networkshop March 2005Richard Hughes-Jones Manchester
SC2004 Disk-Disk bbftp (work in progress)
UKLight: Path:- London-Chicago-London; PCs:- Supermicro +3Ware RAID0 MTU 1500 bytes; Socket size 22 Mbytes; rtt 177ms; SACK off Move a 2 Gbyte file Web100 plots:
HS TCP
Don’t believe this is a protocol problem !
0
500
1000
1500
2000
2500
0 5000 10000 15000 20000 25000 30000 35000 40000 45000
time ms
TC
PA
ch
ive M
bit
/s
050000001000000015000000200000002500000030000000350000004000000045000000
Cw
nd
InstaneousBW
CurCwnd (Value)
0
200
400
600
800
1000
1200
0 5000 10000 15000 20000 25000 30000 35000 40000 45000time ms
Nu
m. D
up
A
CK
s
0
5000000
1000000015000000
20000000
25000000
3000000035000000
40000000
45000000
Cw
nd
DupAcksIn (Delta)CurCwnd (Value)
0
0.2
0.4
0.6
0.8
1
1.2
0 5000 10000 15000 20000 25000 30000 35000 40000 45000time ms
Nu
m. T
im
eo
uts
0
5000000
10000000
15000000
20000000
25000000
30000000
35000000
40000000
45000000
Cw
nd
Timeouts (Delta)CurCwnd (Value)
1
10
100
1000
0 5000 10000 15000 20000 25000 30000 35000 40000 45000time ms
nu
m O
th
er R
ed
uctio
ns
0
5000000
1000000015000000
20000000
25000000
3000000035000000
40000000
45000000
Cw
nd
OtherReductions (Delta)
CurCwnd (Value)