A pparent Networks A pparent Networks 9k A PP Project R esearch Rutherford R esearch Rutherford Network Power Users The Case for Jumbo Packets with WestGrid Examples
Jan 01, 2016
ApparentNetworks
ApparentNetworks
9k APP Project
ResearchRutherford
ResearchRutherford
Network Power Users
The Case for
Jumbo Packets
with
WestGrid Examples
Research Rutherford Apparent
Networks
What are Jumbo Frames & What is 9k MTU?
PREMAC/LLC
IP Header TCP Header Payload Data FCSIFG PREMAC/LLC
IP Header TCP Header Payload Data FCSIFG
OSILayer
Description
7 Application6 Presentation5 Session4 Transport3 Network2 Data Link1 Physical
PREMAC/LLC
IP Header TCP Header Payload Data FCSIFG
MSS(1460bytes)
Maximum Segment Size (MSS)
PREMAC/LLC
IP Header TCP Header Payload Data FCSIFG
Packet (1500 bytes = MTU)
Maximum Transmission Unit (MTU) = Packet
PREMAC/LLC
IP Header TCP Header Payload Data FCSIFG
Frame (1518 bytes)
Frame
Research Rutherford Apparent
Networks
9k MTU - APP- DDS Project Evolution…
9k MTU Project
Core R&ERouter Troubleshooting
Large Xfer MeasurementsInternational Links
Internet 2 SponsoredPhysics Participants
Manual LightpathTRIUMF to CERN TestFew 9k Taps onto Core
Main Tap SDSC
4 yrs
9k APP Project
Lightpath R&ENon-Routed… UCLP
Build Dedicated 9k LinksZX GBICs/quad gigE/campus MM
CANARIE NETERA BCNET 3 Universities Only 9k Link on Campus
Physics & Biochem ParticipantsHEPnet - WestGrid - TRIUMF
9k Node - HPC - 9k NodeViz HPC 9k TestsNFS HPC 9k Tests
0.5 yrs
9k DDS Project
Lightpath R&EClone-Tune 9k APP… Handoff
Provide 9k Switch BiochemUVic - SFU - UofA
Prelim 9k SOA FrameworkDrug Discovery System Flow
Tune 9k SOA FrameworkIntegrate 9k SOA Framework
Demo 9k SOA-DDS Based Collab9k SOA-DDS Software Dev Examples
Complete Handoff … ;-)
2 yrs
Research Rutherford Apparent
Networks
9K MTU Project - Results
GigE 2-way bandwidth vs. MTUfrom Kansas City to various universities
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0 2000 4000 6000 8000 10000
MTU size (bytes)
2-w
ay
Ba
nd
wid
th (
Mb
ps
)
GigE 2-way bandwidth vs. MTUfrom Kansas City to various universities
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0 2000 4000 6000 8000 10000
MTU size (bytes)
2-w
ay
Ba
nd
wid
th (
Mb
ps
)
512MTU
Standard 1500 M
TU
2048MTU
3072MTU
4096MTU
5120MTU
6144MTU
7168MTU
8192MTU
9000MTU
Research Rutherford Apparent
Networks
How do Jumbo Packets Affect Bandwidth?
• If TCP window size and network capacity are not rate limiting factors then (roughly):
0.7 * Max Segment Size (MTU)e2e throughput <
Round Trip Time (latency) sqrt[loss]
M. Mathis, et.al.
• Double the MSS, double the throughput• Effect of slow start?• Effect of irregular flow?
Research Rutherford Apparent
Networks
9k APP – Approaching Application Performance
• Need metrics for defining network impact on dependent applications
• Best current example – MOS as indication of VoIP performance
• Models of network dependence required
• Applicable to QoS/SLA
Research Rutherford Apparent
Networks
9k APP – User Experience
• User expectations of applications• Examples:
• Interaction with 3D models• Collaboration with multiple models/data/voice/video
• Massive data set manipulation• Collaborative HPC simulations
Research Rutherford Apparent
Networks
9k APP - General Models of Network Dependency
• Near Real Time (nrt)• Congestion, Drop, MTU & Transit time sensitive
• Transactional (tr)• Congestion, MTU & Drop sensitive
• Data Transfer (dt)• Congestion, Drop, MTU & Transit time sensitive
• Best Effort (be)• Not sensitive
Research Rutherford Apparent
Networks
9k APP – Real-Time Applications
• Examples – IPTV and Voice-over-IP • Requirement – support human interaction
through highly subjective perceptive processes
• Nature - asynchronous, constant, low-rate, non-TCP, streams
• Dependencies• Highly sensitive to bursty loss• Sensitive to latency, particular in
conversation context• Robust to jitter, up to some limit
Research Rutherford Apparent
Networks
9k APP – Synchronous/Transactional
• Examples - interactive collaborative systems; distributed file systems
• Requirement - maintain some form of state at two or more remote locations;
• Nature - intensive, bursty, synchronous traffic; varies from very small amounts of data to huge exchanges
• Dependencies – requires high transfer rates, irregular; highly sensitive to latencies; intolerant of slow-start
Research Rutherford Apparent
Networks
9k APP - Data Transfer
• Example – FTP; data backup; emergency recovery
• Requirement – transfer massive amounts of data, as quickly as possible
• Nature - sustained one-way flows at maximum rates of transfer
• Dependencies – sensitive to the characteristics of the end-host transmission protocols (i.e. TCP); high capacity; high impact on other flows
Research Rutherford Apparent
Networks
9k APP - Best Effort
• Examples - e-mail, Web browsing, and remote login
• Requirement – sufficient resource to maintain minimal state or connection
• Nature - largely stateless; low rates of data transfer required; not gated by human response
• Dependencies – no critical requirements for network responsiveness
Research Rutherford Apparent
Networks
9k APP - Choice of Application Performance
• Key performance factor – jumbo packets• Previous 9k MTU testing applicable to
simple data transfer use case• Make the case for a more demanding 9k
application performance category• High performance:
• Interactive/collaborative visualization• WestGrid visualization as exemplar
• Distributed file systems• WestGrid Gridstore as exemplar
Research Rutherford Apparent
Networks
Application Performance in Context
NIC driver
kernel daemon
socket
applicationsocket buffer
driver buffer
kernel bufferper cpu
application buffer
programable filter
64 bit parallel data bus
~ 2000 megabytes/sec
sm fibres 10 km 1310 nm~ 1000 megabytes/sec per 10 gigE port
dual port 10 x gigabitEthernet NIC
9k
1.5k
1.5k 1.5k
9k9k
switchrouter
64 bit symetric multiprocessor
rx fi
fo b
uff
er
tx f
ifo
bu
ffe
r
VLAN
Research Rutherford Apparent
Networks
9k APP Project Components
CANARIE
ONS15454
BCNET
ONS15454
CANARIENETERA
ONS15454
CANARIENETERA
VANCOUVER CALGARY EDMONTON
Enterasys N7
STS-24c gigE
gigE
Cisco ONS 10,000 bytes
10,239 bytes
gigELinux
gigE
Dell L3
9,018 bytes
UofA
Physics
ONS15454
CANARIEBCNET
VICTORIA
gigE gigE
Cisco 6509
UVic
Network Services
Linux
gigE
lightpath.phys.ualberta.caRouted IPv4 = 129.128.241.113Lightpath IPv4 = 172.31.241.113Reference IPv4 = 10.128.241.113
phys02.comp.uvic.caRouted IPv4 = 142.104.21.13Lightpath IPv4 = 172.31.21.13Reference IPv4 = 10.104.21.13
STS-24c
STS-24c
UCLP STS-24c
gigE
SM 22 km
long range ZX GBICsfrom CANARIE
HEPnet
Port_3 ZX
172.31 VLAN
ZX GBICZX GBIC SONET SONET
SONET9,216 bytes
172.31 VLAN
172.31 VLAN
172.31 VLAN
172.31 VLAN
dedicatedgigE port
IBM p650 AIX
Routed IPv4 = 206.12.24.65Lightpath IPv4 = 172.31.24.65Reference IPv4 = 10.12.24.65
gridstore.westgrid.ca
7
0 dual port
TRIUMFgigE
campus router
campus router
dual port
route all 172.31.0.0/16(172.31) to one VLAN of 6 on4 port gigE link aggregator
WestGrid
SFU
Enterasys ER16
64,000 bytes
IRMACS
Cisco 6509
9,216 bytes
172.31VLAN
MM
SGIIRMACS
ONYX 3000
note: possibly use unrouted black hole range10.12.24.0/22 to shadow routed IPv4 fromWestgrid 206.12.24.0/22 for lightpaths ... ?... use 172.31 due to conflicts ...
vizserver.westgrid.caRouted IPv4 = 206.12.24.8Lightpath IPv4 = 172.31.24.8Reference IPv4 = 10.12.24.8
172.31 VLANgigE
MM
phase 2 reconfigure N7 inseries between ER16 andCisco 6509
phase 1 direct path fromER16 to Cisco 6509 (ZX GBICslot not available on 6509)
gigE
MM
•Three networks: Netera, CA*net4, BCNET
Research Rutherford Apparent
Networks
9k APP Project Sites & Nodes
•Three sites: UofA, SFU, UVic•Four nodes
•TRIUMF•HEPnet •Vizserver•Gridstore
•Why these?•Physics expertise
•9k lightpath ready•WestGrid applications
•vizserver•gridstore
Research Rutherford Apparent
Networks
9k APP Project - Define 9K paths
•Two phases: Viz & Grid
•Phase 1 Viz•TRIUMF•HEPnet •Vizserver
•Phase 2 Grid•TRIUMF•HEPnet •Gridstore
Research Rutherford Apparent
Networks
9k APP Project - Phase 1 - Viz
•Vizserver session – optional compression•IRIX OS – render pipe
•TCP: vary MTU 68 – 9000 at server•Measure performance vs. MTU
•Vizclient session – wraps local openGL•Fully interactive local X session•refresh requests in render pipe
•VMD: Visual Molecular Dynamics •Xserver & Graphics calls to render pipe•Collaborative VMD sessions
Research Rutherford Apparent
Networks
9k APP Project - Network Build
•3 Lightpaths - UCLP - SONET•Each path STM-24c dropped on gigE
•New quad gigE blade BCNET ONS•2 new ZX GBICs•SM fibre run to SFU lit
•SFU Campus Network•ZX GBIC installed in Enterasys ER16 •Phase 1 - Viz
•MM fibre run direct to Cisco 6509•Phase 2 - Grid
•MM fibre run via Enterasys N7
Research Rutherford Apparent
Networks
9k APP Project - Short Circuit Routing
NIC-0
NIC-1
NIC-0
NIC-1
Routed
Lightpath Short Circuit
Machine 0 dual port Machine 1 dual port
IPv4-0-1
IPv4-0-0
IPv4-1-1
IPv4-1-0
APP-0 APP-1
always on
on/off
APP-Socket-0-1
VM/OS-Socket-0-1
Handler TCP-Session-0-1
TCP-Socket/Port-0-1
LLC-Socket/Port/ARP-0-1
MAC-0-1
Local map unrouted black holeIPv4 to start ... try routed switchover later... ?
1.5k
9k
Research Rutherford Apparent
Networks
9k APP Project – Testing Procedure - pMTU
SFU CANARIE
ONS15454
BCNET
ONS15454
CANARIENETERA
ONS15454
CANARIENETERA
VANCOUVER CALGARY EDMONTON
OC24 OC24 gigE
gigE
Cisco ONS 10,000 bytes
LinuxgigE
Dell L3
9,018 bytes
UofA
Physics
ONS15454
CANARIEBCNET
VICTORIA
OC24
gigE gigE
Cisco 6509
UVicLinux
HEPnet
Routed IPv4 = 206.12.24.8Lightpath IPv4 = 10.12.24.8
lightpath.phys.ualberta.caRouted IPv4 = 129.128.241.113Lightpath IPv4 = 10.128.241.113
9,216 bytes
TRIUMF
phys02.comp.uvic.caRouted IPv4 = 142.104.21.13Lightpath IPv4 = 10.104.21.13
SGI
Cisco 65099,216 bytes
gigE
IRIX
vizserver .westgrid.caIRMACS
Enterasys ER16
64,000 bytes
gigE
Linux
gigE
Routed IPv4 = 206.12.24-27.XXXLightpath IPv4 = 10.12.24-27.XXX
Global Academic Probe Database
AnalysisServer
ApplicationServer
GUI
Sequencer
Sequencer Sequencer
Sequencer
Apparent NetworksMTU Probe
68576
100015003000400050006000700080009000
Apparent NetworksMTU Probe
68576
100015003000400050006000700080009000
Apparent NetworksMTU Probe
68576
100015003000400050006000700080009000
Apparent NetworksMTU Probe
68576
100015003000400050006000700080009000
ProbeReports
Accumulated data from 9k MTU and 9k APP Projects
Research Rutherford Apparent
Networks
• As MTU increases and increasing varies between hops• determine optimal pMTU before and changes
during application use• locate problem hops with unusual behavior
• Is larger effective lower layer pMTU actually better from the application perspective?
• Are packets actually sized appropriately, by the packetization layer, given a larger pMTU?
• What are some effects of larger pMTU under congestion conditions?
9k APP Project – pMTU Issues
Research Rutherford Apparent
Networks
9k APP Project - Phase 2 - Grid
•Distributed file system•Possible candidates
•CXFS, GPFS, NFSv4•Possible clients/servers
•SGI IRIX, IBM AIX, Linux•Preliminary model
•NFSv4 on Linux & IBM AIX(gridstore)•Primary use cases
•File sharing – massive data sets•Physics •Bioinformatics
Research Rutherford Apparent
Networks
9k APP Project - Grid NFS Application
• NFS session• AIX NFSv4 server
• Linux NFSv4 clients• over TCP not UDP
• NFS server – wraps local HPC file system• Fully interactive local file system
session• Fast metadata updates for
directory browsing
Research Rutherford Apparent
Networks
9k APP Project – Grid Network & Testing
•Network reconfigured with intermediate link•Viz performance rechecked•Gridstore VLAN setup
•Fractional quad gigE
•Test via NFSTest suite (opensource)•TCP: vary MTU 68 – 9000 at server
•Probe network after MTU alteration•Measure performance vs. MTU
•Linux NFSv4 to AIX NFSv4•Probe while test in progress
Research Rutherford Apparent
Networks
9k APP Project - Grid - NFS - Tuning
• Objectives derived from WestGrid gridstore• Baseline performance – simple NFS client
• Session• TCP - NFSv4• NFStest suite
• Time to complete vs. MTU• Individual test performance vs. MTU
• Blocksize and other tuning considerations• NFS filesystem mount options
• Block size = 8192 bytes• Fragmentation factors
• Native filesystem block size
Research Rutherford Apparent
Networks
9k APP Project• Bill Rutherford (Rutherford Research/RRX – Project Coordinator)
• Loki Jorgenson (Apparent Networks/SFU – Project Coordinator)
• Thomas Tam (CANARIE/CA*net4 – CANARIE/UCLP Coordinator)
• Bryan Caron (TRIUMF/UofAlberta – TRIUMF/UCLP Coordinator)
• Randy Sobie (HEPnet/UVic – HEPnet President/Grid Integration)
• Brian Corrie (WestGrid/IRMACS/SFU - IRMACS Coordinator)
• Rob Ballantyne (IRMACS/SFU - IRMACS Network Coordinator)
• Martin Siegert (WestGrid/SFU – WestGrid/GridStore Coordinator)
• Dave Bickle (HEPnet/UVic – HEPnet Coordinator/Grid Integration)
• Ken Howard (Network Services/UVic – Network Coordinator)
• Peter van Epp (Network Services/SFU – Network Coordinator)
Research Rutherford Apparent
Networks
9k DDS Project – Drug Discovery System
• Based on 9k APP Project • Combined Physics, Grid, Bioinformatics• Joint development of network & software
• Share network expertise• Help develop preliminary software
• SOA approach• Collaborative viz • Distributed file systems • Instrument interfaces • Grid integration• Lightpath integration
Research Rutherford Apparent
Networks
9k DDS Project – Network Overview
CANARIE
ONS15454
BCNET
ONS15454
CANARIENETERA
ONS15454
CANARIENETERA
VANCOUVER CALGARY EDMONTON
Enterasys N7
STS-24c
gigE
Cisco ONS 10,000 bytes
10,239 bytes
gigE
ONS15454
CANARIEBCNET
VICTORIA
gigE
STS-24c
STS-24c
note: available bandwidth on "lightpath" e2e is dependenton configuration of ONS15454 and activity of ports
UCLP STS-24c
gigE
SM 22 km
long range ZX GBICsfrom CANARIE
Port_3 ZX
10/8 VLAN
ZX GBICZX GBIC SONET SONET
SONET
10/8 VLAN10/8 VLAN
dedicatedgigE port
IBM p650 AIX
Routed IPv4 = 206.12.24.65Lightpath IPv4 = 10.12.24.65
gridstore.westgrid.ca
7
0
route all 10.0.0.0/8 (10/8)todedicated port of 8 portgigE link aggregator
route all 10.0.0.0/8 (10/8)toone VLAN of 6 on 4 portgigE link aggregator
WestGrid
SFU
Enterasys ER16
64,000 bytes
IRMACS
Cisco 6509
9,216 bytes
10/8VLAN
MM
SGIIRMACS
ONYX 3000
note: possibly use unrouted black hole range10.12.24.0/22 to shadow routed IPv4 fromWestgrid 206.12.24.0/22 for lightpaths ... ?
vizserver.westgrid.caRouted IPv4 = 206.12.24.8Lightpath IPv4 = 10.12.24.8
10/8 VLANgigE
MM
9k APP phase 2 reconfigureN7 in series between ER16and Cisco 6509
9k APP phase 1 direct pathfrom ER16 to Cisco 6509(ZX GBIC slot not availableon 6509)
gigEMM gigE
LinuxgigE
Dell L39,018 bytes
UofA
Physics
lightpath.phys.ualberta.caRouted IPv4 = 129.128.241.113Lightpath IPv4 = 10.128.241.113
10/8 VLAN
dual port
TRIUMFgigE
campus router
gigE
LinuxgigE
9k Switch
UofABiochemistry
lightpath.pence.caRouted IPv4 = 129.128.139.2XXLightpath IPv4 = 10.128.139.2XX
10/8 VLAN
dual port
PENCEgigE
campus router
Supplied by Pence
Supplied and set up byTRIUMF, reimbursedby 9k DDS Project
gigE
gigE
Cisco 6509
UVicNetwork Services
Linux
phys02.comp.uvic.caRouted IPv4 = 142.104.21.13Lightpath IPv4 = 10.104.21.13
HEPnet9,216 bytes
10/8 VLAN
campus router
dual port
Supplied by TVBR
gigE
gigE
UVicBiochemistry & Microbiology
Linux
lightpath.bioc.uvic.caRouted IPv4 = 142.104.33.XXXLightpath IPv4 = 10.104.33.XXX
TVBR
10/8 VLAN
campus router
dual port
9k Switch
Supplied and set up byHEPnet, reimbursedby 9k DDS Project
gigE
SFUMolecular Biology and Biochemistry
10/8 VLAN
9k Switch
Supplied and set up byWestGrid, reimbursedby 9k DDS Project
Linux
lightpath.mbb.sfu.caRouted IPv4 = 142.58.213.XXXLightpath IPv4 = 10.58.213.XXX
MBB
campus router
dual portSupplied by MBB
9k DDS phase 2reconfigure todedicated gigE portfrom ONS
9k DDS phase 2reconfigure todedicated gigE portfrom ONS
9k DDS phase 1 tapTRIUMF gigE portfrom ONS for setup
9k DDS phase 1 tap HEPnetgigE port from ONS for setup
9k DDS phase 2reconfigure todedicated gigE portfrom ONS
9k DDS phase 1 tap WestGrid-IRMACS gigE port from ONSfor setup
Research Rutherford Apparent
Networks
Future – Performance Profiles by Application
• APP Network Performance Profile • Build up statistical APP profiles• Use APP profiles to optimize context
• Next Generation Router Design• Use APP Profiles
• Allocate resources• Design microflow queues
• Identify MTU issues• Dynamically configure path mechanics
Research Rutherford Apparent
Networks
End of Presentation
Note: 9k APP Project Meeting
in
Room 1535 at 3:00
Research Rutherford Apparent
Networks
Outline
• Short overview of previous 9K work• What is 9k? • 9k XXX Project snapshots?• Bandwidth value … example data?• How effect bandwidth? – equations?
• Application performance – definition• User experience … limits?• Near Real-time (nrt)• Transactional (tr)• Bulk transfer (bt)• Best-effort (be)
• Value of MOS… expand? to the VoIP industry• Project objectives
• Isolate a single simple performance factor – packet size?• Identify prospective applications
• Interactive collaborative visualization (nrt)• Distributed file system (tr + bt + nrt[metadata])
• Characterize application performance in context … • stack mechanics … grid integration?
Research Rutherford Apparent
Networks
Outline – cont.
• Project components• Three sites: UofA, SFU, Uvic
• Why these sites … only 9k available + RTT factor?• Three networks: BCNET, Netera, CA*net4• WestGrid applications
• Brian’s Westgrid vizserver… IRMACS • Martin’s Westgrid grid storage
• Define 9K paths• Application profiling
• Visualization server – phase 1• Identify system• Define primary use case … why vmd … why collab?• Define network profile … describe nw build?• Identify testing procedure … pMTU tests … issues
Research Rutherford Apparent
Networks
Outline – cont.
• Distributed file system – phase 2• Identify candidates – CXFS, GPFS, NFSv3-4• Identify possible clients/servers• Define primary use cases … physics … DDS file sharing• Define network profile• Identify testing procedure (NFStest)
• Basic types of test … ?
• Futures• 9k DDS … trend to separate purpose nw… key role of UCLP• Performance profiles by application … ???
• Ng rtr … mtu issues … app microflow queues?• Credits• Meeting Reminder
Research Rutherford Apparent
Networks
BCNET ANC 9k APP Project Meeting Agenda – April 26 3-4pm
• Review of current status• Lightpaths• IRMACS• Gridstore• HEPnet• TRIUMF
• Viz test• Network• Probes & schedule• Tests• Demos (incl special medicine collab demo UVic – UofA)
• Gridstore test• Network• Probes & schedule• Tests• Demos
• Follow on• 9k DDS preliminary
• UCLP integration … ?• SOA ideas… ?