Scalable Integrated Performance Analaysis of Multi-Gigabit Networks Ezra Kissel, U. Delaware Ahmed El-Hassany, Guilherme Fernandes, Martin Swany, Indiana U. Dan Gunter, Taghrid Samak, LBNL Jen Schopf, WHOI
Feb 24, 2016
Scalable Integrated Performance Analaysis of Multi-Gigabit Networks
Ezra Kissel, U. DelawareAhmed El-Hassany, Guilherme Fernandes,
Martin Swany, Indiana U.Dan Gunter, Taghrid Samak, LBNL
Jen Schopf, WHOI
What I hope you learn1. Why we care about bulk data
transfer at multi-gigabit rates2. Why and how detailed monitoring is
helpful3. How dynamic control of monitoring
is related to Session Layer protocols
4/16/12 2
Bulk data transfer needs
• Some domains of interest:– Climate simulation (Earth System Grid)– Genomics (JGI)– High-energy physics (Large Hadron
Collider)– Astronomy (Large Synoptic Survey
Telescope)– Astrophysics (FLASH)
Huge data
Analysis
sites
4/16/12 3
Multi-gigabit rates• Networks connecting national labs
and universities have 10Gb/s and soon 100Gb/s capability. one PB = one day at 100Gb/s
• Rarely achieved due to bottlenecks:– Host: Application or Disks– Campus/local networks–Wide area networks
• Hard to tell why, where, or even if there is a problem4/16/12 4
SolutionMonitor all the timeAnalyze all the time
.. but much more when something interesting is happening
Use analysis results as feedback
4/16/12 5
System components• eXtensible Session Protocol (XSP)– Associate multiple TCP connections, L2 circuits,
as a "session"– Provide channels for bi-directional metadata
• NL-Calipers– Summarize in situ timings of every read/write
• BLiPP– Host and TCP stack info. using XSP channels
• PerfSONAR– Standard information formats and exchange
protocols4/16/12 6
Dynamic Session Monitoring
User
(1) Start xfer
(2) Open session3) data
(3) NL-calipers data
(4) Signal TCP (4) Signal TCP(5) data
(5) data
Look at the performance Networkengineer
4/16/12 7
Bottleneck detection
4/16/12 8
Triangles give "instantaneous" throughput
On fixed intervals, summarize all measurements into mean, min, max, variance for both rate and #bytes
Instrumentatio
n
Analysis: pick lowest mean value as bottleneck, apply t-test
TCP throughputTime series of throughput* for representative TCP experiments: (a) 1 stream memory-to-disk with 100ms latency, (b) 1 stream memory-to-memory with no latency, (c) 1 stream disk-to-disk with no latency, (d) 4 streams memory-to-disk with 100ms latency and 1% loss added at 60 seconds.
4/16/12 9
UDT throughputTime series of throughput* for representative UDT experiments: (a) 4 streams memory-to-disk with 100ms latency, (b) 4 streams memory-to-disk with 100ms latency and 1% loss added at 60 seconds, (c) 4 streams disk-to-disk with 100ms latency, (d) 4 streams memory-to-memory with 100ms latency.
4/16/12 10
Wait, what?
4/16/12 11
Half as many read()s.Others return zero, not counted
Variance
Less work being done
4/16/12 12
Review• Why we care about bulk data
transfer at multi-gigabit rates• Why and how detailed monitoring is
helpful• How monitoring is related to Session
Layer protocols– and how that might integrate with a
management framework• Questions?4/16/12 13
Related projects• NetLogger netlogger.lbl.gov• perfSONAR perfsonar.org• XSP damsl.cis.udel.edu/• GENI geni.net• CEDPS cedps-scidac.org
4/16/12 14
Topology-aware Monitoring
4/16/12 15