End-To-End Provisioned Optical Network Testbed for Large-Scale eScience Applications Nagi Rao, Bill Wing Nagi Rao, Bill Wing Computer Science and Mathematics Computer Science and Mathematics Division Division Oak Ridge National Laboratory Oak Ridge National Laboratory [email protected],[email protected]Nov 12, 2003 Nov 12, 2003 Project Kick-off Meeting, University of Virginia Project Kick-off Meeting, University of Virginia Sponsored by Sponsored by NSF Experimental Infrastructure Networks Program NSF Experimental Infrastructure Networks Program Tony Mezzacappa Tony Mezzacappa Physics Division Physics Division Oak Ridge National Laboratory Oak Ridge National Laboratory [email protected]
43
Embed
End-To-End Provisioned Optical Network Testbed for Large-Scale eScience Applications
Nagi Rao, Bill Wing Computer Science and Mathematics Division Oak Ridge National Laboratory [email protected],[email protected]. Tony Mezzacappa Physics Division Oak Ridge National Laboratory [email protected]. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
End-To-End Provisioned Optical Network Testbed for Large-Scale eScience Applications
Nagi Rao, Bill Wing Nagi Rao, Bill Wing
Computer Science and Mathematics Computer Science and Mathematics DivisionDivision
Oak Ridge National LaboratoryOak Ridge National Laboratory
Control and Steering Channels:Interactive visualization – human response timeComputational steering – respond to “inertia” of computation
Coordinated Channels:Coordinated visualization, steering, and archivalMultiple visualization and steering nodes
On Internet: these channels can be supported only in a limited way – It is difficult to sustain large data rates in a fair manner – Unpredictability of transport dynamics makes it very difficult to achieve stability
Types of Networking Channels
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Several Candidate Protocols (to be tested):UDP-based data transport:
UDT(SABUL), tsunami, hurricane, RBUDP, IQ-RUDP, and othersAdvantages: application-level implementations and conceptually simple methodsDisadvantages: unstable code and hard to configure parameters
Tuned TCP methods:net100: tune flow windows large enough to avoid self created lossesAdvantages: known mechanisms and tested kernel codeDisadvantages: physical losses are problematic
– TCP interprets physical losses as congestion and reduces throughputHost Issues for 1-10Gbps Rates: Impedance match issues
– Buffering in NIC, kernel and application, disk speeds • –zero-copy kernel patch and ST
– OS bypass, RDMA
Data Transfers Over Dedicated Channels
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Multiple Streams Over Dedicated Channels
Visualization stream
SteeringVisualization control
Example:•Monitor computation through a visualization channel•Interactive visualization – rotate, project different subspaces •Computational Steering – specify parameters on the fly•Archive/load the data – store the interesting data
High performancestorage
Option 1:Dedicated channels for each stream
•4 NICS – 4 MSPP slots
Data stream
Option 2:Share dedicated channels
•single NIC and MSPP slots•realize sharing at protocol or application level
Option 3:Visualization streams on one channel Data and steering streams on another channel
•two NIC and MSPP slots•realize sharing at protocol or application level
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Connection: Logical: host site to host site
Circuit or Channel or Bandwidth Pipe:Physical: NIC-NIC
Stream: Logical: Application to application
Terminology Review
Visualization stream
Control stream
Data stream
connection
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Advantages: No other traffic on the channel• Simpler protocols:
– Rate controllers with loss recovery mechanisms would suffice for• data transfers and• control channels for host-host connections
• Coordination between the streams can be handled at application/middleware levelDisadvantages:• Scaling problems:
– single connection requires 4 NIC-NIC pairs and 4 channels in the example– main computation site supporting 5 users requires
• host with 20 NICs and 20 channels• MSPP with least 20 slots (e.g 5 blades each with 4 GigE slots)
• Utilization problems:– Even a small control stream needs an entire channel (with minimum resolution)
• E.g., 10Mbps control stream on GigE channel
Dedicated NIC-NIC Channels
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Streams interact and affect each other:• Packets may be “pooled” at the source and destination nodes:
– NIC – interrupt coalescing and buffer clearing– NIC-kernel transfers through buffers– Kernel-application transfers
• Processor load determines interrupt response time at finer levelsTwo important consequences
– Protocols or applications need to “share” the channel• Need protocols that allow for appropriate bandwidth sharing• TCP-like paradigm but a more structured problem
– Total bandwidth is known– Competing traffic is host generated
– Protocol interaction could generate complicated dynamics• Need protocols that stabilize the dynamics for control channels• Very few protocols exist that protect against “underflow”
• Need a combination of existing and newer protocols
Multiple Streams on Single NIC-NIC Channel
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
TSI Application interfaces and networking modules
Application module1 Application
Module 2Application Module 3
Dedicated provisioned channels
Bulk transportmodules streaming
protocols
Stabilizationmodules
Controlmodules
channels
middleware
protocols
applicationsinterfaces
Computational steering
dynamics visualizationdata transfers
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Overall Approach: Separate the steering and display components:– Steering module – connect it visualization control channel– Display module – Separate rendering and display sub-modules and locate them at hosts– Connect sub-modules over data channels
• Candidates under consideration – all need hooks to use dedicated channels– OpenGL, VTK codes
– code needs to be modified with appropriate calls – non-trivial – enSight
• can operate across IP networks without firewalls• High cost and no access to source code
– Paraview • stability problems and hard to use
– Aspect (?)• Developed at ORNL • Has functionality similar to Paraview with additional analysis modules• Developers are willing to incorporate CHEETAH modules
– On-line streaming– Large datasets
Interfacing with visualization modules
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Decomposition of visualization pipeline:– “links” have different bandwidths
• Geometry could be larger than data• Display bandwidth can be much smaller – human consumption
– tasks require different computational power• Large datasets require a cluster to compute the geometry• Rendering can be done on graphics-enabled machines• Display can be transferred to X-enabled machine
Pipeline can be realized over the network and display can be forward to user host
Optimizing visualization pipeline on a network
data storage geometrycomputation
displayrendering
Host node
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Problem is simpler than Internet:Total available channel bandwidth is known All traffic is generated by the nodes and is “known”Fairness issues are simpler – nodes can allocate bandwidth among streams
TCP addresses these problems over the Internet:slow-start to figure out available bandwidthpacket loss and time-out to conclude traffic levelsAIMD to adjust the flow rate
Bandwidth partitioning among data streams might require close-loop control:Simply (open-loop) control of data rates at application level does not always work:
Example: NIC has higher capacity than the provisioned channel:1. packets might be combined and sent out at higher rate by NIC causing losses at MSPP2. packets can be coalesced at receiver NIC resulting rates different from sending
Protocols for dedicated channels – multiple data streams
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Problem is to maintain “steady” dynamics for the control streams between applicationsNot just between NICs or at the lineComplicated end-to-end dynamics can be caused by various factors:
Channel losses:Physical lossesLosses due to sum of streams exceeding the capacity
Impedance mismatch between NIC and lineNIC and kernelkernel and application
On the Internet:Only probabilistic solution is possible over Internet because of complicated cross traffic dynamics – our
solutions based on stochastic approximationTCP does not solve the problem
Multiple TCP/UDP streams generate chaos-like dynamicsSingle TCP stream on the dedicated channel has underflow problem
Tune the flow-window at the desired level and adjust AIMD not to kick-inburst of losses can kill the stream – TCP interprets
This problem still simpler than Internet:Here cross-traffic is generated by the nodes and is “known”
Channels must explicitly stabilized using application-level closed loop control
Protocols for dedicated channels – multiple data and control streams
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Simulation Results: TCP-AIMD exhibits chaos-like trajectoriesTCP streams competing with each other on a dedicated link (Veres and Boda
2000)TCP competing with UDP on a dedicated link (Rao and Chua 2002)
Analytical Results (Rao and Chua 2002): TCP-AIMD has chaotic regimesCompeting with UDP steady streams on a dedicated link
State space analysis and Poincare maps
Internet Measurements (2003, last few weeks): TCP-AIMD traces are a complicated mixture of stochastic and chaotic componentsNote: on dedicated links we expect less or no chaotic component
Complicated Dynamics Interacting Streams
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Question: How relevant are the simulation and analytical results on chaotic trajectories?Answer: Only partially.
Internet (net100) traces show that TCP-AIMD dynamics are complicated mixture of chaotic and stochastic regimes:
• Case 4. Target goodput = 2.0 Mbps, rate control through sleep time, a = 0.8,
Datagram acknowledging time ( ) vs. source rate (Mbps) & goodput (Mbps)
s
• Case 5. Target goodput = 2.0 Mbps, rate control through sleep time, a = 0.9,
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Throughput Stabilization: ORNL-GaTech
. Desired goodput level = 20.0 Mbps, a = 0.8, , adjustment made on congestion window
8.0
Desired goodput level = 2.0 Mbps, a = 0.8, , adjustment made on sleep time
8.0
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Experiments with tsunamifirebird.ccs.ornl.gov – ccil.cc.gatech.edu
• Network transport control settings:– NIC speed and path bandwidth: 1 Gbps – Transferred file size: 204,800,000 bytes– Using default_block_size: 32768 bytes
• Transmission statistics from Tsunami:– Ave. sending rate 296.05 Mbps– Loss rate: 64.32%– Transfer time: 17.51 sec– Throughput: 93.6 Mbps– Sending time&receiving time vs. block sequence number (figure
next slide)
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
• Path bandwidth: 10 Mbps• Using datagram size: 1400 bytes (the
default one doesn’t work)• File size: 10,240,000 bytes• Case 1: Only Tsunami running
– Throughput 9.47 Mbps (receiver, client)
– Goodput 4.20 Mbps (sender, server)
– Sending time&receiving time vs. datagram sequence number (figure right)
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
• Case 2: Only ONTCOU (throughput maximization SA) running• Source goodput: 3.5 Mbps• Sending time&acknowledging time vs. datagram sequence number• Sending rate vs. source goodput
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
• Case 3: Tsunami and ONTCOU running simultaneously with the same datagram size– Tsunami
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Design and test transport protocols for dedicated channels1. Single data streams – collaboration with UVa2. One data and two control streamsTesting on ORNL-ATL-ORNL GigE-SONET link
Interfaces with visualization software:Simple supernova computation at ORNL hosts on dedicated linkDeveloping interfaces to Aspect visualization modules and testingTest Paraview and EnSight
ORNL Year 1 Tasks
ORNL host 1linux
ORNL host 1linux
Juniper M160 router
OC 192 SOX router
ORNL Atlanta
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Design and test transport protocols for dedicated channelsMultiple data, visualization and control streamsTesting on CHEETAH testbed
Interface with visualization:Interfacing supernova visualization modules over CEETAHDeveloping interfaces to Aspect visualization modules with TSI dataset
ORNL Year 2 Tasks
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Design and test transport protocols for dedicated channelsCollaborating multiple data, visualization and control streamsTesting on CHEETAH testbed
Interface with visualization:Interfacing supernova visualization and computation modules over CEETAHDeveloping interfaces to Aspect visualization modules with TSI on-line computationsOptimizing mapping of visualization pipeline
ORNL Year 3 Tasks
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Feedback and Corrections
UT-BATTELLEU.S. Department of Energy Oak Ridge National Laboratory
Dynamics of visualization control and steering streams must be stabilized from application to application– Not enough to stabilize lower transport levels– NIC to line transfers may not be smooth– Application to kernel transfers depend on the processor load
• Provide a user interface for steering and connect it to transport modules