Networks for Data Intensive Science Environments Eli Dart, Network Engineer ESnet Network Engineering Group 2012 BES Neutron & Photon Detector Workshop Gaithersburg, MD August 1-3, 2012
Networks for Data Intensive Science Environments
Eli Dart, Network Engineer
ESnet Network Engineering Group
2012 BES Neutron & Photon Detector Workshop
Gaithersburg, MD
August 1-3, 2012
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Outline
• Intro
• Things I know
• Things I think I know
• Open questions
8/1/12 2
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Intro
I am not a detector designer or a beamline scientist – I do networks • Architecture • Performance • Requirements analysis
ESnet is: • The high-performance networking facility of DOE/SC • A nationwide network connecting the National Laboratories to
each other and to the rest of the world − High-performance science connectivity – 100Gbps shortly − General-purpose business connectivity
• An active partner in several large science experiments that rely on networking for success
8/1/12 3
SNLL
PNNL
LANL
SNLA
JLAB
PPPL
LLNL
JGI
LBNL
Salt Lake
Geography is only representational
GFDL PU Physics
UCSD Physics
SUNN
SEAT
STAR
commercial peering points R&E network peering locations
ALBU
LASV
SDSC LOSA
Routed IP 100 Gb/s Routed IP n X 10 Gb/s 3rd party 10Gb/s Express / metro 100 Gb/s Express / metro 10G Express multi path 10G Lab supplied links Other links Tail circuits
Major Office of Science (SC) sites LBNL Major non-SC DOE sites LLNL
CLEV
ESnet optical transport nodes (only some are shown)
ESnet5 DRAFT Map – November 2012
ESnet managed 10G router 10 ESnet managed 100G routers 100
Site managed routers 100 10
100
10 10
10
10
10
100 10
10 10
10
100
100
100
100
100
10
10
100
100
10
10
10
10
10
100
100 10
10
10
100 10
100
SUNN ESnet PoP/hub locations
LOSA ESnet optical node locations (only some are shown)
10
100
100
100
100
100
100 EQ
CH
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Things I Know
• Network architecture for high performance
• Successful practices from other experiments
• Collaboration scale and systems expertise
8/1/12 5
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Network Architecture for High Performance
Common pain point – data mobility (the movement of data to analysis, storage, visualization, etc.)
• This is a common problem for multiple scientific disciplines in multiple environments
• Reliability/correctness, consistency, performance (in that order) • Reasonable solutions exist – to a first order the issues are
with deployment, not with development Network architecture and system tuning make all the difference
• System and network configuration defaults are typically wrong • Network architecture enables performance and makes
troubleshooting far easier • http://fasterdata.es.net/science-dmz/science-dmz-architecture/
8/1/12 6
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
What Is “The Network” Anyway?
From the users’ perspective, “The Network” is not a bunch of routers, switches, fiber, and so on
From the users’ perspective, “The Network” is the thing that is broken when remote data transfers are hard (That’s fun to say, but what does it really mean?)
• “The Network” is the set of devices and applications involved in the use of a remote resource
• The primary user interface to “The Network” is a data transfer tool or other user-agent application that talks to a tool or application on a different host
This means that without well-configured end systems, science networks do not exist as effective scientific tools
Therefore, the utility of “The Network” depends on the existence and availability of well-configured end systems
7
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Success Stories from Other Experiments
Large Hadron Collider experiments – data and service challenges • Data distribution and analysis system was extensively tested before
LHC operation • Regular tests at increasing scale • Networking architecture hardened, system configs worked out, etc. • When the LHC began operation, the data mobility and data analysis
infrastructure worked Dedicated resources for data transfer
• This has been applied in many environments (LHC, supercomputer centers, tokamaks, beamlines)
• Data Transfer Nodes (dedicated hosts) configured specifically for data transfer
• Connected to the network correctly (see Science DMZ content on fasterdata.es.net)
8/1/12 8
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Light source community
• Science DMZ principles (dedicated systems, network architecture) applied at the LBNL ALS
• Data transfers from data transfer node to supercomputer center (NERSC)
• 300MB/sec disk to disk performance over the network
8/1/12 9
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
10GB 100GB 1TB 10TB 100TB 1PB 10PB 100PB
Science Community Classifications N
umbe
r of c
olla
bora
tions
Sci
entis
ts p
er c
olla
bora
tion
Low
Low
H
igh
Hig
h
Approximate data set size
Small scale instrument science (e.g. Light Sources, Nanoscience centers, Microscopy)
Supercomputer simulations (e.g. Climate, Fusion, Life Sciences)
Large scale instrument science (e.g. HEP, NP)
A few large collaborations have their own internal software and networking groups
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Collaboration scale and systems expertise
The LHC experiments have a significant advantage in networking over most other scientific collaborations
• The human scale is sufficient that the collaborations can support internal specialization
• The LHC experiments have their own network engineering, system administration, and large-scale software development groups
• This allows the fusion of domain-specific knowledge and specialized networking and systems expertise
Most other collaborations do not have the human scale to permit the same level of internal specialization
• Therefore, most collaborations must import systems and networking expertise from somewhere
• ESnet maintains a knowledge base for this purpose http://fasterdata.es.net/
• Send your system and network administrators there, and/or have them contact ESnet – we’re happy to help 8/1/12 11
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Things I Think I Know
Common data mobility toolset can be applied to different data acquisition systems
• Data sets must be moved between filesystems • Data mobility tools are file format agnostic • Therefore, it should be possible to generalize wide area data
movement • A framework for workflows should be able to include this
Import/adoption of tools and techniques from other communities should be straightforward
• Leverages existing investments • Some things will not apply well – we need to run into these quickly
so that areas of need can be identified
8/1/12 12
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Open Questions
What will the integration of data acquisition systems, data transfer systems, and large-scale analysis look like?
• Data acquisition is going to become more complex (e.g. multi-stage trigger farms, as in LHC experiments)
• Need for analysis of multiple types in multiple locations • Several ongoing discussions about workflow
What does real-time analysis actually look like from a networking and computational systems perspective?
• Can’t put a petaflop at the detector • What is the right decomposition of analysis and workflow? • Real-time analysis must be effective without costing so much that
it has to be done on a distant supercomputer • What is “matching” (things you can easily specify a priori) and
what is “analysis” (stuff that is hard to specify a priori)?
8/1/12 13
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Open Questions (2)
How can remote operation and remote collaboration be effectively integrated?
How do needs for real-time analysis fit into remote operation? • If real-time or near-real-time analysis is done at the
beamline, the difficulty increases for remote collaboration tools (need remote operation of the analysis system too)
• Do we just make the analysis tool into a videoconferencing client?
Several of these questions all point toward the need for larger frameworks that leverage commonalities
8/1/12 14
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Discussion
• Data transfer is not the only problem (or the most important problem), but if data transfer doesn’t work then many other things are not possible
• It is very helpful to get the systems and networks for high data rate experiments deployed early – ESnet can help
• My naïve assumption is that there are common data mobility design patterns that could support many experiment types
• There is a lot of network bandwidth available – if you can’t use it, something is broken. Call someone.
• We have network services with machine-consumable interfaces − Heavily used by LHC experiments − Available for use in frameworks in this space
8/1/12 15
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Other slides – Science DMZ
8/2/12 17
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Common Denominator – Data Mobility Data produced at one facility, analyzed elsewhere
• Scientist has allocation at facility A, data at facility B
• Transactional and workflow issues − Experiment, data collection, analysis, results, interpretation, action − Short duty cycle workflows between distant facilities
The inability to move data hinders science • Instruments are run at lower resolution so data sets are tractable
• Grad students often assigned to data movement rather than research
Large data movement doesn’t happen by accident, requires:
• Properly tuned system and network – default settings do not work
• Combination of networks, systems, tools infrastructure must work together cohesively
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
What Is “The Network” Anyway?
From the users’ perspective, “The Network” is not a bunch of routers, switches, fiber, and so on
From the users’ perspective, “The Network” is the thing that is broken when remote data transfers are hard (That’s fun to say, but what does it really mean?)
• “The Network” is the set of devices and applications involved in the use of a remote resource
• The primary user interface to “The Network” is a data transfer tool or other user-agent application that talks to a tool or application on a different host
This means that without well-configured end systems, the science networks we all deploy do not exist as effective scientific tools
Therefore, the utility of “The Network” depends on the existence and availability of well-configured end systems
19
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
The Network As The Foundation
High-performance, feature-rich wide area science network, engineered for large data flows
• Low loss, high capacity, well instrumented • Expert staff to consult, troubleshoot, build, advance • Collaboration with global research networks
Site networks • Connect site/lab/campus resources to WAN services • Local science infrastructure matched to WAN services
Data mobility layer rests on network foundation • Data transfer nodes run mobility tools, e.g. workflow, data transfer, service on-
ramps • Toolset is the user interface to the network – the two must be well-matched
8/2/12 20
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Network Performance – What’s Important?
Ease of use – reduce complexity • Reduce number of devices in the path (fewer things to troubleshoot,
configure, etc – more on this in a minute)
• Dedicated infrastructure for data movement
Zero packet loss • Again, reduce device count
• Use appropriate network devices (e.g. with deep output queues) – this means eliminating LAN devices from WAN data path
Test and measurement • Need well-defined location for well-configured test and measurement gear
(e.g. perfSONAR)
• Locate test and measurement devices near data transfer systems
Security – decouple science and business security policy and control points
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Traditional DMZ
DMZ – “Demilitarized Zone” • Network segment near the site perimeter with different security policy
• Commonly used architectural element for deploying WAN-facing services (e.g. email, DNS, web)
Traffic for WAN-facing services does not traverse the LAN • WAN flows are isolated from LAN traffic
• Infrastructure for WAN services is specifically configured for WAN
Separation of security policy improves both LAN and WAN • No conflation of security policy between LAN hosts and WAN services
• DMZ hosts provide specific services
• LAN hosts must traverse the same ACLs as WAN hosts to access DMZ
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
The Data Transfer Trifecta: The “Science DMZ” Model
8/2/12 23
Dedicated Systems for
Data Transfer
Network Architecture
Performance Testing &
Measurement
Data Transfer Node • High performance • Configured for data
transfer • Proper tools
Science DMZ • Dedicated location for
DTN • Easy to deploy - no
need to redesign the whole network
• Additional info: http://fasterdata.es.net/
perfSONAR • Enables fault isolation • Verify correct operation • Widely deployed in
ESnet and other networks, as well as sites and facilities
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Science DMZ Takes Many Forms
There are a lot of ways to combine these things – it all depends on what you need to do
• Small installation for a project or two • Facility inside a larger institution • Institutional capability serving multiple departments/divisions • Science capability that consumes a majority of the infrastructure
Some of these are straightforward, others are less obvious
Key point of concentration: High-latency path for TCP
Lawrence Berkeley National Laboratory U.S. Department of Energy | Office of Science
Science DMZ - Simple