Data Transfer Nodes (DTNs) Improving large-scale data transfer performance www.geant.org Tim Chown, Jisc GÉANT GN4-3 project, WP6 joint WP leader DeiC Conference, 31 October 2019
Data Transfer Nodes (DTNs)
Improving large-scale data transfer performance
www.geant.org
Tim Chown, JiscGÉANT GN4-3 project, WP6 joint WP leader
DeiC Conference, 31 October 2019
2 www.geant.org
• A little about the GÉANT GN4-3 project
• The data transfer problem / challenge
• Science DMZ architecture and DTNs
• Using perfSONAR to test the benefits of Science DMZ and DTNs
• DTN-related projects
• NREN DTN survey
• Transfers to/from commercial cloud providers
• Commercial vs R&E networks
• Closing thoughts
Introduction
3 www.geant.org
• Collaborative project between the European NRENs
• Runs the GÉANT network and associated services• https://www.geant.org/Projects/GEANT_Project_GN4-3
• Approximately €70m over 4 years (Jan 2019 – Dec 2022)
• Parallel GN4-3N project restructuring backbone network
• WP6 is Network Technologies and Services Development• Evaluating new technologies and building new services• Task 1 – Network Technology Evolution (includes DTNs in Year 1)• Task 2 – Network Services Evolution and Development• Task 3 – Management and Monitoring• Co-led by Ivana Golub (PSNC) and me
The GÉANT GN4-3 project
4 www.geant.org
• Growing interest in R&E community in moving large volumes of research data • From point of capture or generation to a
remote computing facility• For remote data visualisation• Data replication, distributed storage and
backups• To or from cloud providers
• Data set volumes are increasing• 10 TB data sets are not unusual • 100 TB is no longer very large
The problem / challenge
www.diamond.ac.uk
www.skatelescope.org
5 www.geant.org
Researcher network expectations?
https://community.jisc.ac.uk/groups/janet-end-end-performance-initiative/document/network-expectations-data-intensive-science
6 www.geant.org
• ESnet published the Science DMZ model in 2012/13:• https://www.es.net/assets/pubs_presos/sc13sciDMZ-final.pdf
• Three key elements:• Design an appropriate campus network architecture, avoiding local
bottlenecks and causes of packet loss, especially generic campus firewalls• Deploy persistent network performance measurement (i.e., perfSONAR)• Optimise data transfer node (DTN) design and configuration / tuning
• Apply security policy without negatively impacting performance• Streamlined filters, not complex deep packet inspection
• Differential handling of day-to-day and science traffic
The Science DMZ and DTNs – optimising data transfers
7 www.geant.org
Example of a Science DMZ architecture
Dark
FiberDark
Fiber
10GE
Dark
Fiber
10GE
10G
Border Router
WAN
Science DMZ
Switch/Routers
Enterprise Border
Router/Firewall
Site / Campus
LAN
Project A DTN
(building A)
Per-project
security
policy
perfSONAR
perfSONAR
Facility B DTN
(building B)
Cluster DTN
(building C)
perfSONARperfSONAR
Cluster
(building C)
fasterdata.es.net
Design aims to minimizepacket loss and thusmaximise TCP throughput,especially for higher RTT(international) paths.
Security via efficient ACLsand host firewalls, notgeneric campus firewall
8 www.geant.org
• Jisc is encouraging Janet-connected sites to deploy perfSONAR, so we set up our own servers against which they can run tests
• When working with communities who want to move data, we can also host perfSONAR meshes for them• Jisc provides MaDDash on a VM platform• Allows at-a-glance view of network performance across a community• Offering central archiving of measurement data soon
• Our nodes:• London PoP near GÉANT, 10G: https://ps-londhx1.ja.net/toolkit/• Slough shared DC, 10G: http://ps-slough-10g.ja.net/toolkit/
• Nodes are open for remote tests, and available over IPv4 or IPv6
• Also have perfSONAR small nodes (PMP-like) available for loan
Using perfSONAR to test the benefit of Science DMZ and DTNs
9 www.geant.org
• An example of data that was being moved by physical media• Southampton -VIS X-Ray Imaging Centre• Taking samples to Diamond Light Source about six times a year• Might gather 10-40 TB of experimental result data per visit• One data set typically a ~50 GB file, plus up to 5,000 8-25 MB files• Tried using network and rsync; obtained ~30 MB/s (240 Mbit/s)• Using physical media the full transfer process took around 3 weeks
• We ought to be able to do better…• Diamond end has already deployed Science DMZ• Southampton has a 10 Gbit/s campus link to Janet• A target of ~2 Gbit/s would allow ~1 TB per hour
Case study – University of Southampton
www.diamond.ac.uk
10 www.geant.org
• Met with Diamond and campus IT & research staff
• Agreed a phased plan of action:• 1. Change to using Globus transfer tools• 2. Deploy perfSONAR to measure network characteristics• 3. Engineer a 10 Gbit/s link to the research file store,
internal to the campus firewall• 4. Pilot a 10 Gbit/s Science DMZ DTN at campus edge
• Outcome:• External data transfers achieving 2-4 Gbit/s• Able to transfer their most recent 12 TB data set
in 6-12 hours (i.e., overnight)
Working with the computing service and researchers
JANET
10G
10G
Core Core
10G 10G
10G
External DTN
External PerfSONAR
10G
10G
Internal DTN Internal PerfSONAR
10G
10G
10G
10G
10G
10G
1G
10G
perfsonar-b5-mgtem1
perfsonar-b5-datap1p1
perfsonar-extp1p1
10G10G
11 www.geant.org
• We set up a perfSONAR mesh for the Southampton case study (running on a Jisc VM)
• Used measurement points at Diamond, Janet (London), and two at Southampton (by internal file store, and by DTN at campus edge)
•
perfSONAR network measurements
12 www.geant.org
Jisc London pS node to Southampton internal filestore pS node
Throughput peaksand troughsevery day/night;small (<1%) loss notably impactsperformancePacket loss when
firewall loaded
Christmas vacation period
13 www.geant.org
Jisc London pS node to Southampton campus DTN pS node
More consistentthroughput to DTN
No packet loss
14 www.geant.org
• AENEAS - https://www.aeneas2020.eu• Federated European Science Data Center (ESDC) to support the astronomy
community in achieving the goals of the Square Kilometer Array (SKA)
• PRP - https://prp.ucsd.edu• Effort to improve data transfer performance between the DoE ASCR HPC
facilities at ANL, LBNL, ORNL, NCSA• FIONA DTNs have SSD but also support additional GPU compute
• Process - https://www.process-project.eu• Creating data applications for collaborative research: Exascale learning on
medical image data, and many other research applications
• Data Mover Challenge 2020• https://www.sc-asia.org/data-mover-challenge-2020/• Seven teams testing their software on a worldwide DTN network• Includes DTNs in Europe, Asia and the US
Examples of DTN-related projects
15 www.geant.org
• Useful to be able to let our members run disk-to-disk tests, to try different data transfer software and to test disk i/o and tuning
• We host two 10G Data Transfer Nodes (DTNs) at our Slough DC
• Production DTN:• Hosts a Globus endpoint; can read/write at 10Gbit/s• Allows direct iperf tests by prior arrangement
• Experimental DTN:• Runs alternative TCP protocols, e.g., TCP-BBR• Allows alternative transfer tools to be evaluated: WDT, QUIC, …
• Not built as staging DTNs
• Challenge: federated access to the systems – OAuth? eduGAIN?
• We’re running 100G DTN / transfer tests in a private testbed• Our first university, Imperial College, recently connected to Janet at 100G
Jisc’s backbone / reference DTN deployments
16 www.geant.org
• I spotted that DeiC supports ad-hoc iperf tests• https://www.deic.dk/en/node/759
• Can be useful – we support this on our Slough DTN and our NOC runs a server for its internal use• But doesn’t provide data over time
• And measurements may conflict with other tests
Aside – quick iperf tests
17 www.geant.org
• Run during October 2019
• 29 NREN responses (from GÉANT APM contacts)
• User groups mentioned that move data intensively thanks to the NRENs/GÉANT networks:
• Physics : (HEP) LHCONE, Astro-physics, LOFAR | HPC: PRACE | Astronomy | Biology: human brain, ELIXIR ... | Environment and climate research: CMIP6, Copernicus
• Some NRENs see their role purely as transport capacity providers
• Results presented at STF18• https://wiki.geant.org/pages/viewpage.action?spaceKey=APM&title=18th
+STF+-+Copenhagen%2C+22-23+Oct+2019
NREN DTN survey
18 www.geant.org
• Network
• Long distance transfers, firewalls, last mile networking, connection capacity
• Poor network performance and difficulty to troubleshoot
• Tuning campus, LAN and local systems
• Perceived to be difficult to implement Science DMZ with security
• IAAS usage without coordination with NREN
• Low user expectation – researchers transport large volumes of data transfer using hard drives
NREN DTN Survey – reported issues
19 www.geant.org
NREN DTN Survey – measuring transfer performance?
“Other” includes internalprobes, iperf, NetMinder,Cacti, HawkEye, and in-house tools.
20 www.geant.org
NREN DTN survey – ways to support large-scale data transfers?
“Other” includes remote support,help with system tuning, Aspera,dedicated links, LHCONE, running head nodes, consultancy
21 www.geant.org
NREN DTN Survey – other assistance?
“Other” includes optimizing TCP stack, talks at events, help with Globus,annual meetings, bandwidth checks,engaging with research communities
22 www.geant.org
• The DTN survey showed there is (currently) no clear demand for specific software or hardware development to support improved data transfer performance• Good tools and practices exist• Part of the issue is research engagement and dissemination• Many NRENs are doing this well, some less so
• We are thus setting up a focus group in WP6T2 to take the data transfer infrastructure work area forward• 11 NRENs are interested in working with the project• Will identify priorities, and consider providing a best practice wiki• One suggestion is to explore DTN-as-a-Service• An example to look at is AARNet’s CloudStor service
GÉANT DTN work – beyond the survey
23 www.geant.org
• Globus Connect• https://www.globus.org/globus-connect• Run endpoint on the DTN
• Presents GUI to researcher• Just “drag and drop” data to transfer• Can selectively transfer files
• Base transfer tool free to use• Subscription for advanced features
• Uses GridFTP under the hood• Parallel TCP data transfer• Typically four data streams• Adds resilience to packet loss
Bear in mind researchers need simple tools – Globus example
24 www.geant.org
• Excellent paper to be presented at CHEP2019• “Characterising network paths in and out of the clouds”• https://indico.cern.ch/event/773049/contributions/3473824/
• Cloud computing increasingly important for many science disciplines• Details of networking to/from cloud not well-documented
• Intra-cloud throughput of many 100’s of Gbps• Free to transfer data into cloud provider
• Data export is the interesting part• Tests show 20-30Gbps export possible• Ballpark movement costs - $70/TB egress, $20/TB between regions• Provider may waive network costs up to 15% of total bill• So if you use a lot of compute, the data movement may be “free”
What about transfers to/from commercial cloud providers?
25 www.geant.org
• Do NREN R&E networks deliver better large-scale data transfer performance?• Are we better at supporting our researchers and scientists?
• Interesting comparison done in 2017• https://connect.geant.org/2017/05/15/taking-it-to-the-limit-
testing-the-performance-of-re-networking
• Europe (GÉANT) to Australia (AARNet), between DTNs• 9.27Gbps over R&E network, with 400MB TCP buffers• Commercial provider 1 – 0.9Gbps with 200MB buffer• Commercial provider 2 – 1.72 Gbps with 300MB buffer,
dropped to zero after ~30 seconds – anti-DoS kicking in?
Commercial vs R&E networks
26 www.geant.org
• You almost certainly have users or research communities wanting to move significant amounts of data• Do they know what is available to them to support that?• Research engagement is really, really important!
• Science DMZ principles, including DTNs, should be adopted• Differentiate handling of day-to-day and science traffic
• Being able to measure performance is really useful• perfSONAR for network throughput, but consider disk-to-disk too
• Deploying an NREN backbone DTN can be helpful for users• But not many NRENs are doing this at present• Perhaps there’s a potential DTN-as-a-Service offering to be built?
• Need to think cloud for the future
Closing thoughts
Thank you
www.geant.org
Any questions?
Contact: [email protected]
© GÉANT Association on behalf of the GN4 Phase 3 project (GN4-3).The research leading to these results has received funding fromthe European Union’s Horizon 2020 research and innovation programme under Grant Agreement No. 856726 (GN4-3).