Top Banner
CASE STUDY Caltech and 2CRSI: Data Transfer Nodes Scientists at research universities and labs around the world are working to understand the human genome, share detailed images of the cosmos, and explain the fundamental particles that make up the universe. This observational science and research generates massive amounts of data— multiple petabytes in some cases. Research facilities are experiencing a growing challenge in distributing their data-intensive science to labs where researchers can work with it. Transferring a petabyte across current networks can take over 24 hours and the data generated from a single experiment can take weeks to transfer from lab to lab, slowing down important research and maxing out a network’s capacity. Addressing this infrastructure challenge, the National Science Foundation (NSF) provided a five-year, $5 million grant to create a pilot program with University of California, San Diego and University of California, Berkeley creating the Pacific Research Platform, or PRP. This program connects over a dozen universities, supercomputer centers, national labs, and other facilities and universities beyond the state of California through a 100 gigabit (Gb) network. Technology created by Intel and 2CRSI (storage system, high-performance computing, and customized IT appliance manufacturer) is making it possible to utilize this bandwidth through the use of high performance data transfer nodes (DTNs). By 2017 the network will transfer data 1,000x faster than existing networks, enabling terabytes of data to be transferred within a few hours. 1 Getting the Necessary Technology in Place To transfer the data that is gathered from experiments in the lab, petabyte-sized workloads are moved from a variety of mass storage systems to 2U-24 NVMe* DTNs designed by 2CRSI. These devices are powered by Intel® Xeon® E5-2600 v4 processors that operate at up to 160W and contain up to 24 Intel PCIe*-based SSDs. The extremely low latency of Intel® SSDs allows the network to transfer data at speeds of 100- 200 gigabits per second (Gbps). 2 On the receiving end of the network there are similarly configured DTNs (Figure 2). Together, these DTNs serve as a buffer allowing the network to be saturated with a burst of data. Bursting the data over the network as fast as possible frees the network up for other uses and applications, but to do this effectively, DTN performance has to be similar on both ends of the network. This is where the Intel and 2CRSI DTN solution comes into play. Intel® SSDs Help Power the World’s Biggest Supernetwork Figure 1: Pacific Research Platform: UC, San Diego and UC, Berkeley lead creation of West Coast big data freeway system.
4

Intel® SSDs and 2CRSI Help Power the World's Biggest ... · Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain

Aug 29, 2019

Download

Documents

vuthu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Intel® SSDs and 2CRSI Help Power the World's Biggest ... · Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain

CASE STUDYCaltech and 2CRSI: Data Transfer Nodes

Scientists at research universities and labs around the world are working to understand the human genome, share detailed images of the cosmos, and explain the fundamental particles that make up the universe. This observational science and research generates massive amounts of data—multiple petabytes in some cases. Research facilities are experiencing a growing challenge in distributing their data-intensive science to labs where researchers can work with it. Transferring a petabyte across current networks can take over 24 hours and the data generated from a single experiment can take weeks to transfer from lab to lab, slowing down important research and maxing out a network’s capacity.

Addressing this infrastructure challenge, the National Science Foundation (NSF) provided a five-year, $5 million grant to create a pilot program with University of California, San Diego and University of California, Berkeley creating the Pacific Research Platform, or PRP. This program connects over a dozen universities, supercomputer centers, national labs, and other facilities and universities beyond the state of California through a 100 gigabit (Gb) network. Technology created by Intel and 2CRSI (storage system, high-performance computing, and customized IT appliance manufacturer) is making it possible to utilize this bandwidth through the use of high performance data transfer nodes (DTNs). By 2017 the network will transfer data 1,000x faster than existing networks, enabling terabytes of data to be transferred within a few hours.1

Getting the Necessary Technology in Place

To transfer the data that is gathered from experiments in the lab, petabyte-sized workloads are moved from a variety of mass storage systems to 2U-24 NVMe* DTNs designed by 2CRSI. These devices are powered by Intel® Xeon® E5-2600 v4

processors that operate at up to 160W and contain up to 24 Intel PCIe*-based SSDs. The extremely low latency of Intel® SSDs allows the network to transfer data at speeds of 100-200 gigabits per second (Gbps).2 On the receiving end of the network there are similarly configured DTNs (Figure 2). Together, these DTNs serve as a buffer allowing the network to be saturated with a burst of data. Bursting the data over the network as fast as possible frees the network up for other uses and applications, but to do this effectively, DTN performance has to be similar on both ends of the network. This is where the Intel and 2CRSI DTN solution comes into play.

Intel® SSDs Help Power the World’s Biggest Supernetwork

Figure 1: Pacific Research Platform: UC, San Diego and UC, Berkeley lead creation of West Coast big data freeway system.

Page 2: Intel® SSDs and 2CRSI Help Power the World's Biggest ... · Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain

Intel® SSDs Help Power the World’s Biggest Supernetwork 2

The PCIe-based SSDs overcome SAS/SATA SSD performance limitations by optimizing hardware and software to take full advantage of PCIe/NVMe technology, providing extreme data throughput directly to the Intel® Xeon® processors with up to 6x faster data transfer speed compared to 6 Gbps SAS/SATA SSDs.3

“When transferring data across a network, [one challenge is] latency, and getting the CPU closer to the storage is the way to solve latency,” said John Graham, University of California, San Diego and Pacific Research Platform technical advisor. Focusing on high-performance storage, networking, and computing, French company 2CRSI builds the low-latency DTN devices the PRP is using in its network. In 2015, members of the PRP from University of California, San Diego approached 2CRSI about its 2U-24 NVMe drive servers. These were designed to handle 100 Gb network traffic and with the Intel® SSD Data Center Family for PCIe with NVMe, which provide improved scalability over traditional storage along with low latency, promised a very low annual failure rate. Equipped with Intel® Solid State Drive DC P3700 Series

and Intel Xeon processors, 2CRSI’s sample machines were provided to University of California, San Diego to test.

“The physical set up of the 2CRSI servers started in January,” Harry Parks, 2CRSI North American Sales Manager, explained, “Phase One was to test the storage side, to make sure that the storage was operating at the 100 Gbps line rate—which it did, beautifully.” The 2CRSI servers utilized Intel Xeon E5-2600 v4 chips running at up to 160 watts and storage was provided by the Intel® SSD DC P3700 Series. The SSDs provided up to 100-200 Gbps disk-to-disk performance between DTNs. These high-performance servers are able to load data into an all-flash storage array and then push them point-to-point across the network at previously unachievable speeds.

Science DMZ/DATA TRAnSFeR nODeS - RequiReMenTS 2016-2020

2cRSi inTel nVMe 12 DRiVe BenchMARk

Figure 2

Figure 3: Testing of 4 Intel® SSDs, 8 SSDs, and 12 SSDs completed at CalTech resulted in increased performance with the addition of drives.

Results achieved with Intel® SSDs

Page 3: Intel® SSDs and 2CRSI Help Power the World's Biggest ... · Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain

Intel® SSDs Help Power the World’s Biggest Supernetwork 3

“Phase Two was to do the line-rate testing over the long distance LAN. This was a little trickier because the LANs themselves aren’t always operating as they should. Usually it’s not an issue with the [servers] themselves but between various points on the LAN across the network,” Parks continued.

Scaling to the Demands of Science

Azher Mughal is a network engineer at Caltech where he manages the US Computing and Mathematical Sciences (CMS) Tier2 data center. He and his team have been involved in the PRP network design, and testing the scalability and performance of the DTNs. His tests have shown incredible performance gains as Intel SSD DC P3700 devices are added to the 2CRSI chassis: “With four drives, the chassis performs at between six and seven gigabytes (GBps) per second, but when you add four more drives, the performance doubles. With four more drives you’ll see another increase. As you add drives, the performance increases.” (See Figures 3 and 4)

Mughal tested various systems and found that others just didn’t scale like the 2CRSI did when additional drives were added. “To get this level of performance you need to do some system designing. Some servers just

don’t scale,” he said. “You can’t just buy any NVMe and pop it into a system and expect that it will give the same performance.” said Mughal.

In a high-performance server environment, heat is a formidable challenge. In the test environment Mughal reported a nearly 20% net heat-related performance reduction (see Figure 5). The 2CRSI 2U-24 NVMe’s design mitigates this by locating its fans in the front of the system which draw cool air across the drives first, then back through the system allowing the

NVMe drives to work at peak efficiency. With the efficient 2CRSI design, Mughal reported not needing additional fans.

“I’m a big fan of the Intel SSD DC P3700 drives. I have been using them for the past three years and so far all of the drives are running [as expected]. I’m not seeing any physical or technical issues with them at all,” Mughal said. “They are really rock-solid.” Even in the demanding test environment where the SSDs were moved from system to system, he hasn’t experienced any issues.

Partners and a Commitment to Sharing Information

While the technical challenges to building a high-performance data transfer system have proven to be formidable, another challenge to the program was cultural. Among the partners in the PRP project are various research institutions and universities that traditionally compete with each other for the government funding that enables them to continue their research. This competition has historically limited collaboration. To bridge this gap, Larry Smarr, Institute Director, Calit2 at California Institute for Telecommunications and Information Technology and leader of the PRP, met

Figure 4: Source - 2CRSI. HW Config: 2U Rackmount using Intel® Xeon® Processors E5-2600 v4, Intel® Server Board S2600WT with Dual 10 GbE, 12 Intel® SSD DC P3700 Series drives at 2TB each.

Figure 5

Page 4: Intel® SSDs and 2CRSI Help Power the World's Biggest ... · Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain

1. Comments made by the Platform Pacific Research Project, http://citris-uc.org/connected-communities/project/pacific-research-platform-uc-san-diego-uc-berkeley-lead-creation-of-west-coast-big-data-freeway-system/

2. Tests were run by Caltech. Intel does not control or audit third party benchmark data. 3. Results measured by Intel based on the following configurations. Tests document performance of components on a particular test, in specific systems. Differences

in hardware, software, or configuration will affect actual performance. Configurations: Performance claims obtained from data sheet, sequential read/write at 128k block size for PCIe/NVMe and SATA, 64k for SAS. Intel® SSD DC P3700 Series 2TB, SAS Ultrastar® SSD1600MM.

4. 2CRSI HW Configuration: 2U Rackmount using Intel® Xeon® processors E5-2600 v4, Intel® Server Board S2600WT with Dual 10 GbE, 12 Intel® SSD P3700 Series drives at 2TB each.

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies de-pending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, includ-ing the performance of that product when combined with other products.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. For copies of this document, documents that are referenced within, or other Intel literature, please contact your Intel representative.Copyright © 2016 Intel Corporation. All rights reserved. Intel, and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Printed in USA 1216/NH/CP/BB/RA Please Recycle 335205-001US

with the participating organizational leaders—the organizational CIOs, the heads of networking, and university presidents —to get them to sign letters of commitment and agree to share networking expertise and solutions to issues. Smarr was able to get highly-competitive researchers to work hand-in-hand and share the information and resources they needed for the supernetwork to be successful.

“There’s never been this transition area where the scientists and the networking teams talked. Now we’re nimble and very social,” said Graham. “If we have issues, we get on our Friday call and bang them out. This is really valuable to the smaller universities in the group like University of California, Merced and University of California, Riverside that don’t have the [resources] like University of California, Los Angeles or Stanford.” This collaboration and cooperation has helped increase the efficiency in problem solving and resolving bugs quickly and fostered deeper relationships and trust across disciplines and organizations. “As a community, we’re helping debug their internal networks—not just to their door, but all the way into their scientists’ offices. CENIC, ESnet, University of California, San Diego, Caltech—the whole group comes together. [Between all of us] we know the answers, it’s just that they’re stuck in a bunch of different peoples’ minds,” Graham said.

The Outcome and the Future

As the network continues to grow in efficiency and out of test mode to become a fully operational network, Graham looks to the not-too-distant future. “The most value that has come out of this project is the social network that formed around the data science pipeline. We’ve got people from NASA, we’ve got people from all over the world participating. The Pacific Research Platform has quickly morphed into the Global Research Platform which will probably get funded in the next year or so. We now have international groups participating—researchers all around the world want to [participate] because it’s the biggest 100 gigabit test network on the planet. And it’s beyond just a test network. It’s a production network. We can now do real science across this,” Graham

said. The openness that started by collaborating on the development of shared infrastructure could lead to deeper collaboration among scientists and researchers in solving the most challenging problems facing us today.

Parks continued, “Data is growing exponentially. The size is growing out of control. Right now we’re worrying about how to move terabytes. Very soon, within two or three years, we’ll be moving petabytes at a time. By 2020 it will be multiple petabytes. Intel and 2CRSI are helping create the architecture for these data-transfer machines that will eventually, 10 years from now or something, be moving exabyte-scale data.” To transfer data more quickly, the DTNs must have a higher capacity buffer which can be accomplished by increasing the capacity and number of SSDs.

Figure 6: Max throughput reached at 14 drives (7 drives per processor). A limitation due to combination of single PCIe x 16 bus (128Gbps), processor utilization and application overheads.4 Source - Caltech.