Huawei HPC Solution for the Education Industry Promotional Theme Slides
Huawei HPC Solution for the Education IndustryPromotional Theme Slides
1
Huawei Solution
Success Stories
2
3
Trends and Challenges1
Contents
2
The HPC market has been stably growing since 2009. It is estimated that the market potential will reach US$23.6 billion in 2014 and US$30.2 billion in 2017, with a compound annual growth rate (CAGR) of 8.3% over the next five years. Global HPC server requirements are primarily stemmed from government supercomputing, education and scientific research (over 40%), biological sciences, and CAE emulation.
Source: IDC Source: IDC
HPC Market Development Trend
HPC market potential
3
86%
6%
97% 88%
44%
CPU OS GUGPU Interconnect network
Intel x86
8%AMDX86
9Others
Linux
CPU
16%
40%
12%
IB
GE
OthersCPU+GPGPU
Others3%
Data source: TOP500
86%Cluster
MPP 14%
System architectureCluster, x86, Linux, and high-speed networks are dominant technology development trends, and the use of GPGPUs for acceleration is a new development trend.
HPC Technology Development Trend
4
Meteorological scienceSatellite mapping Energy exploration Aerospace
Basic science researchLife sciences Automotive electronics Molecular dynamics
HPC Applications
5
HPC Resource Requirements of Typical Applications
IT resources consumed vary depending on HPC applications. Configure resources for an HPC system based on its application scenario:• Computing-intensive applications require high-frequency CPUs.• Memory-intensive applications require high-bandwidth, large-capacity memory.• Network-intensive applications require IB networks.• I/O-intensive applications require high-bandwidth, large-capacity parallel storage systems.
Scalability
Network
Storage
Memory bandwidth
Memory capacity
Computational chemistry
Computational physicsMaterial science
Drug design
Biological information
Molecular dynamics
Environmental sciences
Fluid mechanics
HPC resource requirements are classified into the following types by hardware:
• CPU• Memory• Storage• Interconnect
Memory-intensive
Computing-intensiveI/O-
intensive
Network-intensive
Meeting diverse computing requirements and building a high-scalability HPC system
6
Challenges Facing HPC Development
Deployment and expansion are complex. It is difficult to reuse the existing infrastructure.
How to reduce the power consumption of high-performance computers has become a top-priority issue.
The electricity consumption of top computers equals the daily electricity consumption of a medium city. Only 167 of the top 500 computers are listed on Green Top 500.
"The previous supercomputing platform was constructed in several phases. System deployment and capacity expansion are complex and time-consuming."
— Director of the supercomputing center in a European university
Management is difficult. An intuitive management and process definition tool is required.
The snowballing of computing data volume for applications requires ever-increasingly high performance.
"Gene sequencing has complex procedures with many branches. A new computing task is time-consuming."
— Research staff from a Chinese biological gene company
"Our rendering services are growing rapidly, so we require more and more computing resources."
— Executive from an American media production company
Challenges
7
Huawei Solution
Success Stories
2
3
Trends and Challenges1
Contents
8
Huawei HPC Strategy
Uni
fied
man
agem
ent
Uni
fied
man
agem
ent
Uni
fied
man
agem
ent
Converged service management platform
Energy-saving products and solutions
All-in-one solution
Rapid deliveryRapid deliveryHigh energy
efficiencyHigh energy
efficiencyRapid deliveryHigh energy
efficiency
SSD card GPU
High I/O performance
High I/O performance
High I/O performance
Application integration
Service scheduling
Converged management
All-in-rack All-in-room Liquid cooling Energy-saving server
Huawei is dedicated to providing high performance and efficiency, easy-to-manage, high- scalability all-in-one HPC solutions.
9
Huawei HPC Solution Portfolio
Parallel file systemParallel environment System
deployment
Customized development
Backup and restoration
Compiling and development environment
Industry applications
Service platforms
CAE/CFD Life sciencesAerospaceAnimation rendering
Meteorological environment Physics & chemistry
GPGPURack server Cabinet storagePhi Rack
storageSolid-state
storage GE switchIB/10GE switchBlade server
Windows Linux
+
Computing Storage Network
Hardware resources
OS Computing environment
System environment
FusionCluster CHESS PBS Works Platform JH Scheduler Cloud renderingBright
Cluster management
Container data centerModular data center
Infrastructure
10
Computing cabinet
Storage cabinet
GPU cabinet
Network cabinetInput PDF Output
PDFUPS Fat node cabinet
Physical Structure of the HPC Solution
Compute node
Parallel file systemGE
switch
Login node
FDR IB switch
User
Management networkComputing networkStorage network
Management node
Login node
Login node
11
Computing Solution: E9000 Blade Server
• Simple---Full convergence of computing, storage, and network resources; rapid deployment (the deployment cycle is shortened from weeks to days)
• Efficient--- Improves computing density by 66%
• Cost-effective---Supports processors of threegenerations and network evolution in the next 10 years
Fatnode
Fatnode
Storage node
Computenode
Compute node
GPUnodeGPU
node
E9000CH121
CH140
CH220CH221 CH222
CH240
CH242
Switch modulesCX110 CX310 CX311CX116 CX317 CX610 CX911
GE switch module GE pass through module
10GE converged switch module
10GE/FCoE converged switch module
10GE pass though module
QDR/FDR switch module
10GE/FC multi-plane switch module
Uplink ports: 12 x GE and 4 x 10GEDownlink ports: 32 x GE
Uplink ports: 32 x GEDownlink ports: 32 x GE
Uplink ports: 16 x 10GEDownlink ports: 32 x 10GE
Uplink ports: 16 x 10GE and 8 x 8G FCDownlink ports: 32 x 10GE
Uplink ports: 32 x 10GEDownlink ports: 32 x 10GE
Uplink ports: 18 x QDR/FDR IBDownlink ports: 16 x QDR/FDR IB
Uplink ports: 16 x 10GE and 8 x 8G FCDownlink ports: 32 x 10GE and 16 x 8G FC
12
E9000 Features
• Computing7 types of compute nodes with E5-2600/E5-4600/E7-4800 CPUs
• StorageCost-effective distributed storage solution, ideal for big data applications
• Network9 types of switch modules: GE/10GE/FCoE/FC/IB QDR/FDR
• ManagementeSight provides unified management, improving management efficiency by 70%.
55Industry-leadingenergy efficiency
Industry-leadingenergy efficiency
Ranked No.1 in competitive tests for projects of China Mobile and China Unicom.Has passed the China Energy Conservation and Environmentally-Friendly Certification.
66 Simplified O&MSimplified O&M
Supports plug-and-play for boards and cards and requires no configuration for new devices.Supports automatic parameter configuration and transfer.
Industry-leadingreliabilityIndustry-leadingreliability
Stably operates at 40°C (104°F).Supports a failure rate of 15% lower than competitors' products.44
11Industry-leadingcomputing density
Industry-leadingcomputing density
Supports 64 Romley EP 130 W CPUs per chassis, with the highest computing density in the industry.Supports computing performance of up to 16.5 TFLOPS per chassis.
22Industry-leading storage density
Industry-leading storage density
Supports 15 x 2.5-inch hard disks per CH222 (full-width node).Supports 120 x 2.5-inch internal hard disks per chassis, with the largest internal storage capacity per chassis.
33Industry-leadingswitching capability
Industry-leadingswitching capability
Uses industry-leading Huawei DC switch technologies for E9000 switch modules.Supports backplane switching capability of 15.6 Tbit/s and network evolution from 10GE to 40GE and 100GE.Supports up to 128 x 10GE uplink ports per chassis.Supports various ports such as Ethernet, IB, and FC ports.
13
Storage Solution: Distributed, Parallel Storage SystemSolution 2: SAN + parallel file system (Lustre)Solution 2: SAN + parallel file system (Lustre)Solution 1: Distributed NASSolution 1: Distributed NAS
OSSMDS OSS
T series storage
SAN
…
IB/10GE
…Compute node cluster
I/O node cluster
Lustre parallel distributed file system
Lustre parallel distributed file system
…
• Excellent performance: T series unified storage and Lustre file system; up to 96 GB/s bandwidth per cabinet
• High reliability: rapid rebuilding with the RAID 2.0+ virtualization technology; end-to-end reliability design
• Elastic scalability: scalable to 16 controllers; capacity expansion and performance improvement based on service growth
• Excellent performance: 800 MB/s bandwidth per node, up to 200 GB/s bandwidth per system (288 nodes), linear performance enhancement
• Large capacity: 40 PB capacity per file system• High reliability: N+M data protection mechanism, tolerating the failure
of up to four nodes• Elastic scalability: on-demand deployment and expansion; node
addition within 1 minute
• Excellent performance: 800 MB/s bandwidth per node, up to 200 GB/s bandwidth per system (288 nodes), linear performance enhancement
• Large capacity: 40 PB capacity per file system• High reliability: N+M data protection mechanism, tolerating the failure
of up to four nodes• Elastic scalability: on-demand deployment and expansion; node
addition within 1 minute
FC
IP/IB
Compute node cluster
OceanStor 9000 (storage node cluster)
Distributed file systemDistributed file system
IB/10GE/GE
… …
… …
14
SolutionSolution HighlightsHighlights
• High performance: 600 MB/s bandwidth per node and 200 GB/s total bandwidth
• Large capacity: linear expansion of 3 to 288 nodes, with up to 40 PB storage capacity per file system.
• Easy to use: Is easy to manage and integrate with the legacy network since the storage system supports both IB and 10GE networks.
• On-demand allocation: dynamically allocates storage resources to different computing clusters.
• High performance: 600 MB/s bandwidth per node and 200 GB/s total bandwidth
• Large capacity: linear expansion of 3 to 288 nodes, with up to 40 PB storage capacity per file system.
• Easy to use: Is easy to manage and integrate with the legacy network since the storage system supports both IB and 10GE networks.
• On-demand allocation: dynamically allocates storage resources to different computing clusters.
Key TechnologiesKey Technologies• A fully symmetrical distributed architecture without any individual
metadata node eliminates the performance bottleneck. • Up to 55 TB global cache effectively improves performance and
shortens response time.• Automated storage tiering and intelligent load balancing maximize
resource utilization.• Space quota management implements flexible space allocation.• The index search technology rapidly retrieves and analyzes
numerous files.
• A fully symmetrical distributed architecture without any individual metadata node eliminates the performance bottleneck.
• Up to 55 TB global cache effectively improves performance and shortens response time.
• Automated storage tiering and intelligent load balancing maximize resource utilization.
• Space quota management implements flexible space allocation.• The index search technology rapidly retrieves and analyzes
numerous files.
Compute head node
Compute node queue 1
Compute node queue 2
Compute node queue 3
…
Core switch(10GE or IB switch) GE switch
Data collection
User access and query
10GE or IB
GE
…
OceanStor 9000
10GE or IB switch
Large-capacity storage resource pool
High-performance storage resource pool
OceanStor 9000 Features
15
Network Solution: External Switches
CE12812 CE12808 CE12804
CE5850-48T4S2Q-EI
CE6850-48S4Q-EI
CE6850-48T4Q-EI
Flagship core switches High-performance TOR switches
CE5810-24T4S-EI
CE5810-48T4S-EICE12816
10G
E n
etw
ork
IB n
etw
ork
16
12800 Switch Features
Elastic cloud engine• High-speed line cards: 12 x 100GE and
24 x 40GE• 64 TB/s total capacity and 4 TB/s
bandwidth per slot• Connected to up to 18,000+ 10GE
servers• 18 GB cache for handling surging traffic
Agile cloud engine• OPS provides open APIs to program networks.• ENP is supported for in-depth programming for
the forwarding plane.• nCenter supports agile VM deployment.• ZTP uses Python to implement zero-touch
network deployment and configuration.
Virtual cloud engine• VS: 1-to-16 virtualization (industry-leading)
and core multiplexing• CSS/SVF: many-to-one virtualization for
simplified O&M• TRILL: large L2 network for easy VM
migration• EVN: cross-DC network virtualization
Quality cloud engine• Industry-leading orthogonal architecture• Patented front-to-rear air channels and
separation of hot and cold air channels• Low latency (2 us) and efficient forwarding• Hot standby of five hardware systems
Core switches
CE12812 CE12808 CE12804CE12816
10GE TOR switches
CE6850-48S4Q-EI
CE6850-48T4Q-EI
CE5850-48T4S2Q-EI
CE5810-24T4S-EI
CE5810-48T4S-EI
GE TOR switches
CE5850-48T4S2Q-HI
40GE switch
CE7850-32Q-EI
CE6810-48S4Q-EI
Accelerates agile innovation of cloud services.
Supports four generations of servers in a 10-year lifecycle.
Improves ICT resource utilization.
Provides high-value data center services.
17
Typical 10GE Network Solution
Thin nodes: 32 CH140s
Thin nodes: 16 CH121s
GPU nodes:8 CH221s
Fat nodes:8 CH240s
OceanStor 9000Frontend IB
Management/Login/I/O node: CH121
System management network S5748
Hardware management networkS5748
10GE switch
Computing networkSystem management network
Storage nodes:8 CH222s
Internal switch CX310 orpass through module CX317
• The computing network and data network are integrated and 10GE networking is adopted.
• Internal pass through or 10GE switches and external 10GE switches constitute a high-speed non-blocking network.
Internal switch CX310 orpass through module CX317N9000 internal
frontend 10GE interface module
18
Infrastructure Solution: Modular Equipment Room
Application scenario: small- and medium-sized enterprisesDeployment scale: 6-28 server cabinets
Application scenario: medium- and large-sized enterprisesDeployment scale: 28-1000+ server cabinets
Application scenario: small-sized enterprisesDeployment scale: 1-5 server cabinets
Three core modules + two auxiliary systems
Higher power density Up to 21 kW/cabinet
Energy-saving data center PUE: 1.25-1.6 (optional)
Simplified construction, rapid deployment
Standard construction: only 8-12 weeks
Replicable, modular design and expansion
Investment upon service growth
Refined management Maximizing ROIGreen operation
LoadLoadITLoad
Load LoadLoadIT
Load
ITLoad
ITLoad
Lower construction cost Reduces initial investment by 25%
Mini (< 20 m2)Mini (< 20 m2) Small (30-100 m2)Small (30-100 m2) Medium and large (100-2000+ m2)Medium and large (100-2000+ m2)
19
Management system
Cooling systemCabinet systemPower
supply and distribution system
End door
Skylight
Cabling system
Fire extinguishing system
The equipment room supports single-row or dual-row, open or contained, hot or cold aisles. A single module supports up to 36 IT cabinets, requiring a minimum height of 2.6 m.
Maximum nameplate power per cabinet: 21 kW; maximum nameplate power per module: 200 kW.
Adaption to various scenarios, maximum space utilizationAdaption to various scenarios, maximum space utilization
Onsite installation can be completed within 1 week, improving deployment efficiency by 50%.
All-in-room designAll-in-room design
Component-level modular architecture allows on-demand, elastic expansion, cutting down initial investment by 30%.
In-row air conditioners, modular UPSs, aisle containment, and integrated PDFs decline PUE to less than 1.5.
Modular architecture and high energy efficiencyModular architecture and high energy efficiency
The NetEco intelligent management system implements real-time monitoring.
Intelligent management system for real-time monitoringIntelligent management system for real-time monitoring
Modular design, flexible combination, and on-demand deployment
Modular Equipment Room Features
20
Overall Software Architecture
Modular design, high scalability, and ease of upgrades
Web Service Interface
Web Service Interface
Web portal/CLIWeb portal/CLIHTTP/SSH
HTTPJo
b sc
hedu
ling
Clu
ster
m
anag
emen
t
Clu
ster
m
onito
ring
Alar
m
man
agem
ent
Pow
er
cons
umpt
ion
man
agem
ent
Use
r m
anag
emen
t
Clu
ster
sec
urity
Inst
alla
tion
Para
llel l
ibra
ry
Mat
h lib
rary
Com
pile
r
Com
mis
sion
ing
tool
SLES 11.x CentOS 5.x/6.x
Biological pharmacy CAS emulation Animation renderingOil exploration
Pre-integrated industry applications and centralized scheduling managementSimplified maintenance
RHEL 5.x/6.x
21
Cluster Management Software: BCM Features
• Easy to manage: supports rapid installation within 1 hour, provides a GUI for management and monitoring, and integrates with various parallel application environments.• Easy to expand: uses a single lightweight management process to support over 1000 nodes per cluster and integrated management of multiple clusters.• Comprehensive functions: provides all functions required by users or administrators, and supports GPU management, job scheduling integration, and cloud computing.
22
Typical Solution 1: All-in-Chassis
Chassis Component Product Model Quantity
Chassis E9000 1
Management node & compute node CH140 8
GPGPU node CH221 2
Fat node CH240 1
Storage node CH222 1
• All-in-chassis solution: high convergence of high-density computing, large-capacity storage, and high-bandwidth network sources
• Main features: computing capability: 10.6 TFLOPS; storage capacity: 13.5 TB; maximum power consumption: 10 kW
• Main scenario: small-scale parallel computing for key labs of specific majors in universities
23
Typical Solution 2: All-in-Rack
Component Product Model Quantity
Rack Standard rack 1
Computing deviceE9000 2
CH140 32
Management/Monitoring/Storage server RH1288 V2
Total: 6Management: 1Monitoring: 1Storage: 4
Storage device S2600T 1
• Main features: computing capability: 33 TFLOPS; storage capacity: 144 TB; maximum power consumption: 25 kW
• Main scenario: medium-scale parallel computing for specific majors of colleges in universities
• All-in-rack HPC solution: high integration of infrastructure and IT devices in a rack, and plug-and-play
Rack
24
Typical Solution 3: All-in-RoomMedium-sized modular equipment room
22 42U standard cabinets, scalable to 28 cabinets Dual-row contained cold aisles, with the minimum power usage
efficiency (PUE) of 1.3 Total IT nameplate power consumption of an equipment room: 133 kW
Input PDFOutput PDF Battery
In-row air conditioner
Computing cabinet
In-row air conditioner
Storage cabinet
In-row air conditioner
In-row air conditioner
In-row air conditioner
Computing cabinet
GPU cabinet
Fat node cabinet
Network cabinet
Contained cold aisle
• All-in-room solution: modular design for rapid construction, on-demand deployment, high energy efficiency, and smart management• Main features: computing capability: 187 TFLOPS @ 322 nodes (6 computing cabinet, 1 GPU cabinet, and 1 fat node cabinet); storage capacity: 1
PB; bandwidth: over 10 GB/s (16 OceanStor 9000 nodes in 2 cabinets)• Main scenario: large-scale parallel computing for comprehensive majors of the campus computer center
In-row air conditioner
UPS
6600 mm
Computing cabinet
3600
mm
25
Huawei HPC Solution Highlights
Computing capability per chassis:
16.5 TFLOPS
Computing capability per chassis:
16.5 TFLOPSThroughput:
200 GB/sThroughput:
200 GB/sHigh-speed interconnect:
IB FDR
High-speed interconnect:
IB FDR
Unified management: Evolution to the cloud platform
Unified management: Evolution to the cloud platformC
ompu
ting
Com
putin
g
Stor
age
Stor
age
Net
wor
kN
etw
ork
Man
agem
ent
Man
agem
ent
Advanced converged architecture to provide optimal
performance
Board-level to system-level energy conversation,
reducing power consumption by 40%
Management GUI, improving management efficiency by
50%
Modular design for on-demand, elastic expansion
26
Huawei Solution
Success Stories
2
3
Trends and Challenges1
Contents
27
• University of Nebraska-Lincoln
• University of Tennessee, Knoxville
• Newcastle University in U.K. (Phases I and II)• Universidad De Burgos in Spain• Deltares Institute in the Netherlands• Utility Association Eilenburg-Wurzen in German
• School of Computer Science and Engineering, Beihang University
• Beijing Jiaotong University• Beijing Post and Telecommunications
Research Institute• Capital Medical University• Institute of Disaster Prevention Science and
Technology in Hebei• School of Psychology, Southwest University
• Istanbul Technical University in Turkey
• Yildiz Technical University• Turkish Academic Network and
Information Center• Mackenzie University in Brazil
Extensive Deployment and Service Experience for HPC Solutions (Education Industry)
28
ULAKBIM: HPC PlatformChallenges
• With the explosive increase of computing data, the existing HPC platform could not address user requirements.
• A low network bandwidth decreased computing efficiency and deteriorated service quality.
• The previous data center had poor heat dissipation and power supply capabilities and could not be easily expanded.
Huawei Solution• The Huawei RH1288 V2 rack server expanded the capacity of the legacy
HPC platform.• Compute nodes were interconnected over a 56G IB network to set up a
non-blocking, high-speed computing network.• The legacy heat dissipation and power supply resources were used for
easier capacity expansion.
Customer Benefits• Improves computing capabilities by four times and boosts service capabilit
ies, addressing ever-increasing user demand.• Improves computing network capabilities, enhancing computing and user
efficiency.• Fully utilizes existing resources, minimizing capacity expansion costs.
29
• Huawei RH2288 servers were used as compute nodes, providing peak data processing capability of 20 TFLOPS.
• Huawei's OceanStor was deployed as the hierarchical storage system, with a storage capacity of up to 500 TB.
• The Huawei all-in-one HPC solution supported modular deployment, meeting requirements for flexible capacity expansion.
Huawei SolutionHuawei Solution
• Computing data has increased rapidly, and the institute's current computing capability could not meet requirements.
• The current data access speed was so slow that the overall computing performance was negatively affected.
• Computing and storage requirements were increasing constantly, and the system capacity was difficult to expand.
ChallengesChallenges
• Improves overall performance by 80%.• Increases data access speed by 70%.• Meets capacity expansion requirements for the next 10 years.
Customer BenefitsCustomer Benefits
Institute of Disaster Prevention for China's State Seismological Bureau: HPC Solution
"Huawei provides us with a complete end-to-end HPC solution, which features high cost-effectiveness, low power consumption, and high scalability. This solution meets our requirements for real-time acquisition of, rapid access to, and large-scale computing of earthquake warning sign monitoring data."
— Institute of Disaster Prevention
30
China's National Meteorological Information Center: Integrated, Multi-Service R&D Platform
Challenges• Long system construction cycle and complex management• Large IT equipment footprint and high O&M management costs• Low IT equipment utilization rate and high TCO
Huawei Solution• A cloud platform was built to satisfy the requirements of multiple business
departments in diverse application scenarios. • Huawei E9000 converged infrastructure blade servers integrated
computing, storage, and network resources. • The Huawei FusionSphere centrally monitored and scheduled resources,
accelerated deployment, and enabled resource sharing.
Customer Benefits• Simplifies IT construction and management and accelerates IT deployment,
which helps the customer focus on core services.• Reduces the IT equipment footprint by 60% and overall power consumption
by 40%.• Enables resource sharing to improve the resource utilization rate and cut
down IT costs by 50%.
31
Newcastle University in U.K.: HPC System
Challenges
Customer Benefits
Huawei Solution
• Memory capacity of a single compute node: > 512 GB; overall computing performance: > 88 TFLOPS
• I/O bandwidth: > 1 GB/s• Multi-phase deployment and easy system expansion
Bioinformatics Medicine R&D
StorageComputing
Network
Huawei HPC solution
• Rapid deployment: Integrated computing, network, and storage resources enable service deployment time to be reduced to three days.
• Simplified management: Unified management minimizes maintenance workloads.
• Easy capacity expansion: The expansion capability of Huawei's HPC solution will reduce expansion costs and meet expansion requirements for the next five years.
• Huawei provided an HPC solution that incorporated the E9000 blade server, S5500T storage device, full 10GE high-speed network, Huawei FusionCluster, and Lustre high-performance parallel storage.
• The all-in-one HPC modular solution will meet future expansion requirements.
"We are impressed by Huawei's HPC solution that integrates computing, storage, and network resources. Huawei's HPC solution provides us with an easy-to-expand and cost-effective HPC cluster. This is the solution we need."— Jason Bain, Deputy Dean of the IT Infrastructure and Architecture Department, Newcastle University
Molecular dynamicssimulation
32
Huawei SolutionHuawei Solution
• The previous standalone serial computing mode had poor computing capabilities. Simulated computing was time-consuming and complex computing tasks could not be executed, which hindered the research progress of the university.
• Computing and storage resources could not be shared, which resulted in a low resource utilization rate.
ChallengesChallenges
Customer BenefitsCustomer Benefits
Turkey YTU: HPC Platform
Yildiz Technical University (YTU), founded in 1911, is one of the sevenpublic universities in Istanbul and is also one of the oldest and mostprominent universities in Istanbul. Its predecessor is Yildiz Universityfounded in 1882. YTU is dedicated to engineering sciences, and has threecampuses and 17000+ students nowadays. YTU wanted to deploy an HPCplatform to improve its abilities in the scientific research and to providevarious HPC resources for enterprises in its science park.
• 256 RH2288 V2 2-socket rack servers were deployed as computing servers to provide the maximum computing capability of 90 TFLOPS. 10 RH5885H V3 servers and 40 NVDIA K20X GPGPUs were used as acceleration nodes to provide the maximum computing capability of 57.5 TFLOPS.
• One OceanStor18000 was deployed as the storage system, which provided 300 TB storage capacity. Six RH2288 V2 servers were deployed to run the commercial Lustre software, which provided a high-performance parallel file system.
• The QDR 40GE network was used to ensure high-speed data communication.
• Huawei's HPC platform provides superb computing performance to improve scientific research efficiency by 80%.
• End-to-end provisioning and unified management reduce maintenance cost by 30%.
33
• An HPC system was established by using 800-core RH1288 rack servers and 10GE switches to replace serial computing with parallel computing.
• A high-end storage solution with the remote disaster recovery (DR) capability was deployed to fulfill performance and reliability requirements. In this solution, the four-controller OceanStor 18800 was deployed at the primary site in Istanbul, and the dual-controller OceanStor 18800 was deployed at the DR site in Ankara. Data could be remotely duplicated between the sites, and hierarchical storage enhanced system performance.
• Huawei provided the high-quality local factory service.
Huawei SolutionHuawei Solution
• The previous serial computing could not satisfy ever-increasing service demand.
• The previous storage products had poor reliability, performance, and scalability, which resulted in severe data loss.
• The previous factory service was not responsive.
ChallengesChallenges
• Parallel computing improves computing performance by 8 times.• High performance and scalability meet storage requirements of 200,000 to
2,500,000 users, and support expansion to 16 controllers to meet service requirements for the next five years.
• The one-stop solution with high data reliability significantly reduces TCO.
Customer BenefitsCustomer Benefits
Istanbul Technical University in Turkey: HPC Platform
Istanbul Technical University (ITU) in Turkey, founded in 1773, is one of the world's oldest universities dedicated to engineering sciences and one of the most prominent educational institutions in Turkey. This university has 25,000 students and 2200 teachers (including 430 professors). It also has five campuses, 12 colleges, five research institutes, and two training institutes in Istanbul. The university studied the agricultural IT program for the Turkey's Ministry of Agriculture. In this program, the university needed to collect and analyze temperature, humidity, and geographical information in different regions using satellites, provide various agricultural indicators for farmers, and generate reports and predictive information based on the collected massive information using an HPC platform. To accomplish this program, the university needed an efficient computing and storage platform with good scalability to meet ever-increasing service demand.
34
• Provided the N8300 cluster NAS that consisted of 10 nodes.• Used the active-active architecture to support parallel access of compute
nodes and implement system load balancing; adopted redundancy design to enhance system reliability.
• Deployed two S5600T storage units to provide 1 PB available space.
Huawei SolutionHuawei Solution
• The bandwidth of the existing storage system could not meet HPC system requirements as compute nodes and computing tasks increased constantly.
• 4 GB/s bandwidth was required to support 64 compute nodes and 256 processes.
• The existing file system capacity could not meet HPC application requirements.
ChallengesChallenges
• Provides up to 4 GB/s bandwidth and 1 PB capacity to satisfy HPC application requirements.
• Supports future online performance improvement and capacity expansion.• Provides non-stop services around the clock.
Customer BenefitsCustomer Benefits
University of Tennessee, Knoxville: HPC Platform Storage System
The University of Tennessee, Knoxville (UTK), founded in 1794, is a public university funded by the government. This university has comprehensive, high-performance. In its 2012 ranking of universities, U.S. News & World Report ranked UTK 101st among all national universities and 46th among public institutions of higher learning. It covers an area of 3500 hectares, provides 300+ degree programs, has 1400+ employees, and hosts 28,000+ students from 100+ countries around the world.
35
"We are impressed by Huawei's one-stop solution. It helps resolve various difficulties arising from IT construction. This cost-effective solution not only protects our old buildings but also meets our service demand. This is the solution that we need."--- Brito Pereira, director of Mackenzie University's information center
• PB-level storage capacity needed to be provided for Mackenzie Television to store videos of 2 million hours.
• A unified data storage service needed to be provided for over 30 scientific research and administrative departments to eliminate information silos.
• A container data center needed to be built to protect old buildings.
ChallengesChallenges
• Huawei provided a one-stop solution that consisted of the Huawei IDS1000 container data center and Huawei N8000 cluster NAS storage devices.
• Two N8000 devices were deployed. One was used as the video and image storage system, and the other was used as the file sharing center.
Huawei SolutionHuawei Solution
• The one-stop solution reduces the initial investment by 40%.• The container data center minimizes equipment footprint, is easy to deploy, and
protects old buildings.• The PUE value decreases to 1.55 and the overall power consumption decreases by
30% compared with the previous data center.• Centralized data storage and unified management cut down TCO by over 20%
compared with the previous storage system.
Customer BenefitsCustomer Benefits
Mackenzie University in Brazil: Container Data Center
Copyright © 2014 Huawei Technologies Co., Ltd. All Rights Reserved.The information in this document may contain predictive statements including, without limitation, statements regarding the future financial and operating results, future productportfolio, new technology, etc. There are a number of factors that could cause actual results and developments to differ materially from those expressed or implied in the predictivestatements. Therefore, such information is provided for reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the information at any timewithout notice.
HUAWEI ENTERPRISE ICT SOLUTIONS A BETTER WAY