The New Era of Coprocessor in Supercomputing - 并行计算中协处理应用的新时代-. Marc XAB, M.A. - 桜美林大学大学院 Country Manager. 5/07/2013 @ BAH! Oil & Gas - Rio de Janeiro, Brazil. Super Micro Computer Inc. Rua Funchal , 418. Sao Paulo – SP www.supermicro.com/brazil. Networking in Rio. Company Overview. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Revenues: FY10 $721 M FY11 $942 M FY12 $1BGlobal Footprint: >70 Countries, 700 customers, 6800 SKUsProduction: US, EU and Asia Production facilities Engineering: 70% of workforce in engineering, SSI Member
Market Share: #1 Server Channel Corporate Focus: Leader Energy Efficient, HPC & Application-Optimized Systems
San Jose (Headquarter)
Fortune 2012 100 Fastest-Growing Companies
COPROCESSOR (协处理器 ) A coprocessor is a computer processor used to supplement
the functions of the primary processor (the CPU).
Operations performed by the coprocessor may be floating point arithmetic, graphics, signal processing, string processing, encryption or I/O Interfacing with peripheral devices. Math coprocessor – a computer chip that handles the floating
point operations and mathematical computations in a computer.
Graphics Processing Unit (GPU) – a separate card that handles graphics rendering and can improve performance in graphics intensive applications, like games.
Secure crypto-processor – a dedicated computer on a chip or microprocessor for carrying out cryptographic operations, embedded in a packaging with multiple physical security measures, which give it a degree of tamper resistance
“Submerged Supermicro Servers Accelerated by GPUs”
Supermicro 1U (Single CPU) with two coprocessors No requirement for room-level cooling Operates at PUE ~ 1.12 25 kilowatts per rack – the breakpoint per rack
(between regular air-cool and submerged cool)
Case Study – Submerged Liquid Cooling
Cost Efficiency
Air cool
Submerged liquid cool
KW / rack
~25kW
Removed Fans and Heat Sinks Use SSD & Updated BIOS Reverse the handlers
Tesla: 2-3x Faster Every 2 Years16
2
4
6
8
10
12
14
DP
GFL
OPS
per
Wat
t
2008
2010
2012
2014
T10 Fermi
Kepler
Maxwell
512 cores
Thousands of core
GPU Supercomputer Momentum
0
10
20
30
40
50
60
Tesla Fermi Launched
2008 2009 2010 2011 2012 2013
June 2012 Top500
# of GPU Accelerated Systems on Top500 52
First Double Precision GPU
4x
Case Study – PNNL
Expects supercomputer to rank in world's top 20 fastest machines.
Research for climate and environmental science, chemical processes, biology-based fuels that can replace fossil fuels, new materials for energy applications, etc.
Supermicro FatTwin™with 2x MIC 5110P per node
Theoretical peak processing speed of
3.4 petaflops
42 racks / 195,840 cores
1440 compute nodes with conventional
processors and Intel Xeon Phi "MIC"
accelerators
128 GB memory per node
FDR Infiniband network
2.7 petabyte shared parallel file system
(60 gigabytes per second read/write)
Case Study – PNNL
Supermicro FatTwin™with 2x MIC 5110P per node
Programing Paradigm
The Xeon Phi programming model and its optimization are shared across the Intel Xeon
CUDA (Compute Unified Device Architecture) – a parallel computing platform and programming model. CUDA provides developers access to the virtual instruction set and memory of the parallel computational elements in CUDA GPUs.
The model used by existing CPU-GPU Heterogeneous architectures for GPU-GPU communication. Data travels via CPU & Infiniband (IB) Host Channel Adapter (HCA) and Switch or other proprietary interconnect
Data transfer between cooperating GPUs in separate nodes in a TCA cluster enabled by the PEACH2 chip.
Schematic of the PEARL network within a CPU/GPU cluster