Top Banner
HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director Center for Research in Extreme Scale Technologies (CREST) Pervasive Technology Institute at Indiana University Fellow, Sandia National Laboratories CSRI June 20, 2012
47

HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

Dec 15, 2015

Download

Documents

Kasey Patten
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

HPC Achievement and Impact – 2012a personal perspective

Thomas Sterling

Professor of Informatics and Computing

Chief Scientist and Associate DirectorCenter for Research in Extreme Scale Technologies (CREST)

Pervasive Technology Institute at Indiana University

Fellow, Sandia National Laboratories CSRI

June 20, 2012

Page 2: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

2

HPC Year in Review

• A continuing tradition at ISC– (9th year, and still going at it)

• As always, a personal perspective – how I’ve seen it

– Highlights – the big picture• But not all the nitty details, sorry

– Necessarily biased, but not intentionally so

– Iron-oriented, but software too– Trends and implications for the future

• And a continuing predictor: The Canonical HEC Computer

• Previous Years’ Themes: – 2004: “Constructive Continuity”– 2005: “High Density Computing”– 2006: “Multicore to Petaflops”– 2007: “Multicore: the Next Moore’s

Law”– 2008: “Run-up to Petaflops”– 2009: “Year 1 A. P. (After Petaflops)“– 2010: “Igniting Exaflops”– 2011: “Petaflops: the New Norm”

• This Year’s Theme: ?

Page 3: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

3

Trends in Highlight - 2012• International across the high end• Reversal of power growth – green screams!• GPU increases in HPC community

– Everybody is doing software (apps or systems) for them

• But … resurgence of multi-core– IBM BG– Intel MIC (oops I mean Xeon Phi)

• Exaflops takes hold of international planning• Big data moves to a driving command• Commodity Linux clusters still the norm among the

common computer

Page 4: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

4

HPC Year in Review

• A continuing tradition at ISC– (9th year, and still going at it)

• As always, a personal perspective – how I’ve seen it

– Highlights – the big picture• But not all the nitty details, sorry

– Necessarily biased, but not intentionally so

– Iron-oriented, but software too– Trends and implications for the future

• And a continuing predictor: The Canonical HEC Computer

• Previous Years’ Themes: – 2004: “Constructive Continuity”– 2005: “High Density Computing”– 2006: “Multicore to Petaflops”– 2007: “Multicore: the Next Moore’s

Law”– 2008: “Run-up to Petaflops”– 2009: “Year 1 A. P. (After Petaflops)“– 2010: “Igniting Exaflops”– 2011: “Petaflops: the New Norm”

• This Year’s Theme: – 2012: “The Million Core Computer”

Page 5: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

I. ADVANCEMENTS IN PROCESSOR ARCHITECTURES

5

Page 6: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

Processors: Intel

• Intel’s 22nm processors: Ivy Bridge die shrink, Sandy Bridge microarchitecture, based on tri-gate (3D) transistors.

• Die size of 160 mm2 with 1.4 billion transistors

• Intel Xeon Processor E7-8870:• 10 cores, 20 threads• 112 GFLOPS max turbo single core• 2.4 GHz• 30 MB cache• 130 W max TDP• 32 nm• Supports up to 4 TB of DDR3 memory

• Next generation 14nm Intel processors are planned for 2013

6

http://download.intel.com/newsroom/kits/22nm/pdfs/22nm-Details_Presentation.pdfhttp://www.intel.com/content/www/us/en/processor-comparison/processor-specifications.html?proc=53580

http://download.intel.com/support/processors/xeon/sb/xeon_E7-8800.pdf

Page 7: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

Processors: AMD

• AMD’s 32nm processors: Trinity• Die size of 246 mm2 with 1.3 billion transistors

• AMD Opteron 6200 Series:• 16 cores• 2.7 GHz• 32 MB cache• 140 W max TDP• 32 nm• 239.1 GFLOPS 2 x AMD Opteron 6276 processors Linpack benchmark(2P)

7

http://www.anandtech.com/show/5831/amd-trinity-review-a10-4600m-a-new-hope

http://www.amd.com/us/Documents/Opteron_6000_QRG.pdf

Page 8: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

Processors: IBM

8

• IBM Power7- Released in 2010- The main processor in the PERCS

• Power7 Processor- 45 nm SOI Process, 567 mm2, 1.2 billion transistors- 3.0-4.25 GHz clock speed, 4 chips per quad module,- 4,6,8 cores per chip, 64 kB L1, 256 kB L2, 32 MB L3 cache / core- Max 33.12 GFLOPS per core, 264.96 GFLOPS per chip

• IBM Power8• Successor to Power7 currently under development. Anticipated

imporved SMT, reliability, larger cache size, more cores. Will be built on 22nm process.

http://www-03.ibm.com/systems/power/hardware/710/specs.htmlhttp://arstechnica.com/gadgets/2009/09/ibms-8-core-power7-twice-the-muscle-half-the-transistors/

Page 9: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

Processors: Fujitsu SPARC IXfx

• Introduced in November 2011• Used in the PRIMEHPC FX10 (an HPC system that is a follow up to

Fujitsu’s K computer)• Architecture:

• SPARC v9 ISA extended for HPC with increased amounts of registers• 16 cores• 236.5 GFLOPS peak performance• 12 MB shared L2 cache• 1.85 GHz• 115 W TDP

9

http://img.jp.fujitsu.com/downloads/jp/jhpc/primehpc/primehpc-fx10-catalog-en.pdf

Page 10: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

Many Integrated Core (MIC) Architecture – Xeon Phi• Multiprocessor architecture

• Prototype products (named Knights Ferry) were announced and released in 2010 to European Organization for Nuclear Research, Korea Institute of Science and Technology Information, and Leibniz Supercomputing Center among others

• Knights Ferry prototype:• 45 nm process• 32 in order cores with up to 1.2 GHz with 4 threads per core• 2 GB GDDR5 memory, 8MB L2 cache• Single board performance exceeds 750 GFLOPS

• Commercial release (codenamed Knights Corner) to be built on a 22nm process is proposed for release in 2012-2013

• Texas Advanced Computing Center (TACC) will use Knights Corner cards in 10 PetaFLOPS “Stampede” supercomputer

10

http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.htmlhttp://inside.hlrs.de/htm/Edition_02_10/article_14.html

Page 11: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

Processors: ShenWei SW1600

• Third generation CPU by Jiāngnán Computing Research Lab• Specs:

• Frequency -- 1.1 GHz• 140 GFLOPS• 16 core RISC architecture• Up to 8TB virtual memory and 1TB physical memory

supported• L1 cache: 8KB, L2 cache: 96KB• 128-bit system bus

11

http://laotsao.wordpress.com/2011/10/29/sunway-bluelight-mpp-%E7%A5%9E%E5%A8%81%E8%93%9D%E5%85%89/

Page 12: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

Processors: ARM

• Industry leading provider of 32-bit embedded microprocessors

• Wide portfolio of more than 20 processors is available

• Applications processors are capable of executing a wide range of OSs: Linux, Android/Chrome, Microsoft Windows CE, Symbian and others

• NVIDIA: The CUDA on ARM Development Kit• A high performance energy efficient kit featuringNVIDIA Tegra 3 Quad-Core ARM A9 CPU• CPU Memory: 2 GB• 270 GFLOPS Single Precision Performance

12

Page 13: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

II. ADVANCES IN GPUs

13

Page 14: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

GPUs: NVIDIA

14

• Current: Kepler Architecture – GK110• 7.1 billion transistors• 192 CUDA cores• 32 Special Function Units• 32 Load/Store Units• More than 1 TFLOPS

• Upcoming: Maxwell Architecture – 2013• ARM instruction set• 20 nm Fab process ??? (expected)• 14-16 GFLOPS in double precision per watt

• HPC systems with NVIDIA GPUs:• Tianhe-1 (2nd on TOP 500): 7,168 Nvidia Tesla

M2050 general purpose GPUs• Titan (currently 3rd on TOP500): 960 NVIDIA GPUs• Nebulae (4th on TOP500): 4640 NVIDIA Tesla

C2050 GPUs• Tsubame 2.0 (5th on TOP500): 4224 NVIDIA Tesla

M2050; 34 NVIDIA Tesla S1070;

http://www.xbitlabs.com/news/cpu/display/20110119204601_Nvidia_Maxwell_Graphics_Processors_to_Have_Integrated_ARM_General_Purpose_Cores.html

Page 15: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

GPUs: AMD

• Current: AMD Radeon HD 7970 (Southern Islands HD7xxx series)• Released in January 2012• 28 nm Fab process• 352 mm2 die size with 4.3 billion transistors• Up to 925 MHz engine clock• 947 GFLOPS double precision compute power• 230 W TDP

• Latest AMD architecture – Graphic Core Next (GCN):• 28 nm GPU architecture• Designed both for graphics and general computing• 32 compute nodes (1,048 stream processors)• Handles workloads of the processor

15

Page 16: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

GPU Programming Models

• Current: Cuda 4.1• Share GPUs across multiple threads• Unified Virtual Addressing• Use all GPUs from a single host thread• Peer-to-Peer communication

• Coming in Cuda 5• Direct communication between GPUs and other PCI devices• Easily acceleratable parallel nested loops starting with Tesla K20 Kepler GPU

• Current: OpenCL 1.2• Open royalty-free standard for cross-platform parallel computing• Latest version released in November 2011• Host-thread safety, enabling OpenCL commands to be enqued from multiple

host threads• Improved OpenGL interoperability by linking OpenCL event objects to

OpenGL• OpenACC

• Programming standard developed by Cray, NVIDIA, CAPS and PGI• Designed to simplify parallel programming of heterogeneous CPU/GPU

systems• The programming is done through some pragmas and API functions• Planned supported compilers – Cray, PGI and CAPS

16

http://developer.nvidia.com/cuda-toolkit

Page 17: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

III. HPC SYSTEMS around the World

17

Page 18: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

IBM Blue Gene/Q

• IBM PowerPC A2 @ 1.6 GHz CPU• 16 cores per node• 16 GB SDRAM-DDR3 per node• 90% water cooling 10% air cooling• 209 TFLOPS peak performance• 80 KW per rack power consumption• HPC Systems powered by Blue Gene/Q:

• Mira, Argonne National Lab• 786,432 cores• 8.162 PFLOPS Rmax, 10.06633 PFLOPS Rpeak• 3rd on TOP500 (June 2012)

• SuperMUC, Leibniz Rechenzentrum• 147,456 cores• 2.897 PFLOPS Rmax, 3.185 PFLOPS Rpeak• 4th on TOP500 (June 2012)

http://www-03.ibm.com/systems/technicalcomputing/solutions/bluegene/ http://www.alcf.anl.gov/mira

Page 19: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

USA: Sequoia• First on the TOP500 as of June 2012• Fully deployed in June 2012• 16.32475 PFLOPS Rmax and 20.13266 PLOPS Rpeak• 7.89 MW Power consumption• IBM Blue Gene/Q• 98,304 compute nodes• 1.6 million processor cores• 1.6 PB of memory• 8 or 16 core Power Architecture processors built on a 45 nm

fabrication method• Lustre Parallel File System

Page 20: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

Japan: K computer• First on the TOP500 as of November 2011• Located at RIKEN Advanced Institute for Computational

Science in Kobe, Japan• Distributed memory architecture• Manufactured by Fujitsu• Performance: 10.51 PFLOPS Rmax and 11.2804 PFLOPS

Rpeak• 12.65989 MW Power Consumption, 824.6 GFLOPS/Kwatt• Annual running costs - $10 million• 864 cabinets:

• 88,128 8-core SPARC64 VIIIfx processors @ 2.0 GHz• 705,024 cores• 96 computing nodes + 6 I/O nodes per cabinet

• Water cooling system minimizes failure rate and power consumption

http://www.fujitsu.com/global/news/pr/archives/month/2011/20111102-02.html

Page 21: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

China: Tianhe-1A• Tianhe-1A:

• #5 on Top-500• 2.566 PFLOPS Rmax and 4.701 PFLOPS Rpeak• 112 computer cabinets, 12 storage cabinets, 6

communication cabinets and 8 I/O cabinets:• 14,336 Xeon X5670 processors

• 7,168 Nvidia Tesla M2050 general purpose GPUs

• 2,048 FreiTeng 1000 SPARC-based processors

• 4.04 MW

• Chinese custom-designed NUDT proprietary high-speed interconnect, called Arch (160 Gbit/s)

• Purpose: petroleum exploration, aircraft design• $88 million to build, $20 million to maintain annually,

operated by about 200 workers

Page 22: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

China: Sunway Blue Light• Sunway Blue Light:

• 26th on the TOP500• 795.90 TFLOPS Rmax and 1.07016 PFLOPS Rpeak• 8,704 ShenWei SW1600 processors @ 975 MHz• 150 TB main memory• 2 PB external storage• Total power consumption 1.074 MW• 9-rack water-cooled system• Infiniband Interconnect, Linux OS

Page 23: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

USA: Intrepid• Located in Argonne National Laboratory• IBM Blue Gene/P• 23rd in TOP500 list• 40 racks• 1024 nodes per rack• 850 MHz quad-core processors• 640 I/O nodes• 458 TFLOPS Rmax and 557 TFLOPS Rpeak• 1260 KW

Page 24: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

Russia: Lomonosov• Located at Moscow State University Research Computer

Center• Most powerful HPC system in Eastern Europe• 1.373 PFLOPS Rpeak and and 674.11 TFLOPS Rmax

(TOP500)• 1.7 PFLOPS Rpeak and 901.9 TFLOPS Rmax (according

to the system’s website)• 22nd on the TOP500• 5,104 CPU nodes (T-Platforms T-Blade2 system)• 1,065 GPU nodes (T-Platforms TB2-TL)• Applications: regional and climate research, nanoscience,

protein simulations

http://www.t-platforms.com/solutions/lomonosov-supercomputer.html

http://www.msu.ru

Page 25: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

Cray Systems

• Cray XK6• Cray’s Gemini interconnect, AMD’s multicore scalar

processors, and NVIDIA’s GPU processors • Scalable up to 50 Pflops of combined performance• 70 Tflops per cabinet

• 16 core AMD Opteron 6200 (96 /cabinet) 1536 cores/cabinet

• NVIDIA Tesla 2090 (96 per cabinet)• OS: Cray Linux Environment

• Cray XE6• Cray’s Gemini interconnect, AMD’s Opteron 6100

processors• 20.2 Tflops per cabinet

• 12 core AMD Opteron 6100 (192/cabinet) 2034 cores/cabinet

• 3D torus interconnect

25

Page 26: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

• Upgrade of Jaguar from Cray XT5 to XK6

• Cray Linux Environment operating system

• Gemini interconnect• 3-D Torus • Globally addressable memory• Advanced synchronization features

• AMD Opteron 6274 processors (Interlagos)• New accelerated node design using NVIDIA multi-core

accelerators• 2011: 960 NVIDIA x2090 “Fermi” GPUs• 2012: 14,592 NVIDIA “Kepler” GPUs

• 20+ PFlops peak system performance • 600 TB DDR3 mem. + 88 TB GDDR5 mem

ORNL’s “Titan” System

Titan Specs

Compute Nodes 18,688

Login & I/O Nodes 512

Memory per node 32 GB + 6 GB

# of Fermi chips (2012) 960

# of NVIDIA “Kepler” (2013) 14,592

Total System Memory 688 TB

Total System Peak Performance 20+ Petaflops

Liquid cooling at the cabinet level Cray EcoPHLex

26

Page 27: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

USA: Blue Waters• Petascale supercomputer to be deployed at NCSA

at UIUC• IBM terminated the contract in August 2011• Cray was chosen to deploy the system• Specifications:

• Over 300 cabinets (Cray XE6 and XK6)• Over 25,000 compute nodes• 11.5 PFLOPS peak performance• Over 49,000 AMD processors (380,000 cores)• Over 3,000 NVIDIA GPUs• Over 1.5 PB overall system memory (4 GB per core)

Page 28: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

UK: HECToR• Located in University of Edinburgh in Scotland• 32nd on the TOP500 list• 660.24 TFLOPS Rmax and 829.03 TFLOPS Rpeak• Currently 30 cabinets

• 704 compute blades• Cray XE6 system• 16-core Interlagos Opteron @ 2.3 GHz• 90,112 cores total

http://www.epcc.ed.ac.uk/news/hector-leads-the-way-the-world%E2%80%99s-first-production-cray-xt6-systemhttp://www.hector.ac.uk/

Page 29: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

Germany: HERMIT• Located at High Performance Computing Center Stuttgart• 24th on the TOP500 list• 831.40 TFLOPS Rmax and 1043.94 TFLOPS Rpeak• Currently 38 cabinets

• Cray XE6• 3552 compute nodes, 113,664 compute cores• Dual socker AMD Interlagos @ 2.3 GHz (16 cores)• 2 MW max power consumption

• Research at HPCCS:• OpenMP Validation Suite, MPI-Start, Molecular

Dynamics, Biomechanical simulations, Microsoft Parallel Visualization, Virtual Reality, Augmented reality

http://www.hlrs.de/

Page 30: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

NEC SX-9

• Supercomputer, an SMP system• Single-chip vector processor:

• 3.2 GHz• 8-way replicated vector pipes (each has 2 multiply, 2 addition units)• 102.4 GFLOPS peak performance• 65 nm CMOS technology• Up to 64 GB of memory

30

http://www.nec.com/en/de/en/prod/solutions/hpc-solutions/sx-9/index.html

Page 31: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

31

Passing of the Torch

IV. In Memoriam

Page 32: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

Alan Turing - Centennial

Page 33: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

Alan Turing - Centennial• Father of computability with the abstract

“Turing Machine”• Father of Colossus

– Arguably first digital electronic supercomputer– Broke German enigma codes– Saved 10s of thousands of lives;

Page 34: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

34

Passing of the Torch

V. The Flame Burns On

Page 35: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

Key HPC Awards

• Seymour Cray Award: Charles L. Seitz• For innovation in high-performance message passing

architectures and networks• A member of National Academy of Engineering• President of Myricom, Inc. until last year• Projects he worked on include VLSI design, programming and

packet-switching techniques for 2nd generation multicomputers, Myrinet high-performance interconnects

• Sidney Fernbach Award: Cleve Moler• For fundamental contribution to linear algebra, mathematical

software, and enabling tools for computational science• Professor at the University of Michigan, Stanford University and

University of New Mexico• Co-founder of MathWorks, Inc. (in 1984)• One of the authors of LINPACK and EISPACK• Member of National Academy of Engineering• Past president of Society for Industrial and Applied

Mathematics

35

Page 36: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

Key Awards

• Ken Kennedy Award: Susan L. Graham• For foundational compilation algorithms and programming tools;

research and discipline leadership; and exceptional mentoring• Fellow: Association for Computing Machinery, American Association

for the Advancement of Science, American Academy of Arts and Sciences, National Academy of Engineering

• Awards: IEEE John von Neumann Medal in 2009• Research includes work on Harmonia, a language-based framework

for interactive software development and Titanium, a Java-based parallel programming language, compiler and runtime system

• ACM Turing Award: Judea Pearl• For fundamental contributions to artificial intelligence through the

development of a calculus for probabilistic and causal reasoning• Professor of Computer Science at UCLA• Pioneered probabilistic and causal reasoning• Created computational foundation for processing information under

certainty• Awards: IEEE intelligent systems, ACM Allen Newell, and many

others• Member: AAAI, IEEE, National Academy of Engineering, and others

36

Page 37: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

VII. EXASCALE

37

Page 38: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

Picking up the Exascale Gauntlet• International Exascale Software Project

– 7th meeting in Cologne, Germany– 8th meeting in Kobe, Japan

• European Exascale Software Initiative– EESI initial study completed– EESI-2 inaugurated

• US DOE X-stack Program– 4 teams selected to develop system software stack– To be announced

• Plans in formulation by many nations

Page 39: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

Strategic Exascale Issues• Next Generation Execution Model(s)• Runtime systems• Numeric parallel algorithms• Programming models and languages• Transition of legacy codes and methods• Computer architecture

– Highly controversial– Must be COTS, but can’t be the same

• Managing power and reliability• Good fast machines or just machines fast

Page 40: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

IX. CANONICAL HPC SYSTEM

40

Page 41: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

Canonical Machine: Selection Process

• Calculate histograms for selected non-numeric technical parameters• Architecture: Cluster – 410, MPP – 89, Constellations - 1• Processor technology: Intel Nehalem – 196, other Intel EM64T – 141, AMD x86_64 – 63,

Power – 49, etc.• Interconnect family: GigE – 224, IB – 209, Custom – 29, etc.• OS: Linux – 427, AIX – 28, CNK/SLES 9 – 11, etc.

• Find the largest intersection including single modality for each parameter• (Cluster, Intel Nehalem, GigE, Linux): 110 machines matching• Note: this does not always include the largest histogram buckets. For example, in 2003 the

dominant system architecture was cluster, but he largest parameter subset is generated for (Constellations, PA-RISC, Myrinet, HP-UX)

• In the computed subset, find the centroid for selected numerical parameters• Rmax• Number of compute cores• Processor frequency• Other useful metric would be power, but it is not available for all entries in the list

• Order machines in the subset by increasing distance (least square error) from the centroid

• The parameter values have to be normalized due to different units used• The canonical machine is found at the head of the list

Page 42: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

Canonical Subset with Machine Ranks

Page 43: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

Centroid Placement in Parameter Space

Centroid coordinates: #cores: 10896 Rmax: 64.3 TFlops Processor frequency: 2.82 GHz

Page 44: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

The Canonical HPC System – 2012• Architecture: commodity cluster

• Dominant class of HEC system• Processor: 64-bit Intel Xeon Westmere-EP

• E56xx (32nm die shrink of E55xx) in most deployed machines• Older E55xx Nehalem in the second place by machine count• Minimal changes since the last year: strong Intel dominance, with AMD and POWER based systems in

distant second and third place• Intel maintains the strong lead in total number of cores deployed (4,337,423), followed by AMD Opteron

(2,038,956) and IBM POWER (1,434,544)• Canonical example: #315 in the TOP500

• Rmax of 62.3 TFlops• Rpeak of 129 Tflops• 11016 cores

• System node• Based on HS22 BladeCenter (3rd most popular system family)• Equipped with Intel X5670 6C clocked at 2.93 GHz (6 cores/12 threads per processor)

• Homogeneous• Only 39 accelerated machines in the list; accelerator cores constitute 5.7% of all cores in TOP 500

• IBM systems integrator • Remains the most popular vendor (225 machines in the list)

• Gigabit Ethernet interconnect• Infiniband still did not surpass GigE in popularity

• Linux• By far #1 OS

• Power consumption: 309 kW, 201.6 MFLOPS/W• Industry owned and operated (telecommunication company)

Page 45: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

X. CLOSING REMARKS

45

Page 46: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.

Next Year (if invited)

• Leipzig• 10th Anniversary of Retrospective

Keynotes• Focus on Impact and

Accomplishments across an entire Decade

• Application oriented: science and industry

• System Software looking forward

Page 47: HPC Achievement and Impact – 2012 a personal perspective Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director.