Top Banner
Addressing Future HPC Demand Addressing Future HPC Demand with Multi-core Processors with Multi-core Processors Stephen S. Pawlowski Stephen S. Pawlowski Intel Senior Fellow Intel Senior Fellow GM, Architecture and Planning GM, Architecture and Planning CTO, Digital Enterprise Group CTO, Digital Enterprise Group September 5, 2007
14

Addressing Future HPC Demand with Multi-core Processors Stephen S. Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise.

Jan 18, 2016

Download

Documents

Griselda Hines
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Addressing Future HPC Demand with Multi-core Processors Stephen S. Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise.

Addressing Future HPC Demand Addressing Future HPC Demand with Multi-core Processorswith Multi-core Processors

Stephen S. PawlowskiStephen S. PawlowskiIntel Senior FellowIntel Senior Fellow

GM, Architecture and PlanningGM, Architecture and PlanningCTO, Digital Enterprise GroupCTO, Digital Enterprise Group

September 5, 2007

Page 2: Addressing Future HPC Demand with Multi-core Processors Stephen S. Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise.

22

Legal DisclaimerLegal DisclaimerINFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT.INTELLECTUAL PROPERTY RIGHT.Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications.systems, or in nuclear facility applications.Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject matter. The furnishing of documents and intellectual property rights that relate to the presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or other materials and information does not provide any license, express or implied, by estoppel or otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights.otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights.Intel may make changes to specifications, product descriptions, and plans at any time, without notice.Intel may make changes to specifications, product descriptions, and plans at any time, without notice.The processors, the chipsets, or the ICH components may contain design defects or errors known as The processors, the chipsets, or the ICH components may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata, which may cause the product to deviate from published specifications. Current characterized errata are available upon request.errata are available upon request.All dates provided are subject to change without notice. All dates specified are target dates, are provided All dates provided are subject to change without notice. All dates specified are target dates, are provided for planningfor planningpurposes only and are subject to change.purposes only and are subject to change.

Intel and are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United Intel and are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.States and other countries.* Other names and brands may be claimed as the property of others.* Other names and brands may be claimed as the property of others.Copyright © 2007, Intel Corporation. All rights reserved.Copyright © 2007, Intel Corporation. All rights reserved.

Page 3: Addressing Future HPC Demand with Multi-core Processors Stephen S. Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise.

33

Intel Is Listening…Intel Is Listening…

With multi-core processors, however, we finally get to a scheme where the With multi-core processors, however, we finally get to a scheme where the HEP execution profile shines. Thanks to the fact that our jobs are HEP execution profile shines. Thanks to the fact that our jobs are embarrassingly parallel (each physics event is independent of the others) embarrassingly parallel (each physics event is independent of the others) we can launch as many processes (or jobs) as there are cores on a die. we can launch as many processes (or jobs) as there are cores on a die. However, this requires that the memory size is increased to accommodate However, this requires that the memory size is increased to accommodate the extra processes (making the computers more expensive). As long as the extra processes (making the computers more expensive). As long as the memory traffic does not become a bottleneck, we see a practically the memory traffic does not become a bottleneck, we see a practically linear increase in the throughput on such a system. A ROOT/PROOF linear increase in the throughput on such a system. A ROOT/PROOF analysis demo elegantly demonstrated this during the launch of the quad-analysis demo elegantly demonstrated this during the launch of the quad-core chip at CERN.core chip at CERN.

Source: “Source: “Processors size up for physics at the LHC”Processors size up for physics at the LHC” Sverre Jarp,Sverre Jarp, CERN Courier, March 28, 2007CERN Courier, March 28, 2007

HEP – High Energy PhysicsLHC - Large Hadron Collider

Page 4: Addressing Future HPC Demand with Multi-core Processors Stephen S. Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise.

44

Accelerating Multi- and Many-coreAccelerating Multi- and Many-core

Performance Through ParallelismPerformance Through Parallelism

Power delivery and management

High bandwidth memory

Reconfigurable cache

Scalable fabric

Fixed-function units

Big Core

Core

Core

Core

Core Core

Core Core

Core CoreBig Core

•Big cores for Single Thread Performance

Big cores for Single Thread Performance

•Small cores for Multi-Thread Performance

Small cores for Multi-Thread Performance

Page 5: Addressing Future HPC Demand with Multi-core Processors Stephen S. Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise.

55

Memory Bandwidth DemandMemory Bandwidth DemandComputational Fluid Dynamic (CFD) - SplashComputational Fluid Dynamic (CFD) - Splash

00

500500

10001000

15001500

20002000

25002500

30003000

35003500

40004000

45004500

Mem

ory

Bandw

idth

(G

B/S

ec)

Mem

ory

Bandw

idth

(G

B/S

ec)

Com

puta

tion (

GFl

ops)

Com

puta

tion (

GFl

ops)

92 277

738

2215

75

x5

0x5

0,

10

fps

15

0x1

00

x1

00

, 1

0fp

s

75

x5

0x5

0,

30

fps

15

0x1

00

x1

00

, 3

0fp

s50005000

Source: Intel Labs

547

1642

4380

131401000010000

2500025000

1 TFLOPOr

1 TB/s

0.17 Byte/Flop

DD

R3

Bandw

idth

(G

B/S

ec)

DD

R3

Bandw

idth

(G

B/S

ec)

Ass

um

ing f

requency

: 2

13

3M

Hz

Ass

um

ing f

requency

: 2

13

3M

Hz

34 68 136272

544

1088

2,4,8,16,32,64 DDR3

channels

Page 6: Addressing Future HPC Demand with Multi-core Processors Stephen S. Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise.

66

Increasing Signaling Rate Increasing Signaling Rate More Bandwidth & Less PowerMore Bandwidth & Less Power

Differential Copper Interconnect & Silicon Photonics

10

11

12

0 1333

Signaling Rate MHz

Po

wer

(m

W/G

bp

s)

State

of t

he ar

t DDR

1066 1600 1867 2133 2400 2600 2800

Single-ended Interconnect

Future

State of the Art with FBD (@25mW/Gbps & 5Gb/s):100 GB/sec ~ 1 Tb/sec = 1,000 Gb/sec 25mw/Gb/sec = 25 WattsBus-width = 1,000 Gb/sec / 5 = 200, about 400 pins (differential)

0

5

10

15

20

25

30

0 5 10 15 20Signaling Rate GBit/sec

Po

wer

(m

W/G

bp

s)

State of the art with FBD

Future Copper Interconnect

Optical today

40

Silicon Photonics Vision

40

Page 7: Addressing Future HPC Demand with Multi-core Processors Stephen S. Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise.

77

Addressing Memory BandwidthAddressing Memory Bandwidth

Bringing Memory Closer to the CoresBringing Memory Closer to the Cores

Package

DRAM

CPU

Heat-sink

Last Level Cache

Fast DRAM

Memory on Package

*Future Vision, does not represent real Intel product

3D Memory Stacking

Package

Si Chip Si Chip

Page 8: Addressing Future HPC Demand with Multi-core Processors Stephen S. Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise.

88

Photonics For Memory BW and CapacityPhotonics For Memory BW and CapacityHigh Performance with Remote MemoryHigh Performance with Remote Memory

RxRx

TxTx

The Indium Phosphide emits the light into the silicon waveguide

The silicon acts as laser cavity - Light bounces back and forth and get amplified by InP based material

integrated silicon integrated silicon photonic chipphotonic chip

Remote Memory Blade Remote Memory Blade integrated into a systemintegrated into a system

Integrated Tb/s Optical Link on a Single ChipIntegrated Tb/s Optical Link on a Single Chip

Page 9: Addressing Future HPC Demand with Multi-core Processors Stephen S. Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise.

99 2008 2010 2012 2014

Increasing Ethernet BandwidthIncreasing Ethernet Bandwidth

TPC-H

SPECweb05

SPECweb05

TPC-C

SPECjApps

SPECjAppsSPECjApps

SPECjApps

10 Gb/s

20Gb/s

SPECweb05

TPC-H

40Gb/s

30Gb/s

50Gb/s

60Gb/s

70Gb/s

80Gb/s

SPECweb05TPC-H

TPC-C

TPC-C

TPC-CTPC-H

Source: Intel, 2006. TPC & SPEC are standard server application benchmarks

Today server I/O is fragmented•1GbE performance doesn’t meet current server I/O requirements•10GbE is still too expensive•2/4G Fibre Channel & 10G Infiniband are being deployed to meet the growing I/O demands for storage & HPC clusters

In 2-4 Years, convergence on 10GbE looks promising•10GBASE-T availability will drive costs down•10GbE will offer a compelling value-proposition for LAN, SAN, & HPC cluster connections

7-10 Years•Look beyond to 40GbE and 100GbE~2x Perf G

ain

~2x Perf Gain

Every 2

Every 2

YearsYears

Page 10: Addressing Future HPC Demand with Multi-core Processors Stephen S. Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise.

1010

Outside the Box:Outside the Box:HPC Networking with Optical CablesHPC Networking with Optical Cables

Benefit Over Copper:Benefit Over Copper: Scalable, BW, throughputScalable, BW, throughput Longer distance – today up to 100mLonger distance – today up to 100m Higher reliability: 10ˉHigher reliability: 10ˉ15 15 Bit Error Rate Bit Error Rate

(BER) or lower(BER) or lower Smaller & lighter form factorSmaller & lighter form factor

Electrical

Socket

Optical Transceiver

in Plug

Optical CableOptical Cable Optical Transceiver

in PlugUp to 100m

Electrical

Socket

2007 2009 2011 2013

20 Gb/s

40Gb/s

100Gb/s80Gb/s

120Gb/s

Doubling the Data Rate Every 2 YearsDoubling the Data Rate Every 2 Years

Page 11: Addressing Future HPC Demand with Multi-core Processors Stephen S. Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise.

1111

Power Aware EverywherePower Aware Everywhere

SiliconSilicon Heat SinksHeat Sinks

SystemsSystemsFacilitiesFacilities

PackagesPackages

Silicon:Silicon:

Moore’s law, Strained silicon, Transistor Moore’s law, Strained silicon, Transistor leakage control techniques, Hi-K Dielectric, leakage control techniques, Hi-K Dielectric, Clock gating, etc.Clock gating, etc.

Processor and System Power:Processor and System Power:

Multi-core, Integrated Voltage Regulators Multi-core, Integrated Voltage Regulators (IVR), Fine grain power management, etc.(IVR), Fine grain power management, etc.

Facilities:Facilities:

Efficient Power Conversion and Cooling

Page 12: Addressing Future HPC Demand with Multi-core Processors Stephen S. Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise.

1212

Reliability Challenges & VisionReliability Challenges & Vision

Source: IntelSource: Intel

+ V+ V

++ -- ++--

-- ++ --++

Depletion RegionDepletion RegionDriftDrift

DiffusionDiffusion

Ion PathIon Path

An exponential growth in FIT/chipAn exponential growth in FIT/chip • FIT/bit of memory cellFIT/bit of memory cell:: expected expected

to be roughly constant, but, to be roughly constant, but, • Moore’s lawMoore’s law:: increasing the bit increasing the bit

count exponentially: 2x every 2 count exponentially: 2x every 2 years years

Source: IntelSource: Intel

Soft Error: one of many reliability challengesSoft Error: one of many reliability challenges

Process TechniquesProcess Techniques

Device Param TuningDevice Param Tuning Rad-hard Cell CreationRad-hard Cell Creation

Circuit TechniquesCircuit Techniques

Architectural TechniquesArchitectural Techniques

Micro SolutionsMicro Solutions Macro SolutionsMacro Solutions

Parity Parity SECDED ECC SECDED ECC

ππ bit bit

Parity Parity SECDED ECC SECDED ECC

ππ bit bit

LocksteppingLocksteppingRedundant multithreading (RMT)Redundant multithreading (RMT)

Redundant multi-core CPURedundant multi-core CPU

LocksteppingLocksteppingRedundant multithreading (RMT)Redundant multithreading (RMT)

Redundant multi-core CPURedundant multi-core CPU

State-of-Art ProcessesState-of-Art Processes

Page 13: Addressing Future HPC Demand with Multi-core Processors Stephen S. Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise.

1313

What Can We Expect?!What Can We Expect?!

10 PFlops10 PFlops

1 PFlops1 PFlops

100 TFlops100 TFlops

10 TFlops10 TFlops

1 TFlops1 TFlops

100 GFlops100 GFlops

10 GFlops10 GFlops

1 GFlops1 GFlops

100 MFlops100 MFlops

100 PFlops100 PFlops

10 EFlops10 EFlops

1 EFlops1 EFlops

100 EFlops100 EFlops

19931993 2017201719991999 20052005 20112011 20232023

HPCHPC

1 ZFlops1 ZFlops

#1#1

20292029

Background picture from CERN OpenLab, Intel HPC Roundtable ‘06

SUMSUMOf Of

Top500Top500

Beyond PetascaleBeyond Petascale

CERN: 115th of Top500 in June CERN: 115th of Top500 in June 20072007source: CERN Courier, Aug. 20, 2007source: CERN Courier, Aug. 20, 2007

Page 14: Addressing Future HPC Demand with Multi-core Processors Stephen S. Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise.

1414

What can Intel do for you?What can Intel do for you?