Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

Post on 18-Jun-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

9/24/12 NSF-ME-08-2012 1 1

Overview of HPC Computer Architecture:

A Long March Toward Exa-Scale Computing and Beyond \

August 16, 2012

Guang R. Gao

ACM Fellow and IEEE Fellow

Distinguished Professor, Dept. of ECE University of Delaware

9/24/12 NSF-ME-08-2012 2 2

Toward A Codelet Based Execution Model and Its Memory Semantics

-- For Future Extreme-Scale Computing Systems

\

August 16, 2012

Guang R. Gao

ACM Fellow and IEEE Fellow Distinguished Professor, Dept. of ECE

University of Delaware

9/24/12 NSF-ME-08-2012 3

Outline •  Background and motivation •  Program execution models •  Evolution of codelet based execution models

–  The EARTH project (1994 – 2004) –  IBM Cyclops-64 project (2004 – 2010+ ): The TNT Experience –  Intel-led UHPC/Runnemede (2010 – 2012): The codelet concept and

SWARM

•  Memory semantics the codelet model •  Conclusions and Future Directions

3

K (“KEI”) Computer •  "K” draws upon the Japanese word "Kei"

for 1016 •  3 times faster than Chinese Tianhe 1A •  8.162 Pflops Rmax, 8.777 Pflops Rpeak •  80,000 8-core 2GHz SPARC64 VIIIfx to

deliver a total of more than 640,000 processing cores

•  1 PB memory •  4th most energy-efficient system in the

500, with a performance-per-watt rating of 825 megaflops per Watt.

•  Tofu : A 6D Mesh/Torus Interconnect NSF-ME-08-2012 4 9/24/12

Tianhe-1A 2.566 Petaflops Rmax

DEPARTMENT OF COMPUTER SCIENCE @ LOUISIANA STATE UNIVERSITY 5

Current Big Themes in Supercomputing

•  Multi-core Many-core – Exa-Scale is on horizon

•  Heterogeneity and Accelerators •  Data-Intensive (big-data) •  Others ?

9/24/12 NSF-ME-08-2012 6

Challenges

•  Challenges: – Big-compute (performance demand on

massively parallelism) – Big-data (massive, irregular, unstructured data

need big analytics) – Big chips with architecture heterogeneity –  Energy efficiency and resiliency

9/24/12 NSF-ME-08-2012 7 7

A Fundamental Challenge - Parallel Program Execution

Models

9/24/12 NSF-ME-08-2012 8

9/24/12 NSF-ME-08-2012 9

Outline •  Background and motivation •  Program execution models •  Evolution of codelet based execution models

–  The EARTH project (1994 – 2004) –  IBM Cyclops-64 project (2004 – 2010+ ): The TNT Experience –  Intel-led UHPC/Runnemede (2010 – 2012): The codelet concept and

SWARM

•  Semantics of the codelet model •  Conclusions and Future Directions

9nn

A Quiz: Have you heard the following terms ?

Actors (dataflow) ?

9/24/12 NSF-ME-08-2012 10

strand ?

fiber ? codelet ?

What is a Program Execution Model?   Application Code   Software Packages   Program Libraries   Compilers   Utility Applications

(API) PXM

User Code

  Hardware   Runtime Code and Libraries   Operating System

System

Curtsey: JB Dennis, PEM-2, 4/72011

NSF-ME-08-2012 12

CPU

Memory

Fine-Grain non-preemptive thread- The “hotel” model

Thread Unit

Executor Locus

Coarse-Grain vs. Fine-Grain Multithreading

A Pool Thread

CPU

Memory

Executor Locus

A Single Thread

Coarse-Grain thread- The family home model

Thread Unit

[Gao: invited talk at Fran Allen’s Retirement Workshop, 07/2002]

9/24/12

Execution Model API

Abstract Machine Models

Programming Environment Platforms

Users Users

Exe

cutio

n M

odel

Programming Models

Execution Model and Abstract Machines 9/24/12 NSF-ME-08-2012 13

9/24/12 NSF-ME-08-2012 14

Outline •  Background and motivation •  Program execution and abstract machine models •  Codelet based execution models

–  The EARTH project (1994 – 2004) –  IBM Cyclops-64 project (2004 – 2010+ ): The TNT Experience –  Intel-led UHPC/Runnemede (2010 – 2012): The codelet concept and

SWARM

•  Semantics of the codelet model •  Conclusions and Future Directions

14nn

Execution Model API

Abstract Machin e Models

Programming Environment Platforms

Users Users

Exe

cutio

n M

odel

Programming Models

Execution Model and Abstract Machines 9/24/12 NSF-ME-08-2012 15

Abstract Machine Models May Be Heterogeneous!

NSF-ME-08-2012 16 9/24/12

High-Level Programming API (MPI, Open MP, CnC, Xio, Chapel, etc.)

Software packages Program libraries Utility applications

Compilers Tools/SDK

API

Abstract Machine

Hardware Architecture

Programming Models/ Environment

Users

Users

Exe

cutio

n M

odel

Runtime System

Runtime System

Execution Model and Abstract Machines NSF-ME-08-2012 17 9/24/12

EARTH Architecture

PE PE PE

EU

SU

Loca

l Mem

ory

Memory Bus

From RQ To EQ

RQ EQ

Inte

rcon

nect

Net

wor

knode

node

node

... ...

9/24/12 NSF-ME-08-2012 18

The EARTH Multithreaded Execution Model (1993 – 200x)

NSF-ME-08-2012 19

fiber within a frame Aync. function invocation

A sync operation Invoke a threaded func

Two Level of Fine-Grain Threads: - threaded procedures - fibers

2 2 1 2

1 2 2 4

Signal Token

Total # signals

Arrived # signals

9/24/12 Fibers 2-level of threads

9/24/12 NSF-ME-08-2012 20

Outline •  Background and motivation •  Program execution models •  Evolution of codelet based execution models

–  The EARTH project (1994 – 2004) –  IBM Cyclops-64 project (2004 – 2010+ ): The TNT

Experience –  Intel-led UHPC/Runnemede (2010 – 2012): The codelet

concept and SWARM

•  Semantics of the codelet model •  Conclusions and Future Directions

20

9/24/12 NSF-ME-08-2012 21

Outline •  Background and motivation •  Program execution models •  Evolution of codelet based execution models

–  The EARTH project (1994 – 2004) –  IBM Cyclops-64 project (2004 – 2010+ ): The TNT

Experience –  Intel-led UHPC/Runnemede (2010 – 2012): The codelet

concept and SWARM

•  Semantics of the codelet model •  Conclusions and Future Directions

21

9/24/12 NSF-ME-08-2012 22

Outline •  Background and motivation •  Program execution models •  Evolution of codelet based execution models

–  The EARTH project (1994 – 2004) –  IBM Cyclops-64 project (2004 – 2010+ ): The TNT

Experience –  Intel-led UHPC/Runnemede (2010 – 2012): The codelet

concept and SWARM –  DOE X-Stack (2012-2015): Continue the codelet path

•  Semantics of Codelet Models •  Conclusions and Future Directions

22nn

The Codelet: A Fine-Grain Piece of Computing

Codelet

Result Object

Data Objects

Supports Massively Parallel Computation!

Courtesy: Prof. Jack Dennis, 2001

The Codelet: A Fine-Grain Piece of Computing

Codelet

Result Object

Data Objects

This Looks Like Data Flow!!

Courtesy: Prof. Jack Dennis, 2001

Concept of Codelet (Feb. 4th, 2011)

-  Codelets are the principal scheduling quantum under our codelet based execution model. A codelet, once allocated and scheduled, will be kept usefully busy - since it is non-preemptive

-  The underline hardware architecture and system software (e.g. compiler, etc.) are optimized to ensure such non-preemption features can be productively utilized.

9/24/12 NSF-ME-08-2012 25

9/24/12 NSF-ME-08-2012 26

Outline •  Background and motivation •  Program execution models •  Codelet based execution models

–  The EARTH project (1994 – 2004) –  IBM Cyclops-64 project (2004 – 2010+ ): The TNT

Experience –  Intel-led UHPC/Runnemede (2010 – 2012): The codelet

concept and SWARM

•  Memory semantics of codelet models •  Conclusions and Future Directions

What is A Shared Memory Execution Model?

NSF-ME-08-2012 27

Thread Model A set of rules for creating, destroying and managing threads

Memory Model Dictate the ordering of memory operations

Synchronization Model Provide a set of mechanisms to protect from data races

Execution Model

The Thread Abstract Machine 9/24/12

“Memory Coherence” A Basic Assumption of SC-Derived

Memory Models

“…All writes to the same location are serialized in some order and are performed in that order with respect to any processor…”

[Gharacharloo Et Al 90]

9/24/12 NSF-ME-08-2012 28

Can We Break The Memory Coherence Barrier ?

9/24/12 NSF-ME-08-2012 29

No ?

Yes ?

Four Key Question on Memory Models

•  What happens when two (or more) concurrent load/store operations happen (arrives) at the same memory location?

•  Answers ?

9/24/12 NSF-ME-08-2012 30

A Conjecture

The LC (Location Consistency) memory model belongs to the group of memory models that iss weakest while still do not violate the causality constraint!

9/24/12 NSF-ME-08-2012 31

9/24/12 NSF-ME-08-2012 32

Outline •  Background and motivation •  Program execution models •  Evolution of codelet based execution models

–  The EARTH project (1994 – 2004) –  IBM Cyclops-64 project (2004 – 2010+ ): The TNT

Experience –  Intel-led UHPC/Runnemede (2010 – 2012): The codelet

concept and SWARM

•  The memory semantics of codelets •  Conclusions and Future Directions

DOE X-Stack Project July 2012 – June 2015

Traleika Glacier

(Team Lead: Intel Universities: UIUC, UD, UCSD, Rice U)

Other Industries (ETI, Reservoir) DOE Labs: (PNNL, Sandia, ORNL, ..)

9/24/12 NSF-ME-08-2012 33

9/24/12 NSF-ME-08-2012 34

Acknowledgements

•  Our Sponsors •  Members of CAPSL •  Members of ETI •  Other Collaborators (T. Sterling, V. Sarkar, etc.) •  My Mentor - Prof. Jack B. Dennis •  My Host

top related