Top Banner
CPEG421-2001-F-Topic-3-II 1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao ACM Fellow and IEEE Fellow Endowed Distinguished Professor Electrical & Computer Engineering University of Delaware [email protected]
40

CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

Dec 18, 2015

Download

Documents

Emily Bridges
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

CPEG421-2001-F-Topic-3-II 1

Topic 3 -- II: System Software Fundamentals:

Multithreaded Execution Models, Virtual Machines

and Memory Models

Guang R. Gao

ACM Fellow and IEEE FellowEndowed Distinguished ProfessorElectrical & Computer Engineering

University of Delaware

[email protected]

Page 2: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

CPEG421-2001-F-Topic-3-II 2

Outline• An introduction to parallel program execution

models• Coarse-grain vs. fine-grain multithreading• Evolution of fine-grain multithreaded program

execution models.• Memory and synchronization. models• Fine-Grain Multithreaded execution and virtual

machine models for peta-scale computing: a case study on HTMT/EARTH

Page 3: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

CPEG421-2001-F-Topic-3-II 3

Terminology Clarification

• Parallel Model of Computation– Parallel Models for Algorithm Designers– Parallel Models for System Designers

• Parallel Programming Models• Parallel Execution Models• Parallel Architecture Models

Page 4: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

CPEG421-2001-F-Topic-3-II 4

System Characterization

Questions:

Q1: What characteristics of a computational system are required …

Q2: The diversity of existing and potential multi-core architectures…

Response:

R1: An important characteristic of such a compiler should include, at both chip level and system level, a program execution model that should at least include the specification and API

Gao, ECCD Workshop, Washington D.C., Nov. 2007

Page 5: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

CPEG421-2001-F-Topic-3-II 5

What Does Program Execution Model (PXM) Mean ?

• The notion of PXM

The program execution model (PXM) is the basic

low-level abstraction of the underlying system

architecture upon which our programming model,

compilation strategy, runtime system, and other software components are developed.

• The PXM (and its API) serves as an interface between the architecture and the software.

Page 6: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

CPEG421-2001-F-Topic-3-II 6

Program Execution Model (PXM) – Cont’d

Unlike an instruction set architecture (ISA) specification, which usually focuses on lower level details (such as instruction encoding and organization of registers for a specific processor), the PXM refers to machine organization at a higher level for a whole class of high-end machines as view by the users

Gao, et. al., 2000

Page 7: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

CPEG421-2001-F-Topic-3-II 7

What is your “Favorite”

Program Execution Model?

Page 8: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

A Generic MIMD Architecture

CPEG421-2001-F-Topic-3-II 8

Memory NICCommunication

Assist

$

P

$

P

IC

Node: Processor(s), Memory System plus Communication assist (Network Interface & Communication Controller)

Full Feature Interconnect Networks. Packet Switching Fabrics. Key: Scalable Network

Objective: Make efficient use of scarce communication resources – providing high bandwidth, low-latency communication between nodes with a minimum cost and energy

Page 9: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

Programming Models for Multi-Processor Systems

• Message Passing Model– Multiple address

spaces

– Communication can only be achieved through “messages”

• Shared Memory Model– Memory address space

is accessible to all

– Communication is achieved through memory

CPEG421-2001-F-Topic-3-II 9

Local Memory

Processor

Local Memory

Processor

Messages

Processor Processor

Global Memory

Page 10: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

Comparison

Message Passing

+ Less Contention

+ Highly Scalable

+ Simplified Synch – Message Passing Sync +

Comm.

– But does not mean highly programmable

- Load Balancing

- Deadlock prone

- Overhead of small messages

Shared Memory

+ global shared address space

+ Easy to program (?)

+ No (explicit) message passing (e.g. communication through memory put/get operations)

- Synchronization (memory consistency models, cache models)

- Scalability

CPEG421-2001-F-Topic-3-II 10

Page 11: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

What is A Shared Memory Execution Model?

CPEG421-2001-F-Topic-3-II 11

Thread ModelA set of rules for creating, destroying and managing threads

Thread ModelA set of rules for creating, destroying and managing threads

Memory ModelDictate the ordering of memory operations

Memory ModelDictate the ordering of memory operations

Synchronization ModelProvide a set of mechanisms to protect from data races

Synchronization ModelProvide a set of mechanisms to protect from data races

Execution Model

The Thread Virtual MachineThe Thread Virtual Machine

Page 12: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

CPEG421-2001-F-Topic-3-II 12

Essential Aspects in User-Level Shared Memory Support?

• Shared address space support and management

• Access control and management

- Memory consistency model (MCM)

- Cache management mechanism

Page 13: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

CPEG421-2001-F-Topic-3-II 13

Grand Challenge Problems

• How to build a shared-memory multiprocessor that is

scalable both within a (multi-core/many-core chip) and a

system with many chips ?

• How to program and optimize application programs?

Our view: One major obstacle in solving these problems in

the memory coherence assumption in today’s hardware-

centric memory consistency model.

Page 14: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

A Parallel Execution Model

CPEG421-2001-F-Topic-3-II 14

Application Programming Interface (API)

Execution / Architecture Model

Thread Model

Memory Model

Synchronization Model

Page 15: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

A Parallel Execution Model

CPEG421-2001-F-Topic-3-II 15

Application Programming Interface (API)

With Dataflow Origins

Execution / Architecture Model

Fine Grained Multithreaded

Model

Memory Adaptive /

Aware Model

Fine Grained Synchronization

Model

Our Model

Page 16: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

CPEG421-2001-F-Topic-3-II 16

Comment on OS impact?

• Should compiler be OS-Aware too ? If so, how ?

• Or other alternatives ? Compiler-controlled runtime, of compiler-aware kernels, etc.

• Example: software pipelining …

Gao, ECCD Workshop, Washington D.C., Nov. 2007

Page 17: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

CPEG421-2001-F-Topic-3-II 17

Outline

• An introduction to multithreaded program execution models

• Coarse-grain vs. fine-grain parallel execution models – a historical overview

• Fine-grain multithreaded program execution models.

• Memory and synchronization. models• Fine-grain multithreaded execution and virtual

machine models for extreme-scale machines: a case study on HTMT/EARTH

Page 18: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

Course Grain Execution Models

CPEG421-2001-F-Topic-3-II 18

The Single Instruction Multiple Data (SIMD) Model

The Single Program Multiple Data (SPMD) Model

The Data Parallel Model

Pipelined Vector Unit orPipelined Vector Unit or

Array of ProcessorsArray of Processors

Program

Processor

Program

Processor

Program

Processor

Program

Processor

Task Task Task Task

Data Structure

Page 19: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

Data Parallel Model

CPEG421-2001-F-Topic-3-II 19

Difficult to write unstructured programsDifficult to write unstructured programsConvenient only for problems with regular structured parallelism.

Limited composability!Limited composability!Inherent limitation of coarse-grain multi-threading

Compute

Communication

Compute

Communication

?

Limitations

Page 20: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

Dataflow Model of Computation

CPEG421-2001-F-Topic-3-II 20

++

++**

a b c d e

1

3

4

3

Page 21: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

Dataflow Model of Computation

CPEG421-2001-F-Topic-3-II 21

++

++**

a b c d e

4

3

4

Page 22: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

Dataflow Model of Computation

CPEG421-2001-F-Topic-3-II 22

++

++**

a b c d e

7

4

Page 23: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

Dataflow Model of Computation

CPEG421-2001-F-Topic-3-II 23

++

++**

a b c d e

28

Page 24: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

Dataflow Model of Computation

CPEG421-2001-F-Topic-3-II 24

++

++**

a b c d e

1

3

4

3

28

Dataflow Software Pipelining

Page 25: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

CPEG421-2001-F-Topic-3-II 25

Outline

• An introduction to multithreaded program execution models

• Coarse-grain vs. fine-grain parallel execution models – A Historical Overview

• Fine-grain multithreaded program execution models.

• Memory and synchronization. models• Fine-grain multithreaded execution and virtual

machine models for peta-scale machines: a case study on HTMT/EARTH

Page 26: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

CPEG421-2001-F-Topic-3-II 26

CPU

Memory

Fine-Grain non-preemptive thread-The “hotel” model

ThreadUnit

ExecutorLocus

Coarse-Grain vs. Fine-Grain Multithreading

A PoolThread

CPU

Memory

ExecutorLocus

A SingleThread

Coarse-Grain thread-The family home model

ThreadUnit

[Gao: invited talk at Fran Allen’s Retirement Workshop, 07/2002]

Page 27: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

CPEG421-2001-F-Topic-3-II 27

Evolution of Multithreaded Execution and Architecture Models

Non-dataflowbased

CDC 66001964

MASAHalstead1986

HEPB. Smith1978

Cosmic CubeSeiltz1985

J-MachineDally1988-93

M-MachineDally1994-98

Dataflowmodel inspired

MIT TTDAArvind1980

ManchesterGurd & Watson1982

*T/Start-NGMIT/Motorola1991-

SIGMA-IShimada1988

MonsoonPapadopoulos& Culler 1988

P-RISCNikhil & Arvind1989

EM-5/4/X RWC-11992-97

Iannuci’s1988-92

Others: Multiscalar (1994), SMT (1995), etc.

Flynn’sProcessor1969

CHoPP’77 CHoPP’87

TAMCuller1990

TeraB. Smith1990-

AlwifeAgarwal1989-96

CilkLeiserson

LAUSyre1976

Eldorado

CASCADE

StaticDataflowDennis 1972MIT

Arg-FetchingDataflowDennisGao1987-88

MDFAGao1989-93

MTAHumTheobaldGao 94

EARTH CAREPACT95’, ISCA96, Theobald99

Marquez04

Page 28: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

The Von Neumann-type Processing

CPEG421-2001-F-Topic-3-II 28

begin for i = 1 … … endforend

begin for i = 1 … … endforend

Source Code

CompilerSequential Machine

Representation

CPU

Load

Processor

Page 29: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

A Multithreaded Architecture

CPEG421-2001-F-Topic-3-II 29

To Other PE’s

One PE

Page 30: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

CPEG421-2001-F-Topic-3-II 30

McGill Data FlowArchitecture Model

(MDFA)

Page 31: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

CPEG421-2001-F-Topic-3-II 31

n1

n2 n3

stor

e

store

fetchfetch

n1

n2 n3

store

fetch fetch

Argument –flow Principle Argument –fetching Principle

Page 32: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

A Dataflow Program Tuple

CPEG421-2001-F-Topic-3-II 32

Program Tuple = { P-Code . S-Code }Program Tuple = { P-Code . S-Code }

P-CodeP-Code

N1: x = a + b;N2: y = c – d;N3: z = x * y;

S-CodeS-Code

22

33n1n1

a

b

22

33n2n2

c

d

22

33n1n1

IPUIPU ISUISU

Page 33: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

The McGill Dataflow Architecture Model

CPEG421-2001-F-Topic-3-II 33

Pipelined Instruction Processing Unit (PIPU)

Dataflow Instruction Scheduling Unit (DISU)

Enable Memory & Controller

Signal Processing

Fire Done

Page 34: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

The McGill Dataflow Architecture Model

CPEG421-2001-F-Topic-3-II 34

Pipelined Instruction Processing Unit (PIPU)

Dataflow Instruction Scheduling Unit (DISU)

Fire Done

Waiting Instructions

Enabled Instructions = PC

Important Features

Pipeline can be kept fully utilized provided that the program has sufficient parallelism

Page 35: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

The Scheduling Memory (Enable)

CPEG421-2001-F-Topic-3-II 35

Dataflow Instruction Scheduling Unit (DISU)

CONTROLLER

1 1

1 1

01

0 0

0 0

0

1 1

1

1 0

0 0

0 1

Signal Processing

Fire Done

Count Signal(s)

0 Waiting Instructions1 Enabled Instructions

Page 36: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

CPEG421-2001-F-Topic-3-II 36

Advantages of the McGill Dataflow Architecture Model

• Eliminate unnecessary token copying and transmission overhead

• Instruction scheduling is separated from the main datapath of the processor (e.g. asynchronous, decoupled)

Page 37: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

Von Neumann Threads as Macro Dataflow Nodes

CPEG421-2001-F-Topic-3-II 37

1

2

3

k

A sequence of instructions is “packed” into a macro-dataflow node

Synchronization is done at the macro-node level

Page 38: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

CPEG421-2001-F-Topic-3-II 38

Hybrid Evaluation Von Neumann Style Instruction Execution” on

the McGill Dataflow Architecture• Group a “sequence” of dataflow instruction into a “thread” or

a macro dataflow node.• Data-driven synchronization among threads.• “Von Neumann style sequencing” within a thread.

Advantage:Preserves the parallelism among threads but avoids unnecessary fine-grain synchronization between instructions within a sequential thread.

Page 39: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

CPEG421-2001-F-Topic-3-II 39

What Do We Get?

• A hybrid architecture model without sacrificing the advantage of fine-grain parallelism!(latency-hiding, pipelining support)

Page 40: CPEG421-2001-F-Topic-3-II1 Topic 3 -- II: System Software Fundamentals: Multithreaded Execution Models, Virtual Machines and Memory Models Guang R. Gao.

A Realization of the Hybrid Evaluation

CPEG421-2001-F-Topic-3-II 40

Pipelined Instruction Processing Unit (PIPU)

Dataflow Instruction Scheduling Unit (DISU)

Fire Done

Shortcut

1 2 k

Von Neumann bitVon Neumann bit