LecILP

7/31/2019 LecILP

1/37

slide 1

Outline

Classification

ILP Architectures

Data Parallel Architectures

Process level Parallel Architectures

Issues in parallel architectures

Cache coherence problem Interconnection networks

7/31/2019 LecILP

2/37

slide 2

Outline

Classification

ILP Architectures





Flynns [66]

Fengs [72]Hndlers [77]

Modern (Sima, Fountain & Kacsuk)

7/31/2019 LecILP

3/37

slide 3

Flynns Classification

Architecture Categories

SISD SIMD MISD MIMD

7/31/2019 LecILP

4/37

slide 4

SISD

C P MIS IS DS

7/31/2019 LecILP

5/37

7/31/2019 LecILP

6/37

slide 6

MISD

C

C

P

P

M

IS

IS

IS

IS

DS

DS

7/31/2019 LecILP

7/37

slide 7

MIMD

C

C

P

P

M

IS

IS

IS

IS

DS

DS

7/31/2019 LecILP

8/37

slide 8

Fengs Classification

1 16 32 64

1

16

64

256

16K

word length

bit slice

length

MPP

STARAN

C.mmP

PDP11

PEPE

IBM370

IlliacIV

CRAY-1

7/31/2019 LecILP

9/37

slide 9

Hndlers Classification

< K x K , D x D , W x W >

control data word

dash degree of pipelining

TI - ASC

CDC 6600 x (I/O)

C.mmP + +

PEPE

Cray-1

7/31/2019 LecILP

10/37

slide 10

Modern Classification

Parallelarchitectures

Data-parallel

architectures

Function-parallel

architectures

7/31/2019 LecILP

11/37

slide 11


Data-parallel

architectures

Vector

architectures

Associative

And neural

architectures

SIMDs Systolic

architectures

7/31/2019 LecILP

12/37

slide 12

Function Parallel Architectures

Function-parallelarchitectures

Instr levelParallel Arch

Thread levelParallel Arch

Process levelParallel Arch

(ILPs) (MIMDs)

Pipelinedprocessors

VLIWs Superscalarprocessors

DistributedMemory

MIMD

SharedMemory

MIMD

7/31/2019 LecILP

13/37

slide 13

Outline

Classification

ILP Architectures





Pipelining VLIW

Superscalar

7/31/2019 LecILP

14/37

slide 14

Pipelining

IF D RF EX/AG M WB

faster throughput with pipelining

resource sharing across cycles

all instructions may not take same cycles

7/31/2019 LecILP

15/37

slide 15

Hazards in Pipelining

Procedural dependencies => Control hazards

conditional and unconditional branches, calls/returns

Data dependencies => Data hazards

RAW (read after write)

WAR (write after read)

WAW (write after write)

Resource conflicts => Structural hazards

use of same resource in different stages

7/31/2019 LecILP

16/37

slide 16

Pipeline Performance

CPI = 1 + (S - 1) * b

Time = CPI * T / S

TS stages

Frequency of interruptions - b

7/31/2019 LecILP

17/37

slide 17

Cache/

memory

Fetch

Unit Single multi-operation instruction

multi-operation instruction

FU FU FU

Register file

ILP in VLIW processors

7/31/2019 LecILP

18/37

slide 18

Cache/

memory

Fetch

UnitMultiple instruction

Sequential stream of instructions

FU FU FU

Register file

Decode

and issue

unit

Instruction/control

Data

FU Funtional Unit

ILP in Superscalar processors

7/31/2019 LecILP

19/37

slide 19

Why Superscalars are popular ?

Binary code compatibility among scalar &superscalar processors of same family

Same compiler works for all processors (scalars and

superscalars) of same family Assembly programming of VLIWs is tedious

Code density in VLIWs is very poor - Instruction

encoding schemes

7/31/2019 LecILP

20/37

slide 20

FU FU FU

Register file

Instruction encodingScalability: Access time, area, power consumption

sharply increase with number of register ports

Issues in VLIW Architecture

7/31/2019 LecILP

21/37

slide 21

Tasks of superscalar processing

Parallel Superscalar Parallel Preserving the Preserving the

decoding instruction instruction sequential sequential

issue execution consistency of consistency of

execution exceptionprocessing

7/31/2019 LecILP

22/37

slide 22

Outline

Classification

ILP Architectures





SIMD Processors

Vector Processors

Associative ProcessorsSystolic Arrays

7/31/2019 LecILP

23/37

slide 23


SIMD ProcessorsMultiple processing elements driven by a single

instruction stream

Vector Processors

Uni-processors with vector instructions

Associative ProcessorsSIMD like processors with associative memory

Systolic ArraysApplication specific VLSI structures

7/31/2019 LecILP

24/37

slide 24

Systolic Arrays [H.T. Kung 1978]

Simplicity, Regularity, Concurrency, Communication

Example :

Band matrix multiplication

666564

56555453

45444342

34333231

232221

1211

666564

56555453

45444342

34333231

232221

1211

000

00

00

00

000

0000

000

00

00

00

000

0000

BBB

BBBB

BBBB

BBBB

BBB

BB

AAA

AAAA

AAAA

AAAA

AAA

AA

C

7/31/2019 LecILP

25/37

B11 B12

B21

B31

A11

A12

A21

A22

A31

A23

T=0

7/31/2019 LecILP

26/37

slide 26

Outline

Classification

ILP Architectures





MIMD Processors

- Shared Memory- Distributed Memory

7/31/2019 LecILP

27/37

7/31/2019 LecILP

28/37

slide 28

MIMD Architectures

Design Space

Extent of address space sharing

Location of memory modules

Uniformity of memory access

7/31/2019 LecILP

29/37

slide 29

Outline

Classification

ILP Architectures





Users perspective

Architects perspective

7/31/2019 LecILP

30/37

slide 30

Issues from users perspective

Specification / Program designexplicit parallelism or

implicit parallelism + parallelizing compiler

Partitioning / mapping to processors

Scheduling / mapping to time instants

static or dynamic

Communication and Synchronization

7/31/2019 LecILP

31/37

slide 31

Parallel programming models

Concurrentcontrol flow

Functional orlogic program

Vector/arrayoperations

Concurrenttasks/processes/threads/objects

With shared variablesor message passing

Relationship betweenprogramming modeland architecture ?

7/31/2019 LecILP

32/37

slide 32

Issues from architects perspective

Coherence problem in shared memory withcaches

Efficient interconnection networks

7/31/2019 LecILP

33/37

slide 33

Outline

Classification

ILP Architectures





Coherence Protocols

- Bus or directory based

- Invalidate or update- Definition of states

7/31/2019 LecILP

34/37

slide 34

Cache Coherence Problem

Multiple copies of data may exist

Problem of cache coherence

Options for coherence protocols

What action is taken?

Invalidate or Update

Which processors/caches communicate?

Snoopy (broadcast) or directory based

Status of each block?

7/31/2019 LecILP

35/37

7/31/2019 LecILP

36/37

slide 36

Interconnection Networks

Architectural Variations: Topology

Direct or Indirect (through switches)

Static (fixed connections) or Dynamic (connections

established as required)

Routing type store and forward/worm hole)

Efficiency:

Delay Bandwidth

Cost

7/31/2019 LecILP

37/37

slide 37

Books

D. Sima, T. Fountain, P. Kacsuk, "Advanced ComputerArchitectures : A Design Space Approach", Addison Wesley,1997.

M.J. Flynn, "Computer Architecture : Pipelined and ParallelProcessor Design", Narosa Publishing House/ Jones and Bartlett,

1996. D.A. Patterson, J.L. Hennessy, "Computer Architecture : AQuantitative Approach", Morgan Kaufmann Publishers, 2002.

K. Hwang, "Advanced Computer Architecture : Parallelism,Scalability, Programmability", McGraw Hill, 1993.

H.G. Cragon, "Memory Systems and Pipelined Processors",Narosa Publishing House/ Jones and Bartlett, 1998.

D.E. Culler, J.P Singh and Anoop Gupta, "Parallel ComputerArchitecture, A Hardware/Software Approach", Harcourt Asia /Morgan Kaufmann Publishers, 2000.

LecILP

Documents