Top Banner
1 Parallel Computer Models CEG 4131 Computer Architecture III Miodrag Bolic
27
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ceg4131 models

1

Parallel Computer Models

CEG 4131 Computer Architecture III

Miodrag Bolic

Page 2: Ceg4131 models

2

Overview

• Flynn’s taxonomy• Classification based on the memory arrangement• Classification based on communication• Classification based on the kind of parallelism

– Data-parallel – Function-parallel

Page 3: Ceg4131 models

3

Flynn’s Taxonomy

– The most universally excepted method of classifying computer systems

– Published in the Proceedings of the IEEE in 1966

– Any computer can be placed in one of 4 broad categories

» SISD: Single instruction stream, single data stream

» SIMD: Single instruction stream, multiple data streams

» MIMD: Multiple instruction streams, multiple data streams

» MISD: Multiple instruction streams, single data stream

Page 4: Ceg4131 models

4

SISD

Processing element (PE)

Main memory (M)

Instructions

Data

Control Unit PE MemoryPE

IS

IS DS

Page 5: Ceg4131 models

5

Applications:• Image processing• Matrix manipulations• Sorting

SIMD

Page 6: Ceg4131 models

6

SIMD Architectures

• Fine-grained– Image processing application– Large number of PEs– Minimum complexity PEs– Programming language is a simple extension of a sequential

language

• Coarse-grained– Each PE is of higher complexity and it is usually built with

commercial devices– Each PE has local memory

Page 7: Ceg4131 models

7

MIMD

Page 8: Ceg4131 models

8

MISD

Applications:• Classification • Robot vision

Page 9: Ceg4131 models

9

Flynn taxonomy

– Advantages of Flynn

» Universally accepted

» Compact Notation

» Easy to classify a system (?)

– Disadvantages of Flynn

» Very coarse-grain differentiation among machine systems

» Comparison of different systems is limited

» Interconnections, I/O, memory not considered in the scheme

Page 10: Ceg4131 models

10

Classification based on memory arrangement

PE1 PEn

Processors

Interconnectionnetwork

Shared memory

Shared memory - multiprocessors

I/O1

I/OnPE1

Interconnectionnetwork

M1

P1

PEn

Mn

Pn

Message passing - multicomputers

Page 11: Ceg4131 models

11

Shared-memory multiprocessors

• Uniform Memory Access (UMA)• Non-Uniform Memory Access (NUMA)• Cache-only Memory Architecture (COMA)

• Memory is common to all the processors.• Processors easily communicate by means of

shared variables.

Page 12: Ceg4131 models

12

The UMA Model

• Tightly-coupled systems (high degree of resource sharing)

• Suitable for general-purpose and time-sharing applications by multiple users.

P1

$

Interconnection network

$

Pn

Mem Mem

Page 13: Ceg4131 models

13

Symmetric and asymmetric multiprocessors

• Symmetric: - all processors have equal access to all peripheral devices.- all processors are identical.

• Asymmetric: - one processor (master) executes the operating system- other processors may be of different types and may be dedicated to special tasks.

Page 14: Ceg4131 models

14

The NUMA Model

• The access time varies with the location of the memory word.• Shared memory is distributed to local memories.• All local memories form a global address space accessible by

all processors

P1

$

Interconnection network

$

Pn

Mem Mem

Distributed Memory (NUMA)

Access time: Cache, Local memory, Remote memory

COMA - Cache-only Memory Architecture

Page 15: Ceg4131 models

15

Distributed memory multicomputers

• Multiple computers- nodes

• Message-passing network

• Local memories are private with its own program and data

• No memory contention so that the number of processors is very large

• The processors are connected by communication lines, and the precise way in which the lines are connected is called the topology of the multicomputer.

• A typical program consists of subtasks residing in all the memories. PE

Interconnectionnetwork

M

PE

M

PE

M

PE

M

PE

M

PE

M

Page 16: Ceg4131 models

16

Classification based on type of interconnections

• Static networks

• Dynamic networks

Page 17: Ceg4131 models

17

Interconnection Network [1]

• Mode of Operation (Synchronous vs. Asynchronous)

• Control Strategy (Centralized vs. Decentralized)

• Switching Techniques (Packet switching vs. Circuit switching)

• Topology (Static Vs. Dynamic)

Page 18: Ceg4131 models

18

Classification based on the kind of parallelism[3]Parallel

architecturesPAs

Data-parallel architectures Function-parallel architectures

Instruction-level

PAs

Thread-level

PAs

Process-levelPAs

ILPS MIMDs

Vectorarchitecture

Associative

architecturearchitectureand neural SIMDs Systolic Pipelined

processorsProcessors)

processorsVLIWs Superscalar Ditributedmemory

(multi-computer)

Sharedmemory(multi-MIMD

DPs

Page 19: Ceg4131 models

19

References

1. Advanced Computer Architecture and Parallel Processing, by Hesham El-Rewini and Mostafa Abd-El-Barr, John Wiley and Sons, 2005.

2. Advanced Computer Architecture Parallelism, Scalability, Programmability, by  K. Hwang, McGraw-Hill 1993.

3. Advanced Computer Architectures – A Design Space Approach by Desco Sima, Terence Fountain and Peter Kascuk, Pearson, 1997.

Page 20: Ceg4131 models

20

Speedup

• S = Speed(new) / Speed(old)

• S = Work/time(new) / Work/time(old)

• S = time(old) / time(new)

• S = time(before improvement) /

time(after improvement)

Page 21: Ceg4131 models

21

Speedup

• Time (one CPU): T(1)

• Time (n CPUs): T(n)

• Speedup: S

• S = T(1)/T(n)

Page 22: Ceg4131 models

22

Amdahl’s Law

The performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used

Page 23: Ceg4131 models

23

20 hours

200 miles

A B

Walk 4 miles /hourBike 10 miles / hourCar-1 50 miles / hourCar-2 120 miles / hourCar-3 600 miles /hour

must walk

Example

Page 24: Ceg4131 models

24

20 hours

200 miles

A B

Walk 4 miles /hour 50 + 20 = 70 hours S = 1Bike 10 miles / hour 20 + 20 = 40 hours S = 1.8Car-1 50 miles / hour 4 + 20 = 24 hours S = 2.9Car-2 120 miles / hour 1.67 + 20 = 21.67 hours S = 3.2Car-3 600 miles /hour 0.33 + 20 = 20.33 hours S = 3.4

must walk

Example

Page 25: Ceg4131 models

25

Amdahl’s Law (1967)

: The fraction of the program that is naturally serial

• (1- ): The fraction of the program that is naturally parallel

Page 26: Ceg4131 models

26

S = T(1)/T(N)

T(N) = T(1) + T(1)(1- )

N

S = 1

+ (1- )

N

=N

N + (1- )

Page 27: Ceg4131 models

27

Amdahl’s Law