Top Banner
1 Introduction to Introduction to Parallel Processing Parallel Processing Ch. 12, Pg. 514-526 Ch. 12, Pg. 514-526 CS147 Louis Huie
35
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

1

Introduction to Parallel Introduction to Parallel ProcessingProcessing

Ch. 12, Pg. 514-526Ch. 12, Pg. 514-526

CS147

Louis Huie

Page 2: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

2

Topics CoveredTopics Covered

An Overview of Parallel ProcessingParallelism in Uniprocessor SystemsOrganization of Multiprocessor

Flynn’s Classification System Topologies MIMD System Architectures

Page 3: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

3

An Overview of Parallel An Overview of Parallel ProcessingProcessing

What is parallel processing? Parallel processing is a method to improve computer

system performance by executing two or more instructions simultaneously.

The goals of parallel processing. One goal is to reduce the “wall-clock” time or the

amount of real time that you need to wait for a problem to be solved.

Another goal is to solve bigger problems that might not fit in the limited memory of a single CPU.

Page 4: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

4

An Analogy of An Analogy of Parallelism Parallelism

The task of ordering a shuffled deck of cards by suit and then by rank can be done faster if the task is carried out by two or more people. By splitting up the decks and performing the instructions simultaneously, then at the end combining the partial solutions you have performed parallel processing.

Page 5: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

5

Another Analogy of Another Analogy of ParallelismParallelism

Another analogy is having several students grade quizzes simultaneously. Quizzes are distributed to a few students and different problems are graded by each student at the same time. After they are completed, the graded quizzes are then gathered and the scores are recorded.

Page 6: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

6

Parallelism in Uniprocessor Parallelism in Uniprocessor SystemsSystems

It is possible to achieve parallelism with a uniprocessor system. Some examples are the instruction pipeline, arithmetic

pipeline, I/O processor. Note that a system that performs different

operations on the same instruction is not considered parallel.

Only if the system processes two different instructions simultaneously can it be considered parallel.

Page 7: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

7

Parallelism in a Uniprocessor Parallelism in a Uniprocessor SystemSystem

A reconfigurable arithmetic pipeline is an example of parallelism in a uniprocessor system.

Each stage of a reconfigurable arithmetic pipeline has a multiplexer at its input. The multiplexer may pass input data, or the data output from other stages, to the stage inputs. The control unit of the CPU sets the select signals of the multiplexer to control the flow of data, thus configuring the pipeline.

Page 8: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

8

A Reconfigurable Pipeline With Data A Reconfigurable Pipeline With Data Flow for the Computation Flow for the Computation

A[i] A[i] B[i] * C[i] + D[i] B[i] * C[i] + D[i]

0

1MUX2 3

S1 S0

0

1MUX2 3

S1 S0

0

1MUX2 3

S1 S0

0

1MUX2 3

S1 S0

*L

AT

CH

+L

AT

CH

|L

AT

CH

To

memory

and

registers

Data Inputs

0 0 x x 0 1 1 1

Page 9: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

9

Although arithmetic pipelines can perform many iterations of the same operation in parallel, they cannot perform different operations simultaneously. To perform different arithmetic operations in parallel, a CPU may include a vectored arithmetic unit.

Page 10: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

10

Vector Arithmetic UnitVector Arithmetic Unit

A vector arithmetic unit contains multiple functional units that perform addition, subtraction, and other functions. The control unit routes input values to the different functional units to allow the CPU to execute multiple instructions simultaneously.

For the operations AB+C and DE-F, the CPU would route B and C to an adder and then route E and F to a subtractor for simultaneous execution.

Page 11: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

11

A Vectored Arithmetic UnitA Vectored Arithmetic Unit

DataInput

Connections

DataInput

Connections

*

+

-

%

Data

Inputs

AB+C

DE-F

Page 12: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

12

Organization of Multiprocessor Organization of Multiprocessor SystemsSystems

Flynn’s Classification Was proposed by researcher Michael J. Flynn in 1966. It is the most commonly accepted taxonomy of

computer organization. In this classification, computers are classified by

whether it processes a single instruction at a time or multiple instructions simultaneously, and whether it operates on one or multiple data sets.

Page 13: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

13

Taxonomy of Computer Taxonomy of Computer ArchitecturesArchitectures

4 categories of Flynn’s classification of multiprocessor systems by their instruction and data streams

Simple Diagrammatic Representation

Page 14: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

14

Single Instruction, Single Data Single Instruction, Single Data (SISD)(SISD)

SISD machines executes a single instruction on individual data values using a single processor.

Based on traditional Von Neumann uniprocessor architecture, instructions are executed sequentially or serially, one step after the next.

Until most recently, most computers are of SISD type.

Page 15: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

15

SISDSISDSimple Diagrammatic Representation

Page 16: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

16

Single Instruction, Multiple Single Instruction, Multiple Data (SIMD)Data (SIMD)

An SIMD machine executes a single instruction on multiple data values simultaneously using many processors.

Since there is only one instruction, each processor does not have to fetch and decode each instruction. Instead, a single control unit does the fetch and decoding for all processors.

SIMD architectures include array processors.

Page 17: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

17

SIMDSIMDSimple Diagrammatic Representation

Page 18: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

18

Multiple Instruction, Multiple Multiple Instruction, Multiple Data (MIMD)Data (MIMD)

MIMD machines are usually referred to as multiprocessors or multicomputers.

It may execute multiple instructions simultaneously, contrary to SIMD machines.

Each processor must include its own control unit that will assign to the processors parts of a task or a separate task.

It has two subclasses: Shared memory and distributed memory

Page 19: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

19

MIMD MIMD Simple Diagrammatic Representation(Shared Memory)

Simple Diagrammatic Representation(DistributedMemory)

Page 20: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

20

Multiple Instruction, Single Multiple Instruction, Single Data (MISD)Data (MISD)

This category does not actually exist. This category was included in the taxonomy for the sake of completeness.

Page 21: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

21

Analogy of Flynn’s Analogy of Flynn’s ClassificationsClassifications

An analogy of Flynn’s classification is the check-in desk at an airport SISD: a single desk SIMD: many desks and a supervisor with a

megaphone giving instructions that every desk obeys

MIMD: many desks working at their own pace, synchronized through a central database

Page 22: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

22

System TopologiesSystem Topologies

Topologies A system may also be classified by its

topology. A topology is the pattern of connections

between processors. The cost-performance tradeoff determines

which topologies to use for a multiprocessor system.

Page 23: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

23

Topology Classification Topology Classification

A topology is characterized by its diameter, total bandwidth, and bisection bandwidth– Diameter – the maximum distance between two

processors in the computer system.– Total bandwidth – the capacity of a communications

link multiplied by the number of such links in the system.

– Bisection bandwidth – represents the maximum data transfer that could occur at the bottleneck in the topology.

Page 24: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

24

Shared Bus Topology– Processors communicate

with each other via a single bus that can only handle one data transmissions at a time.

– In most shared buses, processors directly communicate with their own local memory.

M

P

M

P

M

P

Globalmemory

Shared Bus

System TopologiesSystem Topologies

Page 25: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

25

System TopologiesSystem Topologies

Ring Topology– Uses direct connections

between processors instead of a shared bus.

– Allows communication links to be active simultaneously but data may have to travel through several processors to reach its destination.

P

P

P

P

P

P

Page 26: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

26

System TopologiesSystem Topologies

Tree Topology– Uses direct

connections between processors; each having three connections.

– There is only one unique path between any pair of processors.

P

PPPP

PP

Page 27: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

27

Systems TopologiesSystems Topologies

Mesh Topology– In the mesh topology,

every processor connects to the processors above and below it, and to its right and left.

P

P

P PP

PP

P P

Page 28: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

28

System TopologiesSystem Topologies

Hypercube Topology– Is a multiple mesh

topology.– Each processor

connects to all other processors whose binary values differ by one bit. For example, processor 0(0000) connects to 1(0001) or 2(0010).

P

PP

PP

PP

P P

PP

PP

PP

P

Page 29: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

29

System TopologiesSystem Topologies

Completely Connected Topology Every processor has

n-1 connections, one to each of the other processors.

There is an increase in complexity as the system grows but this offers maximum communication capabilities.

PP

PP

PP

PP

Page 30: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

30

MIMD System ArchitecturesMIMD System Architectures

Finally, the architecture of a MIMD system, contrast to its topology, refers to its connections to its system memory.

A systems may also be classified by their architectures. Two of these are: Uniform memory access (UMA) Nonuniform memory access (NUMA)

Page 31: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

31

Uniform memory access Uniform memory access (UMA)(UMA)

The UMA is a type of symmetric multiprocessor, or SMP, that has two or more processors that perform symmetric functions. UMA gives all CPUs equal (uniform) access to all memory locations in shared memory. They interact with shared memory by some communications mechanism like a simple bus or a complex multistage interconnection network.

Page 32: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

32

Uniform memory access Uniform memory access (UMA) Architecture(UMA) Architecture

SharedMemory

Processor 2

Processor 1

Processor n

Communicationsmechanism

Page 33: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

33

Nonuniform memory access Nonuniform memory access (NUMA)(NUMA)

NUMA architectures, unlike UMA architectures do not allow uniform access to all shared memory locations. This architecture still allows all processors to access all shared memory locations but in a nonuniform way, each processor can access its local shared memory more quickly than the other memory modules not next to it.

Page 34: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

34

Nonuniform memory access Nonuniform memory access (NUMA) Architecture(NUMA) Architecture

Memory 1

Processor 1

Communications mechanism

Memory 2

Processor 2

Memory n

Processor n

Page 35: 1 Introduction to Parallel Processing Ch. 12, Pg. 514-526 CS147 Louis Huie.

35

THE ENDTHE END