Top Banner
Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy
101

Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Lecture 10 Outline

Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy

Page 2: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Interconnection Networks

Uses of interconnection networksConnect processors to shared memoryConnect processors to each other

Interconnection media typesShared mediumSwitched medium

Page 3: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Shared versus Switched Media

Page 4: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Shared Medium

Allows only message at a time Messages are broadcast Each processor “listens” to every message Arbitration is decentralized Collisions require resending of messages Ethernet is an example

Page 5: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Switched Medium

Supports point-to-point messages between pairs of processors

Each processor has its own path to switch Advantages over shared media

Allows multiple messages to be sent simultaneously

Allows scaling of network to accommodate increase in processors

Page 6: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Switch Network Topologies

View switched network as a graphVertices = processors or switchesEdges = communication paths

Two kinds of topologiesDirect Indirect

Page 7: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Direct Topology

Ratio of switch nodes to processor nodes is 1:1

Every switch node is connected to1 processor nodeAt least 1 other switch node

Page 8: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Indirect Topology

Ratio of switch nodes to processor nodes is greater than 1:1

Some switches simply connect other switches

Page 9: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Evaluating Switch Topologies

Diameter Bisection width Number of edges / node (degree) Constant edge length? (yes/no)

Layout area

Page 10: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

2-D Mesh Network

Direct topology Switches arranged into a 2-D lattice Communication allowed only between

neighboring switches Variants allow wraparound connections

between switches on edge of mesh

Page 11: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

2-D Meshes

Page 12: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Evaluating 2-D Meshes

Diameter: (n1/2)

Bisection width: (n1/2)

Number of edges per switch: 4

Constant edge length? Yes

Page 13: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Binary Tree Network

Indirect topology n = 2d processor nodes, n-1 switches

Page 14: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Evaluating Binary Tree Network

Diameter: 2 log n

Bisection width: 1

Edges / node: 3

Constant edge length? No

Page 15: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Hypertree Network

Indirect topology Shares low diameter of binary tree Greatly improves bisection width From “front” looks like k-ary tree of height

d From “side” looks like upside down binary

tree of height d

Page 16: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Hypertree Network

Page 17: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Evaluating 4-ary Hypertree

Diameter: log n

Bisection width: n / 2

Edges / node: 6

Constant edge length? No

Page 18: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Butterfly Network

Indirect topology n = 2d processor

nodes connectedby n(log n + 1)switching nodes

0 1 2 3 4 5 6 7

3 ,0 3 ,1 3 ,2 3 ,3 3 ,4 3 ,5 3 ,6 3 ,7

2 ,0 2 ,1 2 ,2 2 ,3 2 ,4 2 ,5 2 ,6 2 ,7

1 ,0 1 ,1 1 ,2 1 ,3 1 ,4 1 ,5 1 ,6 1 ,7

0 ,0 0 ,1 0 ,2 0 ,3 0 ,4 0 ,5 0 ,6 0 ,7R ank 0

R ank 1

R ank 2

R ank 3

Page 19: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Butterfly Network Routing

Page 20: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Evaluating Butterfly Network

Diameter: log n

Bisection width: n / 2

Edges per node: 4

Constant edge length? No

Page 21: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Hypercube

Directory topology 2 x 2 x … x 2 mesh Number of nodes a power of 2 Node addresses 0, 1, …, 2k-1 Node i connected to k nodes whose

addresses differ from i in exactly one bit position

Page 22: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Hypercube Addressing

0010

0000

0100

0110 0111

1110

0001

0101

1000 1001

0011

1010

1111

1011

11011100

Page 23: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Hypercubes Illustrated

Page 24: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Evaluating Hypercube Network

Diameter: log n

Bisection width: n / 2

Edges per node: log n

Constant edge length? No

Page 25: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Shuffle-exchange

Direct topology Number of nodes a power of 2 Nodes have addresses 0, 1, …, 2k-1 Two outgoing links from node i

Shuffle link to node LeftCycle(i)Exchange link to node [xor (i, 1)]

Page 26: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Shuffle-exchange Illustrated

0 1 2 3 4 5 6 7

Page 27: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Shuffle-exchange Addressing

0 0 0 0 0 0 0 1 0 0 1 0 0 0 11 0 1 0 0 0 1 0 1

11 1 0 11 1 11 0 0 0 1 0 0 1 1 0 1 0 1 0 11 11 0 0 11 0 1

0 11 0 0 11 1

Page 28: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Evaluating Shuffle-exchange

Diameter: 2log n - 1

Bisection width: n / log n

Edges per node: 2

Constant edge length? No

Page 29: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Comparing Networks

All have logarithmic diameterexcept 2-D mesh

Hypertree, butterfly, and hypercube have bisection width n / 2

All have constant edges per node except hypercube

Only 2-D mesh keeps edge lengths constant as network size increases

Page 30: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Vector Computers

Vector computer: instruction set includes operations on vectors as well as scalars

Two ways to implement vector computersPipelined vector processor: streams data

through pipelined arithmetic unitsProcessor array: many identical, synchronized

arithmetic processing elements

Page 31: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Why Processor Arrays?

Historically, high cost of a control unit Scientific applications have data

parallelism

Page 32: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Processor Array

Page 33: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Data/instruction Storage

Front end computerProgramData manipulated sequentially

Processor arrayData manipulated in parallel

Page 34: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Processor Array Performance

Performance: work done per time unit Performance of processor array

Speed of processing elementsUtilization of processing elements

Page 35: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Performance Example 1

1024 processors Each adds a pair of integers in 1 sec What is performance when adding two

1024-element vectors (one per processor)?

sec/ops10024.1ePerformanc 9sec1

operations1024

Page 36: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Performance Example 2

512 processors Each adds two integers in 1 sec Performance adding two vectors of length

600?

sec/ops103ePerformanc 6sec2

operations600

Page 37: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

2-D Processor Interconnection Network

Each VLSI chip has 16 processing elements

Page 38: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

if (COND) then A else B

Page 39: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

if (COND) then A else B

Page 40: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

if (COND) then A else B

Page 41: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Processor Array Shortcomings

Not all problems are data-parallel Speed drops for conditionally executed code Don’t adapt to multiple users well Do not scale down well to “starter” systems Rely on custom VLSI for processors Expense of control units has dropped

Page 42: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Multiprocessors

Multiprocessor: multiple-CPU computer with a shared memory

Same address on two different CPUs refers to the same memory location

Avoid three problems of processor arraysCan be built from commodity CPUsNaturally support multiple usersMaintain efficiency in conditional code

Page 43: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Centralized Multiprocessor

Straightforward extension of uniprocessor Add CPUs to bus All processors share same primary memory Memory access time same for all CPUs

Uniform memory access (UMA) multiprocessorSymmetrical multiprocessor (SMP)

Page 44: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Centralized Multiprocessor

Page 45: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Private and Shared Data

Private data: items used only by a single processor

Shared data: values used by multiple processors

In a multiprocessor, processors communicate via shared data values

Page 46: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Problems Associated with Shared Data

Cache coherenceReplicating data across multiple caches

reduces contentionHow to ensure different processors have

same value for same address? Synchronization

Mutual exclusionBarrier

Page 47: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Cache-coherence Problem

Cache

CPU A

Cache

CPU B

Memory

7X

Page 48: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Cache-coherence Problem

CPU A CPU B

Memory

7X

7

Page 49: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Cache-coherence Problem

CPU A CPU B

Memory

7X

7 7

Page 50: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Cache-coherence Problem

CPU A CPU B

Memory

2X

7 2

Page 51: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Write Invalidate Protocol

CPU A CPU B

7X

7 7 Cache control monitor

Page 52: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Write Invalidate Protocol

CPU A CPU B

7X

7 7

Intent to write X

Page 53: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Write Invalidate Protocol

CPU A CPU B

7X

7

Intent to write X

Page 54: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Write Invalidate Protocol

CPU A CPU B

X 2

2

Page 55: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Distributed Multiprocessor

Distribute primary memory among processors

Increase aggregate memory bandwidth and lower average memory access time

Allow greater number of processors Also called non-uniform memory access

(NUMA) multiprocessor

Page 56: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Distributed Multiprocessor

Page 57: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Cache Coherence

Some NUMA multiprocessors do not support it in hardwareOnly instructions, private data in cacheLarge memory access time variance

Implementation more difficultNo shared memory bus to “snoop”Directory-based protocol needed

Page 58: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Directory-based Protocol

Distributed directory contains information about cacheable memory blocks

One directory entry for each cache block Each entry has

Sharing statusWhich processors have copies

Page 59: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Sharing Status

Uncached Block not in any processor’s cache

Shared Cached by one or more processors Read only

Exclusive Cached by exactly one processor Processor has written block Copy in memory is obsolete

Page 60: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Directory-based ProtocolInterconnection Network

Directory

Local Memory

Cache

CPU 0

Directory

Local Memory

Cache

CPU 1

Directory

Local Memory

Cache

CPU 2

Page 61: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Directory-based ProtocolInterconnection Network

CPU 0 CPU 1 CPU 2

7X

Caches

Memories

Directories X U 0 0 0

Bit Vector

Page 62: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

CPU 0 Reads XInterconnection Network

CPU 0 CPU 1 CPU 2

7X

Caches

Memories

Directories X U 0 0 0

Read Miss

Page 63: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

CPU 0 Reads XInterconnection Network

CPU 0 CPU 1 CPU 2

7X

Caches

Memories

Directories X S 1 0 0

Page 64: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

CPU 0 Reads XInterconnection Network

CPU 0 CPU 1 CPU 2

7X

Caches

Memories

Directories X S 1 0 0

7X

Page 65: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

CPU 2 Reads XInterconnection Network

CPU 0 CPU 1 CPU 2

7X

Caches

Memories

Directories X S 1 0 0

7X

Read Miss

Page 66: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

CPU 2 Reads XInterconnection Network

CPU 0 CPU 1 CPU 2

7X

Caches

Memories

Directories X S 1 0 1

7X

Page 67: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

CPU 2 Reads XInterconnection Network

CPU 0 CPU 1 CPU 2

7X

Caches

Memories

Directories X S 1 0 1

7X 7X

Page 68: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

CPU 0 Writes 6 to XInterconnection Network

CPU 0 CPU 1 CPU 2

7X

Caches

Memories

Directories X S 1 0 1

7X 7X

Write Miss

Page 69: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

CPU 0 Writes 6 to XInterconnection Network

CPU 0 CPU 1 CPU 2

7X

Caches

Memories

Directories X S 1 0 1

7X 7X

Invalidate

Page 70: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

CPU 0 Writes 6 to XInterconnection Network

CPU 0 CPU 1 CPU 2

7X

Caches

Memories

Directories X E 1 0 0

6X

Page 71: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

CPU 1 Reads XInterconnection Network

CPU 0 CPU 1 CPU 2

7X

Caches

Memories

Directories X E 1 0 0

6X

Read Miss

Page 72: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

CPU 1 Reads XInterconnection Network

CPU 0 CPU 1 CPU 2

7X

Caches

Memories

Directories X E 1 0 0

6X

Switch to Shared

Page 73: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

CPU 1 Reads XInterconnection Network

CPU 0 CPU 1 CPU 2

6X

Caches

Memories

Directories X E 1 0 0

6X

Page 74: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

CPU 1 Reads XInterconnection Network

CPU 0 CPU 1 CPU 2

6X

Caches

Memories

Directories X S 1 1 0

6X 6X

Page 75: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

CPU 2 Writes 5 to XInterconnection Network

CPU 0 CPU 1 CPU 2

6X

Caches

Memories

Directories X S 1 1 0

6X 6X

Write Miss

Page 76: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

CPU 2 Writes 5 to XInterconnection Network

CPU 0 CPU 1 CPU 2

6X

Caches

Memories

Directories X S 1 1 0

6X 6X

Invalidate

Page 77: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

CPU 2 Writes 5 to XInterconnection Network

CPU 0 CPU 1 CPU 2

6X

Caches

Memories

Directories X E 0 0 1

5X

Page 78: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

CPU 0 Writes 4 to XInterconnection Network

CPU 0 CPU 1 CPU 2

6X

Caches

Memories

Directories X E 0 0 1

5X

Write Miss

Page 79: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

CPU 0 Writes 4 to XInterconnection Network

CPU 0 CPU 1 CPU 2

6X

Caches

Memories

Directories X E 1 0 0

Take Away

5X

Page 80: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

CPU 0 Writes 4 to XInterconnection Network

CPU 0 CPU 1 CPU 2

5X

Caches

Memories

Directories X E 0 1 0

5X

Page 81: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

CPU 0 Writes 4 to XInterconnection Network

CPU 0 CPU 1 CPU 2

5X

Caches

Memories

Directories X E 1 0 0

Page 82: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

CPU 0 Writes 4 to XInterconnection Network

CPU 0 CPU 1 CPU 2

5X

Caches

Memories

Directories X E 1 0 0

5X

Page 83: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

CPU 0 Writes 4 to XInterconnection Network

CPU 0 CPU 1 CPU 2

5X

Caches

Memories

Directories X E 1 0 0

4X

Page 84: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

CPU 0 Writes Back X BlockInterconnection Network

CPU 0 CPU 1 CPU 2

5X

Caches

Memories

Directories X E 1 0 0

4X

4X

Data Write Back

Page 85: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

CPU 0 Writes Back X BlockInterconnection Network

CPU 0 CPU 1 CPU 2

4X

Caches

Memories

Directories X U 0 0 0

Page 86: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Multicomputer

Distributed memory multiple-CPU computer Same address on different processors refers to

different physical memory locations Processors interact through message passing Commercial multicomputers Commodity clusters

Page 87: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Asymmetrical Multicomputer

Page 88: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Asymmetrical MC Advantages

Back-end processors dedicated to parallel computations Easier to understand, model, tune performance

Only a simple back-end operating system needed Easy for a vendor to create

Page 89: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Asymmetrical MC Disadvantages Front-end computer is a single point of

failure Single front-end computer limits scalability

of system Primitive operating system in back-end

processors makes debugging difficult Every application requires development of

both front-end and back-end program

Page 90: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Symmetrical Multicomputer

Page 91: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Symmetrical MC Advantages

Alleviate performance bottleneck caused by single front-end computer

Better support for debugging Every processor executes same program

Page 92: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Symmetrical MC Disadvantages

More difficult to maintain illusion of single “parallel computer”

No simple way to balance program development workload among processors

More difficult to achieve high performance when multiple processes on each processor

Page 93: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

ParPar Cluster, A Mixed Model

Page 94: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Commodity Cluster

Co-located computers Dedicated to running parallel jobs No keyboards or displays Identical operating system Identical local disk images Administered as an entity

Page 95: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Network of Workstations

Dispersed computers First priority: person at keyboard Parallel jobs run in background Different operating systems Different local images Checkpointing and restarting important

Page 96: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Flynn’s Taxonomy

Instruction stream Data stream Single vs. multiple Four combinations

SISD SIMD MISD MIMD

Page 97: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

SISD

Single Instruction, Single Data Single-CPU systems Note: co-processors don’t count

Functional I/O

Example: PCs

Page 98: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

SIMD

Single Instruction, Multiple Data Two architectures fit this category

Pipelined vector processor(e.g., Cray-1)

Processor array(e.g., Connection Machine)

Page 99: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

MISD

MultipleInstruction,Single Data

Example:systolic array

Page 100: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

MIMD

Multiple Instruction, Multiple Data Multiple-CPU computers

MultiprocessorsMulticomputers

Page 101: Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.

Summary

Commercial parallel computers appearedin 1980s

Multiple-CPU computers now dominate Small-scale: Centralized multiprocessors Large-scale: Distributed memory

architectures (multiprocessors or multicomputers)