Top Banner
Computer Organization Douglas Comer Computer Science Department Purdue University 250 N. University Street West Lafayette, IN 47907-2066 http://www.cs.purdue.edu/people/comer Copyright 2006. All rights reserved. This document may not be reproduced by any means without written consent of the author.
49

Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Aug 14, 2018

Download

Documents

buihanh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Computer Organization

Douglas Comer

Computer Science DepartmentPurdue University

250 N. University StreetWest Lafayette, IN 47907-2066

http://www.cs.purdue.edu/people/comer

Copyright 2006. All rights reserved. This document may notbe reproduced by any means without written consent of the author.

Page 2: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

XVII

Parallelism

CS250 -- Chapt. 17 1 2006

Page 3: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Two Fundamental Hardware TechniquesUsed To Increase Performance

d Parallelism

d Pipelining

CS250 -- Chapt. 17 2 2006

Page 4: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Parallelism

d Multiple copies of hardware unit used

d All copies can operate simultaneously

d Occurs at many levels of architecture

d Term parallel computer applied when parallelism dominatesentire architecture

CS250 -- Chapt. 17 3 2006

Page 5: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Characterizations Of Parallelism

d Microscopic vs. macroscopic

d Symmetric vs. asymmetric

d Fine-grain vs. coarse-grain

d Explicit vs. implicit

CS250 -- Chapt. 17 4 2006

Page 6: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Microscopic Vs. Macroscopic Parallelism

Parallelism is so fundamental that virtually all computersystems contain some form of parallel hardware. We use theterm microscopic parallelism to characterize parallel facilitiesthat are present, but not especially visible.

CS250 -- Chapt. 17 5 2006

Page 7: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Examples Of Microscopic Parallelism

d Parallel operations in an ALU

d Parallel access to general-purpose registers

d Parallel data transfer to/from physical memory

d Parallel transfer across an I/O bus

CS250 -- Chapt. 17 6 2006

Page 8: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Examples Of Macroscopic Parallelism

d Symmetric parallelism

– Refers to multiple, identical processors

– Example: dual processor PC

d Asymmetric parallelism

– Refers to multiple, dissimilar processors

– Example: PC with a graphics processor

CS250 -- Chapt. 17 7 2006

Page 9: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Level Of Parallelism

d Fine-grain

– Parallelism among individual instructions or dataelements

d Coarse-grain parallelism

– Parallelism among programs or large blocks of data

CS250 -- Chapt. 17 8 2006

Page 10: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Explicit And Implicit Parallelism

d Explicit

– Visible to programmer

– Requires programmer to initiate and control parallelactivities

d Implicit

– Invisible to programmer

– Hardware runs multiple copies of program automatically

CS250 -- Chapt. 17 9 2006

Page 11: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Parallel Architectures

d Design in which computer has reasonably large number ofprocessors

d Intended for scaling

d Example: computer with thirty-two processors

d Not generally classified as parallel computer

– Dual processor computer

– Quad processor computer

CS250 -- Chapt. 17 10 2006

Page 12: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Types Of Parallel Architectures

Name Meaning22222222222222222222222222222222222222222222222222

SISD Single Instruction Single Data streamSIMD Single Instruction Multiple Data streamsMIMD Multiple Instructions Multiple Data streams

d Known as Flynn classification

CS250 -- Chapt. 17 11 2006

Page 13: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Conventional (Nonparallel) Architecture

d Known as Single Instruction Single Data

d Other terms include

– Sequential architecture

– Uniprocessor

CS250 -- Chapt. 17 12 2006

Page 14: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Single Instruction Multiple Data(SIMD)

d Each instruction specifies a single operation

d Hardware applies operation to multiple data items

CS250 -- Chapt. 17 13 2006

Page 15: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Vector Processor

d Uses SIMD architecture

d Applies a single floating point operation to an entire array ofvalues

d Example use: normalize values in a set

CS250 -- Chapt. 17 14 2006

Page 16: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Normalization On A Conventional Computer

for i from 1 to N {

V [ i ] ← V [ i ] × Q ;

}

CS250 -- Chapt. 17 15 2006

Page 17: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Normalization On A Vector Processor

V ← V × Q ;

d Trivial amount of code

d Special instruction called vector instruction

d If vector V larger than hardware capacity, multiple steps arerequired

CS250 -- Chapt. 17 16 2006

Page 18: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Graphics Processors

d Graphics hardware uses sequential bytes in memory to storepixels

d To move a window, software copies bytes

d SIMD architecture allows copies in parallel

CS250 -- Chapt. 17 17 2006

Page 19: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Multiple Instructions Multiple Data(MIMD)

d Parallel architecture with separate processors

d Each processor runs independent program

d Processors visible to programmer

CS250 -- Chapt. 17 18 2006

Page 20: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Two Popular Categories Of Multiprocessors

d Symmetric

d Asymmetric

CS250 -- Chapt. 17 19 2006

Page 21: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Symmetric Multiprocessor (SMP)

d Most well-known MIMD architecture

d Set of N identical processors

d Examples of groups that built SMP computers

– Carnegie Mellon University (C.mmp)

– Sequent Corporation (Balance 8000 and 21000)

– Encore Corporation (Multimax)

CS250 -- Chapt. 17 20 2006

Page 22: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Illustration Of A Symmetric Multiprocessor

MainMemory(variousmodules)

Devices

P1

Pi

P2

Pi+1

PN

Pi+2

CS250 -- Chapt. 17 21 2006

Page 23: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Asymmetric Multiprocessor (AMP)

d Set of N processors

d Multiple types of processors

d Processors optimized for specific tasks

d Often use master-slave paradigm

CS250 -- Chapt. 17 22 2006

Page 24: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Example AMP Architectures

d Math (or graphics) coprocessor

– Special-purpose processor

– Handles floating point (or graphics) operations

– Called by main processor as needed

d I/O Processor

– Optimized for handling interrupts

– Programmable

CS250 -- Chapt. 17 23 2006

Page 25: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Examples Of Programmable I/O Processors

d Channel (IBM mainframe)

d Peripheral Processor (CDC mainframe)

CS250 -- Chapt. 17 24 2006

Page 26: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Multiprocessor Overhead

d Having many processors is not always a clear win

d Overhead arises from

– Communication

– Coordination

– Contention

CS250 -- Chapt. 17 25 2006

Page 27: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Communication

d Needed

– Among processors

– Between processors and I/O devices

d Can become a bottleneck

CS250 -- Chapt. 17 26 2006

Page 28: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Coordination

d Needed when processors work together

d May require one processor to coordinate others

CS250 -- Chapt. 17 27 2006

Page 29: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Contention

d Processors contend for resources

– Memory

– I/O devices

d Speed of resources can limit overall performance

– Example: N – 1 processors wait while one processoraccesses memory

CS250 -- Chapt. 17 28 2006

Page 30: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Performance Of Multiprocessors

d Disappointing

d Bottlenecks

– Contention for operating system (only one copyof OS can run)

– Contention for memory and I/O

d Another problem: either need

– One centralized cache (contention problems)

– Coordinated caches (complex interaction)

d Many applications are I/O bound

CS250 -- Chapt. 17 29 2006

Page 31: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

According To John Harper

‘‘Building multiprocessor systems that scale whilecorrectly synchronising the use of shared resources is verytricky, whence the principle: with careful design andattention to detail, an N-processor system can be made toperform nearly as well as a single-processor system. (Notnearly N times better, nearly as good in total performance asyou were getting from a single processor). You have to bevery good — and have the right problem with the rightdecomposability — to do better than this.’’

http:/ / www.john-a-harper.com/ principles.htm

CS250 -- Chapt. 17 30 2006

Page 32: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Definition Of Speedup

d Defined relative to single processor

Speedup = τN

τ1333

d τ1 denotes the execution time on a single processor

d τN denotes the execution time on a multiprocessor

d Goal: speedup is linear in number of processors

CS250 -- Chapt. 17 31 2006

Page 33: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Ideal And Typical Speedup

Speedup

Number of processors (N)

1

4

8

12

16

1 4 8 12 16

ideal

actual

CS250 -- Chapt. 17 32 2006

Page 34: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Speedup For N >> 1 Processors

Speedup

Number of processors (N)

1

8

16

24

32

1 8 16 24 32

ideal

actual

CS250 -- Chapt. 17 33 2006

Page 35: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Summary Of Speedup

When used for general-purpose computing, a multiprocessormay not perform well. In some cases, added overhead meansperformance decreases as more processors are added.

CS250 -- Chapt. 17 34 2006

Page 36: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Consequences For Programmers

d Writing code for multiprocessors is difficult

– Need to handle mutual exclusion for shared items

– Typical mechanism: locks

CS250 -- Chapt. 17 35 2006

Page 37: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

The Need For Locking

d Consider an assignment

x = x + 1;

d Typical code is

load x, R5incr R5store R5, x

CS250 -- Chapt. 17 36 2006

Page 38: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Example Of Problem With Parallel Access

d Consider two processors incrementing item x

– Processor 1 loads x into its register 5

– Processor 1 increments its register 5

– Processor 2 loads x into its register 5

– Processor 1 stores its register 5 into x

– Processor 2 increments its register 5

– Processor 2 stores its register 5 into x

CS250 -- Chapt. 17 37 2006

Page 39: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Hardware Locks

d Prevent simultaneous access

d Separate lock assigned to each item

d Code is

lock 17load x, R5incr R5store R5, xrelease 17

CS250 -- Chapt. 17 38 2006

Page 40: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Programming Parallel Computers

d Implicit parallelism

– Programmer writes sequential code

– Hardware runs many copies automatically

d Explicit parallelism

– Programmer writes code for parallel architecture

– Code must use locks to prevent interference

CS250 -- Chapt. 17 39 2006

Page 41: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

The Point About Parallel Programming

From a programmer’s point of view, a system that uses explicitparallelism is significantly more complex to program than asystem that uses implicit parallelism.

CS250 -- Chapt. 17 40 2006

Page 42: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Programming Symmetric AndAsymmetric Multiprocessors

d Both types can be difficult to program

d Symmetric has two advantages

– One instruction set

– Programmer does not need to choose processor type foreach task

CS250 -- Chapt. 17 41 2006

Page 43: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Redundant Parallel Architectures

d Used to increase reliability

d Do not improve performance

d Multiple copies of hardware perform same function

d Can be used to

– Test whether hardware is performing correctly

– Serve as backup in case of hardware failure

CS250 -- Chapt. 17 42 2006

Page 44: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Loose And Tight Coupling

d Tightly coupled multiprocessor

– Multiple processors in single computer

– Buses or switching fabrics used to interconnectprocessors, memory, and I/O

– Usually one operating system

d Loosely coupled multiprocessor

– Multiple, independent computer systems

– Computer networks used to interconnect systems

– Each computer runs its own operating system

– Known as distributed computing

CS250 -- Chapt. 17 43 2006

Page 45: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Cluster Computer

d Distributed computer system

d All computers work on a single problem

d Works best if problem can be partitioned into pieces

CS250 -- Chapt. 17 44 2006

Page 46: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Grid Computing

d Form of loosely-coupled distributed computing

d Uses computers on the Internet

d Popular for large, scientific computations

CS250 -- Chapt. 17 45 2006

Page 47: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Summary

d Parallelism is a fundamental optimization

d Computers classified as

– SISD (e.g., conventional uniprocessor)

– SIMD (e.g., vector computer)

– MIMD (e.g., multiprocessor)

d Multiprocessor speedup usually less than linear

CS250 -- Chapt. 17 46 2006

Page 48: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Summary(continued)

d Multiprocessors can be

– Symmetric or asymmetric

– Explicitly or implicitly parallel

d Programming multiprocessors is usually difficult

– Locks needed for shared items

d Parallel systems can be

– Tightly-coupled (single computer)

– Loosely-coupled (computers connected by a network)

CS250 -- Chapt. 17 47 2006

Page 49: Computer Organization - eecs.wsu.eduhauser/teaching/Arch-F07/handouts/Chapt… · d Occurs at many levels of architecture d Term parallel computer ... Normalization On A Conventional

Questions?