Top Banner
23

ADPTF: Automated Distributed Performance Testing Frameworkiraicu/teaching/EECS495-DIC/lecture10.pdf · –Coordinating concurrent tasks –Parallelizing algorithms –Lack of standard

Jun 23, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 2: ADPTF: Automated Distributed Performance Testing Frameworkiraicu/teaching/EECS495-DIC/lecture10.pdf · –Coordinating concurrent tasks –Parallelizing algorithms –Lack of standard

• Moore’s Law

– The number of transistors that can be placed

inexpensively on an integrated circuit will

double approximately every 18 months.

– Self-fulfilling prophecy

• Computer architect goal

• Software developer assumption

2

Page 3: ADPTF: Automated Distributed Performance Testing Frameworkiraicu/teaching/EECS495-DIC/lecture10.pdf · –Coordinating concurrent tasks –Parallelizing algorithms –Lack of standard

• Impediments to Moore’s Law

– Theoretical Limit

– What to do with all that die space?

– Design complexity

– How do you meet the expected performance

increase?

3

Page 4: ADPTF: Automated Distributed Performance Testing Frameworkiraicu/teaching/EECS495-DIC/lecture10.pdf · –Coordinating concurrent tasks –Parallelizing algorithms –Lack of standard

• von Neumann model– Execute a stream of instructions (machine code)– Instructions can specify

• Arithmetic operations• Data addresses• Next instruction to execute

– Complexity• Track billions of data locations and millions of instructions• Manage with:

– Modular design– High-level programming languages

4

Page 5: ADPTF: Automated Distributed Performance Testing Frameworkiraicu/teaching/EECS495-DIC/lecture10.pdf · –Coordinating concurrent tasks –Parallelizing algorithms –Lack of standard

• Parallelism

– Continue to increase performance via

parallelism.

5

0

50

100

150

200

250

300

2004 2006 2008 2010 2012 2014 2016 2018

Nu

mb

er

of

Co

res

0

10

20

30

40

50

60

70

80

90

100

Man

ufa

ctu

rin

g P

rocess

Number of Cores

Processing

Page 6: ADPTF: Automated Distributed Performance Testing Frameworkiraicu/teaching/EECS495-DIC/lecture10.pdf · –Coordinating concurrent tasks –Parallelizing algorithms –Lack of standard

• From a software point-of-view, need to

solve demanding problems

– Engineering Simulations

– Scientific Applications

– Commercial Applications

• Need the performance, resource gains

afforded by parallelism

6

Page 7: ADPTF: Automated Distributed Performance Testing Frameworkiraicu/teaching/EECS495-DIC/lecture10.pdf · –Coordinating concurrent tasks –Parallelizing algorithms –Lack of standard

• Engineering Simulations– Aerodynamics– Engine efficiency

7

Page 8: ADPTF: Automated Distributed Performance Testing Frameworkiraicu/teaching/EECS495-DIC/lecture10.pdf · –Coordinating concurrent tasks –Parallelizing algorithms –Lack of standard

• Scientific Applications– Bioinformatics– Thermonuclear processes– Weather modeling

8

Page 9: ADPTF: Automated Distributed Performance Testing Frameworkiraicu/teaching/EECS495-DIC/lecture10.pdf · –Coordinating concurrent tasks –Parallelizing algorithms –Lack of standard

• Commercial Applications– Financial transaction processing– Data mining– Web Indexing

9

Page 10: ADPTF: Automated Distributed Performance Testing Frameworkiraicu/teaching/EECS495-DIC/lecture10.pdf · –Coordinating concurrent tasks –Parallelizing algorithms –Lack of standard

• Unfortunately, greatly increases coding

complexity

– Coordinating concurrent tasks

– Parallelizing algorithms

– Lack of standard environments and support

10

Page 11: ADPTF: Automated Distributed Performance Testing Frameworkiraicu/teaching/EECS495-DIC/lecture10.pdf · –Coordinating concurrent tasks –Parallelizing algorithms –Lack of standard

• The challenge

– Provide the abstractions, programming

paradigms, and algorithms needed to

effectively design, implement, and maintain

applications that exploit the parallelism

provided by the underlying hardware in order

to solve modern problems.

11

Page 12: ADPTF: Automated Distributed Performance Testing Frameworkiraicu/teaching/EECS495-DIC/lecture10.pdf · –Coordinating concurrent tasks –Parallelizing algorithms –Lack of standard

• Standard sequential architecture

CPURAM

BUS

Bottlenecks

12

Page 13: ADPTF: Automated Distributed Performance Testing Frameworkiraicu/teaching/EECS495-DIC/lecture10.pdf · –Coordinating concurrent tasks –Parallelizing algorithms –Lack of standard

• Use multiple

– Datapaths

– Memory units

– Processing units

13

Page 14: ADPTF: Automated Distributed Performance Testing Frameworkiraicu/teaching/EECS495-DIC/lecture10.pdf · –Coordinating concurrent tasks –Parallelizing algorithms –Lack of standard

• SIMD

– Single instruction stream, multiple data

stream Processing

Unit

Control

Unit

Interco

nnect

Processing

Unit

Processing

Unit

Processing

Unit

Processing

Unit14

Page 15: ADPTF: Automated Distributed Performance Testing Frameworkiraicu/teaching/EECS495-DIC/lecture10.pdf · –Coordinating concurrent tasks –Parallelizing algorithms –Lack of standard

• SIMD

– Advantages

• Performs vector/matrix operations well

– EX: Intel’s MMX chip

– Disadvantages

• Too dependent on type of computation

– EX: Graphics

• Performance/resource utilization suffers if

computations aren’t “embarrasingly parallel”.

15

Page 16: ADPTF: Automated Distributed Performance Testing Frameworkiraicu/teaching/EECS495-DIC/lecture10.pdf · –Coordinating concurrent tasks –Parallelizing algorithms –Lack of standard

• MIMD

– Multiple instruction stream, multiple data

streamProcessing/Control

Unit

Processing/Control

Unit

Processing/Control

Unit

Processing/Control

Unit

Interco

nnect

16

Page 17: ADPTF: Automated Distributed Performance Testing Frameworkiraicu/teaching/EECS495-DIC/lecture10.pdf · –Coordinating concurrent tasks –Parallelizing algorithms –Lack of standard

• MIMD – Advantages

• Can be built with off-the-shelf components• Better suited to irregular data access patterns

– Disadvantages• Requires more hardware (!sharing control unit)• Store program/OS at each processor

• Ex: Typical commodity SMP machines we see today.

17

Page 18: ADPTF: Automated Distributed Performance Testing Frameworkiraicu/teaching/EECS495-DIC/lecture10.pdf · –Coordinating concurrent tasks –Parallelizing algorithms –Lack of standard

• Task Communication

– Shared address space

• Use common memory to exchange data

• Communication and replication are implicit

– Message passing

• Use send()/receive() primitives to exchange data

• Communication and replication are explicit

18

Page 19: ADPTF: Automated Distributed Performance Testing Frameworkiraicu/teaching/EECS495-DIC/lecture10.pdf · –Coordinating concurrent tasks –Parallelizing algorithms –Lack of standard

• Shared address space

– Uniform memory access (UMA)

• Access to a memory location is independent of

which processing unit makes the request.

– Non-uniform memory access (NUMA)

• Access to a memory location depends on the

location of the processing unit relative to the

memory accessed.

19

Page 20: ADPTF: Automated Distributed Performance Testing Frameworkiraicu/teaching/EECS495-DIC/lecture10.pdf · –Coordinating concurrent tasks –Parallelizing algorithms –Lack of standard

• Message passing

– Each processing unit has its own private

memory

– Exchange of messages used to pass data

– APIs

• Message Passing Interface (MPI)

• Parallel Virtual Machine (PVM)

20

Page 21: ADPTF: Automated Distributed Performance Testing Frameworkiraicu/teaching/EECS495-DIC/lecture10.pdf · –Coordinating concurrent tasks –Parallelizing algorithms –Lack of standard

• Algorithm

– a sequence of finite instructions, often used

for calculation and data processing.

• Parallel Algorithm

– An algorithm that which can be executed a

piece at a time on many different processing

devices, and then put back together again at

the end to get the correct result

21

Page 22: ADPTF: Automated Distributed Performance Testing Frameworkiraicu/teaching/EECS495-DIC/lecture10.pdf · –Coordinating concurrent tasks –Parallelizing algorithms –Lack of standard

• Challenges

– Identifying work that can be done

concurrently.

– Mapping work to processing units.

– Distributing the work

– Managing access to shared data

– Synchronizing various stages of execution.

22

Page 23: ADPTF: Automated Distributed Performance Testing Frameworkiraicu/teaching/EECS495-DIC/lecture10.pdf · –Coordinating concurrent tasks –Parallelizing algorithms –Lack of standard

23