Top Banner
15-447 Computer Architecture Fall 2008 © September 8, 2008 Majd F. Sakr [email protected] www.qatar.cmu.edu/~msakr/15447-f08/ CS-447– Computer Architecture M,W 2-3:50pm Lecture 7 Performance
42

September 8, 2008 Majd F. Sakr [email protected] qatar.cmu/~msakr/15447-f08

Jan 12, 2016

Download

Documents

Azizuddin Khan

CS-447– Computer Architecture M,W 2-3:50pm Lecture 7 Performance. September 8, 2008 Majd F. Sakr [email protected] www.qatar.cmu.edu/~msakr/15447-f08/. Today. Lecture & Discussion Next Lecture: Review. Done by now. Read the chapters & slides. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

September 8, 2008

Majd F. [email protected]

www.qatar.cmu.edu/~msakr/15447-f08/

CS-447– Computer Architecture

M,W 2-3:50pm

Lecture 7Performance

Page 2: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

Today

°Lecture & Discussion

°Next Lecture: Review

°Read the chapters & slides.

°Practice the performance examples in the Patterson book.

Done by nowDone by now

Page 3: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

Assessing & Understanding Performance

This chapter discusses how to measure, report, and summarize performance of a computer.

Page 4: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

Motivation

It is often helpful to have some yardstick by which to compare systems

• During development to evaluate different algorithms or optimizations

• During purchasing to compare between product offerings

• …

Page 5: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

° Measure, Report, and Summarize

° Make intelligent choices

° See through the marketing hype

° Key to understanding underlying organizational motivation

Performance

Page 6: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

Performance

Why is some hardware better than others for different programs?

What factors of system performance are hardware related?(e.g., Do we need a new machine, or a new operating system?)

How does the machine's instruction set affect performance?

Page 7: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

Which of these airplanes has the best performance?

Airplane Passengers Range (mi) Speed (mph)

Boeing 737-100 101 630 598Boeing 747 470 4150 610BAC/Sud Concorde 132 4000 1350Douglas DC-8-50 146 8720 544

°How much faster is the Concorde compared to the 747? °How much bigger is the 747 than the Douglas DC-8?

Page 8: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

°Response Time (latency)— How long does it take for my job to run?— How long does it take to execute a job?— How long must I wait for the database

query?

°Throughput— How many jobs can the machine run at

once?— What is the average execution rate?— How much work is getting done?

Computer Performance

Page 9: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

°Elapsed Time

•counts everything (disk and memory accesses, I/O , etc.)

•a useful number, but often not good for comparison purposes

Execution Time

Page 10: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

Execution Time°CPU time

•doesn't count I/O or time spent running other programs

•can be broken up into system time, and user time

•Our focus: user CPU time

•time spent executing the lines of code that are "in" our program

Page 11: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

°For some program running on machine X,

PerformanceX = 1 / Execution timeX

"X is n times faster than Y"

PerformanceX / PerformanceY = n

Definition of Performance

Page 12: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

Definition of Performance

Problem:• machine A runs a program in 20 seconds

• machine B runs the same program in 25 seconds

Page 13: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

How to compare the performance? Total Execution Time : A Consistent Summary Measure

Comparing and Summarizing Performance

Computer A Computer BProgram1(sec) 1 10Program2(sec) 1000 100Total time (sec) 1001 110

1.9110

1001

TimeB

Execution

TimeAExecution

AePerformanc

BePerformanc

Page 14: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

Clock Cycles

° Instead of reporting execution time in

seconds, we often use cycles

°Clock “ticks” indicate when to start activities

(one abstraction): time

seconds

program

cycles

program

seconds

cycle

Page 15: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

Clock cycles

° cycle time = time between ticks = seconds per cycle

° clock rate (frequency) = cycles per second (1 Hz = 1 cycle/sec)

A 4 Ghz clock has a 250ps cycle time

Page 16: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

CPU Execution Time

rateclockondscycle

onds

cycle

Cycle

SecondsCyclesSeconds

CPU

sec/

sec/

Program

cycles

ProgramProgram

time)cycle(clock x program) afor cyclesclock (CPU

program afor timeexecution

Page 17: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

So, to improve performance (everything else being equal) you can either increase or decrease?

________ the # of required cycles for a program, or________ the clock cycle time or, said another way, ________ the clock rate.

How to Improve Performanceseconds

program

cycles

program

seconds

cycle

Page 18: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

So, to improve performance (everything else being equal) you can either increase or decrease?

_decrease_ the # of required cycles for a program, or_decrease_ the clock cycle time or, said another way, _increase_ the clock rate.

How to Improve Performanceseconds

program

cycles

program

seconds

cycle

Page 19: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

Could assume that # of cycles equals # of instruction

time

1st

inst

ruct

ion

2nd

in

stru

ctio

n

3rd

in

stru

ctio

n

4th

5th

6th ...

How many cycles are required for a program?

This assumption is incorrect, different instructions take different amounts of time on different machines.

Page 20: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

° Multiplication takes more time than addition° Floating point operations take longer than integer ones° Accessing memory takes more time than accessing

registers° Important point: changing the cycle time often changes

the number of cycles required for various instructions

time

Different numbers of cycles for different instructions

Page 21: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

Now that we understand cycles

Components of Performance Units of Measure

CPU execution time for a program

Seconds for the program

Instruction count Instructions executed for the program

Clock Cycles per Instruction (CPI)

Average number of clock cycles per instruction

Clock cycle time Seconds per clock cycle

Page 22: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

CPI

CPU clock cycles = Instructions for a program

x Average clock cycles per Instruction (CPI)

CPU time = Instruction count x CPI x clock cycle time

rateClock

CPIcountnInstructio

Page 23: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

Performance

°Performance is determined by execution time

°Do any of the other variables equal performance?

•# of cycles to execute program?

•# of instructions in program?

•# of cycles per second?

•average # of cycles per instruction?

•average # of instructions per second?

°Common pitfall: thinking one of the variables is indicative of performance when it really isn’t.

Page 24: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

CPIi : the average number of cycles per instructions for that

instruction class

Ci : the count of the number of instructions of class i executed.

n : the number of instruction classes.

CPU Clock Cycles

)( cyclesclock n

1iii CCPICPU

Page 25: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

Example° Instruction Classes:

• Add

• Multiply

°Average Clock Cycles per Instruction:• Add 1cc

• Mul 3cc

°Program A executed:• 10 Add instructions

• 5 Multiply instructions

Page 26: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

° Performance best determined by running a real application

• Use programs typical of expected workload

• Or, typical of expected class of applicationsex: compilers/editors, scientific applications, graphics

° Small benchmarks

• nice for architects and designers

• easy to standardize

• can be abused

Benchmarks

Page 27: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

Benchmarks (2)

° SPEC (Standard Performance Evaluation Corporation)

• companies have agreed on a set of real programs and inputs

• valuable indicator of performance (and compiler technology)

• can still be abused

Page 28: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

Standard Performance Evaluation Corporation

• SPEC is supported by a number of computer vendors to create

standard sets of benchmarks for modern computer systems.

• The SPEC benchmark sets include CPU performance, graphics,

High-performance computing, Object-oriented computing, Java

applications, Client-server models, Mail systems, File systems, and

Web servers.

Page 29: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

SPEC ‘89

°Compiler “enhancements” and performance

0

100

200

300

400

500

600

700

800

tomcatvfppppmatrix300eqntottlinasa7doducspiceespressogcc

BenchmarkCompiler

Enhanced compiler

SPEC p

erform

ance

ratio

Page 30: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

SPEC CPU Benchmarks

CINT2000 : the SPEC ratio for the integer benchmark sets

CFP2000 : the SPEC ratio for the floating-point benchmark

sets.

computer measured on the timeexecution the

[300MHz]Sun Ultra5 of timeexecution the ratio SPEC

Page 31: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

SPEC 2000Does doubling the clock rate double the performance?

Can a machine with a slower clock rate have better performance?

Clock rate in MHz

500 1000 1500 30002000 2500 35000

200

400

600

800

1000

1200

1400

Pentium III CINT2000

Pentium 4 CINT2000

Pentium III CFP2000

Pentium 4 CFP2000

Page 32: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

SPEC 2000Does doubling the clock rate double the performance?

Can a machine with a slower clock rate have better performance?

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

SPECINT2000 SPECFP2000 SPECINT2000 SPECFP2000 SPECINT2000 SPECFP2000

Always on/maximum clock Laptop mode/adaptiveclock

Minimum power/minimumclock

Benchmark and power mode

Pentium M @ 1.6/0.6 GHz

Pentium 4-M @ 2.4/1.2 GHz

Pentium III-M @ 1.2/0.8 GHz

Page 33: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

Execution Time After Improvement =

Execution Time Unaffected +( Execution Time Affected / Amount of Improvement )

Amdahl's Law

Page 34: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

Example°Application execution time = 20sec

• 12 seconds are spent performing add operations

° If we improve the add operation to run twice as fast, how much faster will the application run?

Page 35: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

Amdahl’s Law° Example:

"Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?"

Page 36: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

Execution time after improvement

Amdahl's Law

seconds) 80100(n

seconds 80

16n ,2080

254

100 t improvemenafter time

n

Execution

Page 37: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

MIPS (million instructions per second)

Example

610 timeExecution

countn Instructio

MIPS

Code fromInstruction Counts (in billions)

for each instruction set

A (1 CPI) B (2 CPI) C (3 CPI)

Compiler 1 5 1 1

Compiler 2 10 1 1

Clock rate = 4GHz A,B,C : Instruction Classes

• Which code sequence will execute faster according to MIPS?• According to execution time?

Page 38: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

CPU clock cycles1 = (5 x 1+1 x 2+1 x 3) x 109 = 10 x 109

CPU clock cycles2 = (10 x 1+1 x 2+1 x 3) x 109 = 15 x 109

Execution time & MIPS

seconds 75.310 4

1015 time2

9

9

Execution

seconds 5.210 4

1010 time1

9

9

Execution

Page 39: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

Execution time & MIPS (2)

280010 seconds 2.5

101)1(5 MIPS

6

9

1

320010 3.75

101)1(10 MIPS

6

9

2

Page 40: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

Performance Evaluation

°Performance depends on

• Hardware architecture

• Software environment

°Meaning of performance depends on viewpoint

• User: time

• System Manager: throughput

Page 41: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

Performance Evaluation

°Kinds of Performance• Graphics

• Network

• Transactional

• Multi-user system

• I/O

• Scientific/Engineering codes

Page 42: September 8, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

15-447 Computer Architecture Fall 2008 ©

Example on the MIPS R10K

Prof run at: Tue Apr 28 15:50:26 1998 Command line: prof suboptim.ideal.m28293

109148754: Total number of cycles 0.55974s: Total execution time 77660914: Total number of instructions executed 1.405: Ratio of cycles / instruction 195: Clock rate in MHz R10000: Target processor modelled

cycles(%) cum % secs instrns calls procedure

61901843(56.71) 56.71 0.32 45113360 1 pdot 47212563(43.26) 99.97 0.24 32523280 1 init 31767( 0.03) 100.00 0.00 21523 1 vsum 1069( 0.00) 100.00 0.00 887 3 fflush : : : : : :