Top Banner
15-447 Computer Architecture Fall 2007 © September 19, 2007 Karem Sakallah [email protected] www.qatar.cmu.edu/~msakr/15447-f07/ CS-447– Computer Architecture M,W 10-11:20am Lecture 7 Performance (Cont’d)
46

15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah [email protected] msakr/15447-f07/ CS-447– Computer Architecture.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

September 19, 2007

Karem [email protected]

www.qatar.cmu.edu/~msakr/15447-f07/

CS-447– Computer Architecture

M,W 10-11:20am

Lecture 7Performance (Cont’d)

Page 2: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Today

°Lecture & Discussion

°Next Lecture: Review

°Read the chapters & slides.

°Practice the performance examples in the Patterson book.

Done by nowDone by now

Page 3: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Assessing & Understanding Performance

This chapter discusses how to measure, report, and summarize performance of a computer.

Page 4: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Motivation

It is often helpful to have some yardstick by which to compare systems

• During development to evaluate different algorithms or optimizations

• During purchasing to compare between product offerings

• …

Page 5: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

° Measure, Report, and Summarize

° Make intelligent choices

° See through the marketing hype

° Key to understanding underlying organizational motivation

Performance

Page 6: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Performance

Why is some hardware better than others for different programs?

What factors of system performance are hardware related?(e.g., Do we need a new machine, or a new operating system?)

How does the machine's instruction set affect performance?

Page 7: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Which of these airplanes has the best performance?

Airplane Passengers Range (mi) Speed (mph)

Boeing 737-100 101 630 598Boeing 747 470 4150 610BAC/Sud Concorde 132 4000 1350Douglas DC-8-50 146 8720 544

°How much faster is the Concorde compared to the 747? °How much bigger is the 747 than the Douglas DC-8?

Page 8: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

°Response Time (latency)— How long does it take for my job to run?— How long does it take to execute a job?— How long must I wait for the database

query?

°Throughput— How many jobs can the machine run at

once?— What is the average execution rate?— How much work is getting done?

Computer Performance

Page 9: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

°Elapsed Time

•counts everything (disk and memory accesses, I/O , etc.)

•a useful number, but often not good for comparison purposes

Execution Time

Page 10: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Execution Time°CPU time

•doesn't count I/O or time spent running other programs

•can be broken up into system time, and user time

•Our focus: user CPU time

•time spent executing the lines of code that are "in" our program

Page 11: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

°For some program running on machine X,

PerformanceX = 1 / Execution timeX

"X is n times faster than Y"

PerformanceX / PerformanceY = n

Definition of Performance

Page 12: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Definition of Performance

Problem:• machine A runs a program in 20 seconds

• machine B runs the same program in 25 seconds

Page 13: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

How to compare the performance? Total Execution Time : A Consistent Summary Measure

Comparing and Summarizing Performance

Computer A Computer BProgram1(sec) 1 10Program2(sec) 1000 100Total time (sec) 1001 110

1.9110

1001

TimeB

Execution

TimeAExecution

AePerformanc

BePerformanc

Page 14: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Clock Cycles

° Instead of reporting execution time in

seconds, we often use cycles

°Clock “ticks” indicate when to start activities

(one abstraction): time

seconds

program

cycles

program

seconds

cycle

Page 15: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Clock cycles

° cycle time = time between ticks = seconds per cycle

° clock rate (frequency) = cycles per second (1 Hz = 1 cycle/sec)

A 4 Ghz clock has a 250ps cycle time

Page 16: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

CPU Execution Time

rateclockondscycle

onds

cycle

Cycle

SecondsCyclesSeconds

CPU

sec/

sec/

Program

cycles

ProgramProgram

time)cycle(clock x program) afor cyclesclock (CPU

program afor timeexecution

Page 17: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

So, to improve performance (everything else being equal) you can either increase or decrease?

________ the # of required cycles for a program, or________ the clock cycle time or, said another way, ________ the clock rate.

How to Improve Performanceseconds

program

cycles

program

seconds

cycle

Page 18: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

So, to improve performance (everything else being equal) you can either increase or decrease?

_decrease_ the # of required cycles for a program, or_decrease_ the clock cycle time or, said another way, _increase_ the clock rate.

How to Improve Performanceseconds

program

cycles

program

seconds

cycle

Page 19: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Could assume that # of cycles equals # of instruction

time

1st

inst

ruct

ion

2nd

in

stru

ctio

n

3rd

in

stru

ctio

n

4th

5th

6th ...

How many cycles are required for a program?

This assumption is incorrect, different instructions take different amounts of time on different machines.

Page 20: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

° Multiplication takes more time than addition° Floating point operations take longer than integer ones° Accessing memory takes more time than accessing

registers° Important point: changing the cycle time often changes

the number of cycles required for various instructions

time

Different numbers of cycles for different instructions

Page 21: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Now that we understand cycles

Components of Performance Units of Measure

CPU execution time for a program

Seconds for the program

Instruction count Instructions executed for the program

Clock Cycles per Instruction (CPI)

Average number of clock cycles per instruction

Clock cycle time Seconds per clock cycle

Page 22: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

CPI

CPU clock cycles = Instructions for a program

x Average clock cycles per Instruction (CPI)

CPU time = Instruction count x CPI x clock cycle time

rateClock

CPIcountnInstructio

Page 23: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Performance

°Performance is determined by execution time

°Do any of the other variables equal performance?

•# of cycles to execute program?

•# of instructions in program?

•# of cycles per second?

•average # of cycles per instruction?

•average # of instructions per second?

°Common pitfall: thinking one of the variables is indicative of performance when it really isn’t.

Page 24: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

CPIi : the average number of cycles per instructions for that

instruction class

Ci : the count of the number of instructions of class i executed.

n : the number of instruction classes.

CPU Clock Cyclesn

i 1 clock cycles ( )i iCPU CPI C

== ´å

Page 25: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Example° Instruction Classes:

• Add

• Multiply

°Average Clock Cycles per Instruction:• Add 1cc

• Mul 3cc

°Program A executed:• 10 Add instructions

• 5 Multiply instructions

Page 26: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

QuizAn application using a desktop client and a remote server is limited by network performance. What happens to response time and throughput when:

°An extra network channel is added

°Networking software is upgraded to reduce communications delay

°More memory is added to the desktop computer

Page 27: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Formula Summary° T: Execution Time (seconds)

° C: Total Number of Cycles

° f: Clock Frequency (cycles/second)

° I: (Dynamic) Instruction Count

• Ij: Count for Instructions of type j

• Cj: Cycles per Instruction of type j

T = C / fC = I1 x C1 + … + Ik x Ck

I = I1 + I2 + … + Ik

CPI = C / I

T = (I x CPI) / f

Page 28: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Performance Calculation Example: fact(4)

fact:a. pushl %ebp # Setupb. movl %esp,%ebp # Setupc. movl $1,%eax # eax = 1d. movl 8(%ebp),%edx # edx = x

L11:e. imull %edx,%eax # result *= xf. decl %edx # x—g. cmpl $1,%edx # Compare x:1h. jg L11 # if > repeati. movl %ebp,%esp # Finishj. popl %ebp # Finishk. ret # Finish

f = 1GHzInst Type Inst Cycles

1 imull 5

2 decl 2

3 Other 1

Calculate:T, C, I, & CPI when factis executed with input x = 4

Page 29: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Performance Calculation Example: fact(4) fact:a. pushl %ebpb. movl %esp,%ebpc. movl $1,%eaxd. movl 8(%ebp),%edx

L11:e. imull %edx,%eaxf. decl %edxg. cmpl $1,%edh. jg L11i. movl %ebp,%espj. popl %ebpk. ret

Inst Count Cycles

a

b

c

d

e

f

g

h

i

j

k

Total

Inst Type Inst Cycles

1 imull 5

2 decl 2

3 Other 1

Page 30: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

° Performance best determined by running a real application

• Use programs typical of expected workload

• Or, typical of expected class of applicationsex: compilers/editors, scientific applications, graphics

° Small benchmarks

• nice for architects and designers

• easy to standardize

• can be abused

Benchmarks

Page 31: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Benchmarks (2)

° SPEC (Standard Performance Evaluation Corporation)

• companies have agreed on a set of real programs and inputs

• valuable indicator of performance (and compiler technology)

• can still be abused

Page 32: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Standard Performance Evaluation Corporation

• SPEC is supported by a number of computer vendors to create

standard sets of benchmarks for modern computer systems.

• The SPEC benchmark sets include CPU performance, graphics,

High-performance computing, Object-oriented computing, Java

applications, Client-server models, Mail systems, File systems, and

Web servers.

Page 33: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

SPEC ‘89

°Compiler “enhancements” and performance

0

100

200

300

400

500

600

700

800

tomcatvfppppmatrix300eqntottlinasa7doducspiceespressogcc

BenchmarkCompiler

Enhanced compiler

SPEC p

erform

ance

ratio

Page 34: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

SPEC CPU Benchmarks

CINT2000 : the SPEC ratio for the integer benchmark sets

CFP2000 : the SPEC ratio for the floating-point benchmark

sets.

computer measured on the timeexecution the

[300MHz]Sun Ultra5 of timeexecution the ratio SPEC

Page 35: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

SPEC 2000Does doubling the clock rate double the performance?

Can a machine with a slower clock rate have better performance?

Clock rate in MHz

500 1000 1500 30002000 2500 35000

200

400

600

800

1000

1200

1400

Pentium III CINT2000

Pentium 4 CINT2000

Pentium III CFP2000

Pentium 4 CFP2000

Page 36: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

SPEC 2000Does doubling the clock rate double the performance?

Can a machine with a slower clock rate have better performance?

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

SPECINT2000 SPECFP2000 SPECINT2000 SPECFP2000 SPECINT2000 SPECFP2000

Always on/maximum clock Laptop mode/adaptiveclock

Minimum power/minimumclock

Benchmark and power mode

Pentium M @ 1.6/0.6 GHz

Pentium 4-M @ 2.4/1.2 GHz

Pentium III-M @ 1.2/0.8 GHz

Page 37: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Execution Time After Improvement =

Execution Time Unaffected +( Execution Time Affected / Amount of Improvement )

Amdahl's Law

Page 38: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Example°Application execution time = 20sec

• 12 seconds are spent performing add operations

° If we improve the add operation to run twice as fast, how much faster will the application run?

Page 39: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Amdahl’s Law° Example:

"Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?"

Page 40: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Execution time after improvement

Amdahl's Law

seconds) 80100(n

seconds 80

16n ,2080

254

100 t improvemenafter time

n

Execution

Page 41: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

MIPS (million instructions per second)

Example

610 timeExecution

countn Instructio

MIPS

Code fromInstruction Counts (in billions)

for each instruction set

A (1 CPI) B (2 CPI) C (3 CPI)

Compiler 1 5 1 1

Compiler 2 10 1 1

Clock rate = 4GHz A,B,C : Instruction Classes

• Which code sequence will execute faster according to MIPS?• According to execution time?

Page 42: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

CPU clock cycles1 = (5 x 1+1 x 2+1 x 3) x 109 = 10 x 109

CPU clock cycles2 = (10 x 1+1 x 2+1 x 3) x 109 = 15 x 109

Execution time & MIPS

seconds 75.310 4

1015 time2

9

9

Execution

seconds 5.210 4

1010 time1

9

9

Execution

Page 43: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Execution time & MIPS (2)

280010 seconsd 2.5

101)1(5 MIPS

6

9

1

320010 3.75

101)1(10 MIPS

6

9

2

Page 44: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Performance Evaluation

°Performance depends on

• Hardware architecture

• Software environment

°Meaning of performance depends on viewpoint

• User: time

• System Manager: throughput

Page 45: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Performance Evaluation

°Kinds of Performance• Graphics

• Network

• Transactional

• Multi-user system

• I/O

• Scientific/Engineering codes

Page 46: 15-447 Computer ArchitectureFall 2007 © September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Example on the MIPS R10K

Prof run at: Tue Apr 28 15:50:26 1998 Command line: prof suboptim.ideal.m28293

109148754: Total number of cycles 0.55974s: Total execution time 77660914: Total number of instructions executed 1.405: Ratio of cycles / instruction 195: Clock rate in MHz R10000: Target processor modelled

cycles(%) cum % secs instrns calls procedure

61901843(56.71) 56.71 0.32 45113360 1 pdot 47212563(43.26) 99.97 0.24 32523280 1 init 31767( 0.03) 100.00 0.00 21523 1 vsum 1069( 0.00) 100.00 0.00 887 3 fflush : : : : : :