Top Banner
Computer System Performance Evaluation: Introduction Eileen Kraemer August 25, 2004
42

Computer System Performance Evaluation: Introduction

Jan 18, 2016

Download

Documents

qabil

Computer System Performance Evaluation: Introduction. Eileen Kraemer August 25, 2004. Evaluation Metrics. What are the measures of interest? Time to complete task Per workload type (RT /TP/ IC/batch) Ability to deal with failures Catastrophic / benign Effective use of system resources. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computer System Performance Evaluation: Introduction

Computer System Performance Evaluation: Introduction

Eileen Kraemer

August 25, 2004

Page 2: Computer System Performance Evaluation: Introduction

Evaluation Metrics

What are the measures of interest?– Time to complete task

• Per workload type (RT /TP/ IC/batch)

– Ability to deal with failures• Catastrophic / benign

– Effective use of system resources

Page 3: Computer System Performance Evaluation: Introduction

Performance Measures

Responsiveness Usage level Missionability Dependability Productivity

Page 4: Computer System Performance Evaluation: Introduction

Classification of Computer Systems General purpose High availability Real-time control Mission-oriented Long-life

Page 5: Computer System Performance Evaluation: Introduction

Techniques in Performance Evaluation Measurement Simulation Modeling Analytic Modeling Hybrid Modeling

Page 6: Computer System Performance Evaluation: Introduction

Applications of Performance Evaluation System Design System Selection System Upgrade System Tuning System Analysis

Page 7: Computer System Performance Evaluation: Introduction

Workload Characterization

Inputs to evaluation:– Under admin control:

• Scheduling discipline, device connections, resource allocation policies ….

– Environmental inputs:• Inter-event times, service demands, failures• = workload

– Drives the real system (measurement)– Input to simulation– Basis of distribution for analytic modeling

Page 8: Computer System Performance Evaluation: Introduction

Workload characterization

How much detail? How to represent? Analytical modeling:

– statistical properties Simulation:

– Event trace, either recorded or generated according to some statistical properties

Page 9: Computer System Performance Evaluation: Introduction

Benchmarking

Benchmarks are sets of well-known programs

Vendors run these programs and report results (some problems with this process)

Page 10: Computer System Performance Evaluation: Introduction

Metrics used (in absence of benchmarks).. Processing rate:

– MIPS (million instructions per second)– MFLOPS (million f.p. ops per second)

Not particularly useful – different instructions can take different amounts of

time– Instructions and complexity of instructions differ

from machine to machine, as will the # of instructions required to execute a particular program

Page 11: Computer System Performance Evaluation: Introduction

Benchmarks:

Provide opportunity to compare running times of programs written in a HLL

Characterize an application domain Consist of a set of “typical” programs Some application benchmarks (real

programs), others are synthetic benchmarks

Page 12: Computer System Performance Evaluation: Introduction

Synthetic benchmarks

Programs designed to mimic real programs by matching their statistical properties– Fraction of statements of each type (=, if,

for)– Fraction of variables of each type (int v real

v char) (local v global)– Fraction of expressions with certain

number and type of operators, operands

Page 13: Computer System Performance Evaluation: Introduction

Synthetic Benchmarks

Pro:– Can model a domain of application

programs in a single program

Page 14: Computer System Performance Evaluation: Introduction

Synthetic Benchmarks

Con:– If expressions for conditionals are chosen

randomly, then code sections may be unreachable and eliminated by a “smart” compiler

– Locality-of-reference seen in normal programs may be violated => resource allocation algorithms that rely on locality-of-reference affected

– May be small enough to fit in cache => unusually good performance, not representative of domain the benchmark is designed to represent

Page 15: Computer System Performance Evaluation: Introduction

Well-known benchmarks for measuring CPU performance Whetsone – “old” Dhrystone – improved on Whetstone Linpack Newer:

– Spice, gcc, li, nasa7, livermore See: http://www.netlib.org/benchmark/ Java benchmarks:

– See http://www-2.cs.cmu.edu/~jch/java/resources.html

Page 16: Computer System Performance Evaluation: Introduction

Whetstone (1972)

Synthetic Models Fortran, heavy on f.p. ops Outdated, arbitrary instruction mixes Not useful with optimizing or

parallelizing compilers Results in mega-whetstones/sec

Page 17: Computer System Performance Evaluation: Introduction

Dhrystone (1984)

Synthetic, C (originally Ada) Models progs with mostly integer

arithmetic and string manipulation Only 100 HLL statements – fits in cache Calls only strcpy(), strcmp() – if

compiler inlines these, then not representative of real programs

Results stated in “Dhrystones / second”

Page 18: Computer System Performance Evaluation: Introduction

Linpack

Solves a dense 100 x 100 linear system of equations using the Linpack library package

A(x) = B(x) + C*D(x) – .. 80% of time

Still too small to really test out hw

Page 19: Computer System Performance Evaluation: Introduction

“Newer”

Spice – Mostly Fortran, int and fp arith, analog circuit

simulation gcc

– Gnu C compiler Li

– Lisp interpreter, written in C Nasa7

– Fortran, 7 kernels using double-precision arithmetic

Page 20: Computer System Performance Evaluation: Introduction

How to compare machines?

A

C

E

D

B

Page 21: Computer System Performance Evaluation: Introduction

How to compare machines?

B

A

C

D

E

VAX 11/780

Typical 1 MIPS machine

Page 22: Computer System Performance Evaluation: Introduction

To calculate MIPS rating

Choose a benchmark MIPS = time on VAX / time on X So, if benchmark takes 100 sec on VAX

and 4 sec on X, then X is a 25 MIPS machine

Page 23: Computer System Performance Evaluation: Introduction

Cautions in calculating MIPS

Benchmarks for all machines should be compiled by similar compilers with similar settings

Need to control and explicitly sate the configuration (cahce size, buffer sizes, etc.)

Page 24: Computer System Performance Evaluation: Introduction

Features of interest for evaluation: Integer arithmetic Floating point arithmetic Cache management Paging I/O

Could test one at a time … or, using synthetic program, exercise all at once

Page 25: Computer System Performance Evaluation: Introduction

Synthetic programs ..

Evaluate multiple features simultaneously, parameterized for characteristics of workload

Pro:– Beyond CPU performance, can also measure

system throughput, investigate alternative strategies Con:

– Complex, OS-dependent– Difficult to choose params that accurately reflect real

workload– Generates lots of raw data

Page 26: Computer System Performance Evaluation: Introduction

“Script” approach

Have real users work on machine of interest, recording all actions of users in real computing environment

Pro:– Can compare system under control and test conditions

(disk 1 v. disk 2), (buf size 1 v. buf size 2), etc. under real workload conditions

Con:– Too many dependencies, may not work on other

installations – even if same machine– System neees to be up and running already– bulky

Page 27: Computer System Performance Evaluation: Introduction

SPEC = System Performance Evaluation Cooperative (Corporation)

Mission: to establish, maintain, and endorse a standardized set of relevant benchmarks for performance evaluation of modern computer systems

SPECCPU – both int and fp version Also for JVMs, web, graphics, other

special purpose benchmarks See: http://www.specbench.org

Page 28: Computer System Performance Evaluation: Introduction

Methodology:

10 benchmarks:– Integer: gcc, espresso, li, eqntott– Floating point: spice, doduc, nasa7, matrix,

fpppp, tomcatv

Page 29: Computer System Performance Evaluation: Introduction

Metrics:

SPECint :– Geometric mean of t(gcc), t(espresso), t(li),

t(eqntott) SPECfp

– Geometric mean of t(spice), t(doduc), t(nasa7), t(matrix), t(fppp), t(tomcatv)

SPECmark– Geometric mean of SPECint, SPECfp

Page 30: Computer System Performance Evaluation: Introduction

Metrics, cont’d

SPEC thruput: measure of CPU performance under moderate CPU contention

Multiprocessor with n processors : two copies of SPEC benchmark run concurrently on each CPU, elapsed time noted

SPECthruput = Time on machine X /time on VAX 11/780

Page 31: Computer System Performance Evaluation: Introduction

Geometric mean ???

Arithmetic mean(x1, x2…xn) (x1+x2+…xn)/n

– AM(10,50,90) = (10+50+90)/3 = 50 Geometric mean(x1,x2,…xn)

– nth root(x1*x2*…*xn)– GM(10,50,90) = (10*50*90)^1/3= 35-36

Harmonic mean(x1,x2,..,xn)– n/ (1/x1 + 1/x2 + … + 1/xn)– HM(10,50,90) = 3/( 1/10 + 1/50 + 1/90) = 22.88

Page 32: Computer System Performance Evaluation: Introduction

Why geometric mean? Why not AM? Arithmetic mean doesn’t preserve

running time ratios (nor does harmonic mean) – geometric mean does

Example:

Page 33: Computer System Performance Evaluation: Introduction

Highly Parallel Architectures

For parallel machines/programs, performance depends on:– Inherent parallelism of application– Ability of machine to exploit parallelism

Less than full parallelism may result in performance << peak rate

Page 34: Computer System Performance Evaluation: Introduction

Amdahl’s Law

f = fraction of a program that is parallelizable

1 –f = fraction of a program that is purely sequential

S(n) = effective speed with n processors S(n) = S(1) / (1-f) + f/n As n->infinity, S(n) -> S(1)/(1-f)

Page 35: Computer System Performance Evaluation: Introduction

Example

S(n) = S(1) / (1-f) + f/n As n->infinity, S(n) -> S(1)/(1-f)

Let f = 0.5, infinite n, max S(inf) = 2 Let f = 0.8, infinite n, max S(inf) = 5

MIPS/MFLOPS not particularly useful for a parallel machine

Page 36: Computer System Performance Evaluation: Introduction

Are synthetic benchmarks useful for evaluating parallel machines? Will depend on : inherent parallelism

– Data parallelism– Code parallelism

Page 37: Computer System Performance Evaluation: Introduction

Data parallelism

multiple data items operated on in parallel by same op

SIMD machines Works well with vectors, matrices, lists, sets Metrics:

– avg #data items operated on per op• (depends on problem size)

– (#data items operated on / # data items) per op• Depends on type of problem

Page 38: Computer System Performance Evaluation: Introduction

Code parallelism

How finely can problem be divided into parallel sub-units?

Metric: average parallelism = Sum(n=1, inf) n f(n) f(n) = fraction of code that can be split into at most n

parallel activities … not that easy to estimate … not all that informative when you do .. …dependencies may exist between parallel tasks, or

between parallel and non-parallel sections of code

Page 39: Computer System Performance Evaluation: Introduction

Evaluating performance of parallel machines is more difficult than doing so for sequential machines Problem:

– Well-designed parallel algorithm depends on number of processors, interconnection pattern (bus, crossbar, mesh), interaction mechanism(shared memory, message passing), vector register size

Solution:– pick the optimal algorithm for each machine– Problem: that’s hard to do! .. And may also

depend on actual number of processors, etc. …

Page 40: Computer System Performance Evaluation: Introduction

Other complications

Language limitations, dependencies Compiler dependencies OS characteristics:

– Timing (communication v. computation)– Process management (light v. heavy)

Page 41: Computer System Performance Evaluation: Introduction

More complications

Small benchmark may reside in cache (Dhrystone)

Large memory may eliminate paging for medium programs, and effects of poor paging scheme hidden

Benchmark may not have enough I/o Benchmark may have dead code,

optimizable code

Page 42: Computer System Performance Evaluation: Introduction

Metrics

Speedup : S(p) – running time of the best possible sequential alg / rt of the parallel imp using p processors

Efficiency = S(p) /p