Top Banner
Performance Eval uation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia Tech Performance Evaluation
133

Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Dec 28, 2015

Download

Documents

Elmer Hodges
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

1

CS5014

Research Methods in CS

Dr. Ayman Abdel-Hamid

Computer Science Department

Virginia Tech

Performance Evaluation

Page 2: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

2

OutlinePerformance Evaluation

•Introduction

•Common Mistakes

Some of the material is based on Dr. Cliff Shaffer’s Notes for CS5014. Department of Computer Science, Virginia Tech

Page 3: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

3

Examples 1/2

• Evaluate design alternatives

• Compare two or more computers, programs, algorithms

Speed, memory, usability

• Determine optimum value of a parameter (tuning, optimization)

• Locate bottlenecks

• Characterize load

• Prediction of performance on future loads

• Determine number and size of components required

Page 4: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

4

Examples 2/2

•Which is the best sorting algorithm?

•What factors affect data structure visualizations?

•Code-tune a program

•Which interface design is better?

•What are the best parameter values for a biological model?

Page 5: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

5

The Art of Performance Evaluation

•Throughput in Transactions per Second

System Workload1 Workload2

A 20 10

B 10 20

How Does system A compare to system B?

Page 6: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

6

Evaluation Issues•System

• Hardware, software, network: Clear bounding for the “system" under study

•Techniques

• Measurement, simulation, analytical modeling

•Metrics

• Response time, transactions per second

•Workload: The requests a user gives to the system

•Statistical techniques

•Experimental design

• Maximize information, minimize number of experiments

Page 7: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

7

Common Mistakes 1/5

•No goals

Each model is special purpose

Performance problems are vague when first presented

• Biased goals OUR system is better than THEIRS

• Unsystematic approach

• Analysis without understanding

People want guidance, not models

• Incorrect performance metrics

Want correct metrics, not easy ones

• Unrepresentative workload

Page 8: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

8

Common Mistakes 2/5

•Wrong evaluation technique

Easy to become married to one approach

• Overlooking important parameters

• Ignoring significant factors

Parameters that are varied in the study are called factors.

There's no use comparing what can't be changed

• Inappropriate experimental design

• Inappropriate level of detail

Page 9: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

9

Common Mistakes 3/5

•No analysis

• Erroneous analysis

• No sensitivity analysis

Measure the effect on changing a parameter

• Ignoring errors in input

• Improper treatment of outliers

Outliers are values that are too high or too low compared to a majority of values

Some should be ignored (can't happen)

Some should be retained (key special cases)

Page 10: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

10

Common Mistakes 4/5

•Assuming no change in the future

• Ignoring variability

Mean is of low significance in the face of high variability

• Too complex analysis

Complex models are \interesting" and so get published and studied

Real world use is simpler

Decision makers prefer simpler models

Page 11: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

11

Common Mistakes 5/5

•Improper presentation of results

•The proper metric to measure the performance of an analyst is the number of analyses that helped decision makers (not the number of analyses performed)

• Ignoring social aspects

• Omitting assumptions and limitations

Page 12: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

12

The Error of one-sided Hypothesis

•Consider the hypothesis “X performs better than Y".

•The danger is the following chain of reasoning:

Could this hypothesis be true?

I have evidence that the hypothesis might be true.

Therefore it is true.

What got ignored is any evidence that the hypothesis might not be (or is not) true.

Page 13: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

13

A Systematic Approach

1. State goals and define the system boundaries

2. List services and outcomes

3. Select metrics

4. List parameters (System and Workload)

5. Select factors to study

6. Select evaluation technique

7. Select workload

8. Design experiments

9. Analyze and interpret data

10. Present results. Start over, if necessary!

Page 14: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

14

Technique Selection•Choices: Analytical Modeling, Simulation, and Measurement

“Until validated, all evaluation results are suspect."

•Validate one of these approaches by comparing against another.Measurement results are just as susceptible to experimental errors and biases as the other two techniques.

•Criteria for technique selection Stage of analysis Time required Tools Accuracy Trade-off evaluation Cost Saleability

Page 15: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

15

Common Performance Metrics 1/3

•Response Time

Interval between a user’s request and the system response

Simplistic definition assuming request and responses are instantaneous

Definition 1 Time between the user finishes the request and the system starts response

Definition 2 Time between the user finishes the request and the system completes response

Page 16: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

16

Common Performance Metrics 2/3

•Throughput: The rate (requests per unit of time) at which requests can be serviced by the system.

Throughput generally increases as the load initially increases.

Eventually it stops increasing, and might then decrease.

Nominal capacity is maximum achievable throughput under ideal workload conditions (bandwidth for computer networks)

Usable capacity is maximum throughput achievable without violating a limit on response time.

Efficiency is ratio of usable to nominal capacity.

Page 17: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

17

Common Performance Metrics 3/3

•UtilizationFraction of time the resource is busy servicing requestsRatio of busy time to total elapsed time over a given period

•Bottleneck: Component with highest utilizationImproving this component often gives highest payoff

•ReliabilityProbability of errorsMean time between errors

•AvailabilityUptime and downtimeMean uptime (Mean time to Failure)

•Cost/performance ratio

Page 18: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

18

Workloads•A workload is the requests made by users of the system under study.

A test workload is any workload used in performance studies A real workload is one observed on a real system. It cannot be repeated. A synthetic workload is a reproduction of a real workload to be applied to the tested system

•Examples (for CPU performance) Addition instruction Instruction mixes Kernels Synthetic programs Application benchmarks

Page 19: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

19

Selecting Workloads

1. Determine the services for the SUT (System Under Test)

• View the system as a service provider

• Component Under Study (CUS)?

2. Select the desired level of detail

3. Confirm that the workload is representative

4. Is the workload still valid?

• A real-world workload is not repeatable

• Most workloads are models of real service requests

Page 20: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

20

Selecting Workloads-Level of Detail

• Select the desired level of detail

Most frequent request

Frequency of request types

Time-stamped sequence of requests (trace)

Average resource demand

Might be necessary to specify complete probability distribution

Distribution of resource demands

Page 21: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

21

Selecting Workloads-Representativeness

• A test workload representative of real application

• Test workload and real application should match in the following

Arrival rate

Resource demands

Resource usage profile

Page 22: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

22

Selecting Workloads-Other Factors

• Loading Level

A workload might exercise a system to its

full capacity (best case)

Beyond its capacity (worst case)

Load level observed in real workload (typical case)

• Impact of external components

• Repeatability

Page 23: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

23

Some Workload Characterization Techniques 1/3

• Averaging

• Uses arithmetic mean

• Alternatives?

• Specifying Dispersion

• Variance

• Markov Models

• Clustering

Page 24: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

24

Some Workload Characterization Techniques 2/3

Markov Models Assume the next request depends only on the last

request Next system state depends only on current system state Transition matrices

Ex: typical distribution for some system is about 4 small packets followed by a single large packet

Random chance: probability of small is always .8, large is always .2.

Markov model: Small follows small .75, large follows small .25. In contrast, small always follows large.

Page 25: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

25

Some Workload Characterization Techniques 3/3

• Clustering

To separate a population into groups with similar characteristics.

minimize the within-group variance while maximizing the between-group variance

Select representatives to simplify further processing

Page 26: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

26

Statistics: Basic Concepts 1/2

• Independent Events: Knowing that one event has occurred does not in any way change our estimate of the probability of the other event.

• Random Variable: takes one of a specified set of values with a specified probability.

Page 27: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

27

Statistics: Basic Concepts 2/2

For independent variables, covariance is zero. Why?

Page 28: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

28

Mean, Median, and Mode

(highest probability xi)

Probability Density Function (pdf) is the derivative of the CDF. P(x1 < x ≤ x2) = F(x2) – F(x1)

Page 29: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

29

Summarizing Data by a Single Number 1/2

• A single number gives key characteristic of data set

• Indices of central tendencies: mean, median, and mode

Sample mean

Sample median

Sort observations in increasing order and take observation in the middle of the series (if number even? take mean of middle two values)

Sample mode

Plot a histogram and specify the midpoint where the histogram peaks, or category occurring most frequently

Mean is affected more by outliers than median or mode

Page 30: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

30

Summarizing Data by a Single Number 2/2

• Mean and Median always exist and are unique

• Mode may not exist, or might not be unique

• Examples

Unimodal, symmetrical pdf

Bimodal, symmetrical pdf

Uniform density function

Skewed pdf

Page 31: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

31

Choosing Mean, Median, or Mode• You can't take the median or mean of categorical data

Use Mode.

• Is the total of interest?

Use mean

(Ex: Total CPU time for five database queries vs. number of windows on the screen open for each query.)

Probably use mean for the times

median for windows.

• Are the data skewed? Use Median

Page 32: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

32

Common Misuses of Means• Using means of significantly different values

Mean of 10 and 1000 msec is 505 msec

• Using mean without regard to skewness of distribution

Sum Mean Typical

10 9 11 10 10 50 10 10

5 5 5 4 31 50 10 5

• Multiplying means to get the mean of a product

Mean of a product of two random variables is equal to product of means, IF two random variables are independent

Page 33: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

33

Summarizing Variability• Mean, median, mode attempt to provide a single

“characteristic" value for the population.

But a single value might not be meaningful

People generally prefer systems with low variability

• There are different ways to measure variability (Indices of dispersion)

Range (max - min)

poor, tends to be unbounded, unstable over a range of observations, susceptible to outliers

Variance or standard deviation

10 and 90 percentiles

Quartiles (box plots)

Page 34: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

34

Variance and Standard Deviation

• Why division by n-1?

Only n-1 of the n differences are independent

• Variance is expressed in units that are the square of the unit of the observations.

• Standard deviation is in units of the mean

• Better to use coefficient of variation, since it takes the scale of measurement unit out of variability consideration

Page 35: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

35

Percentiles• Can be performed for any variables, even without bounds

• 0.9-quantile is the 90-percentile

• Percentiles at multiples of 10% are deciles

• Quartiles divide data into 4 parts at 25, 50, and 75%

25% of observations are less than or equal to first quartile

50% of observations are less than or equal to second quartile

-quantiles can be estimated by sorting the observations and taking [(n-1)+1]th element in the ordered set

Page 36: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

36

Selecting Index of Dispersion

• Is variable bounded?

Specify range

• If no bounds, is distribution unimodal, symmetric?

Use coefficient of variation

• Is distribution nonsymmetric?

Use percentiles

• Use of average traffic for network design?

Usually designed to carry 95 to 99 percentile of observed load (range or percentile)

• Finding a percentile requires several passes through the data

Page 37: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

37

Comparing Systems Using Sample Data• Most CS research work involves samples from some

“population.“ Performance experiments on workloads for programs, systems, and networks

• A fundamental goal of CS experimentation is to determine the mean () and variance of some population characteristic (a parameter).

• We can only measure the mean characteristic for the sample, not the population (a statistic)

The parameter value (population mean) is fixed.

The statistic (sample mean) is a random variable.

The values for the statistic come from some sampling distribution.

Page 38: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

38

Hypothesis Testing• One fundamental experimental question What is the

parameter value?

• Another fundamental experimental question Are two populations different?

Does a subpopulation perform better?

Is one algorithm better than another?

• Null Hypothesis: Two populations have the same mean

• Rejection of the Null Hypothesis: Do so if you are “confident" that the means are different

• Significant Difference: Two means are different with a certain reference confidence (probability)

Page 39: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

39

Confidence Intervals

Page 40: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

40

Central Limit Theorem

Page 41: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

41

Normal Distribution

The sum of a large number of independent observations from any distribution has a normal distribution

•Normal variate denoted as N(, σ), where is the mean and σ is the standard deviation

•A normal distribution with zero mean, and unit variance is called a unit normal or standard normal distribution denoted N(0,1)

•An -quantile of a unit normal variate z ~ N(0,1) is denoted by z

•If a random variable has N(, σ) distribution, then (x- )/ σ has a N (0,1) distribution

Page 42: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

42

Confidence Interval Example

Page 43: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

43

Confidence Intervals for Small Samples

•The sampling distribution only approximates a normal curve when n gets big enough, over about 30.

•Before that, the “tails" of the distribution are too big, so the estimated confidence interval is too small.

•You can calculate the confidence intervals more precisely, but you need a separate table for every n value.

•Usually there is a table to look it up in, or better yet, the statistical software will calculate it for you

•The 100(1-)% confidence interval is given by

nstx n /]1;2/1[

Page 44: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

44

Small Sample CI Example

•8 observations

-0.04, -0.19, 0.14, -0.09, -0.14, 0.19, 0.04, and 0.09

•Mean = 0

•Sample standard deviation = 1.38

•t[0.95;7] = 1.895

•90% Confidence interval is

)0926.0,0926.0(8/138.0*895.10

Page 45: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

45

Testing Against Zero

•Are the differences in processor times for two algorithms significant?

Testing if mean (differences) is zero. Check if zero is within confidence interval at desired confidence.

Note that the number of samples is likely to be small. Take from table for correct t value

Page 46: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

46

Testing Against Zero ExampleDifference in processor times of two different implementations of the same algorithm was measured on seven similar workloads {1.5, 2.6, -1.8, 1.3, -0.5, 1.7, and 2.4}. Can we say with 99% confidence that one implementation is superior to the other?

•Sample size = 7

•Mean = 7.2/7 = 1.03

•Sample variance = 2.57

•Sample standard deviation = 1.6

•100(1-) = 99, = 0.01, 1- /2=0.995

•t[0.995,6] = 3.707

•Confidence Interval is

•Interval includes zero, cannot say

)27.3,21.1(7/6.1*707.303.1

Page 47: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

47

Paired Observations•Need to compare two systems on very similar workloads

If more than 2 systems, or workloads are significantly different, analysis of experimental design techniques should be used

•Conduct n experiments on each of the two systems, one-to-one correspondence between ith test on the 2 systems, observations are paired

•Calculate difference in performance

•Construct a difference interval for the difference

if includes zero, systems are not significantly different

Page 48: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

48

Paired Observations ExampleSix similar workloads used on 2 systems

{(5.4, 19.1), (16.6, 3.5), (0.6, 3.4), (1.4, 2.5), (0.6, 3.6), (7.3,1.7)}

Performance difference {-13.7, 13.1, -2.8, -1.1, -3.0, 5.6}

•Sample size = 6

•Mean = -0.32

•Sample variance = 81.62

•Sample standard deviation = 9.03

•100(1-) = 90, = 0.1, 1- /2=0.95

•t[0.95,5] = 2.015

•Confidence Interval is

•Interval includes zero, two systems are not different

)11.7,75.7(69.3*015.232.0

Page 49: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

49

Unpaired Observations•Two samples of different sizes

•No correspondence between ith observations in two samples

•Make an estimate of the variance and effective number of degrees of freedom classical t-test

Compute sample means

Compute sample standard deviations

Compute mean difference

Compute standard deviation of mean difference

Compute effective number of degrees of freedom

Compute confidence interval for mean difference

Page 50: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

50

What Confidence Interval to Use?•Depends on the loss if wrong

•Sometimes 50% confidence is enough, or 99% confidence is not enough

Page 51: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

51

Regression Models•A regression model predicts a random variable as a function of other variables.

Response variable: the estimated variable

Predictor variables, predictors, factors: variables used to predict the response

All must be quantitative variables to do calculations

•What is a good model?

Minimize the distance between predicted value and observed values

Page 52: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

52

Linear Regression Model 1/2

Page 53: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

53

Linear Regression Model 2/2

n

iii

n

ii

n

iii

n

ii

iii

xbbye

xbbyeSSE

yye

110

1

1

210

1

2

0)(

)(

ˆ

Page 54: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

54

Estimation of Model Parameters 1/2

Page 55: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

55

Estimation of Model Parameters 2/2

-5

0

5

10

15

20

25

30

0 20 40 60 80 100 120

Number of disk I/O

CPU

tim

e in

mse

c

Series1

Series2

Page 56: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

56

Allocation of Variation 1/2

•The purpose of a model is to predict the response with minimum variability.

No parameters: Use mean of response variable

One parameter linear regression model: How good?

n

1i

2

n

1i

2

1

2

)(regression without SSE

y of variance)(1

1

1

1

regression without errors of Variance

yy

yyn

en

yyeError

i

i

n

ii

ii

Page 57: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

57

Allocation of Variation 2/2

Page 58: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

58

Confidence Intervals for Regression Parameters

2/122

2/1

22

2

1

0

1

xnx

ss

xnx

x

nss

eb

eb

1

0

1

0

b

b

tsb

tsb

•b0 and b1 are estimates from a single sample size n

•b0 and b1 are “statistics”

•Values for b0 and b1 are the mean value

•Obtain standard deviations for b0 and b1

•Confidence intervals are

2;2/1 ntt

Page 59: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

59

CI for Regression Parameters Example

Page 60: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

60

Verifying Regression Assumptions 1/2

•Assumption: Linear relationship between x and y

If you plot your two variables, you might find many possible results:

1. No visible relationship

2. Linear trend line

3. Two or more lines (piecewise)

4. Outlier(s)

5. Non-linear trend curve

Only the second case can use a linear model

The third case could use multiple linear models

Page 61: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

61

Verifying Regression Assumptions 2/2

•Assumption: Independent Errors

Make a scatter plot of errors versus response value

If you see a trend, you have a problem dependence of errors on predictor value

•Assumption: The predictor variable x is nonstochastic and it is measured without any error.

Page 62: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

62

Limits to Simple Linear Regression•Only one predictor is used

•The predictor variable must be quantitative (not categorical)

•The relationship between response and predictor must be linear

•The errors must be normally distributed

•Visual tests:

Plot x vs. y: Anything odd?

Plot residuals: Anything odd?

Residual plot especially helpful if there is a meaningful order to the measurements

Page 63: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

63

Other Regression Models•Multiple Linear Regression

More than one predictor variable is used

•Categorical Predictors

Predictor variables not quantitative but represent categories

CPU type or disk type

•Curvilinear Regression

Nonlinear relationship between response and predictors

•Transformations

Errors are not normally distributed or variance is not homogeneous

Page 64: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

64

Multiple Linear Regression 1/2

kkknnn

k

k

n e

e

e

b

b

b

xxx

xxx

xxx

y

y

y

.

.

.

.

.

.

.1

.....

.....

.....

.1

.1

.

.

.2

1

1

0

21

22212

12111

2

1

Page 65: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

65

Multiple Linear Regression 2/2

pTT

pe

kn

thjjjjeb

e

TTT

TT

xXXxm

s

m

t

kn

Xjccss

kn

SSEs

SST

SSESSTR

yXbyySSE

SSSSYSST

yXXXb

j

1

1;2/1

1T

2

1

)(1

sprediction ofDeviation Standard

nsobservatio future ofMean :Prediction

using computed intervals Confidence

1freedom of degrees SSE

X of termdiagonal is where, deviation Standard

1 errors ofdeviation Standard

0

EstimationParameter

Page 66: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

66

Multiple Linear Regression Example 1/3

Observations of disk I/Os, memory sizes, and CPU time for 7 programs

CPU Time Disk I/O Memory Size

2 14 70

5 16 75

7 27 144

9 42 190

10 39 210

13 50 235

20 83 400

Page 67: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

67

Multiple Linear Regression Example 2/3

•Find a linear equation to estimate CPU time

CPU time = b0+b1(Disk I/O)+b2(Memory Size)

•b = (-0.1614,0.1182,0.0265)T

•We can compute the error terms from this (difference between the regression formula and the actual observations)

•Compute R2 = SSR/SST=0.97 (regression explains 97% of variation of y)

•Coefficient of Multiple Correlation = R = 0.970.5 = 0.99

•Standard deviation of errors = Square Root(SSE/(n-3)) = 1.2

Page 68: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

68

Multiple Linear Regression Example 3/3

•The 90% confidence intervals for the parameters are (-2.11, 1.79), (-0.29, 0.53), and (-0.06, 0.11), respectively.

What does this mean?

•What is the predicted CPU time for 100 disk I/Os and a memory size of 550?

y=-0.1614+0.1182*(100)+0.0265*(550)=26.2375

Standard deviation of predicted observation = 3.3435

90% confidence interval using a t-value of 2.132 = (19.1096,33.3363)

Standard deviation for a large number of observations = 3.1385 90% confidence interval = (19.5467,32.9292)

Page 69: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

69

Analysis of Variance (ANOVA) 1/5

Page 70: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

70

Analysis of Variance (ANOVA) 2/5

Page 71: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

71

Analysis of Variance (ANOVA) 3/5

•Assuming errors are independent and normally distributed. All are identically distributed (same mean and variance)

•y’s are normally distributed since x’s are nonstochastic (measured without errors)

•Sum of squares of normal variates has a chi-square distribution

•Given two sums of squares SSi and SSj with vi and vj degrees of freedom. The ratio (SSi / vi) / (SSj / vj) has an F distribution

•The hypothesis that SSi is less than or equal to SSj is rejected at the significance level is the ratio is greater than the 1- quantile of the F-variate

•Compare ratio with F[1-,vi,vj]

Page 72: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

72

Analysis of Variance (ANOVA) 4/5

•MSR = SSR/k = mean square of the regression

Any sum of squares divided by its degrees of freedom gives the corresponding mean square

•MSE = SSE/(n-k-1)

•MSR/MSE has an F[k,n-k-1] distribution

F distribution with k numerator degrees of freedom and n-k-1 degrees of freedom

•If MSR/MSE is greater than value read from F-table, predictor variables are assumed to explain a significant fraction of the response variation (reject null hypothesis)

•This procedure is known as the F-test

Page 73: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

73

Analysis of Variance (ANOVA) 5/5

•In simple linear regression

•One predictor variable

•F-test reduces to testing b1=0

•If confidence interval does include zero, parameter is nonzero

•Regression explains a significant part of the response variation

•F-test is not required

Page 74: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

74

ANOVA Example 1/2

ANOVA calculations for disk-memory-CPU example

Computations performed column by column from the left

Page 75: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

75

ANOVA Example 2/2

ANOVA calculations for disk-memory-CPU example

•Regression passes F-test

Computed ratio > value from F-Table

None of the regression parameters are significantly different than zero?

•We have very high confidence that the regression model has predictive ability

Page 76: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

76

Problem of Multicollinearity•Dilemma: None of our parameters are significant, yet the model is!!

•The problem is that the two predictors (memory and disk I/O) are correlated (R = .9947).

•Next we test if the two parameters each give significant regression on their own.

We already did this for the Disk I/O regression model, and found that it alone accounted for about 97% of the variance.We get the same result for memory size.

•Conclusion: Each predictor alone gives as much predictive power as the two together!

•Moral: Adding more predictors is not necessarily better in terms of predictive ability (aside from cost considerations).

Page 77: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

77

Curvilinear Regression 1/3

•Verify the validity of the linear regression model

Do a scatter plot of response vs. predictor to see if it is linear

•Often you can convert to a linear model and use the standard linear regression

Take the log when the curve looks like y = bxa

ln y = ln b + a ln x

y = a +b/x

y= x/(a+bx)

Page 78: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

78

Curvilinear Regression 2/3

•Example: Amdahl’s law says I/O rate is proportional to the processor speed

I/O rate = (CPU rate)b1

Taking logs we get

log (I/O rate) = log + b1 log (MIPS rate) b0 = log Using standard linear regression, we find that the regression explains 84% of the variation.

No MIPS I/O Rate

1 19.63 288.6

2 5.45 117.30

3 2.63 64.60

4 8.24 356.40

5 14.00 373.20

6 9.87 281.10

7 11.27 149.60

8 10.13 120.60

9 1.01 31.10

10 1.26 23.70

Page 79: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

79

Curvilinear Regression 3/3

Parameter Mean Standard Deviation

Confidence Interval

b0 1.423 0.119 (1.20,1.64)

b1 0.888 0.135 (0.64, 1.14)

Page 80: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

80

Common Mistakes with Regression

•Verify that the relationship is linear. Look at a scatter diagram

•Don't worry about absolute value of parameters. They are totally dependent on an arbitrary decision regarding what dimensions to use

CPU time (sec) = 0.01 (disk I/Os)+0.001(Memory in KB)

•Always specify confidence intervals for parameters and coefficient of determination

•Test for correlation between predictor variables, and eliminate redundancy. Test to see what subset of the possible predictors is “best“ depending on cost vs. performance

•Don't make predictions too far out of the measured range

Page 81: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

81

Experimental Design

“The goal of a proper experimental design is to obtain the maximum information with the minimum number of experiments“

•Determine the affects of various factors on performance

•Does a factor have significant effect?

•An experimental design consists of specifying

The number of experiments

The factor/level combinations of each experiment

The number of replications of each experiment

Page 82: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

82

Potential Pitfalls in Experimentation

•All measured values are random variables. Must account for experimental error

•Control for important parameters

Example: User experience in a user interface study

•Must be able to allocate variation to the different factors

•There is a limit to the number of experiments you can perform

Some designs give more information per experiment.

•One-factor-at-a-time studies do not capture factor interactions.

Page 83: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

83

Types of Experimental Design 1/3

•Simple designs

One factor is varied at a time

Inefficient Does not capture interactions Not recommended

sexperiment 1

)1(1

need levels, havingfactor th factors Given

k

i in

inik

Page 84: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

84

Types of Experimental Design 2/3

•Full factorial design

Perform experiment (s) for every combination of factors at every level

Many experiments required

Captures interactions

If too expensive, can reduce number of levels (ex: 2k design k factors used at 2 levels), number of factors, or take another approach

sexperiment 1

need levels, havingfactor th factors Given

k

ii

n

inik

Page 85: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

85

Types of Experimental Design 3/3

•Fractional Factorial Design

Only run experiments for a carefully selected subset of the full factorial combinations

Page 86: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

86

2k Factorials Designs

•A 2k experimental design is used to determine the effect of k factors, each of which have two alternatives or levels.

Used to prune out the less important factors for further study

Pick the highest and lowest levels for each factor

Assumes that a factor's effect is unidirectional

Performance either continuously decreases or continuously increases as the factor is increased from minimum to maximum

Page 87: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

87

22 Factorials Designs 1/2

Page 88: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

88

22 Factorials Designs 2/2

Page 89: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

89

Allocation of Variation

Page 90: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

90

General 2k Factorials Designs 1/2

nsinteractiofactor - three!3)!3(

!

nsinteractiofactor - two!2)!2(

!effectsmain

kk

kk

k

Page 91: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

91

General 2k Factorials Designs 2/2

Page 92: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

92

2kr Factorials Designs

•Experiments are observations of random variables

•With only one observation, can't estimate error

•If we repeat each experiment r times, we get 2kr observations.

•With a 2 factor model, we now have:

y = q0 +qAxA +qBxB +qABxAxB +e

e is the experimental error

Use sign table, and put in y column the sample means of r measurements at the given factor levels

Page 93: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

93

2kr Factorials Designs Example

SSESSABSSBSSASST

222SST

SSE

ˆ

,

2222222

2

1 1

2

2

jiijABBA

i

r

jij

iijij

erqrqrq

e

yyeFactor A explains 79%, B explains 15.4%, AB explains 4%,and error explains about 1.5%.

Page 94: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

94

Confidence Intervals for Effects

iq

sr

ti

q

r

es

ABq

sB

qs

Aq

sq

s

r

r

res

)]1(22;2/1[ effectsfor intervals confidence

220

zero toup add should serror term Since

)1(k2 freedom of Degrees

)1(22

SSE2

SSE from computed erros of Variance

Confidence intervals for previous example

q0 (39.08, 42.91)

qA (19.58, 23.41)

qB (7.58,11.41)

qAB (3.08,6.91)

Page 95: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

95

2k-p Fractional Factorial Designs

Page 96: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

96

Selecting The Experiments

•The sum of the squares of each column is 2k-p

qA= (-y1+y2-y3+y4-y5+y6-y7+y8)/8

Page 97: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

97

2k-p Fractional Factorial Designs Example

Page 98: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

98

Assumptions•Big assumption: These experiments only provide so much information

•The effects due to interaction are summed into the values calculated for the separate variables

• The experiments are masking these “confounding" interactions

The effects whose influence cannot be separated are said to be confounded

• If the interaction effects are small, it is OK

Page 99: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

99

One-Factor Experiments 1/2

Page 100: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

100

One-Factor Experiments 2/2

...

.

,..

mean Column

1mean Grand

torcolumn vec a form ealternativ tobelonging nsobservatio

matrix an in arranged nsobservatio of Total

esalternativ theofeach for nsobservatio

yy

y

yar

y

jr

arar

ar

jj

j

jiij

th

Page 101: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

101

One-Factor Experiments ExampleEffects of 3 processors, observed 5 times each

•First alternative requires 13.3 bytes less than an average processor

•Third alternative requires 37.7 bytes more than an average processor

Page 102: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

102

Errors and Variation

errors alexperiment todue %6.89

%4.103.357,105

10,992.13

variationof Percentage

94,365.2SSA-SSTSSE

105,357.3SS0-SSY SST

1.992,10SSA

7.281,5287.18753 SS0

633,639 .... 120144SSY

2

22

22

j

jr

ar

For each observation, the error is the difference between the observation and the sum of themean and alternative effect.

Mean is 187.7, R's effect is -13.3, so the first error term is144 - (187.7-13.3)=-30.4

Page 103: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

103

ANOVA

Page 104: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

104

Two-Factor Full Factorial Design 1/2

•Two parameters carefully controlled and varied to study effect on performance

Compare several processors using several workloads

0

0

levelat factor ofeffect

levelat factor ofeffect

responsemean

levelat factor and levelat factor n with observatio

j

j

j

j

ij

ijijij

iB

jA

iBjAy

ey

Page 105: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

105

Two-Factor Full Factorial Design 2/2

...

...

,..

1mean Grand

of levels tocorrespond rows and of levels tocorrespond Columns

matrix an in arranged nsobservatio of Total

sexperiment require levels and have and Factors

yy

yy

yab

y

BA

abab

abbaBA

ii

jj

jiij

Page 106: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

106

Two-Factor Full Factorial ExampleEffects of 3 different cache choices, 5 different workloads

For each row (or column), compute mean of observations in that row (or column)

Difference between a row (or column) mean and overall mean gives row (or column) effect

Page 107: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

107

Estimating Experimental Errors

8.236)4.2(....2.05.3SSE

SSE errors of Variance

ˆ

ˆ response Estimated

222

1 1

2

b

i

a

jij

ijijij

ijij

e

yye

y

Page 108: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

108

Allocation of Variance

1.8%errors) to(due variationdUnexplaine

2.3% workloads todue variationof Percentage

%9.9541.402,13

12,857,20

caches by the explained variationof Percentage

236.80SSB-SSA-SSTSSE

13,402.41SS0-SSY SST

40.308SSB

20.857,12SSA

59.192,78)2.72(53 SS0

91,595 SSY

SSESSBSSASS0SSY

2

2

22

2

ii

jj

ijij

a

b

ab

y

Page 109: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

109

ANOVA

Page 110: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

110

Precautions

Page 111: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

111

Quantile-Quantile Plots

•In order to determine distribution of data

Plot observed quantiles versus theoretical quantiles

•Assume yi is the observed qith quantile

•Using theoretical distribution, the qith quantile xi is computed and

a point is plotted at (xi, yi)

•To determine xi, need to invert CDF qi=F(xi) xi = F-1(qi)

For N(0,1) xi = 4.91[qi0.14-(1-qi)0.14]

For N(,σ), scale to + σxi before plotting

Page 112: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

112

Quantile-Quantile Plots Example•Modeling error for eight predictions of a model were found to b

{-0.04, -0.19, 0.14, -.09, -0.14, 0.19, 0.04, and 0.09}

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

-1.6 -1.1 -0.6 -0.1 0.4 0.9 1.4

Normal Quantile

Resi

dual

Qua

ntile

Page 113: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

113

Q-Q Plot For Two-Factor Example

-10

-8

-6

-4

-2

0

2

4

6

8

10

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Normal quantile

Resid

ual q

uant

ile

Page 114: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

114

Introduction to Simulation

•If system not available, a simulation model provides an easy way to predict the performance or compare several alternatives

•If system is available, a simulation model may be preferred over measurements (Why?)

•Simulation models might fail!

Produce no useful results

Produce misleading results

Page 115: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

115

Common Mistakes in Simulation

•Inappropriate level of detail

•Improper implementation language

•Unverified models

•Invalid models (may not represent the real system correctly)

•Improperly handled initial conditions

•Too short simulations

•Poor random-number generators

•Improper selection of seeds

Page 116: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

116

Terminology 1/3

•State variables

variables whose values define the state of the system

e.g., length of job queue in a CPU scheduling simulation

•Event

a change in the system state

•Continuous-time and discrete-time models

Continuous system state defined at all times (e.g., CPU scheduling)

Discrete system state defined at particular instants of time

Page 117: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

117

Terminology 2/3

•Continuous-state and discrete-state models

Nature of state variables. Time spent by students studying a certain subject (can take an infinite number of values)Queue length in a CPU scheduling simulation can only assume integer values (discrete state model)Discrete-state discrete-event model Continuity of time does not imply continuity of state

•Deterministic and probabilistic modelsOutput of a model can be predicted with certainty deterministicDifferent results on repetition for same set of input parameters probabilistic

Page 118: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

118

Terminology 3/3

•Static and dynamic models

System state changes with time?

•Open and closed models

Input external to the model and independent of it open

No external input closed

•Stable and unstable models

Steady state reached which is independent of time stable

Page 119: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

119

Programming Language Selection

•Simulation language

•General-purpose language

•Extension of a general-purpose language

Extensions as a collection of routines to handle tasks commonly required in simulations

•Simulation package

inflexibility

Page 120: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

120

Types of Simulation 1/3

•Emulation

A simulation using hardware or firmware

e.g., Terminal emulator, processor emulator.

•Monte Carlo Simulation

Static simulation or one without a time axis

Model probabilistic phenomenon that do not change characteristics with time

Evaluate non-probabilistic expressions using probabilistic models

Page 121: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

121

Types of Simulation 2/3

•Trace-driven simulation

Use a trace as input

A trace is a time-ordered record of events on a real system

e.g., A trace of page reference patterns of key programs can be input to compare different memory management schemes

Advantages: (among others) credibility, easy validation, and accurate workload

Disadvantages: (among others) complexity and being a single point of validation

Page 122: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

122

Types of Simulation 3/3

•Discrete-event simulation

ComponentsEvent schedulersimulation clock and a time advancing mechanism

Unit time versus event-driven approachesSystem state variablesEvent routinesInput routines, report routines, initialization routines, and trace routinesDynamic memory management

Page 123: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

123

Analysis of Simulation Results

•Model Verification Techniques

•Model Validation Techniques

•Transient Removal

•Stopping Criteria

Page 124: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

124

Model Verification Techniques•Top-down modular design

•Anti-bugging

•Structured walk-through

•Deterministic models

•Run simplified cases

•Trace: time-ordered list of events and associated variables

•On-Line graphic displays

•Continuity test

•Degeneracy tests

•Consistency tests

•Seed independence

Page 125: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

125

Model Validation Techniques•Validate three key aspects of the model

Assumptions

Input parameter values and distributions

Output values and conclusions

•Each of three aspects maybe subjected to a validity test by comparing with that obtained from possible sources

Expert intuition

Real system measurements

Theoretical results

Page 126: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

126

Transient Removal 1/5

•Steady-state performance is of interest

•What constitutes the transient state?

•Methods for transient removal are heuristic

Long runs

Proper initialization

Truncation

Initial data deletion

Moving average of independent replications

Batch means

Page 127: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

127

Transient Removal 2/5

•Truncation

Variability during steady state less than that during transient state

Measure variability in terms of range- min and max of observations

Observations: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 10, 9, 10, 11, 10, 9, 10, 11, 10, 9

Given a sample of n observations

Ignore first l observations

Calculate min and max of n-l observations

Repeat until (l+1)th observation is neither min or max of remaining observations

Page 128: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

128

Transient Removal 3/5

•Initial data deletion

Average does not change much as observations are deleted

Randomness in observations causes averages to change slightly even during steady state

Average across replications (complete run with no change in input parameter values, seed value is different)

Get a mean trajectory across replications

Get overall mean

Delete the first l observations (init l to 1), and get overall mean for n-l values

Compute relative change in overall mean

Determine the knee where the relative change graph stabilizes

Page 129: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

129

Transient Removal 4/5

•Moving average of independent replications

Similar to initial data deletion but computes mean over a moving time interval window instead of the overall mean

Get a mean trajectory across replications

Plot a trajectory of moving average of successive 2k+1 values (init k to 1)

Repeat with k= 2,3,… until plot sufficiently smooth

Find the knee of the plot

Page 130: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

130

Transient Removal 5/5

•Batch means

Run a very long simulation and later divide into several parts of equal duration (batch)

Study the variance of batch means as a function of batch size

A long run of N observations divided into m batches of size n each (start with n=2 for example)

For each batch, compute a batch mean

Compute overall mean

Compute variance of batch means

Increase n and repeat

Plot variance as a function of n

Page 131: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

131

Stopping Criteria: Variance Estimation 1/3

•Simulation could be run until confidence interval for the mean response narrows to a desired width

•Can obtain variance of sample mean = variance of observations/n

Valid if observations are independent

Observations in most simulations are not independent

•To correctly compute the variance of the mean of correlated observations

Independent replications

Batch means

Page 132: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

132

Stopping Criteria: Variance Estimation 2/3

•Independent replications

Means of independent replications are independent though observations in a single replication are correlated

Conduct m replications of size n+n0 (n0 is length of transient phase)

Compute a mean for each replication

Compute an overall mean for all replications

Calculate the variance of replicate means

Calculate confidence interval for mean response

Will discard mn0 initial observations

Page 133: Performance Evaluation © Dr. Ayman Abdel-Hamid, CS5014, Fall 2006 1 CS5014 Research Methods in CS Dr. Ayman Abdel-Hamid Computer Science Department Virginia.

Performance Evaluation

© Dr. Ayman Abdel-Hamid, CS5014, Fall 2006

133

Stopping Criteria: Variance Estimation 3/3

•Batch means

Run a long simulation run, discard initial transient interval, and divide remaining observations into several batches

Compute means for each batch

Compute overall mean

Calculate variance of batch means

Calculate confidence interval for mean response

Will discard n0 observations (less waste)

Calculate covariance of successive batch means to find correct batch size n

Covariance should be small compared to the variance