Statistical Comparison of Two or More Systems

Statistical Comparison of Two or More Systems

The most relevant of all the Basic Theory Lectures.

No Holidays.

THE MISSION Your analysis task involves

manipulating conditions of the system of interest from a prescribed set of options. Design of Experiments: Determine if the

different options are really different. Is the best one really statistically better?

Ranking and Selection: What’s the probability that the best sample indicates the best system setting?

VOCABULARY Factor

An element of the system that will be manipulated

Setting or Level A value that a Factor may assume

EXAMPLE : Simulation model of Football (EA Sports) Factors

Quarterback Running Back Strong Safety

Settings or Levels for Quarterback Dante’ Bret Johnny U.

TYPES OF DESIGNS One Factor, Two Settings

Paired samples Behrens-Fischer Question: Which is Best?

More than one Factor Factorial Designs Partially Exhaustive Designs Question: Are the settings significant

difference-makers?

PAIRED SAMPLES Example: Quarterback Controversy! Simulate St. Louis Rams vs. Tampa Bay

Bucs, recording the Quarterback Rating Level 1: Curt Warner Level 2: Mark Bulger

Run the simulation 28 times for each player, resulting in data set W1, W2, ..., W28 B1, B2, ..., B28

Is E[B] > E[W]?

BRUTE FORCE Confidence interval on the quantity

E[W]-E[B] If it doesn’t include 0.0, we have

conclusive evidence that there is a difference

Equivalent to the Hypothesis Test H0: E[B]=E[W]

CALCULATIONS ON VARIANCES: SOME BASICS Let X and Y be random variables

)(])[(

])[(][

2

2

xdFXEX

XEXEXVAR

X

CALCULATIONS ON VARIANCES: SOME BASICS Let X and Y be random variables

],[2][][][)5][][)4

][][][],[)3],[2][][][)2

])[(][][)1

2

22

YXCOVYVARXVARYXVARXVARccXVAR

YEXEXYEYXCOVYXCOVYVARXVARYXVAR

XEXEXVAR

COV=0 if X and Y are independent.

SAMPLE MEAN

nXVARXVAR

nn

XVARnn

XVARXVAR

i

n

ii

n

ii

)(

1)(

2

12

1

nX

X

CONFIDENCE INTERVAL

a/2 probability of Type I error on each end of the confidence interval

basic interval for X-bar is n

ZX

nZX

XVARZX

a

a

a

2/

2

2/

2/ ][

BASIC CONFIDENCE INTERVAL

][)( 2/ BWVARZBW a

280][][

],[2][][][

BVARWVAR

BWCOVBVARWVARBWVAR

SPREADSHEET HIGHLIGHTS 1 (U-0.5)*SQRT(12)

zero mean unit stddev

m + (U-0.5)*SQRT(12)* mean m stddev uniform over an interval centered at m

and *SQRT(12)/2 wide

COMMON RANDOM NUMBERS Correlation is not always BAD! Suppose we could INDUCE

CORRELATION between the W’s and the B’s without adding any bias?

Reduces the theoretical variance of W-bar – B-bar

FREE POWER (the probability of correctly rejecting H0: equal means)

STREAMING Segregate the random number

generation task into streams connected to phenomena

seed1 seed2

Inter-arrivaltimes

Servicetimes

Zi=aZi-1 mod m

1. Change features of the service.2. Use exact same arrival stream forcomparing each service setting.

SPREADSHEET HIGHLIGHTS 2 Use same results of RAND() for

building Bulger samples Warner samples

Note CI shrinkage Try with identical sigma Discuss “Estimation”

Behrens-Fischer Problem Comparison of Means No pairs, equal sample sizes, or equal

variances Remember that we are after the variance of

W-bar – B-bar Common use: New samples vs. History

0/][/][],[2][][][

BW nBVARnWVARBWCOVBVARWVARBWVAR

SPREADSHEET HIGHLIGHTS

MULTI-SETTING CASE Can involve many Factors or just

one Treatment i has mean mi Analysis of Variance (ANOVA)

Data from treatment 1, 2, ..., n H0: m1 =...mn-1 =mn Are the treatments distinguishable?

DESIGN OF EXPERIMENTDetermineFactors and Settings

Collect DataAccording to Design

Design = Which Factors,Which Settings for each Treatment

PerformANOVA

State Conclusion

FULL FACTORIAL Build sample of All Combinations Factors

Quarterback (2) Running Back (3) Strong Safety (3) 2x3x3=18 Treatments

HOW ANOVA WORKS Xi,j is ith sample from jth treatment

point Assumed iid Normal (never!) Decomposition of variability

Observation (Obs) Treatment vs. Grand Mean (Tr) Within Treatment (Res)

jiiji eX ,, m

HYPOTHESIS H0 The treatment variability is random

variability The size of the treatment

variability is in-scale with the residual variability

ANOVA uses sums of squares g treatments nt samples from treatment t

ANOVA TABLE

1)(

)(

1)(

11

2,

1

11

2,

1Re

1

2

g

it

g

tji

n

jObs

g

it

g

ttji

n

js

g

tttTr

nxxSS

gnxxSS

gxxnSS

t

t

degreesfreedom

REMEMBER chi-SQUARED?From our Goodness-of-Fit Test X~N(0,1) for n independent X’s sum of n X2 is chi-SQUARED with n

degrees of freedom if estimates (X-bar, sigma) were

used to make X’s N(0,1), lose one d.f. per estimate

F-distribution X is chi-sq with n d.f. Y is chi-sq with m d.f. (X/n)/(Y/m) has F distribution

ANOVA HYPOTHESIS TEST

FfdSSfdSS

s

Tr ~././

Re

The normalizing cancels!

ANOVA HYPOTHESIS TEST Compare the

test statistic to a table

Reject if its big and conclude that ...

the Treatments are Different!

SPREADSHEET HIGHLIGHTS

Statistical Comparison of Two or More Systems

Documents

Statistical Comparison of Two or More Systems