Lecture 4: Fitting distributions: goodness of fit

2001

Bio 4118 Applied BiostatisticsL4.1

Université d’Ottawa / University of Ottawa

Lecture 4: Fitting distributions: Lecture 4: Fitting distributions: goodness of fitgoodness of fit

Lecture 4: Fitting distributions: Lecture 4: Fitting distributions: goodness of fitgoodness of fit

Goodness of fit Testing goodness of fit Testing normality An important note on testing normality!

Goodness of fit Testing goodness of fit Testing normality An important note on testing normality!

2001



Goodness of fitGoodness of fitGoodness of fitGoodness of fit

measures the extent to which some empirical distribution “fits” the distribution expected under the null hypothesis

measures the extent to which some empirical distribution “fits” the distribution expected under the null hypothesis

20 30 40 50 60Fork length

0

10

20

30

Fre

qu

en

cy

Observed

Expected

2001



Goodness of fit: the Goodness of fit: the underlying principleunderlying principleGoodness of fit: the Goodness of fit: the underlying principleunderlying principle

If the match between observed and expected is poorer than would be expected on the basis of measurement precision, then we should reject the null hypothesis.

If the match between observed and expected is poorer than would be expected on the basis of measurement precision, then we should reject the null hypothesis.

Fork length

ObservedExpected

0

20

30

Fre

qu

en

cy20 30 40 50 60

0

10

20

30

RejectH0

AcceptH0

2001



Testing goodness of fit : the Chi-Testing goodness of fit : the Chi-square statistic (square statistic (

Testing goodness of fit : the Chi-Testing goodness of fit : the Chi-square statistic (square statistic (

Used for frequency data, i.e. the number of observations/results in each of n categories compared to the number expected under the null hypothesis.

Used for frequency data, i.e. the number of observations/results in each of n categories compared to the number expected under the null hypothesis.

22

1

i i

ii

n f f

f

( )

Fre

qu

en

cyCategory/class

ObservedExpected

2001



How to translate How to translate 22 into into pp??How to translate How to translate 22 into into pp??

Compare to the 2 distribution with n - 1 degrees of freedom.

If p is less than the desired level, reject the null hypothesis.

Compare to the 2 distribution with n - 1 degrees of freedom.

If p is less than the desired level, reject the null hypothesis.

0 5 10 15 20

2 (df = 5)

0

0.2

0.3

Pro

bab

ility 2 = 8.5, p = 0.31

accept

p = = 0.05

2001



Testing goodness of fit: the log Testing goodness of fit: the log likelihood-ratio Chi-square statistic (likelihood-ratio Chi-square statistic (GG) )

Testing goodness of fit: the log Testing goodness of fit: the log likelihood-ratio Chi-square statistic (likelihood-ratio Chi-square statistic (GG) )

Similar to 2, and

usually gives similar results.

In some cases, G is more conservative (i.e. will give higher p values).

Similar to 2, and

usually gives similar results.

In some cases, G is more conservative (i.e. will give higher p values).

G ff

fi

i

ii

n

2

1ln

F

req

ue

ncy

Category/class

ObservedExpected

2001



22 versus the distribution of versus the distribution of 22 or or GG22 versus the distribution of versus the distribution of 22 or or GG

For both 2 and G, p values are calculated assuming a 2 distribution...

...but as n decreases, both deviate more and more from 2.

For both 2 and G, p values are calculated assuming a 2 distribution...

...but as n decreases, both deviate more and more from 2.

0 5 10 15 20

2/2/G (df = 5)

0

0.2

0.3

Pro

bab

ility

2/G, very small n

2/G, small n

2001



Assumptions (Assumptions (22 and and GG))Assumptions (Assumptions (22 and and GG))

n is larger than 30. Expected frequencies are all larger than 5. Test is quite robust except when there are only

2 categories (df = 1). For 2 categories, both X2 and G overestimate 2,

leading to rejection of null hypothesis with probability greater than i.e. the test is liberal.

n is larger than 30. Expected frequencies are all larger than 5. Test is quite robust except when there are only

2 categories (df = 1). For 2 categories, both X2 and G overestimate 2,

leading to rejection of null hypothesis with probability greater than i.e. the test is liberal.

2001



What if What if nn is too is too small, there are small, there are

only 2 categories, only 2 categories, etc.?etc.?

Collect more data, thereby increasing n.

If n > 2, combine categories.

Use a correction factor. Use another test.

Age (yrs)

1 2 3 4

Observed 33 14 8 1

Expected 37 12 5 2

Age (yrs)

1 2 3 4

Observed 57 24 12 5

Expected 55 24 13 6

1 2 3+

Observed 33 14 9

Expected 37 12 7

Moredata

Classes combined

2001



Corrections for 2 Corrections for 2 categoriescategories

For 2 categories, both X2 and G overestimate 2, leading to rejection of null hypothesis with probability greater than i.e. test is liberal

Continuity correction: add 0.5 to observed frequencies.

Williams’ correction: divide test statistic (G or 2) by:

qkn k

11

6 1

2

( )

Age (yrs)

1 2

Observed 17 8

Expected 20 5

Age (yrs)

1 2

Observed 17.5 8.5

Expected 20.67 5.33

2001



The binomial testThe binomial testThe binomial testThe binomial test

Used when there are 2 categories.

No assumptions Calculate exact

probability of obtaining N - k individuals in category 1 and k individuals in category 2, with k = 0, 1, 2,... N.

Used when there are 2 categories.

No assumptions Calculate exact

probability of obtaining N - k individuals in category 1 and k individuals in category 2, with k = 0, 1, 2,... N.

Number of observations0 1 2 3 4 5 6 7 8 9 10

Pro

bab

ilit

yBinominal distribution, p = 0.5,

N = 10

2001



An example: sex ratio of beaversAn example: sex ratio of beaversAn example: sex ratio of beaversAn example: sex ratio of beavers

H0: sex-ratio is 1:1, so p = 0.5 = q

p(0 males, females) = .00195

p(1 male/female, 9 male/female) = .0195

p(9 or more individuals of same sex) = .0215, or 2.15%.

therefore, reject H0

H0: sex-ratio is 1:1, so p = 0.5 = q

p(0 males, females) = .00195

p(1 male/female, 9 male/female) = .0195

p(9 or more individuals of same sex) = .0215, or 2.15%.

therefore, reject H0

Sample Males Females

Observed 9 1

Expected 5 5

2001



Multinomial testMultinomial test

Simple extension of binomial test for more than 2 categories

Must specify 2 probabilities, p and q, for null hypothesis, p + q + r = 1.0.

No assumptions... ...but so tedious that in practice 2 is used.

2001



Multinomial test: segregation ratiosMultinomial test: segregation ratiosMultinomial test: segregation ratiosMultinomial test: segregation ratios

Hypothesis: both parents Aa, therefore segregation ratio is 1 AA: 2 Aa: 1 aa.

So under H0, p =.25, q = .50, r = .25

For N = 60, p < .001 Therefore, reject H0.

Hypothesis: both parents Aa, therefore segregation ratio is 1 AA: 2 Aa: 1 aa.

So under H0, p =.25, q = .50, r = .25

For N = 60, p < .001 Therefore, reject H0.

Genotype O E

AA 32 15

Aa 28 30

aa 0 15

2001



Goodness of fit: testing normalityGoodness of fit: testing normalityGoodness of fit: testing normalityGoodness of fit: testing normality

Since normality is an assumption of all parametric statistical tests, testing for normality is often required.

Tests for normality include 2 or G, Kolmogorov-Smirnov, Wilks-Shapiro & Lilliefors.

Since normality is an assumption of all parametric statistical tests, testing for normality is often required.

Tests for normality include 2 or G, Kolmogorov-Smirnov, Wilks-Shapiro & Lilliefors.

Fre

qu

en

cy

Category/class

Observed

Expected underhypothesis of normaldistribution

2001



Cumulative distributionsCumulative distributionsCumulative distributionsCumulative distributions

Areas under the normal probability density function and the cumulative normal distribution function

Areas under the normal probability density function and the cumulative normal distribution function

0.2

0.4

0.6

0.8

1.0

0

2.28%

50.00%

68.27%

F

Normal probabilitydensity function

Cumulative normaldensity function

2001



22 or G test for normality or G test for normality22 or G test for normality or G test for normality

Put data in classes (histogram) and compute expected frequencies based on discrete normal distribution.

Calculate 2. Requires large samples

(kmin = 10) and is not powerful because of loss of information.

Put data in classes (histogram) and compute expected frequencies based on discrete normal distribution.

Calculate 2. Requires large samples

(kmin = 10) and is not powerful because of loss of information.

Observed

Expected underhypothesis of normaldistribution

Fre

qu

en

cy

Category/class

2001



““Non-statistical” assessments of Non-statistical” assessments of normalitynormality

““Non-statistical” assessments of Non-statistical” assessments of normalitynormality

Do normal probability plot of normal equivalent deviates (NEDs) versus X.

If line appears more or less straight, then data are approximately normally distributed.

Do normal probability plot of normal equivalent deviates (NEDs) versus X.

If line appears more or less straight, then data are approximately normally distributed.

NE

Ds

X

Normal

Non-normal

2001



Komolgorov-Smirnov goodness of fitKomolgorov-Smirnov goodness of fitKomolgorov-Smirnov goodness of fitKomolgorov-Smirnov goodness of fit Compares observed

cumulative distribution to expected cumulative distribution under the null hypothesis.

p is based on Dmax, absolute difference, between observed and expected cumulative relative frequencies.

Compares observed cumulative distribution to expected cumulative distribution under the null hypothesis.

p is based on Dmax, absolute difference, between observed and expected cumulative relative frequencies.

Dmax

X

0.2

0.4

0.6

0.8

1.0

Cu

mu

lati

ve f

req

ue

ncy

2001



An example: wing length in fliesAn example: wing length in fliesAn example: wing length in fliesAn example: wing length in flies

10 flies with wing lengths: 4, 4.5, 4.9, 5.0, 5.1, 5.3, 5.5, 5.6, 5.7, 5.8, 5.9, 6.0

cumulative relative frequencies: .1, .2, .3, .4, .5, .6, .7, .8, .9, 1.0

10 flies with wing lengths: 4, 4.5, 4.9, 5.0, 5.1, 5.3, 5.5, 5.6, 5.7, 5.8, 5.9, 6.0

cumulative relative frequencies: .1, .2, .3, .4, .5, .6, .7, .8, .9, 1.0

Wing length

0.2

0.4

0.6

0.8

1.0

Cu

mu

lati

ve f

req

ue

ncy

Dmax

4.0 4.5 5.0 5.5 6.0

2001



Lilliefors testLilliefors test

KS test is conservative for tests in which the expected distribution is based on sample statistics.

Liliiefors corrects for this to produce a more reliable test.

Should be used when null hypothesis is intrinsic versus extrinsic.

2001



An important note on testing An important note on testing normality!normality!

An important note on testing An important note on testing normality!normality!

When N is small, most tests have low power. Hence, very large deviations are required in

order to reject the null. When N is large, power is high. Hence, very small deviations from normality will

be sufficient to reject the null. So, exercise common sense!

When N is small, most tests have low power. Hence, very large deviations are required in

order to reject the null. When N is large, power is high. Hence, very small deviations from normality will

be sufficient to reject the null. So, exercise common sense!

Lecture 4: Fitting distributions: goodness of fit

Documents