Top Banner
Hypothesis Tests II
58

Hypothesis Tests II. The normal distribution Normally distributed data.

Apr 01, 2015

Download

Documents

Aylin Dewhirst
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hypothesis Tests II. The normal distribution Normally distributed data.

Hypothesis Tests II

Page 2: Hypothesis Tests II. The normal distribution Normally distributed data.

The normal distribution

Page 3: Hypothesis Tests II. The normal distribution Normally distributed data.

Normally distributed data

Page 4: Hypothesis Tests II. The normal distribution Normally distributed data.

Normally distributed means

Page 5: Hypothesis Tests II. The normal distribution Normally distributed data.
Page 6: Hypothesis Tests II. The normal distribution Normally distributed data.

First, lets consider a more simple problem…

We are testing the equality of a mean of a population (Y) to a particular value.Now, if is assumed, what do we know? We have some idea about the distribution of sample mean .We need a measuring device that is sensitive to the variations in , or in other words deviations from the statement therein…

Page 7: Hypothesis Tests II. The normal distribution Normally distributed data.
Page 8: Hypothesis Tests II. The normal distribution Normally distributed data.
Page 9: Hypothesis Tests II. The normal distribution Normally distributed data.
Page 10: Hypothesis Tests II. The normal distribution Normally distributed data.
Page 11: Hypothesis Tests II. The normal distribution Normally distributed data.

z1

z2 has a bivariate normal distribution.

𝒇 (𝒛𝟏 ,𝒛𝟐 )= 𝟏

𝟐𝝅𝝈𝒛𝟏𝝈𝒛𝟐

√𝟏− 𝝆𝟐𝐞𝐱𝐩 ¿

Page 12: Hypothesis Tests II. The normal distribution Normally distributed data.

-10-5

05

10 -10

-5

0

5

10

0.000

0.005

0.010

0.015

Bivariate Normal Distribution

z1

z2

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

dnor

m (x

)

If has a bivariate normal distribution then the pdf of points is a one dimensional normal distribution. This distribution is over the line because all points of the form is situated on this line.

If has a multinomial normal distribution then the pdf of points is a two dimensional normal distribution. This distribution is over the plane because all points of the form is situated on this plane. (Hard to draw)This is how one

dimension (degrees of freedom) is lost!

Page 13: Hypothesis Tests II. The normal distribution Normally distributed data.

That means even though the points lie in a two dimensional space, the probability distribution function defined over them is basically single dimensional.

But,

The situation resembles the following: Assume we have two normally distributed random variables; and . Then the distribution of the sum their squares, i.e., does not necessarily have a Chi-squared distribution with two degrees of freedom. Why?

Page 14: Hypothesis Tests II. The normal distribution Normally distributed data.

Consider the case where . Then which has a Chi-square distribution of one degree of freedom. Hence unless are independent has chi-square distribution with one degree of freedom.

What is distribution?

Page 15: Hypothesis Tests II. The normal distribution Normally distributed data.
Page 16: Hypothesis Tests II. The normal distribution Normally distributed data.

The t-distribution

Page 17: Hypothesis Tests II. The normal distribution Normally distributed data.

That is why we divide by (n-1) in calculating sample s.d.

Page 18: Hypothesis Tests II. The normal distribution Normally distributed data.
Page 19: Hypothesis Tests II. The normal distribution Normally distributed data.

One sample t-test

Page 20: Hypothesis Tests II. The normal distribution Normally distributed data.
Page 21: Hypothesis Tests II. The normal distribution Normally distributed data.
Page 22: Hypothesis Tests II. The normal distribution Normally distributed data.

Two-sample testsB and A are types of seeds.

Page 23: Hypothesis Tests II. The normal distribution Normally distributed data.
Page 24: Hypothesis Tests II. The normal distribution Normally distributed data.

Numerical Example (wheat again)

Page 25: Hypothesis Tests II. The normal distribution Normally distributed data.

Summary: We have so far seen how a good test statistic (null distribution) looks like. The distribution that we have selected is a test book distribution. Could we pick others?

Page 26: Hypothesis Tests II. The normal distribution Normally distributed data.
Page 27: Hypothesis Tests II. The normal distribution Normally distributed data.

Choosing Test Statistic

Page 28: Hypothesis Tests II. The normal distribution Normally distributed data.

The t statistic

Page 29: Hypothesis Tests II. The normal distribution Normally distributed data.

The Kolmogorov-Smirnov statistic

Page 30: Hypothesis Tests II. The normal distribution Normally distributed data.

Comparing the test statistics

Page 31: Hypothesis Tests II. The normal distribution Normally distributed data.
Page 32: Hypothesis Tests II. The normal distribution Normally distributed data.
Page 33: Hypothesis Tests II. The normal distribution Normally distributed data.

Sensitivity to specific alternatives

Page 34: Hypothesis Tests II. The normal distribution Normally distributed data.
Page 35: Hypothesis Tests II. The normal distribution Normally distributed data.

Discussion

Page 36: Hypothesis Tests II. The normal distribution Normally distributed data.

Or…

• We need to add in additional assumptions such as equality of the stanadard deviations of the samples.

Page 37: Hypothesis Tests II. The normal distribution Normally distributed data.

Two-sample testsB and A are types of seeds.

Remembered?

Page 38: Hypothesis Tests II. The normal distribution Normally distributed data.

Contingency Tables

(Cross-Tabs)

Page 39: Hypothesis Tests II. The normal distribution Normally distributed data.

We use cross-tabulation when:

• We want to look at relationships among two or three variables.

• We want a descriptive statistical measure to tell us whether differences among groups are large enough to indicate some sort of relationship among variables.

Page 40: Hypothesis Tests II. The normal distribution Normally distributed data.

Cross-tabs are not sufficient to:

• Tell us the strength or actually size of the relationships among two or three variables.

• Test a hypothesis about the relationship between two or three variables.

• Tell us the direction of the relationship among two or more variables.

• Look at relationships between one nominal or ordinal variable and one ratio or interval variable unless the range of possible values for the ratio or interval variable is small. What do you think a table with a large number of ratio values would look like?

Page 41: Hypothesis Tests II. The normal distribution Normally distributed data.

Because we use tables in these ways, we can set up some decision rules about how to use tables

• Independent variables should be column variables.

• If you are not looking at independent and dependent variable relationships, use the variable that can logically be said to influence the other as your column variable.

• Using this rule, always calculate column percentages rather than row percentages.

• Use the column percentages to interpret your results.

Page 42: Hypothesis Tests II. The normal distribution Normally distributed data.

For example, • If we were looking at the relationship between

gender and income, gender would be the column variable and income would be the row variable. Logically gender can determine income. Income does not determine your gender.

• If we were looking at the relationship between ethnicity and location of a person’s home, ethnicity would be the column variable.

• However, if we were looking at the relationship between gender and ethnicity, one does not influence the other. Either variable could be the column variable.

Page 43: Hypothesis Tests II. The normal distribution Normally distributed data.

Contingency Tables (Cross-Tabs)

Marital Status

Married Single

GenderMale 37 41

Female 51 32

How do we measure the relationship?

Page 44: Hypothesis Tests II. The normal distribution Normally distributed data.

What do we EXPECT if there is no relationship?

Gender Total

Female Male

ResultCured 78

Not 83

Total 88 73 161

Page 45: Hypothesis Tests II. The normal distribution Normally distributed data.

Observed Expected

F M F M

Cured 37 41 Cured 42.6 35.4

Not 51 32 Not 45.4 37.6

6.37)6.3732(

4.45)4.4551(

4.35)4.3541(

6.42)6.4237( 2222

3.18

Page 46: Hypothesis Tests II. The normal distribution Normally distributed data.

RESULT● This test statistic has a χ2 distribution with

(2-1)(2-1) = 1 degree of freedom● The critical value at α = .01 of the χ2 distribution with

1 degree of freedom is 6.63● Thus we do not reject the null hypothesis that the

two proportions are equal, that the drug is equally effective for female and male patients

Page 47: Hypothesis Tests II. The normal distribution Normally distributed data.

INTRODUCTION TO ANOVA• The easiest way to understand ANOVA is to generate a tiny data set

(using GLM):

As a first step set the mean , to 5 for the dataset with 10 cases. In the table below all 10 cases have a score of 5 at this point.

CASE SCORE CASE SCORE

5 5

5 5

5 5

5 5

5 5

Page 48: Hypothesis Tests II. The normal distribution Normally distributed data.

• The next step is to add the effects of the IV. Suppose that the effect of the treatment at is to raise scores by 2 units and the effect of the treatment at is to lower scores by 2 units.

CASE SCORE CASE SCORE

5+2=7 5-2=3

5+2=7 5-2=3

5+2=7 5-2=3

5+2=7 5-2=3

5+2=7 5-2=3

Page 49: Hypothesis Tests II. The normal distribution Normally distributed data.

• The changes produced by treatment are the deviations of the scores from Over all of these cases the deviations is

This is the sum of the (squared) effects of treatment if all cases are influenced identically by the various levels of A and there is no error.

Page 50: Hypothesis Tests II. The normal distribution Normally distributed data.

• The third step is to complete the GLM with addition of error.

CASE SCORE CASE SCORE

5+2+2=9 5-2+0=3

5+2+0=7 5-2-2=1

5+2-1=6 5-2+0=3

5+2+0=7 5-2+1=4

5+2-1=6 5-2+1=4

Page 51: Hypothesis Tests II. The normal distribution Normally distributed data.

Then the variance for the group is

And the variance for the group is

The average of these variances is also 1.5Check that these numbers represent error variance; that means they represent random variability in scores within each group where all cases are treated the same and therefore are uncontaminated by effects of the IV.The variance for this group of 10 numbers, ignoring group memebership is

Page 52: Hypothesis Tests II. The normal distribution Normally distributed data.

Standard Setup for ANOVASum

9 3

7 1

6 3

7 4

6 4

The difference between each score and the Grand Mean is broken into two components:1. The difference between the score and its own group mean 2. The difference between that group mean and the grand mean

Page 53: Hypothesis Tests II. The normal distribution Normally distributed data.

Sum of squares for treatmentThe effect of the IV!!!

Sum of squares for error

Each term is then squared and summed seperately to produce the sum of squares for error and the sum of squares for treatment seperately. The basic partition holds because the cross product terms vanish.

∑𝑖∑𝑗

(𝑌 𝑖𝑗−𝐺𝑀 )2=∑𝑖∑𝑗

(𝑌 𝑖𝑗−𝑌 𝑗 )2+∑

𝑛∑𝑗

(Y j−GM )2

Page 54: Hypothesis Tests II. The normal distribution Normally distributed data.

This is the deviation form of basic ANOVA. Each of these terms is a sum of squares (SS). The average of this sum is the total variance in the set of scores ignoring group memebership. This term is called sum of square within groups. This term is called SS between groups.This sum is frequently symbolized as,

Page 55: Hypothesis Tests II. The normal distribution Normally distributed data.

At this point it is important to realize that the total variance in the set of scores is partitioned into two sources. One is the effect of the IV and the other is all remaining effects (which we call error). Because the effects of the IV are assessed by changes in the central tendencies of the groups, the inferences that come from ANOVA are about differences in central tendency.

However sum of squares are not yet variances. To become variances, they must be ‘averaged’. The denominators for averaging SS must be degrees of freedom so that the statistics will have a proper distribution (remember previous slides).

Page 56: Hypothesis Tests II. The normal distribution Normally distributed data.

So far we now that the degrees of freedom of must be N-1.

Furthermore,

Also,

Thus we have (as expected)

Page 57: Hypothesis Tests II. The normal distribution Normally distributed data.

Variance is an ‘averaged’ sum of squares (for empirical data of course). Then to obtain mean sum of squares (MS),

The F distribution is a sampling distribution of the ratio of two distributions.

This statististic is used to test the null hypothesis that

Page 58: Hypothesis Tests II. The normal distribution Normally distributed data.

Source table for basic ANOVASource SS df MS FBetween 40 1 40 26.67Within 12 8 1.5Total 52 9