Top Banner
Lecture 5 Hypothesis testing
73
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 5 Hypothesis testing. What should you know?

Lecture 5Hypothesis testing

Page 2: Lecture 5 Hypothesis testing. What should you know?

What should you know?

• Confidence intervals (Wald and bootstrap)• p-value• how to find a normal probability (relates to a p-value)• how to find a normal quantile (relates to the confidence interval)• Central limit theorem (i.e. the standard deviation of the mean is and the distribution is

approximately normal• histogram (good for looking at data, assessing skeweness)• quantile plot (good for assessing normality)• box plot (good for comparing samples)• two sample t-test and its assumptions• power of a test• Type 1 and type 2 error

Page 3: Lecture 5 Hypothesis testing. What should you know?

Examplet-test and confidence interval

Page 4: Lecture 5 Hypothesis testing. What should you know?

Example

• Load SomeData1.sav• Do the test• Observe the confidence interval• Do a box plot

Page 5: Lecture 5 Hypothesis testing. What should you know?

Confidence interval

• 95 % Confidence interval for the difference between the means,

• is the 0.975 quantile of a t distribution with degrees of freedom.• This is a Wald interval: estimate plus/minus quantile x std.err.

Page 6: Lecture 5 Hypothesis testing. What should you know?

Return to the example SomeData1.sav• How would you calculate the pooled standard deviation from this

output?1. Take the standard error for the difference and divide by 2. Or, use the standard deviations from each sample and do:

Page 7: Lecture 5 Hypothesis testing. What should you know?

Why do we care?

• When doing a sample size calculation or a meta-analysis, you sometimes need to be able to retrieve the standard deviation from output that displays different information.

Page 8: Lecture 5 Hypothesis testing. What should you know?

Recall classic hypothesis testing framework• State the hypotheses• Get the test statistic• Calculate the p-value• If p-value is less than the significance level (say 0.05) reject the null• Otherwise Do not reject the null

Page 9: Lecture 5 Hypothesis testing. What should you know?

Technical point

• If p-value is less than , say “there is sufficient evidence to reject the null hypothesis.”• If p-value is greater than , say “there is insufficient evidence to reject

the null”, because:• Either the null is true• Or the sample size was not large enough to detect the alternative• Or the alternative is very close to the null (so we could not detect it)• Or we got unlucky

Page 10: Lecture 5 Hypothesis testing. What should you know?

SignificanceWhat is the probability that we reject the null when the null is true? (i.e. probability of a type 1 error)

Page 11: Lecture 5 Hypothesis testing. What should you know?

PowerIs the sample size large enough to reject the Null when the null is false?

Page 12: Lecture 5 Hypothesis testing. What should you know?
Page 13: Lecture 5 Hypothesis testing. What should you know?

Everbody, roll a D20 for an implausibility check

Page 14: Lecture 5 Hypothesis testing. What should you know?

Guess the modifierIf it quacks like a duck …

• Suppose we want to know whether a character has a 0 modifier for a trait checked with D20=20.• Note if the check is passed.• If passed, assume the modifier is greater than 0.• If fail, assume the modifier is greater than 0.

Page 15: Lecture 5 Hypothesis testing. What should you know?

Modifier Must roll Probability of passing the implausibility check

0 20 0.05

4 16 or better 0.25

10 10 or better 0.55

14 6 or better 0.75

15 5 or better 0.80

16 4 or better 0.85

Modifier + dice roll > 19

Page 16: Lecture 5 Hypothesis testing. What should you know?

A problem

• Note that characters with very small modifiers will probably fail the test. This is called a Type 2 error.• So the test works best if the character has a large modifier.• A non-significant result does not “prove” that the character has a 0

modifier.

Page 17: Lecture 5 Hypothesis testing. What should you know?

Power

• The power of a test is the probability of rejecting the null when the null is false.• Power is defined against particular alternatives.• The modifier test is powerful against the alternative that the modifier is 16• The modifier test is weak against the alternative that the modifier is 4.

Page 18: Lecture 5 Hypothesis testing. What should you know?

Gaining power

• Increase the sample size• Use a powerful test (technical stats issue)• Refine the study design to reduce variance

Theses are the same techniques used to reduce the confidence interval.

Page 19: Lecture 5 Hypothesis testing. What should you know?

Some problems with NHSTMultiple testing

Page 20: Lecture 5 Hypothesis testing. What should you know?

Multiple testing

• If I roll the dice often enough, I will pass the implausibility check• This applies to hypothesis testing• Repeated tests on the same data set, within the same study, may

yield a spurious “significant” result• This is called a type 1 error

Page 21: Lecture 5 Hypothesis testing. What should you know?

ExampleSomeData1.sav

Page 22: Lecture 5 Hypothesis testing. What should you know?

When the null is true

• Open SPSS• Go to Transform -> random number generators -> set active generator

-> Mersenne Twister

• -> Set starting points -> random start• Load SomeData1.sav• Add a column of random normal (all mean 0, sd 1)• Go to Analysis -> compare means -> independent samples• At least one person in the class should get a significant result (p < 0.05)

Page 23: Lecture 5 Hypothesis testing. What should you know?

My recommendation

• It is best to save the hypothesis test for the primary outcome• Use confidence intervals and effect sizes for secondary outcomes

Page 24: Lecture 5 Hypothesis testing. What should you know?

What does the p-value tell me?The p-value is not as informative as one might think

Page 25: Lecture 5 Hypothesis testing. What should you know?

What is p (the p-value)?

Page 26: Lecture 5 Hypothesis testing. What should you know?

The correct answer

• The correct answer is c)• The p-value is the probability of getting something at least as extreme

as what one got, assuming that the null hypothesis is true.

Page 27: Lecture 5 Hypothesis testing. What should you know?

p-value and sample size

• The p-value is a function of the sample size• If the null is false (even by a small amount) a large sample size will

yield a small p-value• A large study will almost undoubtedly yield a significant result, even

when nothing interesting is happening.• A small study will almost undoubtedly yield a non-significant result,

even when the intervention is effective.

Page 28: Lecture 5 Hypothesis testing. What should you know?

How many subjects do I need?

• A sample size calculation is an essential part of the design of any study.• The number of subjects you need depends on• variance of the data• the design of the study• the clinically meaningful effect that you want to be able to detect• MCID (minimal clinically important difference) The smallest change that a

patient (or other subject) would view as personally important.

Page 29: Lecture 5 Hypothesis testing. What should you know?

Calculations

• Simple cases can be solved analytically• More complex cases are resolved through simulation• Avoid online power calculators

Page 30: Lecture 5 Hypothesis testing. What should you know?

NHSTA history of abuse

Page 31: Lecture 5 Hypothesis testing. What should you know?

Abuses of NHST

• Fishing expeditions (NHST used as an exploratory technique)• Many measurements of interest (leads to multiple testing)• Measurements with high degree of variability, uncertain distributions (normality

assumption violated, so p-values not accurate)• Convenience samples (violates assumptions of randomness, independence)• Cult-like adherence to • In the presence of electronic computers, very large data bases are available for

analysis; everything is significant• Alternatively, underpowered studies; nothing is significant• Relying on the statistician to come up with the research question (no clear hypothesis)• RESULT: We are a long way from the scientific method

Page 32: Lecture 5 Hypothesis testing. What should you know?

Possible solutions

• Quote estimate and confidence interval and/or• Quote an effect size.• Never only quote the decision (reject/accept); quote the p-value

Page 33: Lecture 5 Hypothesis testing. What should you know?

What is an effect size?

• A measure of the effect (difference between groups) that does not depend on the sample size.• Cohen’s d:

• Alternate suggested effect size:

This statistic falls between 0 and 1. There are rules of thumb for what constitute large, medium and small effect.

Page 34: Lecture 5 Hypothesis testing. What should you know?

SPSS alert

• SPSS does not give you a Cohen’s d or the other effect size for the two-sample comparison.• It does give the mean difference and the confidence interval.

Page 35: Lecture 5 Hypothesis testing. What should you know?

Problems with the effect size

• The effect size is sometimes taken to represent some sort of absolute measure of meaningfulness• Measures of meaningfulness need to come from the subject matter• Quote the p-value, not the decisions (SPSS does this)

Page 36: Lecture 5 Hypothesis testing. What should you know?

Advantages of the p-value

• The p-value measures the strength of the evidence that you have against the null hypothesis.• The p-value is a pure number (no unit of measurement)• A common standard across all experiments using that methodology• Sometimes we need to make a decision: do we introduce the new treatment or not?

Hypothesis testing gives an objective criterion.

Page 37: Lecture 5 Hypothesis testing. What should you know?

Ideal conditions for NHST

• Carefully designed experiments• Everything randomized that should be randomized• One outcome of interest• No more subjects than necessary to achieve good power• Structure of measurements known to be normal (or whatever distribution is

assumed by the test)

Page 38: Lecture 5 Hypothesis testing. What should you know?

vocabulary

The following are equivalent• The significance level • The probability of a type 1 errorThe following are related• The probability of a type 2 error • The power of the test, Difference between and :• is set by the experimenter• is a consequence of the design.

Page 39: Lecture 5 Hypothesis testing. What should you know?

Pop quiz

•What is the difference between the significance level of a test and the p-value of that test?

Page 40: Lecture 5 Hypothesis testing. What should you know?

Answer

• The significance level (0.05, say) determines whether the null hypothesis is rejected or not.• The p-value (or observed significance level) measures the degree of

evidence against the null hypothesis in the data.

Page 41: Lecture 5 Hypothesis testing. What should you know?
Page 42: Lecture 5 Hypothesis testing. What should you know?

The two-sample testAssumptions

Page 43: Lecture 5 Hypothesis testing. What should you know?

Assumptions

1. The sample means are normally distribution (or almost)2. Variances are equal3. Everything is independent

Page 44: Lecture 5 Hypothesis testing. What should you know?

Normality

• The t-test is usually robust with respect to this conditions. • If the sample is large enough, this condition will hold.• As a reality check, a bootstrap test is possible or a non-parametric

test.

Page 45: Lecture 5 Hypothesis testing. What should you know?

Bootstrap two-sample test

• This is a resampling test.• The computer repeatedly permutes group membership labels

amongst the cases and calculates the T-statistic with the new groups.• If the null hypothesis is true, group membership is irrelevant.• What proportion of the bootstrapped T statistics are more extreme

that the “real” one?• This proportion is the p-value of the test.

Page 46: Lecture 5 Hypothesis testing. What should you know?

Transforming the data

• Sometimes a transformation of the data will produce something more normal like• Take logs• Take square roots• Other transformations are possible

• My experience: this rarely works, but sometimes it does.

Page 47: Lecture 5 Hypothesis testing. What should you know?

Example: the cloud seeding data

• Load clouds.csv into SPSS• Do a t-test of seeded vs unseeded data• Transform with logarithms• Repeat• Notice that there is a significant difference when the data have been

transformed.• Questions: Does it matter whether you use natural or base 10

logarithms?

Page 48: Lecture 5 Hypothesis testing. What should you know?

Check for normality

• Quantile plot on each of the two samples (SPSS does not do this easily)• Boxplot (at least gives an idea of symmetry)• Check the residuals (SPSS does not do this easily)

Page 49: Lecture 5 Hypothesis testing. What should you know?

HeteroscedasticityWhen the variances are unequal

Page 50: Lecture 5 Hypothesis testing. What should you know?

Unequal variances

• The two-sample t test assumes both samples have the same variance (resp. standard deviation)• Violation of this assumption can be bad, especially when the sample

sizes are unequal.

Page 51: Lecture 5 Hypothesis testing. What should you know?

Welch version of the t-test

• Supplied automatically in SPSS under “unequal variances assumption”• Replaces the pooled estimate with a different estimate of the

standard deviation.• Reduces the degrees of freedom• This inflates the quantile of the t distribution.• Makes a more conservative test

Page 52: Lecture 5 Hypothesis testing. What should you know?

Test for equality of variances

• Levene test• SPSS does this automatically• The statistic is always positive• large values suggest different variances• SPSS calculates a p-value and supplies this.

Page 53: Lecture 5 Hypothesis testing. What should you know?

Example

• Load SomeData2.sav• Run the test• Check Levene• Do a boxplot• Compare to SomeData1.sav

• A boxplot is a good way to determine graphically if the samples have different variances.

Page 54: Lecture 5 Hypothesis testing. What should you know?

Lack of independenceA show stopper

Page 55: Lecture 5 Hypothesis testing. What should you know?

Example

• Load invisibility.sav• Load invisibility RM.sav

• These files have the same numbers in the same groups• invisibility.sav assume two independent groups• invisibility RM.sav a repeated measures design

Page 56: Lecture 5 Hypothesis testing. What should you know?

Does a cloak of invisibility increase mischief?

1. Select a group of students. Randomly assign a cloak of invisibility to half the members. Record number of mischievous actions within the time frame of the study.

2. Select a group of students. Give each person a cloak of invisibility. Record the number of mischievous actions within the time frame. Remove the cloaks. Record number of mischievous actions within the time frame. Compare pre vs post.

Page 57: Lecture 5 Hypothesis testing. What should you know?

Matched pairs design

• Select n pairs. Members of each pair should be similar in some way.• Randomly assign the treatment to one member of each pair; the

other gets the control.• Record what happens.• Compare the difference.• The “pair” might be the same individual at different time points (pre-

post analysis).

Page 58: Lecture 5 Hypothesis testing. What should you know?

The analysis

• Calculate the differences (pre-post, or post-pre)• Do a one sample t-test on the differences (use SPSS)• Null hypothesis: the true mean difference is 0.• Alternate hypothesis: the true mean difference is not 0.• Test statistic has a t distribution with n-1 degrees of freedom under

the null. n is the number of pairs.

• is the difference between the means• is the standard deviation of the differences

Page 59: Lecture 5 Hypothesis testing. What should you know?

Example

• Look at the output in SPSS• Can you see where everything comes from?• Is there a levene test for the paired t-test?• What assumptions do you think need to hold for the paired t-test?• Create a variable for the pre-post differences. Test for normality.

Page 60: Lecture 5 Hypothesis testing. What should you know?

Assumptions

• We have matched intelligently (i.e. pairs share common features)• The differences are normal with common mean and variance• The differences are independent

Page 61: Lecture 5 Hypothesis testing. What should you know?

The one-sided testSeldom seen in medical research papers

Page 62: Lecture 5 Hypothesis testing. What should you know?

Independent two sample test

• Different alternate hypothesis:

• Or, we might consider:

Page 63: Lecture 5 Hypothesis testing. What should you know?

Implementation

• Not done explicitly in SPSS• Perform the standard two-sample test• Halve the p-value of what you get• Use this value to make a decision (accept vs reject)

Page 64: Lecture 5 Hypothesis testing. What should you know?

What is going on?

• The rejection region favours values that “trend” towards the alternate hypothesis• Suppose the alternate hypothesis

is that the group 2 mean is greater than the group 1 mean, with 3 observations in each group. We observe 1.5

Page 65: Lecture 5 Hypothesis testing. What should you know?

Compare with the two-sided hypothesis• Same situation, but the

alternate is that the groups means differ.• Observe the same test

statistic value.• p-value is twice the

size, because we look at both tails.

Page 66: Lecture 5 Hypothesis testing. What should you know?

What happens in practice

• Editors of medical journals hate one-sided hypothesis tests.• However, they (the tests, not the editors) can be legitimate.• If you only care about one possibility (say that the new treatment

gives a bigger response than the old treatment), the one-sided test has greater power.• You are more likely to reject the null when the null is false. This is

good.

Page 67: Lecture 5 Hypothesis testing. What should you know?

Comparing 3 or more meansANOVA (Analysis of Variance)

Page 68: Lecture 5 Hypothesis testing. What should you know?

Run example

• Open SPSS file Viagra.sav• Note the structure of the file• One variable denotes group membership• One variable denotes the response

• Run Analysis -> Compare means -> One-way ANOVA

Page 69: Lecture 5 Hypothesis testing. What should you know?

Assumptions and goals

• The response is normally distributed• The data are independent• Each subject belongs to exactly one group• All responses have the same variance• We want to determine if the group means differ or not

Page 70: Lecture 5 Hypothesis testing. What should you know?

Look at the test statistic

• The test statistic (called F) is a ratio of variance estimates.• Numerator: a variance estimate built from the group means and

grand mean.• Denominator. the pooled variance estimate• The ratio of two independent estimates of the same variance has an F

distribution.• Under H0, the test statistic has an F distribution.• If the group means differ, the F statistic will be large.

Page 71: Lecture 5 Hypothesis testing. What should you know?

The Formula

• between groups sum of squares (k groups)• within groups sum of squares (n observations)• • F has an F distribution with k-1 and n-k degrees of freedom.

Page 72: Lecture 5 Hypothesis testing. What should you know?

Degrees of freedom

• Yes, there are degrees of freedom here• The F has 2 numbers associated with it:• the DF of the numerator = number of groups – 1• the DF of the denominator = n – number of groups

Page 73: Lecture 5 Hypothesis testing. What should you know?

Procedure

• Set up the data file (one variable for groups, one for responses)• Look at the boxplot• Run the analysis• Note the p-value• Note the value of R-squared• Look at the residual with a quantile plot to check for normality• If significant, look at post-hoc tests to determine which means differ from

the rest.• Alternatively, can plan “contrasts” and test these when the data are

balanced.