Top Banner
Comparing Classical and Bayesian Approaches to Hypothesis Testing James O. Berger Institute of Statistics and Decision Sciences Duke University www.stat.duke.edu
27

Comparing Classical and Bayesian Approaches to Hypothesis Testing

Jan 15, 2016

Download

Documents

kalila

Comparing Classical and Bayesian Approaches to Hypothesis Testing. James O. Berger Institute of Statistics and Decision Sciences Duke University www.stat.duke.edu. Outline. The apparent overuse of hypothesis testing When is point null testing needed? The misleading nature of P-values - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Comparing Classical and Bayesian Approaches to Hypothesis Testing

Comparing Classical and Bayesian Approaches to Hypothesis Testing

James O. BergerInstitute of Statistics and Decision Sciences

Duke Universitywww.stat.duke.edu

Page 2: Comparing Classical and Bayesian Approaches to Hypothesis Testing

Outline

• The apparent overuse of hypothesis testing

• When is point null testing needed?

• The misleading nature of P-values

• Bayesian and conditional frequentist testing of plausible hypotheses

• Advantages of Bayesian testing

• Conclusions

Page 3: Comparing Classical and Bayesian Approaches to Hypothesis Testing

I. The apparent overuse of hypothesis testing

• Tests are often performed when they are irrelevant.

• Rejection by an irrelevant test is sometimes viewed as “license” to forget statistics in further analysis

Page 4: Comparing Classical and Bayesian Approaches to Hypothesis Testing

HabitatType

Rank ObservedUsage

Hypothesis

A 1 3.8B 2 3.6 H0 : "mean usage isC 3 2.8 equal for all habitats"D 4 1.8 Rejected (P<.025)E 5 1.5F 6 0.7

Prototypical example

Page 5: Comparing Classical and Bayesian Approaches to Hypothesis Testing

Statistical mistakes in the example

• The hypothesis is not plausible; testing serves no purpose.

• The observed usage levels are given without confidence sets.

• The rankings are based only on observed means, and are given without uncertainties. (For instance, perhaps Pr (A>B)=0.6 only.)

Page 6: Comparing Classical and Bayesian Approaches to Hypothesis Testing

HabitatType

Rank ObservedUsage

Hypothesis

A 1 3.8B 2 3.6 H0 : "mean usage isC 3 2.8 equal for all habitats"D 4 1.8 Rejected (P<.025)E 5 1.5F 6 0.7

Prototypical example

Page 7: Comparing Classical and Bayesian Approaches to Hypothesis Testing

Statistical mistakes in the example

• The hypothesis is not plausible; testing serves no purpose.

• The observed usage levels are given without confidence sets.

• The rankings are based only on observed means, and are given without uncertainties. (For instance, perhaps Pr (A>B)=0.6 only.)

Page 8: Comparing Classical and Bayesian Approaches to Hypothesis Testing

HabitatType

Rank ObservedUsage

Hypothesis

A 1 3.8B 2 3.6 H0 : "mean usage isC 3 2.8 equal for all habitats"D 4 1.8 Rejected (P<.025)E 5 1.5F 6 0.7

Prototypical example

Page 9: Comparing Classical and Bayesian Approaches to Hypothesis Testing

Note that, while is typically not

plausible, it is a good approximation to

as long as < (4 n

(assuming Gaussian observations with

standard deviation ).

H

H

n

0

0

0:

:| | , )

II. When is testing of a point null hypothesis needed?

Answer: When the hypothesis is plausible, tosome degree.

Page 10: Comparing Classical and Bayesian Approaches to Hypothesis Testing

Examples of hypotheses that are not realistically plausible

• H0: small mammals are as abundant on livestock grazing land as on non-grazing land

• H0: survival rates of brood mates are independent

• H0: bird abundance does not depend on the type of forest habitat they occupy

• H0: cottontail choice of habitat does not depend on the season

Page 11: Comparing Classical and Bayesian Approaches to Hypothesis Testing

Examples of hypotheses that may be plausible, to at least some degree:

• H0: Males and females of a species are the same in terms of characteristic A.

• H0: Proximity to logging roads does not affect ground nest predation.

• H0: Pollutant A does not affect Species B.

Page 12: Comparing Classical and Bayesian Approaches to Hypothesis Testing

Example: Experimental drugs D1, D2, D3, . . .

are to be tested.Each Test: H0: Di has negligible effect

H1: Di is effective

Typical Bayesian Answer: The probabilitythat H0 is true is 0.06.

Classical Answer (P-value): If H0 were true, the

probability of observing hypothetical data as ormore "extreme" than the actual data is 0.06.

III. For plausible hypotheses, P-valuesare misleading as measures of evidence

Page 13: Comparing Classical and Bayesian Approaches to Hypothesis Testing

DRUG D1 D2 D3 D4 D5 D6

P-VALUE 0.41 0.04 0.32 0.94 0.01 0.28

DRUG D7 D8 D9 D10 D11 D12

P-VALUE 0.11 0.05 0.65 0.009 0.09 0.66

Question: How strongly do we believe thatDrug i has a nonnegligible effect when (i) the P-value is approximately 0.05?(ii) the P-value is approximately 0.01?

Page 14: Comparing Classical and Bayesian Approaches to Hypothesis Testing

A Surprising Fact: Suppose it is knownthat, apriori, about 50% of the Drugs willhave negligible effect. Then,

(i) of the Drugs for which the P-value 0.05, at least 25% (and typically over 50%) will have negligible effect;

(ii) of the Drugs for which the P-value 0.01, at least 7% (and typically over 15%) will have negligible effect.

Page 15: Comparing Classical and Bayesian Approaches to Hypothesis Testing

DRUG D1 D2 D3 D4 D5 D6

P-VALUE 0.41 0.04 0.32 0.94 0.01 0.28

DRUG D7 D8 D9 D10 D11 D12

P-VALUE 0.11 0.05 0.65 0.009 0.09 0.66

Question: How strongly do we believe thatDrug i has a nonnegligible effect when (i) the P-value is approximately 0.05?(ii) the P-value is approximately 0.01?

Page 16: Comparing Classical and Bayesian Approaches to Hypothesis Testing

A Surprising Fact: Suppose it is knownthat, apriori, about 50% of the Drugs willhave negligible effect. Then,

(i) of the Drugs for which the P-value 0.05, at least 25% (and typically over 50%) will have negligible effect;

(ii) of the Drugs for which the P-value 0.01, at least 7% (and typically over 15%) will have negligible effect.

Page 17: Comparing Classical and Bayesian Approaches to Hypothesis Testing

DRUG D1 D2 D3 D4 D5 D6

P-VALUE 0.41 0.04 0.32 0.94 0.01 0.28

DRUG D7 D8 D9 D10 D11 D12

P-VALUE 0.11 0.05 0.65 0.009 0.09 0.66

Question: How strongly do we believe thatDrug i has a nonnegligible effect when (i) the P-value is approximately 0.05?(ii) the P-value is approximately 0.01?

Page 18: Comparing Classical and Bayesian Approaches to Hypothesis Testing

IV. Bayesian testing of point hypotheses

Data and Model: has density

# of eggs hatched out of eggs

in a recently polluted area (so is binomial,

and is the true proportion that would hatch).

To Test: versus

is the historically known proportion

of eggs that hatch in the area

X

X

f x

Example n

f

H H

Example

( | )

:

: :

:

0 0 1 0

0

Page 19: Comparing Classical and Bayesian Approaches to Hypothesis Testing

The prior distribution

Let and be the prior probabilities of and

(The usual choice is

Under , let be the density representing

information concerning the location of (The usual

choice for the binomial problem is

There are two schools of Bayesian statistics,

the school, where the prior distribution

reflects real extraneous information, and the

school, where the prior is chosen in a default fashion.

P P H H

default P P

H

default

Note

subjective

objective

1 2 1 2

1 2

1

05

1

.

. .)

( )

.

( ) .)

:

Page 20: Comparing Classical and Bayesian Approaches to Hypothesis Testing

Posterior probability that H0 is true, given the data (from Bayes theorem):

Pr( | )( | )

( | ) ( | ) ( )

( , )

{ }

( )

H data xP f x

P f x P f x d

Beta x n xx x n

00 0

0 0 1

0 0

1

0

1 1 1 1

(for the binomial testing problem)

Page 21: Comparing Classical and Bayesian Approaches to Hypothesis Testing

Note: Some prefer to use the (or

weighted likelihood ratio) of to ,

=

,

since this does not involve prior probabilties of the .

Suppose eggs hatched out of

Then and (Here a

classical test would yield

Bayes Factor

H H

Bf x

f x d

likelihood of data under H

average likelihood of data under H

H

Example x= n= .

H data x B

P value

i

0 1

0

0

1

0

0

40 100

0 52 0 92

0 05

( | )

( | ) ( )

" "

:

Pr( | ) . . .

. .)

{ }

Page 22: Comparing Classical and Bayesian Approaches to Hypothesis Testing

Conditional frequentist interpretation of the posterior probability of H0

Pr( | )

,

.

(

H data x frequentist type I error

probability

x

type I error probability

0 is also the

conditional on observing data of the

same "strength of evidence" as the actual data

The classical makes the

mistake of reporting the error averaged over data

of very different strengths.)

Page 23: Comparing Classical and Bayesian Approaches to Hypothesis Testing

V. Advantages of Bayesian testing

• Pr (H0 | data x) reflects real expected error rates: P-values do not.

• A default formula exists for all situations:

Pr( | )( , ) ( , ) ( , )

( , ) ( , )

* * *

*

( )

*

H data xf x f x f x dx d

f x f x d

x

0

0

0

1

1 ,

where is independent (unobserved) data of the smallest

size such that the above integrals exist.

Page 24: Comparing Classical and Bayesian Approaches to Hypothesis Testing

• Posterior probabilities allow for incorporation of personal opinion, if desired. Indeed, if the published default posterior probability of H0 is P*, and your prior probability of H0 is P0, then your posterior probability of H0 is

In the binomial example, recall

A "skeptic" has ; hence

A "believer" has ; hence

Pr( | )

: . .

. Pr( | ) . .

. Pr( | ) . .

*

( )

*

H data xP P

Example P

P H data x

P H data x

00

1

0 0

0 0

11

11

1

052

01 011

0 9 0 91

Page 25: Comparing Classical and Bayesian Approaches to Hypothesis Testing

• Posterior probabilities are not affected by the reason for stopping experimentation, and hence do not require rigid experimental designs (as do classical testing measures).

• Posterior probabilities can be used for multiple models or hypotheses.

Example H

H

H

H data H data H data

: :

:

:

Pr( | ) . , Pr( | ) . , Pr( | ) .

pollutant A has no effect on species B

pollutant A decreases abundance of species B

pollutant A increases abundance of species B

0

1

2

0 1 230 68 02

Page 26: Comparing Classical and Bayesian Approaches to Hypothesis Testing

An aside: integrating science and statistics via the Bayesian paradigm

• Any scientific question can be asked (e.g., What is the probability that switching to management plan A will increase species abundance by 20% more than will plan B?)

• Models can be built that simultaneously incorporate known science and statistics.

• If desired, expert opinion can be built into the analysis.

Page 27: Comparing Classical and Bayesian Approaches to Hypothesis Testing

Conclusions

• Hypothesis testing is overutilized while (Bayesian) statistics is underutilized.

• Hypothesis testing is needed only when testing a “plausible” hypothesis (and this may be a rare occurrence in wildlife studies).

• The Bayesian approach to hypothesis testing has considerable advantages in terms of interpretability (actual error rates), general applicability, and flexible experimentation.