Top Banner
PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894-August 5, 1981 Egon Pearson 11 August 1895 -12 June 1980
32

PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

PHILOSOPHY OF SCIENCE: Neyman-Pearson approach

Zoltán Dienes, Philosophy of Psychology

Jerzy Neyman

April 16, 1894-August

5, 1981

Egon Pearson

11 August 1895 -12 June 1980

Page 2: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

'The statistician cannot excuse himself from the duty of getting his

head clear on the principles of scientific inference, but equally no

other thinking person can avoid a like obligation'

Fisher 1951

Page 3: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

Prior to 1930s:

There were many statistical procedures

But no coherent account of what they achieved or of how to choose the right test.

Neyman and Pearson put the field of statistics on a firm logical footing

It is now orthodoxy

(but note: there are passionate attacks on just how firm their logical footing is!)

Page 4: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

What is probability?

Relative frequency interpretation

Need to specify a collective of elements – like throws of a dice.

In the long run – as number of observations goes to infinity – the proportion of throws of a dice showing a 3 is 1/6

The probability of a ‘3’ is 1/6 because that is the long run frequency of ‘3’s relative to all throws

Page 5: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

One cannot talk about the probability of a hypothesis e.g.

“this cancer drug is more effective than placebo” being true

“genes are coded by DNA” is not true 2/3 of the time in the long run – it is just true. There is no relevant long run.

A hypothesis is just true or false.

When we say what the probability of a hypothesis is, we are referring to a subjective probability

Page 6: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

Neyman-Pearson (defined the philosophy underlying standard statistics):

Probabilities are strictly long-run relative frequencies – not subjective!

Statistics do not tell us the probability of your theory or the null hypothesis being true.

So what relative frequencies are we talking about?

Page 7: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

If D = some data and H = a hypothesis

For example, H = this drug is just a placebo cure for depression

D = 17 out of 20 people felt happier on the drug than the placebo

One can talk about p(D|H)

The probability of the data given the hypothesis

e.g. p(‘17 or more people happier out of 20’|’drug is a placebo’)

Page 8: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

A collective or reference class we can use:

the elements are

‘measuring for each of 20 people whether they are happier on the drug day or the placebo day’ given the drug operates just as a placebo.

Consider a hypothetical collective of an infinite number of such experiments.

That is a meaningful probability we can calculate.

Page 9: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

One can NOT talk about p(H|D)

The probability of our hypothesis given the data

e.g. p(‘my drug is a placebo’| ‘obtaining 17 out of 20 people happier’)

What is the reference class??

The hypothesis is simply true or false.

Page 10: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

P(H|D) is the inverse of the conditional probability p(D|H)

Inverting conditional probabilities makes a big difference

e.g.

P(‘dying within two years’|’head bitten off by shark’) = 1

P(‘head was bitten off by shark’|’died in the last two years’) ~ 0

P(A|B) can have a very different value from P(B|A)

Page 11: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

Statistics cannot tell us how much to believe a certain hypothesis.

All we can do is set up decision rules for certain behaviours

– accepting or rejecting hypotheses –

such that in following those rules in the long run we will not often be wrong.

E.g. Decision procedure:

Run 40 subjects and reject null hypothesis if t-value larger than a critical value

Page 12: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

Our procedure tells us our long term error rates BUT it does not tell us which particular hypotheses are true or false or assign any of the hypotheses a probability.

All we know is our long run error rates.

Page 13: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

Need to control both types of error:

α = p(rejecting Ho|Ho) β = p(accepting Ho|Ho false)

State of World:

Decision: Ho true Ho false

Accept Ho Type II error

Reject Ho Type I error

Page 14: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

Consider a year in which of the null hypotheses we test, 4000 are actually true and 1000 actually false.  

State of World

___________________________

Decision H0 true H0 false

___________________________________________________

Accept H0 3800 500

Reject H0 200 500

___________________________

4000 1000

 

α = ? β = ?

Page 15: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

Need to control both types of error:

α = p(rejecting Ho/Ho) β = p(accepting Ho/Ho false)

power:

P(‘getting t as extreme or more extreme than critical’/Ho false)

Probability of detecting an effect given an effect really exists in the population. ( = 1 – β)

State of World:

Decision: Ho true Ho false

Accept Ho Type II error

Reject Ho Type I error

Page 16: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

Decide on allowable α and β BEFORE you run the experiment.

e.g. set α = .05 as per normal convention

Ideally also set β = .05.

α is just the significance level you will be testing at.

But how to control β?

Page 17: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

Controlling β:

Need to

1) Estimate the size of effect you think is plausible or interesting given your theory is true

2) Power tables or online programs tell you how many subjects you need to run to keep β to .05 (equivalently, to keep power at 0.95)

Page 18: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

Most studies do not do this!

But they should. Strict application of the Neyman-Pearson logic means setting the risks of both Type I and Type II errors in advance (α and β).

Most researchers are extremely worried about Type I errors (false positives) i.e. whether p < .05

but allow Type II errors (false negatives) to go uncontrolled.

Leads to inappropriate judgments about what results mean and what research should be done next.

Read handout for details!

Page 19: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

You read a review of studies looking at whether meditation reduces depression. 100 studies have been run and 50 are significant in the right direction and the remainder are non-significant. What should you conclude?

Page 20: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

You read a review of studies looking at whether meditation reduces depression. 100 studies have been run and 50 are significant in the right direction and the remainder are non-significant. What should you conclude?

If the null hypothesis were true, how many would be significant?

How many significant in the right direction?

Page 21: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

"The continued very extensive use of significance tests is alarming." (Cox 1986)

"After four decades of severe criticism, the ritual of null hypothesis significance testing---mechanical dichotomous decisions around a sacred .05 criterion---still persist. “

“[significance testing] does not tell us what we want to know, and .. out of desperation, we nevertheless believe that it does!"

(Cohen 1994)

Page 22: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

“statistical significance testing retards the growth of scientific knowledge; it never makes a positive contribution”

(Schmidt & Hunter, 1997, p. 37).

“The almost universal reliance on merely refuting the null hypothesis is a terrible mistake, is basically unsound, poor scientific strategy, and one of the worst things that ever happened in the history of psychology”

(Meehl, 1978, p. 817).

Page 23: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

A lot of criticism arises because most researchers do not follow the Neyman and Pearson demands in a sensible way

e.g. habitually ignoring power

BUT

The (orthodox) logic of Neyman and Pearson is also controversial

Page 24: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

To summarise:

You are allowed to draw a back and white conclusion when the decision procedure has known low error rates

Anything that affects the error rates of your decision procedure affects what decisions you can draw

Page 25: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

In general: The more opportunities you give yourself to make an error the higher the probability of an error becomes. So you must correct for this.

E.g.

Multiple tests: If you perform two t-tests the overall probability of an error is increased

Page 26: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

Multiple tests:

Testing the elderly vs the middle aged

AND the middle aged vs the young

That’s two t-tests so for the overall Type I rate to be controlled at .05 could conduct each test at the .025 level.

If one test is .04, would reject the null if just doing that one test but accept the null if doing two tests.

Page 27: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

Cannot test your data once at .05 level

Then run some more subjects

And test again at .05 level

Type I error rate is no longer .05 because you gave yourself two chances at declaring significance.

Each test must be conducted at a lower p-value for the overall error rate to be kept at .05.

Does that make sense?

Should our inferences depend on what else we might have done or just on what the data actually is?

Page 28: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.
Page 29: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.
Page 30: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

If when they stopped collecting data depends on who has the better kung fu

Then the mathematically correct result depends on whose kung fu is better!

Page 31: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

The mathematically correct answer depends on whose unconscious wish to please the other is strongest!!

Page 32: PHILOSOPHY OF SCIENCE: Neyman-Pearson approach Zoltán Dienes, Philosophy of Psychology Jerzy Neyman April 16, 1894- August 5, 1981 Egon Pearson 11 August.

The Bayesian (and likelihood) approaches do not depend on

when you planned to stop running subjects,

whether you conduct other tests,

or whether the test is planned or post hoc!