Life After P-hacking (APS May 2013, Washington DC) With minor edits for posting Uri Simonsohn Penn (gave the talk) Leif Nelson UC Berkeley Joe Simmons.

Life After P-hacking(APS May 2013, Washington DC)

With minor edits for posting

Uri SimonsohnPenn (gave the talk)

Leif NelsonUC Berkeley

Joe SimmonsPenn also

Photo not necessary

http://opim.wharton.upenn.edu/~uws/

Definition

p-hacking: exploiting researchers’ degrees-of-freedom seeking p<.05

Life after p-hacking

• n>50• Direct replications• 21 words• Compromise writing• Who to hire• What about Bayesian?

~ Median study: n=20

• False-Positive Psych: n>20

• What can you reliably detect with n=20?

• Mturk study. – N=674– Why not published ds?

n=20 is enough for:

• Men taller than womenn=6

• People above median age closer to retirementn=10

• Women, more shoes than menn=15

n=20 is not enough for:• People who like spicy food are more likely to like Indian food n = 27

• Liberals rate social equality as more important than do conservatives n = 34

• People who like eggs report eating egg salad more often n = 47

• Men weigh more than women n = 47

• Smokers think smoking is less likely to kill someone than do non-smokersn = 146

• People who like spicy food are more likely to like Indian food n = 27

• Liberals rate social equality as more important than do conservatives n = 34

• People who like eggs report eating egg salad more often n = 47

• Men weigh more than women n = 47

• Smokers think smoking is less likely to kill someone than do non-smokersn = 146

• Are you studying a bigger effect than: • Men weigh more than women?

• If not, use n>50


• n>50• Direct replications• 21 words• Compromise writing• Who to hire• What about Bayesian?

Lion's Weight Coins Calories

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

Low HighEs

timat

e

Estimates are way off

Subjects confused?

Big outliers

Lion's Weight Coins Calories

-0.25-0.2

-0.15-0.1

-0.050

0.050.1

0.150.2

0.25

Low HighEs

timat

e

p < .03Estimates are way off

Subjects confused?

Big outliers

Calories

-0.25-0.2

-0.15-0.1

-0.050

0.050.1

0.150.2

0.25

Low HighEs

timat

e

p < .03

Study 1?

• Run calories study again.• Same exclusion rule.

Why not just conceptual replication?

• Restart p-hacking clock

• Failures do not count

Replications

• Conceptual– Rule out confounds– Rule in generalizability

• Direct– Rule out false-positive


• n>50• Direct replications• 21 words (Google it)• Compromise writing• Who to hire• What about Bayesian?

How can an organic farmer compete?

How can an organic researcher compete?

• If you determined sample size in advanceSay it.

• If you did not drop variablesSay it.

• If you did not drop conditionsSay it.

21 Word Solution get .pdf here http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2160588

Footnote 1

We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study.

Organic Farmer Organic Researcher

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2160588

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2160588


• n>50• Direct replications• 21 words • Compromise writing• Who to hire• What about Bayesian?

Compromise writing

• While reviewers still in dark ages.• Have it both ways.• “Clean” version in main text

– All studies “worked” & < 2500 words• Supplement/footnote

– n=100n=150 – p=.08 w/o exclusion– Data and materials online

• Only reformers read small print• Organic 21 words applies.• Everybody likes the paper


• n>50• Direct replications• 21 words • Compromise writing• Who to hire• What about Bayesian?

If you hire based on quantityyou pass on these guys

What’s the alternative to counting papers?

• Rookies: Best 1• Tenure: Best 3• Full: Best 5

Try it. It is a powerful question. What’s her best paper?


• n>50• Direct replications• 21 words • Compromise writing• Who to hire• What about Bayesian? Only speak for myself here.

My prior: Bayesians will be unhappy in 3 2 1

P-hacking also invalidatesBayesian results

P-hacking also invalidatesBayesian results

Let me say that again

• Bayesian proposals for Psych1) Bayesian t-test• Replications use it sometimes • Turns out

– α = 5%

2) Bayesian estimation • Latest JEP:G . • Turns out

– Changes nothing

1%

t-test “vs” Bayesian Estimationchanges nothing

How similar?Results change by less than if we dropped 1 observation at random.

But!

• Isn’t data-peeking OK for Bayes?– Not when used for hypothesis testing

• Also:– Dropped subjects, measures, conditions invalidate all inference.

• P-hacking Bayesian stats

• Drunk driving leather seats

Good reasons to go Bayesian do not include p-hacking.

• Next slide is the last.


• n>50• Direct replications• 21 words • Compromise writing• Who to hire• What about Bayesian? Only speak for myself here.

Leif NelsonUC Berkeley

Joe SimmonsPenn

Life After P-hacking (APS May 2013, Washington DC) With minor edits for posting Uri Simonsohn Penn (gave the talk) Leif Nelson UC Berkeley Joe Simmons.

Documents

n50 slide

paper slide

necessary slide

women n

p life

conservatives n

nonsmokers n

retirement n