Life After P-hacking (APS May 2013, Washington DC) With minor edits for posting Uri Simonsohn Penn (gave the talk) Leif Nelson UC Berkeley Joe Simmons Penn also Photo not necessar y
Mar 31, 2015
Life After P-hacking(APS May 2013, Washington DC)
With minor edits for posting
Uri SimonsohnPenn (gave the talk)
Leif NelsonUC Berkeley
Joe SimmonsPenn also
Photo not necessary
Definition
p-hacking: exploiting researchers’ degrees-of-freedom seeking p<.05
Life after p-hacking
• n>50• Direct replications• 21 words• Compromise writing• Who to hire• What about Bayesian?
~ Median study: n=20
• False-Positive Psych: n>20
• What can you reliably detect with n=20?
• Mturk study. – N=674– Why not published ds?
n=20 is enough for:
• Men taller than womenn=6
• People above median age closer to retirementn=10
• Women, more shoes than menn=15
n=20 is not enough for:• People who like spicy food are more likely to like Indian food n = 27
• Liberals rate social equality as more important than do conservatives n = 34
• People who like eggs report eating egg salad more often n = 47
• Men weigh more than women n = 47
• Smokers think smoking is less likely to kill someone than do non-smokersn = 146
• People who like spicy food are more likely to like Indian food n = 27
• Liberals rate social equality as more important than do conservatives n = 34
• People who like eggs report eating egg salad more often n = 47
• Men weigh more than women n = 47
• Smokers think smoking is less likely to kill someone than do non-smokersn = 146
• Are you studying a bigger effect than: • Men weigh more than women?
• If not, use n>50
Life after p-hacking
• n>50• Direct replications• 21 words• Compromise writing• Who to hire• What about Bayesian?
Lion's Weight Coins Calories
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
Low HighEs
timat
e
Estimates are way off
Subjects confused?
Big outliers
Lion's Weight Coins Calories
-0.25-0.2
-0.15-0.1
-0.050
0.050.1
0.150.2
0.25
Low HighEs
timat
e
p < .03Estimates are way off
Subjects confused?
Big outliers
Calories
-0.25-0.2
-0.15-0.1
-0.050
0.050.1
0.150.2
0.25
Low HighEs
timat
e
p < .03
Study 1?
• Run calories study again.• Same exclusion rule.
Why not just conceptual replication?
• Restart p-hacking clock
• Failures do not count
Replications
• Conceptual– Rule out confounds– Rule in generalizability
• Direct– Rule out false-positive
Life after p-hacking
• n>50• Direct replications• 21 words (Google it)• Compromise writing• Who to hire• What about Bayesian?
How can an organic farmer compete?
How can an organic researcher compete?
• If you determined sample size in advanceSay it.
• If you did not drop variablesSay it.
• If you did not drop conditionsSay it.
21 Word Solution get .pdf here http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2160588
Footnote 1
We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study.
Organic Farmer Organic Researcher
Life after p-hacking
• n>50• Direct replications• 21 words • Compromise writing• Who to hire• What about Bayesian?
Compromise writing
• While reviewers still in dark ages.• Have it both ways.• “Clean” version in main text
– All studies “worked” & < 2500 words• Supplement/footnote
– n=100n=150 – p=.08 w/o exclusion– Data and materials online
• Only reformers read small print• Organic 21 words applies.• Everybody likes the paper
Life after p-hacking
• n>50• Direct replications• 21 words • Compromise writing• Who to hire• What about Bayesian?
If you hire based on quantityyou pass on these guys
What’s the alternative to counting papers?
• Rookies: Best 1• Tenure: Best 3• Full: Best 5
Try it. It is a powerful question. What’s her best paper?
Life after p-hacking
• n>50• Direct replications• 21 words • Compromise writing• Who to hire• What about Bayesian? Only speak for myself here.
My prior: Bayesians will be unhappy in 3 2 1
P-hacking also invalidatesBayesian results
P-hacking also invalidatesBayesian results
Let me say that again
• Bayesian proposals for Psych1) Bayesian t-test• Replications use it sometimes • Turns out
– α = 5%
2) Bayesian estimation • Latest JEP:G . • Turns out
– Changes nothing
1%
t-test “vs” Bayesian Estimationchanges nothing
How similar?Results change by less than if we dropped 1 observation at random.
But!
• Isn’t data-peeking OK for Bayes?– Not when used for hypothesis testing
• Also:– Dropped subjects, measures, conditions invalidate all inference.
• P-hacking Bayesian stats
• Drunk driving leather seats
Good reasons to go Bayesian do not include p-hacking.
• Next slide is the last.
Life after p-hacking
• n>50• Direct replications• 21 words • Compromise writing• Who to hire• What about Bayesian? Only speak for myself here.
Leif NelsonUC Berkeley
Joe SimmonsPenn