10 things statistics taught us about big data
Post on 01-Dec-2014
3310 Views
Preview:
DESCRIPTION
Transcript
10 things statistics taught us about big data
Research Blogging Teaching
Research Blogging Teaching
jtleek.com
Research Blogging Teaching
simplystatistics.org
Research Blogging Teaching
jhudatascience.org
from: jtleek@gmail.com Roger let me know you gave him a ballpark figure for the number of students registered for his course "Computing for Data Analysis”. Could you give me an idea of how many have registered for my course "Data Analysis?”
from: pangwei@coursera.org Hi Jeff, 7,000 students! It's pretty awesome. (You'll be able to check this out yourself next week, once the class sites are up.)
from: rdpeng@gmail.com You are f**ed. -roger
Enrollment
Time
Enrollment
Time
Enrollment
Time
9 classes 1 month long Every month
Enrollment
Time
1,000,000+ Enrolled
http://goo.gl/vQK0RH
http://goo.gl/xWAlPi
10 statistics things
http://goo.gl/wTAuvR
1. Problem first, not solution backward 2. Define a metric for success first 3. Analyze interactively 4. Plot your data first and always 5. Know your real sample size 6. Watch out for confounders 7. Correct for multiple testing 8. Average many predictors 9. Smooth over time and space 10. Have others check your work
Problem first Not solution backward
http://goo.gl/3vA1OB
http://hyperboleandahalf.blogspot.com/
http://cran.r-project.org//
http://bioconductor.org/
Define a metric for success Before you start
http://www.agendia.com/managed-care/breast-cancer/mammaprint/
89% sensitivity 42% specificity 65% accuracy
http://www.biomedcentral.com/1471-2164/14/336/figure/F3
Analyze Interactively
http://had.co.nz/
https://twitter.com/EllieMcDonagh/status/469184554549248000
http://www.chrisstucchio.com/blog/2013/hadoop_hatred.html
Plot your data First and always
http://www.chrisstucchio.com/blog/2013/hadoop_hatred.html
h$p://www.biostat.wisc.edu/~kbroman/topten_worstgraphs/
Know your real sample size
Watch out for confounders
http://xkcd.com/552/
shoe size & literacy
Correct for multiple testing
http://xkcd.com/882/
http://xkcd.com/882/
http://xkcd.com/882/
Average many predictors
5 independent, 70% accurate classifiers
10 (.7^3)(.3^2)+5(.7^4)(.3)+(.7^5)=
83.7% accuracy http://www.cbcb.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdf Adapted from Todd Halloway
101 independent, 70% accurate classifiers
99.9% accuracy
http://www.cbcb.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdf Adapted from Todd Halloway
http://www.cbcb.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdf Adapted from Todd Halloway
Smooth (average) over time and space
http://simplystatistics.org/2014/02/13/loess-explained-in-a-gif/
http://fivethirtyeight.com/
Have others check your work
10 statistics things
http://goo.gl/wTAuvR
1. Problem first, not solution backward 2. Define a metric for success first 3. Analyze interactively 4. Plot your data first and always 5. Know your real sample size 6. Watch out for confounders 7. Correct for multiple testing 8. Average many predictors 9. Smooth over time and space 10. Have others check your work
jtleek.com/talks
top related