Page 1
20/10/2011
1
Anthony GoldbloomCEO, Kaggle
e-mail [email protected] @antgoldbloom
Predictive modeling competitions
Photo by mikebaird, www.flickr.com/photos/mikebaird
making data science a sport
1. Motivation2. Does it Work?3. Why it Works4. How it Works5. Case Studies
Page 2
20/10/2011
2
Mismatch between those with data andthose with the skills to analyse it
Crowdsourcing
1. Motivation2. Does it Work?3. Why it Works4. How it Works5. Case Studies
Page 3
20/10/2011
3
Forecast Error(MASE)
Existing model
Tourism Forecasting Competition
Aug 9 2 weeks later
1 month later
Competition End
dunnhumby Shopping Challenge
9
10
11
12
13
14
15
16
17
18
19
20
1 2 3 4 5 6 7 8 9 10 11
% C
orr
ec
tly P
red
icte
d V
isit
s
Competition Progress (Weeks)
Page 4
20/10/2011
4
1. Motivation2. Does it Work?3. Why it Works4. How it Works5. Case Studies
Page 5
20/10/2011
5
“In less than a week, Martin
O’Leary, a PhD student in glaciology, outperformed
the state-of-the-art algorithms”
“The world’s brightest physicists
have been working for decades on solving one of the great unifying
problems of our universe”
Kaggle’s Dark Matter Competition on the White House blog
User base: ~16,000 registered data scientists
Page 6
20/10/2011
6
Our User Base
• neural networks
• logistic regression
• support vector machine
• decision trees
• ensemble methods
• adaBoost
• Bayesian networks
• genetic algorithms
• random forest
• Monte Carlo methods
• principal component analysis
• Kalman filter
• evolutionary fuzzy modeling
Users apply different techniques
Page 7
20/10/2011
7
Additional slidesNot MIT, not SAS … UoL?
Page 8
20/10/2011
8
1. Motivation2. Does it Work?3. Why it Works4. How it Works5. Case Studies
1 2 3
Upload Submit Evaluate &Exchange
Page 9
20/10/2011
9
Use the wizard to post a competition
Participants make their entries
Page 10
20/10/2011
10
Competitions are judged based on predictive accuracy
Competition Mechanics
Competitions are judged on objective criteria
Page 11
20/10/2011
11
1. Motivation2. Does it Work?3. Why it Works4. How it Works5. Case Studies
Page 12
20/10/2011
12
Benchmarking
Page 13
20/10/2011
13
Untouched problems
Page 14
20/10/2011
14
2011$3 million prize
Successfulgrant applications
Outcomes of a competition to predict the success of grant applications:
- Better identify likely successes to avoid wasting resources on
hopeless applications
- Identify and communicate the characteristics of a successful application to future applicants
~25%
Page 15
20/10/2011
15
Who to hire?
Branding: “we do analytics”
Page 16
20/10/2011
16
Photo by gidzy, www.flickr.com/photos/gidzy
What could the world’s bestanalysts find in your data?e-mail [email protected] +1 650 283 9781