Anthony Goldbloom Kaggle [email protected] @antgoldbloom Heritage Health Prize Photo by mikebaird, www.flickr.com/photos/mikebaird A predictive modeling competition that’s life & death
Anthony Goldbloom Kaggle [email protected] @antgoldbloom
Heritage Health Prize
Photo by mikebaird, www.flickr.com/photos/mikebaird
A predictive modeling competition that’s life & death
Agenda –
What is Kaggle What are data science competitions The Heritage Health Prize Lessons Learned
Kaggle Connect is a marketplace with the world’s best, ranked
Kaggle Connect: submit your project and get matched with a data scientist
Agenda –
What is Kaggle What are data science competitions The Heritage Health Prize Lessons Learned
Competitions are judged on objective criteria
Competition Mechanics
Agenda –
What is Kaggle What are data science competitions The Heritage Health Prize Lessons Learned
The world’s largest ever data science competition
Competition Mechanics
Competition Mechanics
The prize launched on April 4th 2011 to lots of attention
First milestone prize was awarded on October 4th 2011
Winners include an IBM consultant and a hedge fund trader
0
50
100
150
200
250
0.46
0.462
0.464
0.466
0.468
0.47
0.472
0.474
May-11 August-11 November-11 February-12 May-12 August-12 November-12 February-13
Submission Count
Leading Score
Meet Phil, the IBMer who was part of the team that won the first milestone prize
Second milestone prize was announced on April 4th 2012
Dave & Phil win milestone prize 2 as well
0
50
100
150
200
250
0.46
0.462
0.464
0.466
0.468
0.47
0.472
0.474
May-11 August-11 November-11 February-12 May-12 August-12 November-12 February-13
Submission Count
Leading Score
Third milestone prize was announced on October 4th 2012
Third awarded to a BI consultant and a software engineer
0
50
100
150
200
250
0.46
0.462
0.464
0.466
0.468
0.47
0.472
0.474
May-11 August-11 November-11 February-12 May-12 August-12 November-12 February-13
Submission Count
Leading Score
The competition closed on April 4th 2013
Final prize will be announced at the in DC on June 4th
0
50
100
150
200
250
0.46
0.462
0.464
0.466
0.468
0.47
0.472
0.474
May-11 August-11 November-11 February-12 May-12 August-12 November-12 February-13
Submission Count
Leading Score
~2,000 participants in 1,660 teams 35,771 entries Over 4 years worth of man hours spent on the competition
Agenda –
What is Kaggle What are data science competitions The Heritage Health Prize Lessons Learned
Top entries bunch at around the same level of predictive accuracy
There’s only so much information you can extract from a data set
0
50
100
150
200
250
0.46
0.462
0.464
0.466
0.468
0.47
0.472
0.474
May-11 August-11 November-11 February-12 May-12 August-12 November-12 February-13
Submission Count
Leading Score
“In less than a week, Martin O’Leary, a PhD student in glaciology, outperformed the state-of-the-art algorithms”
“The world’s brightest physicists have been working for decades on solving one of the great unifying problems of our universe”
Kaggle’s Dark Matter Competition on the White House blog
Phil Brierley has performed well in a huge range of problems
Kaggle Connect provides the people – many of the world’s best, ranked