Top Banner
The Future Of Kaggle Where we came from and where we’re going kaggle.com/benhamner @benhamner
40

Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

Apr 11, 2017

Download

Technology

MLconf
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

The Future Of KaggleWhere we came from and where we’re going

kaggle.com/benhamner@benhamner

Page 2: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

Our mission is to help the world learn from data

@benhamner

Page 3: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

We got started running supervised learning competitions

@benhamner

Page 4: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

Since 2010, we’ve run

● 240 general competitions● 1,610 university classroom competitions

We’re now doing this at scale

@benhamner

Page 5: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

This has attracted a talented and diverse community

@benhamner

Page 6: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

We’ve taught hundreds of thousands machine learning

@benhamner

Page 7: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

We’ve pushed the state of the art forward

@benhamner

Page 8: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

● What techniques work well● How people win competitions● Why our community participates● What major pain points data scientists hit● How we can help data scientists ameliorate these pain points

We’ve learned a tremendous amount along the way

@benhamner

Page 9: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

Great data scientists optimize the entire ML workflow

@benhamner

Page 10: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

GBM’s and deep neural networks are incredibly effective

@benhamner

Page 11: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

Model ensembling almost always ekes out gains

@benhamner

Page 12: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

Successful participants avoid overfitting

@benhamner

Page 13: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

We’ve seen major pain points

@benhamner

Page 14: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

Today’s practices are like programming in assembly

@benhamner

Page 15: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

Beside software engineering tools, ML tools feel like they came from the stone age

@benhamner

Page 16: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

Accessing data is tough

@benhamner

Page 17: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

Getting high quality data is even tougher

@benhamner

Page 18: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

Cleaning data is painful

Essay: “This essay got good marks, but as far as I can tell, it's gibberish.”

Human Scores: 5/5, 4/5@benhamner

Page 19: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

Data leakage is common and subtle

@benhamner

Page 20: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

Going from research to production can be brutal

@benhamner

Page 21: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

Reproducing work takes days to months

@benhamner

Page 22: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

We can do better than this

@benhamner

Page 23: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

Accessing data should be seamless

@benhamner

Page 24: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

You should never need to repeat work others have done

@benhamner

Page 25: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

A single command should reproduce everything start-to-end

> make all

@benhamner

Page 26: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

Making a successful one-line update should take seconds

@benhamner

Page 27: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

Helpful metadata shouldn’t stay buried in minds or emails

@benhamner

Page 28: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

Best practices should be easy defaults, not complicated custom contraptions

@benhamner

Page 29: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

We’re changing this

@benhamner

Page 30: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

We’ve launched two new products: Kernels and Datasets

@benhamner

Page 31: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

We recently joined Google Cloud to accelerate our growth

@benhamner

Page 32: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

Datasets, Kernels, and Competitions have an exciting future

@benhamner

Page 33: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

The world’s data will be accessible with a common interface

@benhamner

Page 34: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

That captures the important code and metadata on top of it

@benhamner

Page 35: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

A central searchable hub for your organization’s data

@benhamner

Page 36: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

A kernel is an atom of reproducible data science

@benhamner

Page 37: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

Kernels will be your continuous integration server for data

@benhamner

Page 38: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

We’ve started running code competitions

@benhamner

Page 39: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017

● Backtested time series● Live data feeds● Reinforcement learning● Generative modeling● Adversarial learning● Machine learning under computational constraints● Sensitive datasets

This will enable exciting new competition formats

@benhamner

Page 40: Ben Hamner, CTO, Kaggle, at MLconf NYC 2017