Top Banner
How to win data science competitions Silicon Valley Big Data Science Mountain View, 3/25/2015 Arno Candel, H2O.ai Matt Dowle, H2O.ai Mark Landry, Team H2O.ai
24
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: H2 o kaggle-032515

How to win data science competitions

Silicon Valley Big Data Science Mountain View, 3/25/2015

Arno Candel, H2O.ai Matt Dowle, H2O.ai

Mark Landry, Team H2O.ai

Page 2: H2 o kaggle-032515

H2O, @ArnoCandel

OutlineIntroduction to H2O (5 mins)

Kaggle Problems (55 mins)

Otto Group

Rain

2

https://github.com/h2oai/h2o-dev/tree/master/h2o-r/demos/kaggle

Page 3: H2 o kaggle-032515

H2O, @ArnoCandel

Teamwork at H2O.aiJava, Apache v2 Open-Source

#1 Java Machine Learning in Github Join the community!

3

Page 4: H2 o kaggle-032515

H2O, @ArnoCandel

H2O: Open-Source (Apache v2) Predictive Analytics Platform

4

Page 5: H2 o kaggle-032515

H2O, @ArnoCandel 5

H2O Architecture - Designed for speed, scale, accuracy & ease of use

Key technical points: • distributed JVMs + REST API • no Java GC issues

(data in byte[], Double) • loss-less number compression • Hadoop integration (v1,YARN) • R package (CRAN)

Pre-built fully featured algos: K-Means, NB, PCA, CoxPH, GLM, RF, GBM, DeepLearning

Page 6: H2 o kaggle-032515

H2O, @ArnoCandel 6

H2O GitBooks

https://leanpub.com/u/h2oai

Page 7: H2 o kaggle-032515

H2O, @ArnoCandel

H2O World7

http://h2o.ai/h2o-world/ http://learn.h2o.ai Watch the Videos

Day 2 • Speakers from Academia & Industry • Trevor Hastie (ML) • John Chambers (S, R) • Josh Bloch (Java API) • Many use cases from customers • 3 Top Kaggle Contestants (Top 10)

• 3 Panel discussions

Day 1 • Hands-On Training • Supervised • Unsupervised • Advanced Topics • Markting Usecase

• Product Demos • Hacker-Fest with Cliff Click (CTO, Hotspot)

Join us at H2O World 2015!

Page 8: H2 o kaggle-032515

H2O, @ArnoCandel

iPython Notebooks8

Page 9: H2 o kaggle-032515

H2O, @ArnoCandel

Sparkling Water: Spark+H2O9

Page 11: H2 o kaggle-032515

H2O, @ArnoCandel

Otto Group Challenge11

Data: 93 numerical features 9 output classes 62k training set rows 144k test set rows

Page 12: H2 o kaggle-032515

H2O, @ArnoCandel

Otto Group Challenge12

Install H2O (h2o-dev)

Page 13: H2 o kaggle-032515

H2O, @ArnoCandel

Otto Group Challenge13

Page 14: H2 o kaggle-032515

H2O, @ArnoCandel

Flow-based GUI14

Page 15: H2 o kaggle-032515

H2O, @ArnoCandel

Otto Group Challenge15

Page 16: H2 o kaggle-032515

H2O, @ArnoCandel

Otto Group Challenge16

Page 17: H2 o kaggle-032515

H2O, @ArnoCandel

Otto Group Challenge17

LB score: 0.501 (Benchmark: 1.56) #332 out of 1203

Page 19: H2 o kaggle-032515

H2O, @ArnoCandel

Hands-On: Rain Challenge19

Trouble: “List columns”, missing values, outliers, noise, …

Page 20: H2 o kaggle-032515

H2O, @ArnoCandel

Please Welcome Matt Dowle!

20

Page 22: H2 o kaggle-032515

H2O, @ArnoCandel

Beating the Benchmark

22

Score: 0.00973, Benchmark: 0.011776 by Mark Landry

Page 23: H2 o kaggle-032515

H2O, @ArnoCandel

More can be done with H2O23

Page 24: H2 o kaggle-032515

H2O, @ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning.

Join our Community and Meetups! https://github.com/h2oai h2ostream community forum www.h2o.ai @h2oai

24

Thank you!