Top Banner
KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITY Presenter: Yuwei(Ruby) Jiao 16-11-29 CS 846 Software Engineering for Big Data 1
15

KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITYpalencar/cs846/fall-2016/presentations/... · KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITY Presenter: Yuwei(Ruby) Jiao 16-11-29 CS 846

Apr 08, 2018

Download

Documents

phungdien
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITYpalencar/cs846/fall-2016/presentations/... · KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITY Presenter: Yuwei(Ruby) Jiao 16-11-29 CS 846

KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITY

Presenter: Yuwei(Ruby) Jiao

16-11-29 CS 846 Software Engineering for Big Data 1

Page 2: KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITYpalencar/cs846/fall-2016/presentations/... · KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITY Presenter: Yuwei(Ruby) Jiao 16-11-29 CS 846

Outline • Background • Problem • Goals • Approach • Expectation •  Future Work

16-11-29 CS 846 Software Engineering for Big Data 2

Page 3: KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITYpalencar/cs846/fall-2016/presentations/... · KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITY Presenter: Yuwei(Ruby) Jiao 16-11-29 CS 846

Background • Kaggle:

•  a platform for predictive modeling and analytics competitions

•  companies and researchers post their data •  statisticians and data miners from all over the world

compete to produce the best models • Allstate:

•  the second largest personal lines insurer in the United States

•  is currently developing automated methods of predicting the cost, and hence severity of claims

16-11-29 CS 846 Software Engineering for Big Data 3

Page 4: KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITYpalencar/cs846/fall-2016/presentations/... · KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITY Presenter: Yuwei(Ruby) Jiao 16-11-29 CS 846

Problem • Data

•  train.csv (188318 x 132) •  test.csv (125546 x 131)

• Attributes •  ID 1 •  Categorical 116 •  Continuous 14 •  Loss 1

16-11-29 CS 846 Software Engineering for Big Data 4

Page 5: KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITYpalencar/cs846/fall-2016/presentations/... · KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITY Presenter: Yuwei(Ruby) Jiao 16-11-29 CS 846

Goals • Explore raw data

•  Data Statistics •  Data Visualization •  Data Transformation •  Data Interaction •  Data Preparation

• Evaluation, prediction and analysis •  Explore different machine learning models and algorithms

16-11-29 CS 846 Software Engineering for Big Data 5

Page 6: KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITYpalencar/cs846/fall-2016/presentations/... · KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITY Presenter: Yuwei(Ruby) Jiao 16-11-29 CS 846

Approach •  Language:

•  Python 3.0

•  Library:

16-11-29 CS 846 Software Engineering for Big Data 6

Page 7: KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITYpalencar/cs846/fall-2016/presentations/... · KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITY Presenter: Yuwei(Ruby) Jiao 16-11-29 CS 846

Approach --- Data Statistics

16-11-29 CS 846 Software Engineering for Big Data 7

skew

Page 8: KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITYpalencar/cs846/fall-2016/presentations/... · KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITY Presenter: Yuwei(Ruby) Jiao 16-11-29 CS 846

Approach --- Data Visualization

16-11-29 CS 846 Software Engineering for Big Data 8

Page 9: KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITYpalencar/cs846/fall-2016/presentations/... · KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITY Presenter: Yuwei(Ruby) Jiao 16-11-29 CS 846

Approach --- Data Visualization

16-11-29 CS 846 Software Engineering for Big Data 9

Page 10: KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITYpalencar/cs846/fall-2016/presentations/... · KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITY Presenter: Yuwei(Ruby) Jiao 16-11-29 CS 846

Approach --- Data Transformation

16-11-29 CS 846 Software Engineering for Big Data 10

Page 11: KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITYpalencar/cs846/fall-2016/presentations/... · KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITY Presenter: Yuwei(Ruby) Jiao 16-11-29 CS 846

Approach --- Data Interaction

16-11-29 CS 846 Software Engineering for Big Data 11

Page 12: KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITYpalencar/cs846/fall-2016/presentations/... · KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITY Presenter: Yuwei(Ruby) Jiao 16-11-29 CS 846

Approach --- Data Preparation • Divide into dataset into train and validation set • Convert categorical attributes to binary vector with one-

hot encoding •  Determining the state has a low and constant cost •  Changing the state has the constant cost •  Easy to design and modify •  Easy to detect illegal states •  Takes advantage of an FPGA's abundant flip-flops

16-11-29 CS 846 Software Engineering for Big Data 12

Page 13: KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITYpalencar/cs846/fall-2016/presentations/... · KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITY Presenter: Yuwei(Ruby) Jiao 16-11-29 CS 846

Expectation • Make prediction:

•  XGBoost

• Current ranking: •  50%

• Expectation ranking: •  30%?

16-11-29 CS 846 Software Engineering for Big Data 13

Page 14: KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITYpalencar/cs846/fall-2016/presentations/... · KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITY Presenter: Yuwei(Ruby) Jiao 16-11-29 CS 846

Future Work •  Feature engineering

•  Use domain knowledge of the data to create features •  Make machine learning algorithms work

• Evaluation, prediction and analysis •  Linear Regression (Linear algo) •  LASSO Linear Regression (Linear algo) •  KNN (non-linear algo) •  SVM (Non-linear algo) •  Random Forest (Bagging) •  AdaBoost (Boosting)

16-11-29 CS 846 Software Engineering for Big Data 14

Page 15: KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITYpalencar/cs846/fall-2016/presentations/... · KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITY Presenter: Yuwei(Ruby) Jiao 16-11-29 CS 846

16-11-29 CS 846 Software Engineering for Big Data 15

Thank you!

Q + A?