KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITY Presenter: Yuwei(Ruby) Jiao 16-11-29 CS 846 Software Engineering for Big Data 1
KAGGLE COMPETITION: ALLSTATE CLAIMS SEVERITY
Presenter: Yuwei(Ruby) Jiao
16-11-29 CS 846 Software Engineering for Big Data 1
Outline • Background • Problem • Goals • Approach • Expectation • Future Work
16-11-29 CS 846 Software Engineering for Big Data 2
Background • Kaggle:
• a platform for predictive modeling and analytics competitions
• companies and researchers post their data • statisticians and data miners from all over the world
compete to produce the best models • Allstate:
• the second largest personal lines insurer in the United States
• is currently developing automated methods of predicting the cost, and hence severity of claims
16-11-29 CS 846 Software Engineering for Big Data 3
Problem • Data
• train.csv (188318 x 132) • test.csv (125546 x 131)
• Attributes • ID 1 • Categorical 116 • Continuous 14 • Loss 1
16-11-29 CS 846 Software Engineering for Big Data 4
Goals • Explore raw data
• Data Statistics • Data Visualization • Data Transformation • Data Interaction • Data Preparation
• Evaluation, prediction and analysis • Explore different machine learning models and algorithms
16-11-29 CS 846 Software Engineering for Big Data 5
Approach --- Data Preparation • Divide into dataset into train and validation set • Convert categorical attributes to binary vector with one-
hot encoding • Determining the state has a low and constant cost • Changing the state has the constant cost • Easy to design and modify • Easy to detect illegal states • Takes advantage of an FPGA's abundant flip-flops
16-11-29 CS 846 Software Engineering for Big Data 12
Expectation • Make prediction:
• XGBoost
• Current ranking: • 50%
• Expectation ranking: • 30%?
16-11-29 CS 846 Software Engineering for Big Data 13
Future Work • Feature engineering
• Use domain knowledge of the data to create features • Make machine learning algorithms work
• Evaluation, prediction and analysis • Linear Regression (Linear algo) • LASSO Linear Regression (Linear algo) • KNN (non-linear algo) • SVM (Non-linear algo) • Random Forest (Bagging) • AdaBoost (Boosting)
16-11-29 CS 846 Software Engineering for Big Data 14