March Data Crunch Madness Team Coach K. Yi Chun Chien, Xiayu Zeng, Feifei Chen, Xiaoshan Jin March 2015
March Data Crunch Madness
Team Coach K.Yi Chun Chien, Xiayu Zeng, Feifei Chen,
Xiaoshan Jin
March 2015
2Introduction❖ Background: NCAA Men’s Basketball Tournament is a single-elimination tournament,
currently featuring 68 college teams.
❖ Objective: Develop a predictive model that examines factors contributing to a team’s performance, based on data from 2001-2014.
❖ Result: As can be analyzed from the model, box score has a large effect on a team’s result in 2015, which is helpful to predict:➢ Win/Lose➢ Winning Probability➢ Sweet Sixteen
3Independent & Dependent Variables
IndependentVariables
SeedLocation
Box Score
Assist, Steal, Block Shot,% 2/3 Point Field Goals,% Free Throws, Tempo
Seed#,If this team is Top 5,If this team is 15/16
Latitude, Longitude, Distance Difference
Dependent Variable:Win/Lose
4
Performance Validation
Accuracy
ROC Curve
AUC
RMSE
Define Data Group
Decision Tree
Build 5 Models
Bootstrap Forest
Boosted Tree
Neural Network
Nominal Logistic
Training (80%)
Validation(20%)
Testing (2015)
Probability
Win/Lose
Top 16
Prediction
Analysis Process
5
● Distribution Review: Most variables are normal distributed
Distribution and Correlation● Scatter Matrix: Few variables has linear correlation
6
5 Models Performance
Validation
Nominal Logistic Regression Accuracy: 72%
ROC Curve for Validation
Nominal Logistic Regression has the best performance
Performance Validation
Training
7
Result Lose WinLose 6 6Win 5 24Total 11 30
● 2015 Forecast Top 16 team ● 2015 Forecast Result: 73% accuracy
Prediction
27%
73%
Forecast Result IncorrectForecast Result Correct
8Model ExplanationDefensive efficiency, offensive efficiency, opponent’s blocked shots and assists are most important attributes based on individual p-value
According to our analysis results, good offensive efficiency contributes more than defensive efficiency in leading a team’s success
The closer the distance to stadium, the better result a team performs
9Interesting Analysis● Average score difference is narrowing down● The score pattern for Top 5 Seeds is less volatile
than the one for bottom 2 seeds
● 9 out of 16 is predicted correctly● Only Georgetown shows a declining pattern
of winning probability
10
Result and Conclusion❖ Whether a team wins or loses is positively related to four
primary factors:➢ offensive efficiency➢ defensive efficiency➢ block shots➢ assists
❖ Accuracy: Our model is 72.19% accurate in predicting a team’s result for 2015.