Top Banner
Pitcher Cluster Analysis in Major League Baseball Dr. John Harris, Dr. Tom Lewis, Jamey McDowell, Ian McConnell
28

Furman Engaged Pres

Feb 14, 2017

Download

Documents

James McDowell
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Furman Engaged Pres

Pitcher Cluster Analysis in Major League Baseball

Dr. John Harris, Dr. Tom Lewis, Jamey McDowell, Ian McConnell

Page 2: Furman Engaged Pres

Issue:

When predicting an individual batter’s performance against an individual pitcher’s, frequently that batter has not faced that pitcher often enough to generate a significant sample size.

For example, Freddie Freeman has gone 2 for 10 against Cole Hamels this season; is this really an accurate predictor of how well he will perform against Hamels in his next at-bat?

Page 3: Furman Engaged Pres

Goal

We wanted to increase sample size of batter-pitcher “matchups” in order to better predict future interactions between specific players

To accomplish this, we grouped pitchers with similar styles together; we did this through the use of clustering algorithms

Page 4: Furman Engaged Pres

Hypothesis

By seeing how well a batter does against a cluster of same-style pitchers, we can better predict how well he will fair against any particular pitcher contained within that cluster

Page 5: Furman Engaged Pres

Data Sources

Sean Forman, founder of Baseball-Reference.com

Baseball-Reference.com

Brooks Baseball

Page 6: Furman Engaged Pres

Sample Data Sheet:

Page 7: Furman Engaged Pres

Metrics Analyzed

List of every plate appearance of the 2014 regular season, sorted by date

Pitch type statistics by pitcher

Pitcher style

Batter hand (L/R/Switch)

Page 8: Furman Engaged Pres

Data Exclusions

Batters who only batted in second half of season

Batters who only sacrifice bunted in first half

Page 9: Furman Engaged Pres
Page 10: Furman Engaged Pres

Clustering Methods in Use

K-means: make initial guesses at k cluster centers, then adjust centers based on mean of observations in that cluster

Decision Tree Analysis: let the computer choose which pitcher characteristics most strongly affect opposing OBP (minimum cluster size 50)

Page 11: Furman Engaged Pres

Clustering Pitchers by...

Pitch independent stats (Strike percentage, GB/FB, etc.)

L-R

Batter performance against pitcher

Pitch similarity (i.e. fastballs thrown alike, etc.)

Page 12: Furman Engaged Pres

Pitch Independent Stats, K-means

K=17

Even spread on large clusters, obvious reasonings for small clusters

Page 13: Furman Engaged Pres
Page 14: Furman Engaged Pres

Pitch Independent Stats, CART

8 clusters/leaves

Decided by OBP against

Important factors:Strike percentage

Strikeout percentage

Velocity difference between top two pitches

Pitches per plate appearance

Page 15: Furman Engaged Pres
Page 16: Furman Engaged Pres

Batter performance against

8 clusters

Pitchers in same cluster if same batters perform in a similar fashion against those pitchers

Page 17: Furman Engaged Pres

Clustering Batters

When we treat batters as individual entities, they do not have enough plate appearances to make accurate predictions

We solve this by treating all left-handed batters as the same “batter”, and do the same with righties and switch hitters

Page 18: Furman Engaged Pres

Method

By a given cluster method, assign each pitcher to numbered cluster

Compile every plate appearance and total on-base for a pitcher cluster against a batter type in first half of season

Page 19: Furman Engaged Pres

Example

Left-handed batters are 200 for 860 against cluster 3

This is the predicted performance of LHBs against cluster 3 in second half of 2014

Page 20: Furman Engaged Pres

Method (cont.)

Run the same compilation on second half of season

We test the accuracy of our prediction with a minimum variance test

Page 21: Furman Engaged Pres

Minimum Variance Test

∑(x_i-p)^2=S(1-2p)+Np^2

S=# of times on base

N=Number of opportunities (PA)

p=predicted OBP v. cluster

Page 22: Furman Engaged Pres
Page 23: Furman Engaged Pres

Results

Hypothesis confirmed: every method we tested better predicted the second half of a season than career history

Clustering methods which also cluster batters were the only ones that “beat” prediction based on first half OBP of batter

Page 24: Furman Engaged Pres
Page 25: Furman Engaged Pres

Conclusions

Computer-chosen decision statistics best separate pitchers into clusters

Sample sizes are large enough to accurately predict when all batters are treated as three clusters

Page 26: Furman Engaged Pres
Page 27: Furman Engaged Pres

Application

In-game decisions

Pre-game decisions

Page 28: Furman Engaged Pres

Future WorkTest on other years

Cluster batters in ways other than handedness

Probit Modeling