Does being on a more experienced team influence or increase the performance of rookie players in Major League Baseball? Adam Rothstein Jesse Cox May 3, 2016 Abstract This paper examines the effect of a new MLB player’s team experience on their performance in subsequent years. Using MLB statistics for players and for teams from the years 2013 and 2014 we took a sample of rookie players entering the league no earlier than 2011 and used a simple multiple regression and a logistic regression to determine both the amount and probability of an increase in skill and proficiency as measured by their respective batting averages. We find that our models were not highly telling models for the subject matter. We need a bigger, more comprehensive data set so that we can drop variables that skew results such as a negative change in batting average. 1
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Does being on a more experienced team influence or
increase the performance of rookie players in Major League
Baseball?
Adam Rothstein Jesse Cox
May 3, 2016
Abstract
This paper examines the effect of a new MLB player’s team experience on their performance in
subsequent years. Using MLB statistics for players and for teams from the years 2013 and 2014
we took a sample of rookie players entering the league no earlier than 2011 and used a simple
multiple regression and a logistic regression to determine both the amount and probability of an
increase in skill and proficiency as measured by their respective batting averages. We find that
our models were not highly telling models for the subject matter. We need a bigger, more
comprehensive data set so that we can drop variables that skew results such as a negative change
in batting average.
1
1 Introduction/Literature Review
Drafting players for any major league sport is the highlight of the preseason for loyal
followers of sports and carries a lot of weight for the possible performance of the team that year.
Being able to adequately pick and draft players that are not only top performers in their league
now but have the potential to become even better is pivotal. But what about the current
performance of a team? We speculate that it is not entirely up to the player and their respective
potential to become a better player but possibly the player’s experience and notoriety around
them.
We know that when a player joins the MLB their batting average is lower than it will be
at their career high (Sommers). Sommers finds that a player’s batting average increases from the
point they join the MLB until they hit a career maximum at which point it slowly declines and
that this process takes from 79 years. Sommers’s study took into account the possibility of
injury and minimum atbats that tend to skew these results. Our study is focused on how a
player’s respective team can influence their improvement or better yet the rate at which they
improve.
In Horowitz’s study finds that MLB team owners want teams to be evenly matched
because close wins and losses or rather the potential of a close match drives ticket sales which is
2
a major source of profit for them (Horowitz). They also find that the talent disparities have been
much lower in recent years (Horowitz). Finally they conclude that competition did not drive
performance to its peak (Horowitz). We disagree with this on the basis that even though they
found talent disparities to be less than previous years we see the same teams making it to the
playoffs year after year. We also disagree with the finding that competition does not drive
improvement. We theorize that a more experienced team will have a positive spillover effect on
new players and that the drive to perform up to par is higher and that this will cause them to
improve at a faster rate.
2 Data
We used MLB baseball statistics from baseballreference.com. We took a sample of
players from the 2013 season who had been drafted for their first year in the MLB. After
obtaining the players who made their first year’s appearance we excluded all the players who
were over the age of 25 to try to better capture younger players who still have more room for
improvement. We used their batting average during this season as their starting batting average.
Once we had narrowed our list to these players we found them on the 2014 roster and pulled
their batting averages for 2014.
There were a total of 30 MLB teams that we used for this assignment all of which are
represented within our sample of players. This opened up a lot of variability between teams.
With this we expected different teams to have many different batting averages, runs, playoff
3
wins and world series wins. We used the data from the 162 2014 regular season games that every
team participates in. In addition to these 2014 regular season games we used data from the 2010,
2011, 2012, 2013, and 2014 playoff and world series games to try to capture the notoriety or
longer term experience a team has to offer.
2.1 Dependent Variables
We used two main dependent variables in our modeling and analysis. We began by first
using just one dependent variable, Percentage Change in BA which was just the change in
batting average for our sample of rookie players from 20132014. It didn’t take long before we
saw that not every player saw an increase in batting average as we had assumed from our
predictions and from the literature on the topic of batting average changes over time. To try and
correct for this we created a dummy variable for whether or not the change in batting average
was positive or negative and called this variable Positive Change, leaving us with two
dependent variables for our modeling.
2.2 Explanatory Variables
Average Batting Age on Team: The average batting age on each respective team is just the
average age of those who bat on the team. We think this will be significant because older players
have, presumably, been in the league longer and therefore have more experience in their
4
profession. This experience in turn should translate into a greater improvement in individual
performance. The mean age for batters on MLB teams in 2014 was about 28 with a standard
deviation of 1.3. The range is from 25 to about 30 which gives a rather good spread for possibly
explaining some of the change and player performance (Figure 1).
Team Batting Average: This variable is the player’s 2014 MLB team batting average. We
realize that the batting averages of individual players influence the determination of this variable
but each team consists of enough players we feel the minute amount of correlation isn’t enough
to throw off our results. The mean team batting average for MLB teams in 2014 was 28.1 with a
relatively low standard deviation of only about 1.3. This means that the spread of all teams only
lies within a range of about 8 (Figure 1).
Team Average Run per Game: This variable is the average runs per game that the MLB team
scored over the course of the 2014 regular season. Our thinking is that a team that scores more
runs per game will consist of better players and a higher batting average overall. The summary
statistics for this variable showed a mean of 4 with a standard deviation of only .3 so it doesn’t
seem that it will be very significant (Figure 1).
Team Total Season Runs 2014: Total team runs was the total runs each team had per season.
For the same reason we thought average runs per game would be significant we thought this
would be too. While the average runs per game had a relatively poor spread the total season runs
5
had a mean of about 616 and a standard deviation of 52 which makes for a much better spread
than runs per game (Figure 1).
Team Rank: The team rank variable is a numerical value of 15 that corresponds to each team’s
respective ranking within their division. There are five teams per division and six divisions in all.
The average rank was three which is above the expected value for an even spread which could be
indicative of better teams recruiting more rookie players (in 2013 at least) (Figure 1).
Team Win Percentage: The team’s win percentage is calculated by just dividing the team’s
number of regular season wins by 160 (the number of regular season games each team plays in
per year). With a spread of 5% and a mean 48% the range is large enough to predict significance
(Figure 1).
Post Season Performance: The better teams in the league perform in the postseason and have a
chance to compete in the World Series. We thought this would be a very interesting variable to
include but since teams that are better performers often make multiple and consecutive
appearances which would suggest a certain notoriety associated with some teams. In an attempt
to capture the possible effect of this we created a variable which would show performance in the
postseason. If a team ended the regular season without making an appearance in the playoffs,
they received a value of 1. If a team made it to the playoffs but lost, they received a value of 2. If
a team made a World Series appearance, but lost, they received a value of 3. Finally the team
that won the World Series, received a value of 4. We looked at the five most recent years: 2010,
6
2011, 2012, 2013, 2014, and came up with an aggregate total for each team in respect to their
performance. A team that never made it to the playoffs and thus would have the lowest score for
each year, would have a value of 5. If a team won the World Series all five years, they would
have a value of 20. Of course, in these years the same team did not win all five years so the range
for this variable is from 5 to 14 with a standard deviation of 2 (Figure 1).
3 Problems with Data
Once we formulated summary statistics for the data we compiled we found that there was
not always a positive change in a player’s batting average (Figure 1). This makes sense because
of the pretty low range and standard deviation of batting averages in general. Not every player
will experience a dramatic improvement in all of their first years consistently. To try to correct
for this we decided to take only the players who had a positive change in batting average and use
them in a logistic regression. We felt that doing this would enable us to capture a percentage
increase or correlation with our explanatory variables that would be able to stand apart from just
a basic bivariate regression.
3.1 Experiment
7
We used two econometric models in our study to determine the effect of experience on
individual player development. The first was a simple multivariate regression of our explanatory
variables against the first dependent variable, Percentage Change in BA.