8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project
1/24
2009Written by :
Bowei Zhang
Proofread by:
Steven MillerSteven Subichin
09/30/2009
Last Revision Date11/24/2009
8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project
2/24
2
Table of ContentsPROJECT INTRODUCTION.............................................................................................3
12 Months Rolling Sum and Lagged Leading Indicators.......................................................3
Correlation Verified To Be Linear.....................................................................................4
Market Share Forecast...................................................................................................4
Modeling Data Geography-US Models extended to include Canada...........................................5
MODELING EFFORT 1-SIMPLE LINEAR REGRESSION MODELS.........................................5
Modeling Assumptions and Limitations..............................................................................5
How to Obtain Monthly Forecast from 12 Months Rolling Sum Forecasts....................................5
Seasonality-Removed By 12 Months Rolling Sum.................................................................6
Outliers-Strike Consideration...........................................................................................6
Outliers Reduction-Smooth Economic Leading Indicators.......................................................7
Model Utility and Residual Analysis..................................................................................7
MODELING EFFORT 2- MULTIPLE LINEAR REGRESSION MODELS....................................8
How to Build Multiple Linear Regression Models in Minitab...................................................8
Reason for Not Using Multiple Regression Models Monthly Forecasts.......................................9
MODELING EFFORT 3- TIME SERIES MODELS...............................................................10DATA RANGE AND SOURCE........................................................................................11
EXPLANATION OF THE STANDARD LINEAR REGRESSION SPREADSHEET......................12
Common Tabs...........................................................................................................12
Unique Tabs..............................................................................................................14
Steps of Searching for New Leading Indicators...................................................................15
MODEL REFRESH AND UPDATE ISSUES.......................................................................16
FILES LOCATION AND NAME......................................................................................16
FUTURE LOOK...........................................................................................................17
APPENDIX..................................................................................................................17
APPENDIX
Goodyear and Industry North AmericaCommercial Replacement Tire
Causative Forecasting Models User
8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project
3/24
3
PROJECT INTRODUCTIONLean inventory and efficient demand planing are two weapons especially improtant for any businesses to
survive recession times. To achieve these two goals, a powerful demand forecasting system with
relatively high level of accuracy is necessary. The aim of the project is to build such forecasting models
which reveal the relationship between leading economic variables and Goodyears business that by
looking at the trend of those economic variables, Goodyear can tell the future highs and lows of itsrelevant business segments.
Key members of the project include Steven Miller, Steven Subichin, Mike Ryan, Greg Tomsho and
Bowei Zhang.
This project is focused on Goodyear and the total industrys performance in US/North America
commercial replacement tire markets. We split the commercial replacement tire market into four
different segments by tire application and wished to forecast the demand for each segment as well as the
total market as a whole.
Raw data we have for this project are:
Monthly data of 74 leading economic variables(US) that may potentially relate to the commercial
replacement tire market from 01/1996 to 06/2009. (Multiple data sources)
Industrys monthly shipment data for each segement of US/North America commercial
replacement tire market: Urban/Regional/long haul/Mixed service from 01/1996 to 06/2009.
(Data source: RMA)
Goodyears monthly billed sales and shipment data for each segment of US commercial
replacement tire market: Urban/Regional/long haul/Mixed service from 01/2003 to 07/2009 (Data
source: EDW)
One thing worth notice is that RMA and Goodyears classification of the four market segments are
slightly different. We kept Goodyears billed sales data for each market segment using its own marketclassification criteria and regrouped Goodyears shipment data using RMAs criteria. We did it this way
because Goodyears billed sales forecast will be used to assist DP which uses Goodyears market
classification criteria and Goodyears shipment will be used, together with Industrys shipment forecast,
which uses RMAs criteria, to calculate Goodyears future market share.
12 Months Rolling Sum and Lagged Leading Indicators
Initially we wished to find the potential linkage between external economic variables and the replacement
tire business, be it linear or non-linear relationship. To reduce the modeling noise occurred to relatively
small monthly billed sales and shipment values and identify their correlation with leading economic
variables more easily, we substituted each monthly tire sales/shipment data point with the sum of data for
that month and data for previous 11 months. Thereinafter this moving yearly data will be called 12months rolling sum. We calculated the correlation coefficients between 12 months rolling sums of billed
sales/shipments of Goodyear and Industry for each market segment with the 74 economic variables
monthly data. We assumed some of the variables have leading capabilities for the commercial replacment
market. To test that, we simplely lagged those variables by certain months when we calculated the
correlation coefficients. For example, if we think it takes the replacement tire market 2 months to respond
to the movement of a leading indicator, then we would use the 2 months lagging data of that variable to
calculate the correlation coefficients. For Goodyears billed sales data, we calculated their correlation
8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project
4/24
4
coefficients with up to 24 months lagging data of the 74 variables. The numbers somehow proved our
assumption because some variables have high correlation coefficients when lagged by near term and
some by long term.
Correlation Verified To Be Linear
Although correlation coefficient is a tool to depict the strength of a linear relationship between two
variables, the interpretation of that value could be very arbitary. There is no set rule about what nubmer is
high and low and sometimes high numbers dont necessarily mean pure linear relationships. So we also
drew scatter plots to study the true relationship between tire sales/shipment and leading economic
varaibles and used correlation coefficients as a second reference.
Using these two tools, we were able to identify some regular relatonship patterns between external
variables and sales/shipment data that can be captured by certain mathematical models. By regular I mean
that those relationship patterns can mostly be depicted by certain mathematical models. After careful
consideration and comprehensive tests, we decided to build only simple linear regression models (which
means one leading indicator matchs one market segment)for ease of understanding and use in practice.
Now we have already built simple linear regression causative models with some level of confidence forGoodyears billed sales, Goodyears shipment and Industry shipment to forecast 2 months and 12 months
out for each of the four segments in US only and North American commercial replacement tire market.
For Industry shipment models, we also built time series models to provide alternative views and they all
achieved decent forecast accuracy rates.(Monthly forecast ex-post errors for US only time series models
range from 7.75% for Urban tires and 22.24% for Mixed Service tires; Monthly forecast ex-post errors for
North America Market time series models range from 7.05% for Urban tire and 17.75 % for Mixed
Service tire. )
Market Share Forecast
Since for Goodyear and Industry shipment data we applied the same market classsification criteria(by
vehicle application code)[1]when grouping the data for each market segment, we are able toforecast(calculate) Goodyears future market shares.
However, a word of caution is that even though we re-grouped Goodyears shipment data using RMAs
criteria, there are still some difference between RMAs definition of certain market segments and
Goodyears. One verification is that Goodyears re-grouped shipment data(North America) for Regional
and Long haul segments are significantly different from what RMAs adjustment and interpretation of
shipment data reported by Goodyear. But the difference within each market segment can offset each other
to a certain extent. Also, due to re-statement issue, Goodyears total shipment data in EDW is different
from the data sent back by RMA, on average by 7.5% during the period from 06/2007 to 04/2008.
The above two facts mean that using different sources for Goodyear shipment data can leads to different
causative models. RMAs official Goodyear shipment data has its value for other analysis endeavors.
However,we chose to use EDWs Goodyear shipment data to build causative models and calculate
Goodyears future market share because the modeling results from this project are intended for internal
use only.
8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project
5/24
8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project
6/24
6
f13=(i=213Fi-i=112Fi) +H1
=F13+i=212Fi-i=212Fi - F1+H1
= H1+ (F13- F1) + ( i=212Fi-i=212Fi)
H1 is the true history value of Jan in the first year. (F13- F1) is deemed as the forecasted monthlyincrease/decrease year-over-year, the change from Jan in the first year to Jan in the second year in our
example. We assume the forecasted values from the two rolling sum forecasts of the same 11 months are
almost the same, namely the artificial error term ( i=212Fi-i=212Fi) would be close to 0. If this
assumption does not hold, our forecasted monthly value will deviate from the true monthly forecast we
wish to, but impossible to, get directly from 12 months rolling sum forecasts. This is likely to happen to a
monthly forecast value when absolute percentage forecast errors of the two related 12 months rolling sum
forecasts change dramatically in that it will violate the i=212Fii=212Fi assumption. It will be easier
for multiple regression models to violate this assumption thus generating inaccurate forecasts. More
detailed discussion will be covered in the section Multiple Linear Regression.
Seasonality-Removed By 12 Months Rolling SumOne benefit of using 12 months rolling sum history as the dependent variables for linear regression
models is that we dont have seasonality in the data.Appendix 2 is a comparison plot for monthly and
12 months rolling sum industry commercial replacement tire shipment data. As can be seen, the monthly
data is more volatile and has some seasonality across the history. The 12 months rolling sum, on the other
hand, does not have seasonal pattern at all (This fact applys to every market segment of our analysis). But
over the long term, the 12 months rolling sum may indicate some regular business cycilcal pattern which
can be deemed as a sort of seasonality when buiding time series models. This topic will be covered in
more detail later.
Normally for statistical modeling purpose, if the raw data has strong seasonality, we would have to
deseasonalize them first then build the model and in the end reseasonalize the forecast. In our models,the transformation foluma introduced above f13= H1+ (F13- F1) + ( i=212Fi-i=212Fi) adds
seasonality back to monthly forecasts by adding the monthly history term (H1 in the formula). Hence
by using this formula we successfully avoided the seasonality issue in raw data and kept seasonality in the
monthly forecast. The leading indicators we picked for this project are all free of seasonality issue.
However, in the future, if we want to bring in new economic variables with seasonal pattern, we
have to deseasonalize them before use.
Outliers-Strike Consideration
Outlying data points in terms of either independent or dependent variables in regression and time series
models can heavily skew forecast results and hence forecast accuracy. Among the many possible outlying
reasons, unusual one time event can generate abnormal history data. For example, the strike occurred onOct 5th, 2006 and ended early 2007 made Goodyears commercial replacement tire sales of each market
segment from 11-2006 to 01-2007 extremely low. Especially sales for 12 -2006 is lower than the lowest
points during recent recession times. Hence for Goodyears total replacement tire market, we
overforecasted about 63% and 117% for 11-2006 and 12-2006 respectively. To fix the problem, we
replaced the monthly sales data from 11-2006 to 01-2007 with the monthly average of same month from
2003 to 2005 and reran the linear regression model. The total cumulative absolute forecast error during
model building period (12-2003 to 12-2007) decreased from original 20.07% to 14.57%. However, the
8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project
7/24
7
model validation error (ex-post error over period from 01-2008 to 06-2009) increased a little from the
original 9.37% to 10.30%. One possible explanation is that without outlying data clean-up, the model
studied the dampened sales during stike period and exterted the learning for forecasts over ex-
post period, during which recession and dampened sales exist. Hence data clean-up in this case did
not improve the models ex-post forecast accuracy. For details about the test, please refer to the tab
named Industry outlier fix in the excel file named Goodyear Billed Sales Causative Models 2months out (US only). After careful thought, we think our Goodyear models are robust enough to
contain some outlying raw data in the model buidling period without deterioriate forecast results.
Hence we kept our models as they are.
Outliers Reduction-Smooth Economic Leading Indicators
However, there is one easy way to reduce, at least partially, the outlying forecast points. Before we dive
into that, lets first look at how we calculate the monthly forecast residual. Assuming H13 is the monthly
history value of the 13th month and using the transformation formula mentioned earlier, we can get the
following residual calculation formula:
H13- f13=H13- H1- (F13- F1) - ( i=212Fi-i=212Fi) = (H13-F13)-(H1- F1) - ( i=212Fi-i=212Fi)
As can be seen, abnormal values of H13, H1, or abnormal change of leading indicators monthly values
(It may cause two consecutive 12 months rolling sum forecasts change dramatically, which will very
possibly violate our assumption that i=212Fii=212Fi). Hence one reasonable remedy for outlying
monthly forecasts is to smooth the leading indicators monthly values by replacing them with the average
of values of corresponding months plus previous 11 months. This transformation of leading indicators
will reduce the monthly forecasts volatility.For an example, please refer to Appendix 3. We tested the
transformation technique of leading indicators on the industry shipment 12 months out model for North
America region. As shown by the plot, the transformation makes the forecasts smoother and closer to
history value. In fact, the ex-post error during period from 01/2008 to 06/2009 dropped from 21.59% to
only 9.35% after we took the 12 months rolling average of leading indicators monthly value.
This smoothing technique will not always generate more accurate forecast results. However, it will
definitely help make monthly forecasts less volatile if the leading indicators used are volatile in nature.
We applied this technique for Industry shipment 12 months out forecast models for US only and North
America data.
Model Utility and Residual AnalysisAppendix 4 is a causative models utility comparison table. Key metrics used include R-square (both
original and adjusted) and cumulative absolute percentage errors (ex-post period) of both causative and
nave models. The nave models simply assume what happened yesterday will happen again
tomorrow. Hence we would take current months sales/shipment as the forecast in 2 months and 12
months for nave models. As coded in blue at the right most column of this table, ex-post forecast errorsof nave models are all higher than those of causative models, except for Regional Market. We think this
could be either a coincidence or that the Regional Market is relatively speaking, stable enough to repeat
the history value over time. Either explanation wont invalidate the effectiveness of our causative models
though.
Ex-post errors are cumulative absolute percentage errors. To be specific, this heuristic metrics is
calculated by dividing the sum of absolute values of all monthly forecast residuals by the sum of monthly
8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project
8/24
8
history values over a certain period. We prefer to use this metric to evaluate our forecasts absolute
deviation from history rather than from a constant, as in R-square.
The R-square value is another tool to indicate the effectiveness of regression models. The higher R-square
(including adjusted one) is, the more likely that the total variation in the n observed values of the
dependent variable is explained by the overall regression model. However, there is no absolute standard
for what is a good value. As can be seen from the table (color coded in yellow), Goodyear models have
relatively high R-square and Industry models lower, despite the fact that industry models are as accurate
as Goodyear models in terms of monthly forecasts ex-post errors. This brings up two questions. One
question is that how long can we keep using the causative models until we have to revise them? (This
question will be addressed at the last section of this document). Another is that can we build multiple
regression models that generate small monthly forecast ex-post errors and high R-square (adjusted) value?
(Addressed below)
MODELING EFFORT 2- MULTIPLE LINEAR REGRESSION MODELS
How to Build Multiple Linear Regression Models in MinitabSince in general our Industry models have low R-square, industry total market shipment 2 months out
forecast model is picked for this test. What we wish to build is a multiple regression model which has a
high R-square and low cumulative absolute percentage error for monthly forecast during ex-post period.
Minitabs automatic model selection function is used to perform the test.
Ideally, it would be great if we can dump as many variables data into Minitab as possible and let the
computer generate an optimal solution for us. However, Minitab can only process a limited number of
variables using Stepwise [5]and Best subsets [6] selection methods. So some variables need to be
screened out of the candidates pool as follows:
Select 12 months rolling sum of total industry shipment and 2 months lagged data of the 74variables.
Calculate correlation coefficients between rolling sum values and 74 variables and keep variables
which have a correlation coefficient higher than 40% or lower than -40%. This step reduced the
number of potential variables from 74 to 30.
Use Stat-Basic Statistics-Correlation in Minitab to generate Correlation Matrix, which includes
P-value for each correlation coefficient between any pairs of variables, including dependent
variable-shipment. (If the correlation coefficient between two variables is higher than 0.9 or
lower than -0.9, then one of them can be considered redundant for the dependent variable in the
model. If there are more than two multi-correlated variables, compare their P-values with
dependent variable first to screen out those with higher P-values; if P-values are the same, keep
those with a higher absolute value of correlation coefficient with dependent variable.) Use the Correlation Matrix to eliminate redundant variables. This step reduced the number of
variables from 30 to 20.
Use Step-wise and Best-subsets methods in Minitab to generate the best multiple linear
regression models.
The best two models generated by Best-subsets method are one 10 variables linear regression model[7]and one 9 variables linear regression model.[8]
8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project
9/24
9
The best model from Step-wise method is a 5 variables linear regression model. [9]
Even though these models all have high R-squares (around 90%) and low ex-post forecast errors for 12
months rolling sum forecasts , which are better than our original one variable linear regression models,
their ex-post forecast errors for monthly forecast are especially high (around 40% and that of our original
model is around 9%). However, the multiple regression models cumulative absolute percentage errors of
monthly forecasts during model building period (around 12%) are not very far from those of our original
models (8%).
Reason for Not Using Multiple Regression Models Monthly Forecasts
We find out that the reason why multiple regression models did not perform well for monthly forecast
during ex-post period is related to the assumption of our defined transformation formula of 12 months
rolling sum forecast to monthly forecast. As mentioned before, we assume the forecasted values from the
two 12 months rolling sum forecasts of the same 11 months is almost the same,
namelyi=212Fii=212Fi.
However this assumption is not always true and can be more easily violated by multiple linear regression
models than single linear regression models. The multiple linear regression models in our case all havehigh R-square values, which means that the variation of dependent variable (shipment/sales of tires) is
explained to a large extent by those multiple variables we included (despite the fact that mathematically
speaking, the more variables we add in a multiple regression model, the higher its R-square). The
downside of that is that for multiple regression models we have more external factors to control and
each ones fluctuation can affect our final transformed monthly forecasts.
Look at the data and plot in Appendix 10. In the vertical axis of the plots in Appendix 10 is the
cumulative absolute percentage error for 12 months rolling forecast. Blue line represents the 12 months
rolling sum forecast obtained by our original single linear regression model. Red and green line are
forecasts from two multiple linear regression models selected by Best subsets method. As can be seen,
before period 49, which is 12-2007, multiple regression models are more accurate than single linearregression model in terms of 12 month rolling sum forecast.
During ex-post period, from 01-2008 to 06-2009, the forecast accuracy of multiple regression models
fluctuate more heavily than the single linear regression model. That is because there are more variables
in multiple regression models and it is more possible that the recession impacts on those variables will
skew the 12 months rolling sum forecast. More fluctuation between two consecutive 12 months rolling
sum forecasts will violate thei=212Fii=212Fi assumption and cause our related monthly forecast to
have a high forecast error.
To sum up, due to technique we used to transform 12 months rolling sum forecast to monthly
forecast and the fact that multiple regression models are more difficult to control and maintain, we
think simple linear regression model is better for our modeling purposes even though relatively,
they will have a smaller R-square, compared with multiple regression models.
It is natural to think that if we can use monthly shipment/sales as dependent variables directly to build
multiple regression models then we can have both high forecast accuracy and high R-square. However,
the monthly data is too volatile compared with 12 months rolling sum values, and as tested, we can barely
find well correlated external indicators for monthly shipment/sales data.
8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project
10/24
10
MODELING EFFORT 3- TIME SERIES MODELSOther than causative models, we also tested time series models for tire sales/shipment data. The
forecasting method we used is called exponential smoothing, which weights the observed time series
values unequally. More recent observations are weighted more heavily than more remote observations.
This modeling method studies the time series historys level, trend (optional) and seasonality (optional)
and copies it/them into future to make forecasts. As mentioned earlier, Goodyears history data dates backto 2003 and Industry back to 1996. Opposite to causative models, for time series studies, the more data
we have, the easier it is for us to capture trend and seasoanlity, if there are any. As proofed by the plots in
Appendix 11, Goodyears history data is too short to show obvious trend and seasonality while industrys
history data is strong enough to be considered as a good candidate for Multiplicative Holt Winters
method [12]. Actually, the seasonality indicated in the industry data is business cyclicality over the long
term because the 12 months rolling sum values dont have seasonality in themselves. But this cyclical
pattern can be modeled as a sort of seasonality.
Due to data availability, we built time series models for industry shipment only using monthly data from
01/1996 to 12/2007 and tested each model over the period from 01/2008 to 06/2009. The monthly hisotry
data is very volatile even though it has trend and seasonality over history. To make sure we build the besttime series models we can, we tested four models using both the monthly shipment data and 12 month
rolling sum shipment data for each market segment. The four models are : Level only; Level + Trend;
Level+Trend+Increasing Seasonality (Multiplicative Holt Winters method); Level+Trend+Constant
Seasonality (Additive Holt Winters method [13]). Hence in total, for each market segment, 8 time series
models were tested using Minitab. As expected, the model that generates the smallest monthly forecast
ex-post error for all the market segments except Mixed service is Multiplicative Holt Winters method on
12 months rolling sum history data. The fact that Mixed service is an exception did not surprise us
because its relatively complex business structure. Mixed services hisory plot does not show very typical
and easily recognizable trend and seasonality patterns either. The best time series model for this market
segment is level only using 12 month rolling sum hisory. This model generates a monthly forecast ex-
post error of 22.24%, which is higher than those of all the Multiplicative Holt Winters models for theother market segments. The level only model means that if we wish to forecast into future for multiple
periods after 06/2009, we would get the same 12 month rolling sum forecast for every future period. In
that case, according to the transformation formula introduced previously (f13=(i=213Fi-i=112Fi)
+H1), all future monthly forecasts will be the same as the one year back monthly history values. This is a
kind of nave model too.
All time series models are executed using Minitab. After uploading history shipment data into Minitab,
simple go to Stat-Time series-Single Exp Smoothing, Double Exp Smoothing and Winters
method for level only, level + trend and level+ trend+ seasonality models, respectively. For
level only and level+ trend models, Minitab can generate optimal models by automatically searching
for smoothing constants for level and trend components that minimize the Sum of Square Errors. ForHolt Winters method, we have to manually define all three smoothing constants for level, trend and
seasonality while the default 0.2 values for all three smoothing constants work well for our project, most
of the time.
All the Industry shipment modeling results are stored in an Excel file named Industry Time Series
modeling. For each market segment, there are three tabs in this Excel file. Take urban market for
example. In the Urban forecast results tab, monthly forecast, monthly forecasts cumulative absolute
8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project
11/24
11
percentage errors, history v.s. forecast plots for each of the eight time series models are listed for
comparison. In the Rolling to monthly transform- Urb tab, 12 months rolling sum forecasts generated
by Minitab can be copied to the column named Urban 12 months rolling sum forecast to generate the
monthly forecast in the right most column. The transformation formula previously defined was already
imbedded in this calculation. In the last tab called Error Calculator- Urban, transformed or Minitab
directly generated monthly forecasts can be copied to related column to get the forecast error statisticscolor coded in blue.
DATA RANGE AND SOURCEAs an old saying goes Garbage in garbage out. To avoid this clich for our project, we have to
carefully maintain and process the raw data. All Goodyear billed sales data is available from EDW. As
Goodyears market group names are slightly different from the RMA names for the four market
segments, for a detailed transformation table, please refer to Appendix 14. All Goodyear shipment data
has to be manually processed in order to apply RMAs market classification. This job is previously done
by Greg Tomsho using Materials number Vehicle Application code table generated by Steven D.
Miller. For a copy of this table, please see file named pbu03_all.All Industry shipment data by marketsegments is available from RMA. Contact Krista Liem for latest industry data.
All the key leading indicators used for our modeling purposes are summarized in the table named Key
Leading Indicators[15].In general our leading indicators come from three sources: Federal Reserve Bank
of St. Louis, Energy Information Administration, US Dept. of Energy and Freight Transportation
Research (FTR) Associate. FTR database is updated monthly and can be accessed by Krista Liem.
Another thing worth notice is the data range issue. For Goodyear and Industrys causative models, we
used sales/shipment history data from 2003 to 2009. For Industrys time series models, we used data from
1996 to 2009. It makes sense to use more history data to study purely the time series trend and
seasonality pattern. However, since so many macro-economic factors can affect tire sales/shipment
dramatically over a long period of time, it would be risky to use say, 12 years tire sales/shipment history
data to build single linear regression models. As a matter of fact, at the initial stage of our project, we
built causative models for Industry shipment using history data for past 12.5 years and then we reduced
the data range to past 6.5 years and re-ran the models. It turns out that using less shipment history data,
we got lower monthly forecast ex-post errors. And we had to change some of the leading indicators
selected previously.
Hence to build effective causative models, we may have to consider dropping some of the oldest data in
modeling period when new data becomes available and for time series models, it is OK to include new
data points while keeping the old data. Also, most of our economic indicators data source organizations
revise their published data periodically afterwards . As the new monthly data becomes available, the data
for past periods may also have changed. If that is the case, all revised data within our modeling data rangeshould be used to re-run the model to get new sales/shipment forecast.
EXPLANATION OF THE STANDARD LINEAR REGRESSION
SPREADSHEETAll the causative models developed so far have the same standardized excel spreadsheet structure.
8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project
12/24
12
There are six files in total for each category of modeling and they are named as:
Goodyear Billed Sales Causative Models 2 months out
Goodyear Billed Sales Causative Models 12 months out
Goodyear Shipment Causative Models 2 months out
Goodyear Shipment Causative Models 12 months out
Industry Shipment Causative Models 2 months out
Industry Shipment Causative Models 12 months out
There are two sets of models. One set for US only data and another set for North America data. Hence in
total there are 12 files. Every file contains the following 11 tabs. Take Goodyear Billed Sales Causative
Models 2 months out (US only) for example.
Common Tabs
1. ReadMe:
It contains description of the models within the Excel file and description of each indexed tab and
how to use them.
2. Scatter Plots:For each market segment and each of the 74 economic variables, there is a matching scatter plot
generated in this tab. All the data used come from the tab x-months Lagging Data Set. If the
current structure of the data in that tab does not change, the scatter plots will update automatically
as the data changes. However, if new data is added, then we have to manually change the plots to
refect the new data points. To do that, you can right click on the plots and select Select data,
then you will be directed to the tab x-months Lagging Data Set, where you are able to re-select
raw data.
3. x-months Lagging Data Set:
x can be either 2 or 12 depending the purpose of the model. The reason why our causative
models have the ability to forecast dependent variables future values is that we lagged the
independent variables while constructing the linear regression relationships. If we wish to
forecast 2 months out, we will lag the leading indicators by 2 months; if we wish to forecast 12
months out, we will lag the leading indicators by 12 months. Hence in this tab, billed sales 12
months rolling sum data and 2 months lagged 74 economic variables are listed from 12-2003 to
06-2009, which include both the modeling building period and validation (ex-post) period.
4. All Data:
This tab lists all 74 variabless monthly history data from 01-1996 to 06-2009. Some variables
may have missing data points for the most recent history. This tab was set up to store any history
data used for the project.
5. Correlation Coefficients:
This tab contains the monthly history of tire sales and automatically calculated 12 months rolling
sum values. Also, the monthly history data of all 74 variables are listed here. The red dotted line
table at the bottom of this tab listed correlation coefficients (calculated using =Correl()function
in Excel) between 12 months rolling sum over the period from 12-2003 to 06-2009 (same period
as used in the scartter plots) for each market segment and 2 months lagged 74 variables. All the
correlation coefficients whose absolute values are above 80% are listed in color using
Conditional Formatting in Excel. To obtain the updated correlation coefficients as new data
comes in, you may have to add new monthly sales, drag-down excel cells to get 12 months rolling
8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project
13/24
13
sums, add new monthly data for the 74 variables, and re-set the inbedded formula to include new
12 months rolling sum and leading indicators.
This tool is used together with scatter plots to detect potential linear relationship between leading
indicators and tire sales data.
6. Forecast errors:
This tab listed all the selected variables (using scartter plots and correlation coefficients inprevious tabs) and their cumulative absolute percentage errors during period 01/2008 to 06/2009
(ex-post errors) for both the 12 month rolling sum forecasts and transformed monthly forecats.
The 12 month rolling sum forecasts ex-post errors are used to monitor our simple linear
regression models effectiveness in capturing potential linear relationship between 12 months
rolling sum sales and leading indicators. If the relationship is close to linear, this ex-post error
should be small. And the monthly forecast ex-post error is used to check if our model can
generate decent monthly forecast in near future. Normally, the ex-post error for 12 months rolling
sum forecast should be smaller than that of monthly forecast.
7. Urban-x:
From tab 7 to 11 are the models we used to generate monthly forecasts. All tabs have the same
structure and are self-explanatory. For illustration purpose, a detailed explanation is providedhere only for Urban-2 tab for Goodyear Billed Sales 2 months out model.
The only two columns that need to be updated with external data source are the monthly history
of tire sales and the column named by the selected leading indicator. You can drag down the
column named 12 months rolling sum history to get the 12 months rolling sum needed for
modeling.
Then use Regression function in an Excel add-in called data analysis [16] to select the
dependent variable, which is 12 months rolling sum history of tire sales and 2 months lagged
monthly history of the leading indicator over the modeling period. The Regression function in
Data analysis will generate a detailed ANOVA analysis as shown in appendix 17. The orange
color coded two numbers are coefficients for the constant value and leading indicator in the
simple linear regression model. You can copy those two numbers in the corresponding locationsat the top of the table then the monthly forecasts (at the right most of the table) and forecast errors
(at the top right of the table) will be automatically generated. If current data selected for model
building is used for forecasting future monthly sales, you dont have to change the coefficients
previously entered at the top of the table. When new monthly sales data and leading indicators
data become avaible, you can add them in and change the formula for new ex-post error
calcualtion. If after a certain period of time, new data needs to be added into the modeling period,
you have to rerun the data-analysis add-in to reselect the corresponding 12 month rolling sum
tire sales and leading indicators monthly data.
Most of the data for new cells can be obtained by draging down the cells in Excel.
8. Regional-x:
See tab 6 for instruction.9. Long haul-x:
See tab 6 for instruction.
10. Mixed service-x:
See tab 6 for instruction.
11.Total Market-x:
See tab 6 for instruction.
8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project
14/24
8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project
15/24
15
Steps of Searching for New Leading Indicators
The logical steps of using multiple tabs in each model/excel file to search for the best leading indicators
for each market segment can be described as follows. Take Goodyear Billed Sales Causative Models 2
months out (US only) for example.
1. Update tire sales and leading indicators monthly data in the tab Correlation Coefficients.2. Adjust fomulas to include new data when refreshing the correlation coefficient calculation table
in this tab.
3. Copy and paste new tire sales 12 month rolling sum data (including ex-post period)and leading
indicators monthly data to the tab 2-month lagging data sets.
4. Go to tab Scatter Plots to update scatter plots one by one if necessary to include new data added
in the tab 2-month lagging data sets .
5. Observe the scatter plots. If a linear relationship is found, consider that variable a condidate for
test.
6. If linear relationship is not obvious to detect, use tab Correlation Coefficients to search for
variables with high correaltion coefficients with 12 months rolling sum tire sales data.
7. To test all the candidate varibles for a specific market segment, copy their data to correspondingmarket segment tab one bye one then perform the following test starting from step 8.
8. Update both the monthly data for tire sales and leading indicator selected in specific market
segment tab.
9. Click Excel Data-Data Analysis-Regression tab to select the matching 12 months rolling sum
sales and lagged monthly data for leading indicator(lagged by 2 months in this case) and perform
ANOVA analysis.
10.Copy coefficients for the constant and variable in linear regression model from the ANOVA
analysis generated by Excel to corresponding positions at the top of the market segment tab.
11. Drag down the the colum called 12 months rolling sum forecast and monthly forecast if
necessary. All formulas are already inbedded.
12.Copy and paste the ex-post forecast errors for both 12 months rolling sum and monthly forecastsautomatically generated at the top right of the table in market segment tab to corresponding
positions in the tab named Forecast errors.
13. Repeat step 7 to 12 until ex-post errors generated by every potential leading indicator are
recorded in the Forecast Errors tab.
14.Select the one variable that does not generate negative monthly forecasts and gives a low monthly
forecast ex-post error.
15.If outlying monthly forecasts are generated by a chosen leading indicator, either manual
adjustment of forecast is required or a back up leading indicator can be selected from the tab
Forecast Errors.
MODEL REFRESH AND UPDATE ISSUESTo use linear regression models to forecast, one important underlying assumption is that the linear
relationship between independent variable (leading indicators in our models) and dependent variable
(tires sales/ shipment)will last into future. And the similar type of underlying assumption for exponential
smoothing models is that the trend and seasonality will last into future. However, in practice these
assumption wont hold forever. That brings up the question about when to revisit the models. The
suggested re-modeling cycle is 6 months for our project. Every six months, when we have 6 more months
8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project
16/24
16
new tire sales/shipment data, we can evaluate the effectiveness of each model. If the leading indicator still
works fine, then the only thing to do might be to add new data in modeling period and drop the equal
amount of old data, if necessary. If the chosen external economic variable loses its power of leading tire
sales/shipment, then a backup leading indicator may be found at the Forecast Errors tab of each
model/excel file or a completely new leading indicator should be brought in by the above mentioned 15
steps approach.
All the update info about leading indicators chosen for this project is stored in the file named Key
Leading Indicators. Some of the economic variables for our 2 months out models have a delivery lag
around 45 to 60 days. That means to effectively use some of our causative models, we need to obtain the
leading indicators forecast values first. Sometimes these forecasted values are provided by the data
source organizations. Sometimes we need to do the forecasts by ourselves using time series modeling
techniques.
FILES LOCATION AND NAMEAll the files related to this project is stored at the following location:
T:\NAT\703 Commercial Demand Planning\Commercial Modeling
For details about all the folders and their contents please see Appendix 18.
FUTURE LOOKDepending on the effectiveness of the causitive models developed for this project as new data becomes
available, we can
Revise and maintain our current models
Transfer the modeling technique to Goodyears other business segments Automate the modeling procedures in Excel using advanced programming language
APPENDIX[1] RMA Commercial Truck Tire Classification
Market segment Vehicle Application Code Description
Urban 220 Light, Medium, and W ide-Base
Truck Tires marketed to operate
specifically in pickup and delivery
service in a local area (e.g. retail
and wholesale pick-up and delivery,
emergency vehicles, and intracity
8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project
17/24
17
bus fleets).
Regional 230 Medium, Wide Base and Heavy
Truck Tires marketed to operate in a
limited (150 mile radius) delivery or
service related vocation (e.g.
State & local government,
emergency vehicles, public utility,
school bus, food, petroleum and
manufacturing goods distribution,
and
inter-modal piggy-back trailers).
Long haul 240 Medium, W ide Base and Heavy
Truck Tires marketed to operate in
long distance, high annual mileage
operations (e.g. Less-Than-
Trailer-Load, Trailer-Load, and
Lease/Rental Fleets, Common
Contract Carriers, and Inter-City Bus
Fleets).
On-Off/Off Highway (Mixed
service)
250 All Light, Medium, W ide Base,
Heavy and Large-off-the-Road
Truck Tires marketed to operate in
off and on-off highway applications
(e.g.
construction, mining, sanitation, and
logging)
[2] Comparison of monthly data with 12 months rolling sum data
[3] Using 12 months rolling average to smooth leading indicator will sometimes improve forecast results
8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project
18/24
18
0
100000200000
300000
400000
500000
600000
700000
Jan-08
Feb-08
Mar-08
Apr-08
May-08
Jun-08
Jul-08
Aug-08
Sep-08
Oct-08
Nov-08
Dec-08
Jan-09
Feb-09
Mar-09
Apr-09
May-09
Jun-09
Jul-09
Aug-09
Sep-09
Oct-09
Nov-09
Dec-09
Jan-10
Feb-10
Mar-10
Apr-10
May-10
Jun-10
Industry Regional Market Segment Shipment Forecast
Actual History Forecast using leading indicator's monthly data Forecast using leading indicator's 12 months moving average
[4] Causative Models Utility Comparison Table
[5] Stepwise regression removes and adds variables to the regression model for the purpose of identifying a usefulsubset of thepredictors. Minitab provides three commonly used procedures: standard stepwise regression (adds andremoves variables), forward selection (adds variables), and backward elimination (removes variables).
When you choose the stepwise method, you can enter a starting set of predictor variables inPredictors in initial model. These variables are removed if theirp-values are greater than theAlpha to enter value. If you want keep variables in the model regardless of their p-values,enter them in Predictors to include in every model in the main dialog box.
When you choose the stepwise or forward selection method, you can set the value of Alphafor entering a new variable in the model in Alpha to enter.
When you choose the stepwise or backward elimination method, you can set the value ofAlpha for removing a variable from the model in Alpha to remove.
[6] Best subsets regression identifies the best-fitting regression models that can be constructed with thepredictorvariables you specify. Best subsets regression is an efficient way to identify models that achieve your goals with asfew predictors as possible. Subset models may actually estimate the regression coefficients and predict futureresponses with smaller variance than the full model using all predictors.
http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/response_and_predictor_variables_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/p_value_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/alpha_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/regression_analysis_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/response_and_predictor_variables_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/response_and_predictor_variables_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/Coefficients_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/response_and_predictor_variables_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/p_value_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/alpha_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/regression_analysis_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/response_and_predictor_variables_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/response_and_predictor_variables_def.htm');http://tmp/sv63d.tmp/javascript:Shared_GLOSSARY/Coefficients_def.htm');8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project
19/24
19
Minitab examines all possible subsets of the predictors, beginning with all models containing one predictor, and thenall models containing two predictors, and so on. By default, Minitab displays the two best models for each numberof predictors.
For example, suppose you conduct a best subsets regression with three predictors. Minitab will report the best andsecond best one-predictor models, followed by the best and second best two-predictor models, followed by the fullmodel containing all three predictors
[7] Best multiple regression models by Minitab Best-subsets method-10 variables
The regressi
TOTAL = 16972
- 65
- 171
[8] Best multiple regression models by Minitab Best-subsets method-9 variables
8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project
20/24
20
The regressi
TOTAL = 12404
- 265
+ 10.[9] The best multiple regression model by Minitab Step-wise method-5 variablesThe regressi
TOTAL = - 283+
[10] 12 months rolling sum forecasts absolute percentage errors comparison table and plot for single andmultiple linear regression models
8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project
21/24
21
1
2
3
4
5
67
9 Aug-04
10 Sep-0411 Oct-04
12 Nov-04
13 Dec-04
14 Jan-05
DateTime
12months
[11] Goodyears 12 months rolling sum history plot for Total Market billed sales
8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project
22/24
22
Year
Month
20072006200520042003
DecJunDecJunDecJunDecJunDec
4000000
3750000
3500000
3250000
3000000Gyt12-monthrollingsales
Time Series Plot of Gyt 12-month rolling sales
Industrys 12 months rolling sum history plot for Total Market shipment
Year
Month
200720062005200420032002200120001999199819971996
DecDecDecDecDecDecDecDecDecDecDecDec
19000000
18000000
17000000
16000000
15000000
14000000
13000000
12000000
Industry12-monthrollingShip
Time Series Plot of Industry 12-month rolling Ship
[12] Multiplicative Holt Winters method
A time series modeling technique that is able to capture increasing seasonal variation.
[13] Additive Holt Winters method
A time series modeling technique that is able to capture constant seasonal variation.
[14] Goodyears Market Group and RMA name transformation
[15] Key Leading Indicators and their sources
8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project
23/24
23
N.O.
Var 1 Industrial
Var 2 CVar 3 Real Retail
Var 4 Ho
Var 5 2-4 Uni
Var 6 Conference Board In
Var 7 UM Index of
Var 8 Di
Var 9 WTI
Var 11 M1 Mon
Var12 ISM
External Eco
[16] Add-in Data Analysis in Excel 2007 can be activated as follows:
8/14/2019 Bowei Zhang Scholarly Potential_ Goodyear Project
24/24
24
Click the Microsoft Office Button , and then click Excel Options. Click Add-Ins, and then in the Manage box, select Excel Add-ins. Click Go. In the Add-Ins available box, select the Analysis ToolPak check box, and then click OK. Tip: If Analysis ToolPak is not listed in the Add-Ins available box, click Browse to locate it.
If you get prompted that the Analysis ToolPak is not currently installed on your computer, click Yes toinstall it.
After you load the Analysis ToolPak, the Data Analysis command is available in the Analysis group on theData tab.
[17] ANOVA analysis generated by Regression function of Data analysis add-in in Excel 2007
SUMMARY OUTPUT
Re ression Sta
Multiple RR Square 0
Adjusted R Square 0
Standard Error 3Observations
[18] Project Folders and their contents
2 months and 12 months out Ca
Greg Tomsho
2 months and 12 months ou
JamesKrein
Commercial Replacement Indust
Folder Name: 2 month
Goodyear Billed Sales Causative Mo
Goodyear Billed Sales Causative Mo