Chapter 10 – Basic Regression Analysis with Time Series Data
Feb 08, 2016
Chapter 10 – Basic Regression Analysis with Time Series Data
What is Time Series Data and why is it Different?
• There is a time ordering of the data• The past can affect the future, but the future
cannot affect the past.
Example: National population from 1900 to 2006 (data set NATPOP)
What is Time Series Data and why is it Different?
• Random nature of times series data• Formally, the process that generates time
series data is called a stochastic or time series process
What is Time Series Data and why is it Different?
• Random nature of times series data• Random sample from a population vs.
random sample of time series data
Examples of Time Series Data: Uncorrelated data, constant process model
Examples of Time Series Data: Autocorrelated data
Examples of Time Series Data: Trend
Examples of Time Series Data: Cyclic or seasonal data
Examples of Time Series Data: Nonstationary data
Examples of Time Series Data: A mixture of patterns
Cyclic patterns of different magnitudes
Atypical events
13
Atypical events
Famous Time Series Expert –Yogi Berra
The future ain’t what it used to be.
Famous Time Series Expert –Yogi Berra
You can observe a lot just by watching.
The basic graphical display for time series data is the time series plot which is just a graph of the observations vs. time periods.
Time Series Plot Example
Open TRAFFIC2 data set and make time series plot of year vs. statewide total accidents (totacc)
In Minitab need to choose series and time stamp
Time Series Plot Example
Time series plots
Notice that the histograms look very similar even though the time series behavior is very different
Histogram of totacc
When there are two or more variables of interest, scatter plots can be useful
Forecasting
It is difficult to make predictions, especially about the future. – Neils Bohr
Forecasting
Forecasting is useful in many fields:
Business and industryEconomicsFinanceEnvironmental sciencesSocial sciencesPolitical sciences
Data Analysis Process:
1. Problem definition2. Data collection3. Data analysis4. Model selection and fitting5. Model validation6. Model deployment7. Monitoring forecasting model
performance
Time Series Example – Data Set FERTIL3
gfr – number of children born to every 1,000 women of childbearing age from 1913 to 1984.
Make a time series plot of gfr
Time Series Example – Data Set FERTIL3
Time Series Example – Data Set FERTIL3
pe – average real dollar value of the personal tax exemption from 1913 to 1984.
Make a time series plot of pe
Time Series Example – Data Set FERTIL3
Time Series Example – Data Set FERTIL3
We want to predict gfr.
Lets try this model:
Time Series Example – Data Set FERTIL3
We want to predict gfr.
Notation for time series model slightly different:
Time Series Example – Data Set FERTIL3
The regression equation isgfr = 96.3 - 0.0071 pe
Predictor Coef SE Coef T PConstant 96.344 4.305 22.38 0.000pe -0.00710 0.03592 -0.20 0.844
S = 19.9400 R-Sq = 0.1% R-Sq(adj) = 0.0%
Time Series Example – Data Set FERTIL3
residual plots
Time Series Example – Data Set FERTIL3
residual plots
Time Series Example – Data Set FERTIL3
residual plots
Time Series Example – Data Set FERTIL3
residual plots
Time Series Example – Data Set FERTIL3
This model suffers from misspecification.
Time Series Example – Data Set FERTIL3Scatter plot of gfr vs. pe
Time Series Example – Data Set FERTIL3What could affect general fertility rate in the U.S.? Many things!How about these two:• World War II• Availability of the birth control pill
Time Series Example – Data Set FERTIL3ww2 is a dummy variable• 1 if year is 1941 through 1945• 0 otherwisepill is a dummy variable • 1 if year is 1963 or greater• 0 otherwise
Time Series Example – Data Set FERTIL3
Fit this model with two dummy variables.
Time Series Example – Data Set FERTIL3
gfr = 98.7 + 0.0825 pe - 24.2 ww2 - 31.6 pill
Predictor Coef SE Coef T PConstant 98.682 3.208 30.76 0.000pe 0.08254 0.02965 2.78 0.007ww2 -24.238 7.458 -3.25 0.002pill -31.594 4.081 -7.74 0.000
S = 14.6851 R-Sq = 47.3% R-Sq(adj) = 45.0%
Time Series Example – Data Set FERTIL3
Residual plots:
Time Series Example – Data Set FERTIL3
Residual plots:
Time Series Example – Data Set FERTIL3
Residual plots:
Time Series Example – Data Set FERTIL3
Residual plots:
Time Series Example – Data Set FERTIL3
Residual plots:
Time Series Example – Data Set FERTIL3
Residual plots:
Time Series Example – Data Set FERTIL3
Residual plots:
gfr = 98.7 + 0.0825 pe - 24.2 ww2 - 31.6 pillww2 is a dummy variable• 1 if year is 1941 through 1945• 0 otherwise
gfr = 98.7 + 0.0825 pe - 24.2 ww2 - 31.6 pillpill is a dummy variable • 1 if year is 1963 or greater• 0 otherwise
Time Series Example – Data Set FERTIL3
gfr = 98.7 + 0.0825 pe - 24.2 ww2 - 31.6 pill
Summary:• Adding pe and ww2 improves model significantly• Model gives insight into historical variables that
affect gfr• Model may not be very useful for future predictions
Time Series Example – Data Set FERTIL3
Modeling with lags
Economic theory implies that there might be a lag effect on gfr (general fertility rate) from pe (tax value of having a child)
Time Series Example – Data Set FERTIL3
Modeling with lags
Examine variables pe_1, pe_2, pe_3, and pe_4
Time Series Example – Data Set FERTIL3
Making lags in Minitab is easy. Go to Stat > Time Series > Lag.
Time Series Example – Data Set FERTIL3
Residual plots:
Time Series Example – Data Set FERTIL3
Residual plots:
Time Series Example – Data Set FERTIL3
Residual plots:
Time Series Example – Data Set FERTIL3
Residual plots:
Time Series Example – Data Set FERTIL3
Residual plots:
Time Series Example – Data Set FERTIL3
Residual plots:
Time Series Example – Data Set FERTIL3
Residual plots:
Time Series Example – Data Set FERTIL3
Residual plots:
Time Series Example – Data Set FERTIL3
gfr = 95.9 + 0.073 pe - 0.006 pe_1 + 0.034 pe_2 - 22.1 ww2 - 31.3 pill
Predictor Coef SE Coef T PConstant 95.870 3.282 29.21 0.000pe 0.0727 0.1255 0.58 0.565pe_1 -0.0058 0.1557 -0.04 0.970pe_2 0.0338 0.1263 0.27 0.790ww2 -22.13 10.73 -2.06 0.043pill -31.305 3.982 -7.86 0.000
S = 14.2701 R-Sq = 49.9% R-Sq(adj) = 45.9%
Time Series Example – Data Set FERTIL3
Personally, I think adding the two lags to this model over complicates the model for little gain. I recommend against inclusion of the two lags.