V Bandi and R Lahdelma 1 Forecasting
Jan 17, 2016
V Bandi and R Lahdelma
1
Forecasting
V Bandi and R Lahdelma
2
Forecasting?
• Decision-making deals with future problems- Thus data describing future must be needed
• Representation of what occurs in future
Time
Operative decisions
Tactical decisions
Strategic decisions
Hours Days Weeks Months Year 5 Years 20-50 Years
V Bandi and R Lahdelma
3
Time horizons of Forecast
• Depending on the purpose, the time horizon may differ- Operational planning
• Day – week level
- Tactical planning• Week – month– year
- Strategic planning• Year – 10 year – 50 years
V Bandi and R Lahdelma
4
Requirments of forecasting model
• Sufficient accuracy- Depends on purpose of the forecast
• Operative decisions requires high degree of accracy
• Necessary Input data availability- Having access to real data is always a challenge
• Model must be easy to update and maintain the model - when the system changes
- Not overly complex and specialized
V Bandi and R Lahdelma
5
Different approaches to forecasting
• Theory-oriented - The laws of physics determine how the system behaves;
therefore the model is formed based on theoretical laws• Example: Heat is transferred through radiation, conduction and
convection...
• Data-oriented- History data is analyzed in order to find out dependencies
• Requires applied mathematical techniques
V Bandi and R Lahdelma
6
Different approaches to forecasting
• In practice it is wise to use both (theortical and data-oriented) approaches together- forecast model structure is planned based on theory but the
parameters are estimated from history data
- Sometimes observing the data can reveal dependences that are otherwise missed in theoretical analyses
- Understanding the laws of physics allows making the model more generic and accurate
V Bandi and R Lahdelma
7
Let us try some simple forecast models
V Bandi and R Lahdelma
8
Forecasting demand for Cars
• The demand for Toyota cars over first six months in helsinki region is summarized in following table. Forcast the demand for car in next 6 months.
Month Number of units
Jan 46
Feb 56
Mar 43
Apr 43
May 60
Jun 72
V Bandi and R Lahdelma
9
Forecast demand for cars
• Simple modeling techniques- Based on a averages, weighing averages
• In the example, dependency between month and net unit of sales is hard to identify- It is very difficult to forecast accurately
V Bandi and R Lahdelma
10
Forecasting applications in Engineering
• Planning and optimization- example: coordination of cogeneration
• Simulation - Planning new systems
- Improving existing systems
- To understand the behavior of systems
V Bandi and R Lahdelma
11
Forecasting methods
• Based on averages- Moving averages
• Smoothing techniques
• Regression- Linear regression
• In simplest form: Y = aX+b
• Y dependent variable, X independent variable
- Non-linear regression
- Dynamic regression
• Neural networks and many more
V Bandi and R Lahdelma
12
Regression analysis
• A regression analysis is for forecasting one variable from another- we must decide which variable will be independent variable
and which is dependent variable Y
- This choice is usually motivated by a theory or hypothesis of causality
• The alleged “cause” is X and the alleged “effect” is Y
V Bandi and R Lahdelma
13
What regression does
• What regression does- A regression analysis produces a straight line that estimates
the average value of Y at any specific value of X
- Example: Heat demand forecast in a year based on out door temperature yt = a0+a1xt
• a0= 261 MW
• a1= -11.3 MW/Co
- The curve fits badly at high temperatures• therefore it is misaligned also for cold temperatures
V Bandi and R Lahdelma
14
Forecast regression model
• The model aims to explain the behavior of the unknown quantity y in terms of known quantities x, parameters a and random noise e- y = f(x, a) + e
• The structure of the model (shape of function f()) can be determined based on theory, based on intuition or by exploring history data - The parameters a are estimated from history data so that
the noise e is minimized• When the model has a good structure, e is white noise
• Forecasting models can be classified according to the shape of function f
V Bandi and R Lahdelma
15
Linear Regression model based on one dependent and one independent variable
• A model where a single dependent variable y is explained by a single independent variable x is fitted to history data- yt = a0+a1xt, where t= 1,...,T
• This is a linear equation system with two unknowns - The equation can be solved in the least squares sense (2-
norm)
- To solve it we augment it with a error variable et
V Bandi and R Lahdelma
16
Linear Regression model, Determining parameters
• We seek for parameters a values that minimize the square sum of the error variables
Min s.t.
• If we introduce the vector/matrix notations,
And
V Bandi and R Lahdelma
17
Linear Regression model, Determining parameters
• The problem in vector/matrix format
• Substituting e into the objective function yields an unconstrained optimization problem
Min (Xa – y)T(Xa – y) = aTXTXa – 2aTXTy + yTy
• Derivative w.r.t to a gives the solution
2XTXa – 2XTy = 0
a = (XTX)-1 XTy
V Bandi and R Lahdelma
18
Generalizations of linear regressionMultiple independent (explaining)
variables• Linear regression model with multiple parameters xi
yt = a0+a1x1,t+a2x2,t+….+anxn,t , where t= 1,...,T
• Now there are more unknown parameters ai and the X-matrix becomes wider
And
• The matrix formulas and solution remain the same
a = (XTX)-1 XTy
V Bandi and R Lahdelma
19
Heat Demand Forecast
• Heat demand depends- Weather
• Outside temperature, wind, solar radiation, seasons
- Building properties
- Residents behavior
• Forecasting requires identification of independent variables
V Bandi and R Lahdelma
20
Heat Demand Forecast
• Accurate heat demand forecast- Weather, resident behavior, building properties can be
considered as independent variables
- Forecast modelling with all independent variables requires data
• Obtaining data is challenging
• According to previous studies, outside temperature has most influence on heat demand
V Bandi and R Lahdelma
21
Heat demand forecast using Regression based on outside temperature
• Dependent variable- Heat consumption (historical data)
• Independent variable- Outside temperature (historical data)
• Forecasting model yt = a0 + a1xt
• The curve fits badly at high temperatures, therefore it is misaligned also for cold temperatures
V Bandi and R Lahdelma
22
Standard Deviation (SD) or RMSE (Root-Mean-Squared-Error)
• The square root of the mean/average of the square of all of the error - The use of SD or RMSE is very common and it makes an
excellent general purpose error criteria for forecasts
• stdev(e) = sqrt(eTe/T)
V Bandi and R Lahdelma
23
Forecast based on outdoor temperatureForecast vs actual for sample week
The forecast is on good on average, but does not quite satisfactory, RMSE (Root-Mean-Squared-Error) or standard deviation for annual forecast is 20%
V Bandi and R Lahdelma
24
Forecast based on outdoor temperatureForecast vs actual for sample week
• RMSE = 20% (out of average demand) - not a good forecast
• Reason for low accuracy- Outside temperature alone cannot explain heat consumption
completely
- Outside temperature alone cannot explain heat consumption completely. This can be explained by correlation coefficient between outside temperature and heat consumption
V Bandi and R Lahdelma
25
Correlation coefficient
• The correlation coefficient is a number between -1 and 1 that indicates the strength of the linear relationship between two variables
- Very strong positive linear relationship between X and Y
• r ≈ 1:
- No linear relationship between X and Y. Y does not tend to increase or decrease as X increases.
• r ≈ 0:
- Very strong negative linear relationship between X and Y. Y decreases as X increases
• r ≈ -1
• The sign of r (+ or -) indicates the direction of the relationship between X and Y. The magnitude of r (how far away from zero it is) indicates the strength of the relationship.
V Bandi and R Lahdelma
26
Correlation coefficient
V Bandi and R Lahdelma
27
Correlation between outside temperature and heat consumption for a single building
• Correlation coefficient for a building r = -0.956- Strong negative relation ship
- Model could have been more accurate if r = -1
V Bandi and R Lahdelma
28
Residents behavior in a building
• People behavior usually have a rhythm (a strong, regular repeated pattern)
• Lets hypothesis residents behavior has similar rhythm or on weekdays (Monday to Friday) and weekends (Saturday and Sunday)
• Let us modify the forecast model using these week rhythms
V Bandi and R Lahdelma
29
Modified Forecast Model
• Original forecast model
yt = a0 + a1xt
• y intercept a0 also has a negative on accuracy, as it influences the forecast being a constant
• Modified forecast modelyt = ah(t) + a1xt
Where is a social component based on weekly rhythm
V Bandi and R Lahdelma
30
Forecast based using weekly rhythm
RMSE (Root-Mean-Squared-Error) or standard deviation for annual forecast is 13%
V Bandi and R Lahdelma
31
Improving accuracy of the model
• The weekly rhythm model does not consider that some weeks and days are different- E.g. during holiday seasons, religious holidays etc the
demand is different from the normal weekday
• The days can be classified e.g. working day, Saturday, holiday
• Possible to include more independent variables- solar radiation, wind speed and direction, cloudiness, ...
• In general these affect the precision only a little
• History data from multiple years- Weighted regression – recent history can obtain more
weight