Very short-term wind speed forecasting with Bayesian structural break model

at SciVerse ScienceDirect

Renewable Energy 50 (2013) 637e647

Contents lists available

Renewable Energy

journal homepage: www.elsevier .com/locate/renene

Very short-term wind speed forecasting with Bayesian structural break model

Yu Jiang a, Zhe Song a,*, Andrew Kusiak b

a School of Business, Nanjing University, 22 Hankou road, Nanjing 210093, ChinabDepartment of Mechanical and Industrial Engineering, 3131 Seamans Center, The University of Iowa, IA 52242-1527, Iowa City, United States

a r t i c l e i n f o

Article history:Received 27 November 2011Accepted 31 July 2012Available online

Keywords:Time seriesForecastingWind powerWind speedBayesian structural break model

* Corresponding author.E-mail addresses: [email protected] (Y.

[email protected] (Z. Song), andrew-kusiak@u

0960-1481/$ e see front matter � 2012 Elsevier Ltd.http://dx.doi.org/10.1016/j.renene.2012.07.041

a b s t r a c t

This paper examines a new time series method for very short-term wind speed forecasting. The timeseries forecasting model is based on Bayesian theory and structural break modeling, which couldincorporate domain knowledge about wind speed as a prior. Besides this Bayesian structural break modelpredicts wind speed as a set of possible values, which is different from classical time series model’ssingle-value prediction This set of predicted values could be used for various applications, such as windturbine predictive control, wind power scheduling. The proposed model is tested with actual wind speeddata collected from utility-scale wind turbines.

� 2012 Elsevier Ltd. All rights reserved.

1. Introduction

Wind is one of the most promising green energy sources. Theworld’s installed wind power capacity is exponentially increasingin recent years and wind industry is expanding into a large-scalebusiness [1]. However many technical challenges need to besolved to further reduce cost of wind energy and improve perfor-mance and reliability of wind turbines as well as power systems.Among these challenges, wind speed and power forecasting is thekey technology that could be applied to various aspects of the windindustry, such as wind turbine predictive control [2,3], wind powergrid integration and economic dispatch [4,5], wind farm siting andso on. Forecasting in wind industry could be roughly divided intotwo categories: (very) short-term or long-term forecasting [1].Short-term forecasting is more focused on predicting wind speedsor power several minutes, hours or days ahead. Long-term fore-casting involves predictions several days, weeks or even monthsahead. According to published literature, methods used to performthese predictions are: statistical learning, physical modeling andhybrid of these two [1,6]. Specifically, for very short-term windspeed or power forecasting, time series modeling techniques, suchas ARIMA or neural networks are widely used and achievedpromising results [15e19].

Statistical learning emphasizes learning from historical data,including wind data, control system data and so on [1]. The

Jiang), [email protected],iowa.edu (A. Kusiak).

All rights reserved.

forecasting model is usually represented as a time series model ordynamic model, and the model’s parameters are estimated byminimizing some cost function over historical training dataset.Different statistical learning algorithms could achieve this goal,such as data mining algorithms, neural networks [7e9]. Statisticallearning based forecasting models are easy to build and haverelatively high prediction accuracy for short-term forecasting.Some complaints about these approaches are difficulties tounderstand the models, sometimes called black-boxes. Forecastingerrors could get accumulated quickly with increasing predictionhorizons.

Another classical way to forecast wind speed is based on phys-ical models coming from atmospheric science. These methods arewidely used by numeric weather forecasting systems. The fore-casting models are typically composed of differential equationsderived from the first law [1]. Physical modeling needs deepknowledge about atmospherics and is not easy to set up andmaintain. Its main strength lies in interpretability and long-termforecasting. Compared with statistical learning, physical modelingcan’t fully utilize the data coming from various sources. Mostimportant thing is that the forecasting models built from physicalmodeling have a hard time to provide very short-term predictions(e.g. a few minutes ahead).

Realizing advantages and disadvantages of both forecastingmethods, hybrid models are presented to integrate statisticallearning with physical modeling. One popular and practical way isto use numeric weather forecasting data as part of inputs for theforecasting model extracted by statistical learning algorithms, suchas neural networks [10] or auto-regression models [11]. Training

Delta:1_given name

Delta:1_given name

Delta:1_surname

Delta:1_given name

Delta:1_given name

Delta:1_surname

mailto:[email protected]




www.sciencedirect.com/science/journal/09601481

http://www.elsevier.com/locate/renene

http://dx.doi.org/10.1016/j.renene.2012.07.041



Table 1Notations of parameters in the model.

Notation Parameter

mj Location parameter of regime jhj Precision parameter of regime jst Indicator of structural break at time ts Scale parameter of regime precisionshjn Shape parameter of regime precisionshjm Location parameter of regime locations mjh Scale parameter of regime locations mjp Probability of occurrence of a break

Y. Jiang et al. / Renewable Energy 50 (2013) 637e647638

data for the learning algorithms is roughly composed of wind data(e.g. directions and speeds), control systems data and numericweather forecasting data. The hybrid approach is widely acceptedby forecasting practitioners in the wind industry to provide accu-rate and adaptive prediction models or ensembles of differentprediction models [1,11e14]. A thorough review on state-of-the-artwind power forecasting could be found in the Argonne NationalLab’s report [1].

In this paper we are focused on very short-term wind speed orpower forecasting, where forecasting horizons are usually severalhours or even minutes. Usually the very short-term wind speedforecasting could be regarded as a time series modeling problem,where only past wind speeds are used to predict future windspeeds.

The aforementioned literature are more focused on predictingwind speeds or power point by point, i.e. single forecast fora specific time stamp. However this single-point forecast doesn’tmeet various applications’ needs, such as power system operationswhere risks and uncertainties have to be quantified. To solve thisproblem, some researchers emphasized the need to generatescenarios or probabilistic numbers for wind power forecasting. Inother words, the wind power forecast is treated as a randomnumber following some distribution [20e22].

So far only a limited number of time series modeling techniquesare applied to the aforementioned very short-term wind powerforecasting. Besides these time series models don’t specificallyprovide a mechanism to allow integrate prior knowledge aboutwind speeds into the forecasting process.

A Bayesian approach solves this issue by specifying priordistributions of model parameters. But more than that, Bayesianforecastingmodel produces a predictive sample of thewind speeds.The predictive sample provides more information than a classicalpoint forecast. For instance, it can be used to obtain quantileforecasts.

This paper tests the possibility of applying a state-of-the-artstructural break time series method developed in Geweke andJiang [23] to very short-term wind speed forecasting. The fore-casting model applied in this paper is based on the Bayesian theoryand structural break modeling. The structural break modeling wasoriginally developed to detect structural breaks or regimes changesin nonlinear (macroeconomic and financial) time series by theclassical hypothesis test procedure (see [24] for a recent survey).Ignorance of structural breaks in estimating and forecastingnonlinear time series causes unreasonable consequences [25e27].However, classical structural break modeling approaches are notsuitable for forecasting time series with evolving structural breaks.The reason is they do not account for new breaks occurring out-of-the sample. Bayesian approaches solve this problem by introducingthe probability of in-sample breaks as a parameter and then usingits posterior information to forecast out-of-sample breaks. Appli-cations of Bayesian forecasting in economics in the presence ofstructural breaks include [23,28e30]. Wind speed time series arenonlinear in general and experience structural breaks or regimechanges over time. Recently [31e33], applied a regime-switchingmodel to capture nonlinearities caused by regime changes of timeseries, where the number of regimes is finite and past regimes canrecur. Structural break model applies a similar idea as regime-switching model [34], but it captures nonlinearities of time seriesby assuming an infinite number of different regimes. This charac-teristic leads to more flexibility in modeling nonlinear time series[23], such as wind speed.

In this paper we pioneer in using a Bayesian structural breakmodel to forecast wind speed. The advantages of this approach are:(1) Bayesian techniques allow us to incorporate domain knowledgeabout the very short-termwind speed patterns into the time series

model as a prior; (2) this Bayesian structural break model predictsseamlessly the wind speed as a set of possible values, which isdifferent from traditional time series model’s single-value predic-tions or post-processing predictions into probabilistic forms.

On the other side, very short-termwind speed forecasting is notfully studied in contemporary literature, especially for forecaststhat are used for predictive control purposes (e.g. use the fore-casting wind speeds to optimally control wind turbines for loadreduction and power maximization). Forecasting into long futurehorizon is hard. However forecasting seconds or minutes ahead isnot trivial when the prediction frequency is high, e.g. generatingpredictions every 10 s. The proposedmodel is testedwith actual 10-s average wind speed data collected from utility-scale windturbines.

2. Bayesian structural break model and wind speed forecast

In recent years, many time series models are developed to dealwith nonlinearities in time series that cause problems for tradi-tional linear models. Among these models, structural break modelshave become increasingly popular and achieved promisingcomputational results in estimation and prediction of economictime series data [23,28,34]. The structural break model can beinterpreted as a nonlinear model with linear properties that differacross regimes. That is to say, observations in different regimes aredrawn from different distributions. Changes between regimes arecalled structural breaks and a structural break usually appearswhen there is an unexpected shift in the level or volatility of a timeseries. Ignorance of structural breaks in estimation and forecastingcan lead to huge errors and unreliable results in general.

Wind speed time series are nonlinear in nature. Even within anhour, wind speeds experience different periods, for example,periods of high and low speeds, or periods of high and low vola-tilities. Without thinking about physical reasons behind thissudden-change phenomenon, we could statistically infer thereexist different regimes (distributions) that generate this windspeed time series. Therefore, amultiple structural breakmodel is anappropriate tool to forecast wind speed under the presence ofregime changes or structural breaks. On the other hand, windspeeds are not totally random. Based on historical data or domainknowledge, we roughly know when the wind speed will go up anddownwithin a certain time horizon. For example, for a specific site,local people can tell you about the wind speed patterns withina day, which translates into the structural breaks. Thus Bayesiantechniques are designed to incorporate the domain knowledge intothe structural break model, where the number of distributions orregimes will be given as a prior. Table 1 lists notations of parame-ters in the model.

Let y ¼ (y1, ..., yT)0 denote a time series of wind speed with Tobservations. Corresponding to each observation there is a latentvariable st taking on two values 0 or 1, and st is independent andidentically distributed Bernoulli variables with Pr(st ¼ 1) ¼ p. The

Fig. 1. Sample paths of change rates (10-sec. data).

Table 2Results with different prior briefs of p (10-sec. data).

Prior mean Posterior mean

Sample 1 Sample 2

0.01 0.0191 0.02090.05 0.0394 0.06420.1 0.0413 0.0732

Y. Jiang et al. / Renewable Energy 50 (2013) 637e647 639

set of latent variables s¼ (s1, ..., sT�1)0 can be used to track structuralbreaks or regime changes of a time series in a following way: (1)st ¼ 0 indicates that there is no occurrence of structural break orregime change at time t, i.e. observation ytþ1 will be in the sameregime as that for yt; (2) st ¼ 1 indicates that a break or regimechange occurs at time t and therefore observation ytþ1 will be ina new regime starting from time tþ 1. Given latent variables we candetermine the regime in which observation yt stays and the totalnumber of regimes, J, of a time series. Assume that yt is in the jthregime, then j ¼ 1þPt�1

k¼1 sk and the total number of regimes

J ¼ 1þPT�1k¼1 sk. The model further assumes that observations in

regime j are normally distributed as

yt

��mj;hj�wiid N�mj; h�1j

�ðj ¼ 1; :::; JÞ

where mj and hj represent the location (mean) and precision(inverse of variance) parameters of regime j.

2.1. Prior distributions

The model adopts a two-tier hierarchical prior structure forreasons discussed below. The first level contains regime coefficients{mj, hj}(j ¼ 1, ..., J) and latent variables {st}(t ¼ 1, ..., T�1) and theirprior distributions are

shjwc2ðnÞ;

mj

��hjwN�m;h�1

j $h�1�;

stwBernoulliðpÞ:

(1)

The combination of prior distributions of mj and hj in (1) formsa so-called normal-gamma prior that is known as a conjugate priordistribution, i.e. the prior and posterior distributions of {mj, hj} areboth members of normal-gamma family. This setting of conjugateprior distribution makes it possible to marginalize regime coeffi-cients in the joint posterior distribution (see details in Section 2.2)and therefore simplify the posterior simulation.

The second level consists of parameters m, h, s, v which specifythe prior distributions in (1) and parameter p that governs thefrequency of structural break or regime change. Their prior distri-butions are

m��m; h�wN

�m; h�1�

s h��s; n�wc2

�n�

as��a; b�wc2

�b�

n�� lwexp

�l�

p��a; b�wBeta

�a; b

�;

(2)

where m;h; s; n; a;b; l;a; b, each denoted with an underbar, are fixedhyperparameters needed to be specified before application of themodel.

Note that the number of breaks J�1 has a binomial priordistribution with probability parameter p, which can be viewed asan implicit hierarchical prior distribution. Therefore, the number ofbreaks or regimes is treated as a random variable and estimated asa parameter, which removes the unreasonable restriction that it isgiven as a fixed number ex ante as in the model of [34,35].

The first level of the hierarchical prior structure expresses thebelief that regime coefficients come from a common distribution,meanwhile the model also allows differences across regimesthrough the second level parameters. The setting of hierarchicalprior will provide more flexibility in prior belief of parameters.Through the second level parameters m, h, s, v and p, the hierar-chical prior also allows information about regime coefficients inone regime to depend on information of other regimes. Thisproperty is particularly useful for forecasting time series since thefrequency and size of out-of-sample breaks are not completelyrandom and it will depend partly on the history of sample data andin-sample breaks and partly on a random element.

Fig. 2. Interval forecasts of sample 1 (10-sec. data): prior mean of p ¼ 0.01.


2.2. Posterior distributions and simulation

Let m ¼ (m1, ..., mJ)0, h ¼ (h1, ..., hJ)0, q ¼ (m, h, s, v, p)0 and s ¼ (s1, ...,sT�1)0, then according to Bayes’ Theorem

f ðm; h; q; sjyÞ ¼ f ðyjm; h; q; sÞf ðm; h; sjqÞf ðqÞf ðyÞ f

f ðyjm; h; q; sÞf ðm; h; sjqÞf ðqÞ:Therefore, the joint posterior probability density kernel is

proportional to the product of conditional density function of y, theprior density functions of all parameters and latent variables(Appendix 1, 2, 3 and 4 give all details). As mentioned before, theconjugate normal-gamma prior distribution given in (1) leads toa normal-gamma posterior distribution of {mj, hj}(j ¼ 1, ..., J). sþ s2j þ

njh�m� yj

�2nj þ h

!hj

��ðs; n;m; h; p; s; yÞwc2�nj þ n

�;

mj

��hj; s; n;m; h;p; s; y�wN

�hmþ njyj

��hþ nj

� ;h�1j $�hþ nj

��1

!;

(3)

where yj ¼ ð1=njÞi0njyj; s2j ¼ ðyj � injyjÞ0ðyj � injyjÞ; yj is the vector

containing nj observations in regime j, and in denotes an n� 1vectorof units. The normal-gamma posterior distribution in (3) makes itpossible to analytically integrate regime coefficients in the jointposterior distributions,

f ðq; sjyÞfZ

:::

Zf ðyjm;h; q; sÞf ðm;h; q; sÞdmdh;

which implies that we only need to deal with the posterior distri-butions of q ¼ (m, h, s, v, p)0 and latent variables s. This propertysignificantly simplifies the simulation of the posterior distributionand thus reduces the computational cost. The joint posteriordensity kernel of f ðq; sj:yÞ is given in Appendix 5.

The posterior distribution can be simulated using Markov chainMonte Carlo (MCMC). The idea is to construct a Markov chain

Fig. 3. Interval forecasts of sample 1 (10

fqðmÞ; sðmÞgMm¼1 on a general state space such that the limiting

distribution of the chain is f ðq; sj:yÞ. More specifically, the posteriorsimulator is, globally, a Gibbs sampler with six blocks, m, h, s, v, pand s. The Gibbs sampling algorithm to obtain a sequence

fqðmÞ; sðmÞgMm¼1 proceeds as follows:

1. Make an initial draw ðmð0Þ;hð0Þ; sð0Þ; yð0Þ; pð0ÞÞ from corre-sponding prior distributions in (2) and generatesð0Þ ¼ ðsð0Þ1 ; :::; sð0ÞT�1Þ0 from stwBernoulliðpð0ÞÞ ðt ¼ 1; :::; T � 1Þ.

2. Iterate from m ¼ 1 to M successively make draws

a) mðmÞ from f ðmj:hðm�1Þ; s2ðm�1Þ; yðm�1Þ; pðm�1Þ; sðm�1Þ; yÞ;

b) hðmÞ from f ðhj:mðmÞ; sðm�1Þ; yðm�1Þ;pðm�1Þ; sðm�1Þ; yÞ;

c) sðmÞ from f ðsj:mðmÞ;hðmÞ; yðm�1Þ; pðm�1Þ; sðm�1Þ; yÞ;

d) yðmÞ from f ðnj:mðmÞ;hðmÞ; sðmÞ; pðm�1Þ; sðm�1Þ; yÞ;

e) pðmÞ from f ðpj:mðmÞ;hðmÞ; sðmÞ; yðmÞ; sðm�1Þ; yÞ;

f) sðmÞ from f ðsj:mðmÞ;hðmÞ; sðmÞ; yðmÞ; pðmÞ; yÞ:

Notice that except for p the conditional posterior density func-tions (kernels) in steps 2a for m, 2b for h, 2c for s, 2d for v, and 2f for sare not of any known forms, therefore, MetropoliseHastings algo-rithms are applied in the simulations of these blocks. The condi-tional posterior probability density kernels and the MetropoliseHastings algorithms used here are given in Appendix 6 and 7.

2.3. Forecasting

If structural breaks exist in the past then it is sure that they willoccur in the future. Therefore, forecasting time series in the pres-ence of future breaks then becomes important and necessary. Forthe purpose of forecasting, a successful model should have

-sec. data): prior mean of p ¼ 0.05.

Table 3Performance evaluation of sample 1 (10-sec. data).

Persistentmodel

AR(4)model

Imp (%) NNs Imp (%) Our modelp ¼ 0.01

Imp (%) Our modelp ¼ 0.05

Imp (%)

MAE 0.177 0.191 �7.910 0.174 1.695 0.165 6.780 0.17 3.955MSE 0.058 0.062 �6.897 0.047 18.966 0.055 5.172 0.054 6.897RMSE 0.24 0.249 �3.750 0.216 10 0.235 2.083 0.232 3.333


desirable features as follows: (1) New breaks can occur over theforecast horizon; (2) The number of breaks in the future should notbe fixed ex ante; (3) Observed sample and in-sample breaks shouldpartially inform the size and frequency of out-of-sample breaks; (4)It should be easy and convenient to implement.

Based on the Gibbs sampler given in Section 2.2, the procedureof generating K-step-ahead out-of-sample forecasts, yTþ1, ..., yTþK,conditional on the observed sample is described as follows:

1. At iterationm, generate sðmÞTþkwBernoulliðpðmÞÞ ðk ¼ 0; :::;K � 1Þ

and then use sðmÞT ; :::; sðmÞ

TþK�1 to determine the regime number ofeach yTþ1, ..., yTþK.

2. Generate regime coefficients ðhðmÞJ ;m

ðmÞJ Þ of last regime

according to posterior distributions in (3).3. If there are new regimes out-of-sample, generate regime

coefficients ðhðmÞj ;m

ðmÞj Þðj � J þ 1Þ from sðmÞhjwc2ðnðmÞÞ;

mj

��hðmÞj wNðm;hðmÞ�1

j $hðmÞ�1Þ.

4. Generate yðmÞTþ1; :::; y

ðmÞTþK from NðmðmÞ

j ;hðmÞ�1j Þ using correspond-

ing regime coefficients according to the regime number of each

yðmÞTþ1; :::; y

ðmÞTþK determined in step 1.

The procedure of forecasting described above satisfies the fourdesirable features.

� By means of latent variables the model allows breaks to occurout-of-the sample, i.e. a break occurs at time Tþk if the latentvariable sTþk ¼ 1(k ¼ 0, ..., K�1).

� As a characteristic inherited from the in-sample modeling thisforecasting model also treats the number of out-of-samplebreaks as a binomial random variable where the probabilitythat a break occurs is p.

� On the other hand, the frequency of out-of-sample breaks isalso dependent on the information about in-sample breakssince p is drawn from the posterior distribution and it reflectsthe frequency of in-sample breaks.

� Finally, implementation of this procedure is quite easy andsimple. This procedure of forecasting can be embedded into theoriginal MCMC by adding few steps at the end of each iteration.The computational cost is relatively low. For example, in thecomputational study below, to generate 100,000 draws of a 6-


step-ahead predictive sample it only takes about 3 min usingan average PC, which is simpler and more efficient thantraining neural networks forecasting models.

3. Computational study

To test the efficacy of the proposed forecasting approach, realwind speed time series are used. The wind speed data used in thispaper was collected by the anemometer installed on the top ofeach wind turbine’s nacelle from a wind farm located in the eastcoast of Jiangsu province, China. Though the data was sampled ata very high frequency by a wind turbine’s SCADA system, it wasaveraged and stored at 10-s or 10-min intervals (referred to the 10-min or the 10-s average data) in wind industry. The 10-s (highfrequency) data shows the dynamic nature of the wind turbine,while the 10-min data reflects rather steady state of the turbine.The data used in this research is collected over a period of twoweeks. The proposed forecasting method is tested on both 10-s and10-min average data and compared with a persistent forecastingmodel, where the wind speed forecasting could be described by thefollowing equation: byTþ1 ¼ yT ,., byTþK ¼ yT . The persistentmodel assumes the near future wind speed will be the same as thecurrent wind speed. Although it is rather simple, the persistentmodel is used as a reference model to justify implementation ofadvanced forecasting models [1]. When the forecasting time step isvery short, for example, every 10 s, the persistent model is hard tobeat (i.e. K is relatively small, e.g. 1 or 2). One of many scientificexplanations is the air’s inertia property (i.e. the physical laws). Inlater forecasting comparisons, we compare the performance of thepersistent model with our model, and also with the AR model andneural networks, at a certain time horizon, e.g. K ¼ 6. In this paperwe confine the comparisons among these popular and typicalforecasting models due to limited space. The computational resultsalready justify the Bayesian structural break model’s value in veryshort-term wind speed forecasting. Thorough and completecomparisons with other time series forecasting models will be ourfuture research agenda.

Although numerous computational experiments are performedfor different wind speed time series of different wind turbines, thispaper only reports the computational results from a randomlyselected wind turbine. The quantitative levels of the forecastingresults and the magnitude of the forecasting performance

-sec. data): prior mean of p ¼ 0.01.

Fig. 5. Interval forecasts of sample 2 (10-sec. data): prior mean of p ¼ 0.05.


fluctuations reported depend on different datasets, but the quali-tative results are consistent and insensitive.

In order to get a stationary time series, we focus on the series oflog change rates of wind speeds, instead of directly using the windspeed time series in our forecasting model. For either 10-s or 10-min dataset, let yt be the averaged wind speed of period t, thenthe rate of wind speed change of period t is defined as ratet ¼ ln(yt/yt�1). For computational study below, based on the series of changerates rate ¼ (rate1, ..., rateT)0 we first forecast rateTþ1, rateTþ2, ...,rateTþK over a forecasting horizon of K periods, and then the K-step-ahead prediction of wind speed can be calculated asyTþK ¼ yTexpð

PKk¼1 rateTþkÞ.

Asmentionedbeforeweneed to specify hyperparameters of priordistributions before implementation of our forecasting model. Priorspecifications allow us to incorporate domain knowledge into windspeed forecasting. For location parameter m and scale parameter h ofregime locationparameter mj, we set m ¼ PT

t¼1 ratet=T , h ¼ 10000,s ¼ 0:08 and n ¼ 8. For shape parameter n and scale parameter s ofregime precision parameter hj, we specify l ¼ 0:25, a ¼ 800 andb ¼ 8. These values of hyperparameters provide diffuse priordistributions and reflect no specific prior information. We alsoconduct a prior sensitivity analysis and the results indicate thatposterior distributions of all parameters are not sensitive to differentvalues of above hyperparameters. The parameter p reflects priorbelief about the frequency of structural breaks and posterior distri-bution of p and the number of regimes is found to be sensitive todifferent prior briefs of p, which leads to different forecasting accu-racy. In order to check forecasting performances with different priorbriefs of p, we experiment with different pairs of a and b so that theprior mean of p is equal to different values while keeping the priorstandarddeviationofpbeingequal tohalfof itspriormean. InSection3.1 forecasting results on 10-s average data are reported with thepriormean of p being 0.01 and 0.05, while the priormean of p is 0.05and 0.2 for forecasting based on 10-min average data in Section 3.2.

3.1. Forecasting results on 10-s average data

Forecasting the wind speed at 10-s intervals has many criticalapplications in wind power, such as predictive control of windturbines. Knowing the near future wind speed distributions allowsthe control system to make optimal adjustments of the pitch angleand generator torque.

Table 4Performance evaluations of sample 2 (10-sec. data).

Persistentmodel

AR(4)model

Imp (%) NNs Im

MAE 1.44 1.668 �15.833 2.433 �6MSE 3.372 3.551 �5.308 6.146 �8RMSE 1.834 1.884 �2.726 2.479 �3

For the purpose of forecasting performance evaluation, twodatasets of 10-s wind speeds are randomly selected: (1) 0:00:00e3:01:20, Aug. 8, 2007; (2) 14:00:00e17:01:20, Sep. 22, 2009. Thecorresponding time series of change rates for each dataset contain1088 observations. Fig. 1 shows the sample paths of two series ofchange rates, where abrupt changes happen often and presentchallenges for forecasting.

Before forecasting, it is interesting to investigate the posteriorinformation of parameter p with different prior briefs of p. Resultslisted in Table 2 indicate that the posterior mean of p increases asthe prior mean of p increases, however, when the prior mean isrelatively large an increment of 0.05 (from 0.05 to 0.1) in the priormean only leads to small increment in the posterior mean. For thisfact we choose the prior mean of p to be 0.01 and 0.05 in theforecasting experiments below.

Using the forecasting method described in Section 2.3, wemake a 6-step-ahead (i.e. 1 min ahead) forecast of wind speedyTþ6 for T ¼ 1079, 1080, ..., 1088. A predictive sample of 1000draws is generated for each of the 10 forecasting objects. Wechoose the widely used absolute value function to be the lossfunction, and then as discussed in [36] the optimal point forecastis the median of predictive distribution. Therefore, we report themedian of the predictive sample as the forecasted value and alsorecord percentiles of the predictive sample to provide intervalforecasts.

Let yTþ6 be the measured value of wind speed and byTþ6 be theforecasted value, then the forecasting error is defined aseTþ6 ¼ byTþ6 � yTþ6 for T ¼ 1079, ..., 1088. The forecasting perfor-mance is evaluated based on three criteria: mean absolute error(MAE), mean square error (MSE) and root mean square error(RMSE), which are defined as below

MAE ¼ 110

X1088T¼1079

jeTþ6j ;MSE ¼ 110

X1088T¼1079

ðeTþ6Þ2;

RMSE ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi110

X1088T ¼1079

ðeTþ6Þ2vuut ð4Þ

In order to compare forecasting performance with other fore-casting models, such as autoregressive model and neural networks,we set the persistent forecasting model as a benchmark and report

p (%) Our modelp ¼ 0.01


Imp (%)

8.958 1.385 3.819 1.38 4.1672.266 2.996 11.151 3.137 6.9695.169 1.731 5.616 1.771 3.435

Fig. 6. Sample paths of change rates (10-min. data).

Table 5Results with different prior briefs of p (10-min. data).

Prior mean Posterior mean

Sample 1 Sample 2

0.01 0.0421 0.03050.05 0.0792 0.06150.1 0.0912 0.07170.2 0.0961 0.0768


the improvement relative to the persistent forecastingmodel that isdefined as

Imp ¼ EvCper � EvCEvCper

� 100%; (5)

where EvC is the considered evaluation criterion defined in (4),which can either be MAE, MSE or RMSE, and EvCper denotes thevalue of EvC for the persistent forecasting model.

3.1.1. Sample 1Figs. 2 and 3 display the interval forecasts for sample 1 along

with themeasured and forecasted values. Fig. 2 is for the case whenthe prior mean of parameter p is 0.01 and Fig. 3 is for 0.05. Theinterval forecasts indicate that measured values are inside the 60%confidence interval for both cases.

Table 3 shows the forecasting performance evaluations forsample 1. The order of the AR model and those in the followingsections is specified according to the partial autocorrelation func-tion of the time series. For neural networks, the inputs are pastwind speeds at t, t � 1, ., t � 6; the output is the wind speed att þ 6. The neural networks are trained based on the samplesstarting from 00:02:00 to 3:00:40 of Aug. 8, 2007. For example, thefirst training data point is constructed as following: neuralnetworks’ output is t þ 6 ¼ 00:02:00, neural networks’ inputs aret ¼ 00:01:00, t � 1 ¼ 00:00:50, t � 2 ¼ 00:00:40, t � 3 ¼ 00:00:30,t � 4 ¼ 00:00:20, t � 5 ¼ 00:00:10, t � 6 ¼ 00:00:00. We trained100 neural networks with different network structures and func-tions. Based on the training error analysis, the best five are selectedto form an ensemble. Then the ensemble is used to forecast thewind speeds starting from 3:00:50 to 3:02:20. For both cases, ourforecasting model performs better than the persistent forecastingmodel and the autoregressive model. Comparing to the neuralnetworks, our model provides a lower MAE and a higher MSE, butboth are very close. This is probably due to the choice of absolutevalue function as the loss function rather than the quadraticfunction.

3.1.2. Sample 2Same forecasting experiments are done for sample 2 as for

sample 1. Figs. 4 and 5 show the results of interval forecasts for twocases and Table 4 lists the results of performance evaluations.Similar to the comparisons in sample 1, neural networks are trained

and tested on the corresponding samples. Measured values are allinside the 80% confidence interval and our method outperformsother forecasting models for both cases.

3.2. Forecasting results on 10-min average data

Forecasting the wind speed or wind power at a 10-min intervalis not only helpful in wind turbine predictive control, but alsocritical for wind power scheduling and related ramp events prep-aration. Two datasets of 10-min. average data are used for thepurpose of forecasting: (1) 0:00 Aug 8e23:50 Aug 14, 2007; (2)0:00 Sep 22e23:50 Sep 28, 2008; and both datasets contain 1008observations. Fig. 6 shows the sample paths of the correspondingseries of change rates, where magnitudes of change are relativelylarger than those of 10-sec. series.

Table 5 shows the posterior information of parameter p withdifferent prior briefs. When the prior mean of p is large, increasingprior mean has little influence on the posterior mean. We choosethe prior mean of p to be 0.05 and 0.2 in the forecastingexperiments.

For T ¼ 990, ., 999, our forecasting method is applied toproduce 6-step-ahead (1 h ahead) predictions. Interval forecasts areprovided for both samples. Using the similar criteria defined in (4)and (5), forecasting performance is evaluated and compared to thepersistent model.

3.2.1. Sample 1Fig. 7 shows the predictive densities of 10 min ahead forecast

(byTþ1) through 60min ahead forecast (byTþ6) for sample that ends atT ¼ 990. As time goes the predictive density becomes more andmore diffuse, which is consistent with the intuition.

Figs. 8 and 9 display the interval forecasts. When forecastsbecome unreliable as measured values are outside 90% confidence

Fig. 7. Predictive densities for sample 1 (10-min. data, T ¼ 990, p ¼ 0.05).

Fig. 8. Interval forecasts of sample 1 (10-min. data): prior mean of p ¼ 0.05.


interval (22:40 and 22:50), by updating the sample our modelmake corrections (23:00 and later).

Table 6 shows the results of performance evaluation for bothcases and the results indicate that our method performs better thanthe persistent model and close to the AR model, but worse thanNNs. However, our model’s performance is consistently out-performing the persistent model.


Table 6Performance evaluations of sample 1 (10-min. data).

Persistentmodel

AR(4)model

Imp (%) NNs Im

MAE 3.015 3.015 0.000 2.762MSE 12.763 12.687 0.595 10.488 1RMSE 3.573 3.562 0.308 3.238

3.2.2. Sample 2Interval forecasts are plotted in Figs. 10 and 11. Measured values

are all inside the 90% confidence interval. Results of performanceevaluation are given in Table 7. Performance evaluations of sample 2(10-min. data) indicate that our forecasting model performs betterthan the persistentmodel for both cases. The ARmodel has a similarforecasting performance as our model, and is better than the NNs.

-min. data): prior mean of p ¼ 0.2.

p (%) Our modelp ¼ 0.01


Imp (%)

8.391 2.963 1.725 3.01 0.1667.825 12.414 2.734 12.684 0.6199.376 3.523 1.399 3.561 0.336


Table 7Performance evaluations of sample 2 (10-min. data).

Persistent model AR(4) model Imp (%) NNs Imp (%) Our model p ¼ 0.01 Imp (%) Our model p ¼ 0.05 Imp (%)

MAE 2.331 2.282 2.102 2.537 �8.837 2.285 1.973 2.266 2.789MSE 7.582 7.251 4.366 9.531 �25.706 7.344 3.139 7.169 5.447RMSE 2.754 2.671 3.014 3.087 �12.092 2.71 1.598 2.677 2.796



So far in these four studied samples, our model is consistent inoutperforming the persistent model. Both AR model and NNs arenot reliable. In some cases, their performances are significantlyworse than the persistent model.

4. Conclusion

This paper introduced a new time series model of forecastingvery short-term wind speed. The forecasting model integrates theconcepts of structural breaks and Bayesian inferences, which allowsprior information about the wind speeds to be incorporated intothe model and somehow boosts forecasting performance. For veryshort-term wind speed or power forecasting (e.g. the forecastingtime step is 10 s), persistent model is very hard to beat. Theproposed method is tested with real-world wind speed time seriesand its forecasting performance is compared with a benchmarkpersistent model and other popular forecasting approaches.Computational results confirm the advantages of the proposedmethod which outperforms all other tested methods, with theexception of NN forecasting which however (unlike our ownmethod) in some cases provides quite unreliable forecasts,although there are several limitations about our method reportedin this paper. For example the algorithm’s computing efficiency isnot high and the size of training sample is not large enough.However the method reported in this paper shed new lights on theresearch of wind speed or power forecasting for the wind industry.Further research is needed to make this approach more accurateand robust so that it can be deployed in real-time applications.

Besides, several assumptions used in this forecasting model couldbe relaxed, for example, within each regime the currently usedsimple normal distribution can be extended to a more complicatedform such as an autoregressive model or a linear regression model.

Acknowledgment

The authors thank the reviewers for providing careful reviewsand valuable comments. This research has been supported by theNatural Science Foundation of China, Grant No. 71001050, and theIowa Energy Center, Grant No. 07-01.

Appendix

1. Conditional density kernel of observations y

0J

1 2J � � � �. 3

f ðyjm;h;sÞf@Yj¼1

hnj=2jAexp4�X

j¼1

hj yj� injmj0yj� injmj 25:

where m ¼ (m1,..., mJ)0, h ¼ (h1,..., hJ)0, yj is the vector containing the njobservations in regime j, and in denotes an n � 1 vector of units.

2. Conditional density kernel of latent variables s

f ðsjpÞfpJ�1ð1� pÞT�J

j¼1 j nj þ h


3. Prior density kernels

a) f ðhj:s; n; sÞfYJj¼1

½2n=2Gðn=2Þ��1sn=2hðn�2Þ=2j expð�shj=2Þ

b) f ðmj:h;m;h; sÞfYJj¼1

h1=2j h1=2exp½�hjhðmj � mÞ2=2�

c) f ðmj:m;hÞfexp½�hðm� mÞ2=2�

d) f ðhj: s; nÞfhðn�2Þ=2expð� s h=2Þ

e) f ðsj: a; bÞfsðb�2Þ=2expð� a s=2Þ

f) f ðnj: lÞfexpð� l nÞ

g) f ðpj:a; bÞfpa�1ð1� pÞb�1

4. Joint conditional posterior density kernels of all parametersand latent variables

f ðm;h;q;sjyÞf�2n=2Gðn=2Þ�JsJn=2hJ=2YJ

hðnþnj�1Þ=2j

j¼1

$exp

(�PJ

j¼1hjhsþh

�mj�m

�2þ�yj� injmj�0�yj� injmj

�i.2

)

$exp�h�m�m

�2=2�h�n�2��

2exp

��sh=2�

exp��lv

�s�b�2��

2exp

��as=2�paþJ�2ð1�pÞbþT�J�1

(6)

5. Joint conditional posterior density kernels after integration ofm and h

f ðq; sjyÞ ¼ f ðm;h; s; n; p; sjyÞ

fYJj¼1

G��nj þ n

��2�

Gðn=2Þ $YJj¼1

h

hþ nj

!1=2

sJn=2$

YJj¼1

"sþ s2j þ

njhnj þ h

�m� yj

�2#�ðnjþnÞ=2

$exp� h�m� m

�2.2�h� n�2��

2exp

�� s h=2�exp

�� l n�

s�b�2��

2exp

�� a s=2�paþJ�2ð1� pÞbþT�J�1

6. Conditional posterior density kernels of parameters and latentvariables

a) f ðsj:m;h; n; p; s; yÞfsðb�2Þ=2expð� a s=2ÞsJn=2

QJj¼1

"sþ s2j þ

njhnj þ h

ðm� yjÞ2#�ðnjþnÞ=2

b) f ðnj:m;h; s; p; s; yÞfexpð� l nÞsJn=2YJj¼1

Gððnj þ nÞ=2ÞGðn=2Þ

QJj¼1

"sþ s2j þ

njhnj þ h

ðm� yjÞ2#�n=2

c) f ðmj:h; s; n; p; s; yÞfexp½� hðm� mÞ2=2�

QJj¼1

"sþ s2j þ

njhnj þ h


d) f ðhj:m; s; n; p; s; yÞfhðn�2Þ=2expð� s h=2ÞYJj¼1

h

hþ nj

!1=2

QJ

"sþ s2 þ njh ðm� yjÞ2

#�ðnjþnÞ=2

e) f ðpj:m; s; n;h; s; yÞfpaþJ�2ð1� pÞbþT�J�1

0p��ðm; s; n;h; s; yÞwBetaðaþJ � 1; bþT � JÞ

f) f ðsj:m; s; n; p;h; yÞfpJ�1ð1� pÞT�JsJn=2$YJj¼1

Gððnj þ nÞ=2ÞGðn=2Þ

$YJj¼1

h

hþ nj

!1=2YJj¼1

"sþ s2j þ

njhnj þ h


7. MetropoliseHastings algorithms

a) The MetropoliseHastings algorithms for drawing m, h, s, vare similar, and here we just take m as an example:i. Find bm that maximizes the conditional posterior

density of m. This can be done by solvingv

vmlnkðmj:h; s; n; p; s; yÞ ¼ 0; where kðmj:h; s; n; p; s; yÞ

ff ðmj:h; s; n; p; s; yÞ denote the conditional posteriordensity kernel given in Appendix 6.c.

ii. Evaluate Ibm ¼ � v2

vm2lnkðmj:h; s; n; p; s; yÞj:

m¼bm .iii. Draw candidate ~m from Nðbm; Ibm�1Þ and accept it with

probability

max

8>><>>:kð~mjh; s; n; p; s; yÞ=exp�� Ibm�~m� bm�2=2�kðmjh; s; n; p; s; yÞ=exp�� Ibm�m� bm�2=2�;1

9>>=>>;:

b) The MetropoliseHastings algorithms for drawing s.i. Get a candidate draw ~s from the previous draw s by

choosing one of the three ways according to probabil-ities qadd, qdel, qmove (which sum to 1): (1) with prob-ability qadd add a break, i.e. randomly select a timepoint t*˛ft : st ¼ 0g and set ~st* ¼ 1; (2) with proba-bility qdel delete a break, i.e. randomly select a timepoint t*˛ft : st ¼ 1g and set ~st* ¼ 0; (3) with proba-bility qmove move a break, i.e. randomly select a timepoint t*˛ft : st ¼ 1g and then set ~st* ¼ 0 and~st*þ1 ¼ 1 or ~st*�1 ¼ 1 with equal probability 0.5(if t*¼ 1 or t*¼ T� 1t* ¼ T � 1, then only set ~s2 ¼ 1 or~sT�2 ¼ 1; if the time point to be moved is originallya break, then it is equivalent to delete the break).

ii. Calculate the probability qð~sj:sÞ, the probability ofchanging s to ~s and the probability qðsj:~sÞ, the proba-bility of changing ~s to s

iii. Accept the candidate draw ~s with probability

min kð~sjm; s; n;p;h; yÞ=qð~sjsÞkðsjm; s; n;p;h; yÞ=qðsj~sÞ;1

�

where kðsj:m; s; n; p;h; yÞ is the conditional posterior density kernelgiven in Appendix 6f.


References

[1] Monteiro R, Bessa V, Miranda A, Botterud A, Wang J and Conzelmann G. 2009.Wind power forecasting: state-of-the-art 2009. Argonne National Laboratory;Argonne, Illinois, USA, Tech. Rep. ANL/DIS-10e1.

[2] Kusiak A, Song Z, Zheng H. Anticipatory control of wind turbines with data-driven predictive models. IEEE Transactions on Energy Conversion 2009;24:766e74.

[3] Kusiak A, Li W, Song Z. Dynamic control of wind turbines. Renewable Energy2010;35:456e63.

[4] Botterud A, Wang J, Miranda V, Bessa RJ. Wind power forecasting in U.S.electricity markets. The Electricity Journal 2010;23:71e82.

[5] Barthelmie RJ, Murray F, Pryor SC. The economic benefit of short-term fore-casting for wind energy in the UK electricity market. Energy Policy 2008;36:1687e96.

[6] Ma L, Luan S, Jiang C, Liu H, Zhang Y. A review on the forecasting of windspeed and generated power. Renewable and Sustainable Energy Reviews2009;3:915e20.

[7] Kusiak A, Zheng H, Song Z. Short-term prediction of wind farm power: a datamining approach. IEEE Transactions on Energy Conversion 2009;24:125e36.

[8] Sfetsos A. A comparison of various forecasting techniques applied to meanhourly wind speed time series. Renewable Energy 2000;21:23e35.

[9] Alexiadis MC, Dokopoulos PS, Sahsamanoglou HS, Manousaridis IM. Short-term forecasting of wind speed and related electrical power. Solar Energy1998;63:61e8.

[10] Jursa R, Rohrig K. Short-term wind power forecasting using evolutionaryalgorithms for the automated specification of artificial intelligence models.International Journal of Forecasting 2008;24:694e709.

[11] Sanchez I. Short-term prediction of wind energy production. InternationalJournal of Forecasting 2006;22:43e56.

[12] Kusiak A, Zheng H, Song Z. Wind farm power prediction: a data-miningapproach. Wind Energy 2009;12:275e93.

[13] Ramirez-Rosado J, Fernandez-Jimenez LA, Monteiro C. Comparison of twonew short-term wind-power forecasting systems. Renewable Energy 2009;34:1848e54.

[14] Sanchez I. Adaptive combination of forecasts with application to wind energy.International Journal of Forecasting 2008;24:679e93.

[15] Hong Y, Chang H, Chiu C. Hour-ahead wind power and speed forecastingusing simultaneous perturbation stochastic approximation (SPSA) algorithmand neural network with fuzzy inputs. Energy 2010;35(9):3870e6.

[16] Kavasseri RG, Seetharaman K. Day-ahead wind speed forecasting usingf-ARIMA models. Renewable Energy 2009;34:1388e93.

[17] Liu H, Tian H, Chen C, Li Y. A hybrid statistical method to predict wind speedand wind power. Renewable Energy 2010;35:1857e61.

[18] Wong WK, Xia M, Chu WC. Adaptive neural network model for time-seriesforecasting. European Journal of Operational Research 2010;207:807e16.

[19] CadenasE,RiveraW.Windspeed forecasting in threedifferent regionsofMexico,using a hybrid ARIMAeANNmodel. Renewable Energy 2010;35:2732e8.

[20] Pinson P, Kariniotakis G. On-line assessment of prediction risk for wind powerproduction forecasts. Wind Energy 2004;7:119e32.

[21] Pinson P, Madsen H, Nielsen HA, Papaefthymiou G, Klöckl B. From probabi-listic forecasts to statistical scenarios of short-term wind power production.Wind Energy 2009;12:51e62.

[22] Pinson P, Nielsen HA, Madsen H, Kariniotakis G. Skill forecasting fromensemble predictions of wind power. Applied Energy 2009;86:1326e34.

[23] Geweke J, Jiang Y. Inference and prediction in a multiple-structural-breakmodel. Journal of Econometrics 2011;132:172e85.

[24] Perron P. Dealing with structural breaks. In: Patterson K, Mills T, editors.Palgrave handbook of econometrics. Econometric theory, vol. 1. PalgraveMacmillan; 2006. p. 278e352.

[25] Clements M, Hendry D. Forecasting economic time series. Cambridge: Cam-bridge University Press; 1998.

[26] Clements M, Hendry D. Forecasting non-stationary economic time series.Cambridge: The MIT Press; 1999.

[27] PesaranM,TimmermannA.Howcostly is it to ignorebreakswhen forecasting thedirection of a time series? International Journal of Forecasting 2004;20:411e25.

[28] Koop G, Potter S. Estimation and forecasting in models with multiple breaks.Review of Economic Studies 2007;74:763e89.

[29] Jochmann M, Koop G, Strachan R. Bayesian forecasting using stochastic searchvariable selection in a VAR subject to breaks. International Journal of Fore-casting 2010;26:326e47.

[30] Chauvet M, Potter S. Business cycle monitoring with structural changes.International Journal of Forecasting 2010;26:777e93.

[31] Gneiting T, Larson K, Westrick K, Genton M, Aldrich E. Calibrated probabilisticforecasting at the stateline wind energy center: the regime-switching spaceetime method. Journal of the American Statistical Association 2006;101:968e79.

[32] Pinson P, Christensen L, Madsen H, Sorensen P, Donovan M, Jensen L. Regime-switching modelling of the fluctuations of offshore wind generation. Journalof Wind Engineering and Industrial Aerodynamics 2008;96:2327e47.

[33] Pinson P, Madsen H. Adaptive modelling and forecasting of wind powerfluctuations with Markov-switching autoregressive models. Journal of Fore-casting 2012;31(4):281e313.

[34] Pesaran M, Pettenuzzo D, Timmermann A. Forecasting time series subject tomultiple structural breaks. Review of Economic Studies 2006;73:1057e84.

[35] Chib S. Estimation and comparison of multiple change-point models. Journalof Econometrics 1998;86:221e41.

[36] Gneiting T. Quantiles as optimal point predictors. International Journal ofForecasting 2011;27(2):197e207.

Very short-term wind speed forecasting with Bayesian structural break model

Documents