-
Improved synthetic wind speed generation usingmodied Mycielski
approachMehmet Fidan1, Fatih Onur Hocaolu2,3,*, and mer N.
Gerek1
1Department of Electrical Engineering, Anadolu University,
26555, Eskiehir, Turkey2Engineering Faculty, Department of
Electrical Engineering, Afyon Kocatepe University, 03200
Afyonkarahisar, Turkey3Solar and Wind Reseach and Application
Center, Afyon Kocatepe University, 03200 Afyonkarahisar, Turkey
SUMMARY
In this paper, novel approaches for wind speed data generation
using Mycielski algorithm are developed and presented. Toshow the
accuracy of developed approaches, we used three-year collected wind
speed data belonging to deliberately se-lected two different
regions of Turkey (Izmir and Kayseri) to generate articial wind
speed data. The data belonging tothe rst two years are used for
training, whereas the remaining one-year data are used for testing
and accuracy comparisonpurposes. The concept of distinct synthetic
data production with correlation-wise and distribution-wise similar
statisticalproperties constitutes the main idea of the proposed
methods for a successful articial wind speed generation.
Generateddata are compared with test data for both regions in the
sense of basic statistics, Weibull distribution parameters,
transitionprobabilities, spectral densities, and autocorrelation
functions; and are also compared with the data generated by the
clas-sical rst-order Markov chains method. Results indicate that
the accuracy and realistic behavior of the proposed method
issuperior to the classical method in the literature. Comparisons
and results are discussed in detail. Copyright# 2011 JohnWiley
& Sons, Ltd.
KEY WORDS
wind speed; prediction; Mycielski; Markov; modeling; synthetic
data generation
Correspondence
*F. O. Hocaolu, Engineering Faculty, Department of Electrical
Engineering, Afyon Kocatepe University, 03200
Afyonkarahisar,Turkey.E-mail: [email protected]
Received 28 April 2010; Revised 19 May 2011; Accepted 12 June
2011
1. INTRODUCTION
The knowledge of wind speed time series is of vital impor-tance
to evaluate the characteristics of the data to deter-mine the wind
regime of any region for electricitygeneration purposes [16].
Synthetic wind data generationgives very useful insights to
understand the underlyingprocess for the meteorological phenomenon.
There are anumber of studies that deal with synthetic wind
speedmodeling. A brief review of these studies is given in
thesucceeding paragraphs:
Sahin and Sen have modeled the wind speed data mea-sured from
the Marmara region of Turkey using rst-orderMarkov chains [7]. Tore
et al. used rst-order Markovchain models for synthetic generation
of hourly wind speedtime series in Corsica region [8]. Youcef
Ettoumi et al.have modeled three hourly wind speed and wind
directiondata by means of Markov chains [9]. Autoregressivemodels,
Markov chains, and wavelet methods are alsoused for wind speed data
generation by Aksoy et al.
[10]. Shamshad et al. have generated hourly wind speeddata using
rst-order and second-order Markov chainsand compared the rst-order
and second-order Markovchains using wind speed data measured from
two differentregions in Malaysia [11]. In their study, it is
concluded thatthe wind speed behavior slightly improves by
increasingthe Markov-model order. Recently, Hocaoglu et al.
alsomodeled the wind speed data using Markov chains andobserved the
effect of number of Markov states [12]. Inthese studies, the wind
speed generation was based onMarkov transition probabilities. The
use of Markov chainsto generate wind speed data is appropriate for
approximat-ing the general statistical parameters of orginal data.
How-ever, it is not appropriate to approximate the variations orto
keep the correlations between the samples of the windspeed data.
Spectral density analysis gives very usefulinsights to understand
the behavior of the data in time[13]. The spectral characteristics
of the previously reportedmethods are clearly far from the
characteristics of real windspeed data.
INTERNATIONAL JOURNAL OF ENERGY RESEARCHInt. J. Energy Res.
2012; 36:12261237
Published online 31 August 2011 in Wiley Online Library
(wileyonlinelibrary.com). DOI: 10.1002/er.1893
Copyright# 2011 John Wiley & Sons, Ltd.1226
-
In this paper, a novel approach using Mycielski algo-rithm (as
presented in Section 2) is applied to generate ar-ticial wind speed
data. Originally, the Mycielskialgorithm was designed as a
predictor, which was alsoused for forecasting the wind speed in the
future [14].The Mycielski algorithm was also developed and usedfor
coding and compression in communications [15], or,with simple
inversion modications, it was also used as apseudo-random number
generator [16]. In this paper, theMycielski algorithm is converted
to wind speed data gener-ator instead of predictor with several
changes. The gener-ated data have same characteristics with the
original data.However, this generated data are totally different
fromthe original data at sample base. To show the accuracyand
efciency of the proposed method, we also tested theMarkov chain
approach for wind speed data generationas mentioned in Section 3.
The generated data from theproposed methods and the measured data
are compared,and detailed analysis on generated data is carried out
inSection 4. In this section, generation methods are discussedin
the sense of Weibull distribution parameters,
transitionprobabilities, spectral densities, and autocorrelation
func-tions of the generated and measured data. Finally, theresults
and the conclusions are given in Section 5.
2. WIND SPEED DATA GENERATIONUSING MODIFIED
MYCIELSKIALGORITHM
Wind speed time series generation is important to under-stand
the underlying process of the data. Such a study isnecessary for
further analysis on wind data.
The Mycielski algorithm performs a prediction on thetime series
data using the total exact history of the datasamples. The basic
idea of the algorithm is to search forthe longest sufx string at
the end of the data sequence,which had been repeated at least once
in the history ofthe sequence. The search starts with a short
(length=1)template size and goes on increasing the template size
aslong matches are found in the history. When the longest
re-peating sequence is determined, the value of the sampleright
after the repeating template is assigned as the predic-tion value.
The prediction rule works according to the
intuitive fact that if this pattern had appeared like this inthe
past, then it is expected to behave the same now.
A time series predictor can be generalized with the ex-pression
in Equation (1):
x
n 1 fn1 x 1 ; . . . ; x n (1)
where the difference between x
n 1 and the actual valuex[n+1] is expected to be small. For our
particular case, thefunction f() performs an iterative algorithm
that starts fromthe shortest data segment at the end (i.e.,
lengthone sam-ple: x[n]) then one by one increases the data
segmentlength to the left side as (x[n1],x[n]), (x[n2],x[n1],x[n]),
and so forth. Meanwhile, the segments are searchedfrom the end
point to the start point by sliding over thesamples. Several
matches could be found during the algo-rithm run. At a point of a
no-match, a probably long seg-ment will not be encountered anywhere
in the pastsequence. At that point, the prediction is made as the
nextsample value of the latest encountered (1shorter) match-ing
string. Naturally, the algorithm searches through thewhole data
sequence repeatedly for each prediction step,and it has high
computational requirements. The overallscheme can be analytically
expressed as follows:
m arg maxL
x k x n ; x k 1 x n 1 ; . . . ; x k L 1 x n L 1
;
fn1 x
n 1 x m (2)
The original Mycielski algorithm works on binarysequences. For
binary sequences, the steps of the algorithmcan also be shown with
an example as in Table I.
Because the time series data adopted herein consist ofreal
numerical values, the algorithm should be modiedfor the articial
wind speed generation process. Anothermodication was also necessary
to avoid cyclic and re-peated generation outputs. In this work, two
types of mod-ications were proposed to remedy these problems.
Therst proposed algorithm can be dened as Mycielski gen-eration
with random noise addition (Myc-1), and the sec-ond algorithm can
be called Mycielski generation forlevel-reduced wind speed data
(Myc-2). For both of themodication methods, the wind speed data
were acquiredfor three years and were used for the generation of
one-
Table I. A sample run for basic Mycielski prediction.
X019=01101110011101011011, length:20, X
20=?
Scanned history Searched pattern Repeat location Prediction
location Prediction value0110111001110101101 1 18 19
1011011100111010110 11 15 17 001101110011101011 011 14 17
00110111001110101 1011 2 6 1011011100111010 11011 1 6
101101110011101 011011 0 6 10110111001110 1011011 No repeat
Previous location Previous predictionStop procedure. X
20=1
Improved synthetic wind speed generation M. Fidan, F. O. Hocaolu
and . N. Gerek
1227Int. J. Energy Res. 2012; 36:12261237 # 2011 John Wiley
& Sons, Ltd.DOI: 10.1002/er
-
year articial wind speed data. The measured wind speedvalues
vary between 0 and 14m/s with 0.1m/s quantizedlevels. For both
Myc-1 and Myc-2 methods, the trainingset is kept the same (two
years of data). However, bothof these methods utilize randomization
process to varythe output year data at each run, making it possible
to pro-duce several distinct realizations.
2.1. Myc-1 algorithm
In the Myc-1 algorithm, the rst sample (following arecorded
history of 3years data) of articial data was pre-dicted using a
modied Mycielski algorithm. The modi-cation consists of relaxing
the exact matching criterion ofEquation (2) to a close match in the
Hamming distance.Hamming distance can be shown as in Equation
(3).
dHx; y x; y 2 Rj 1;x 6 y0;x y
(3)
This kind of an alteration is necessary to process timeseries
data with non-integer values. By nature, oatingpoint numbers hardly
match in an exact manner; therefore,a match case is assumed if two
numbers are close within aninterval. For each numerical comparison,
this interval was
taken as 0.2 in our experiments. This redened distancecan be
expressed as in Equation (4).
dTOLx; y x; y 2 Rj 1;jx yj > 0:20;j x yj0:2
(4)
The method continues by also manipulating the predic-tion value
by perturbing it to a randomized level. Particu-larly, a random
value between 0.4 and 0.4 was addedto the prediction. This
perturbation was observed to avoidloops and cyclic limits of the
produced data, which other-wise occurs. The addition result was
assumed as the rstsample of articial data, and the history was
updated byattaching this sample to the end of list, which was
thethree-year data in the beginning. These sample generationand
history update steps were continued until the lengthof generated
data corresponds to one year. These steps ofthe Myc-1 algorithm are
also explained in the ow diagramshown in Figure 1.
The purpose of random noise is protection from beingstuck to a
repeat search. Unless the random noise is used,the generated data
will be a copy of a part of the history.Because of the proposed
modication, our generated datawould be unique and totally different
from the originaldata. On the other hand, the generated data have
statistical
LengthHistory = 3 365 24N = 0 , PN = 0
Search Repeatition
Repeat Found?
Update NP
PN = PN + NoiseN = N + 1
N = 365 24? { },History History PN=
Stop Procedure
Extend Searched Pattern Shorten Scanned History
Yes
Yes
No
No
Figure 1. Flow diagram of Myc-1 algorithm. PN: Nth sample of
generated wind speed data.
Improved synthetic wind speed generationM. Fidan, F. O. Hocaolu
and . N. Gerek
1228 Int. J. Energy Res. 2012; 36:12261237 # 2011 John Wiley
& Sons, Ltd.DOI: 10.1002/er
-
properties, which are pleasingly similar to the original dataas
presented in Section 4. The concept of distinct syntheticdata
production with correlation-wise and distribution-wise similar
statistical properties constitutes the main ideaof the proposed
methods for a successful articial windspeed generation.
2.2. Myc-2 algorithm
In the Myc-2 algorithm, the original data values were rstrounded
to the nearest integer values. This rounding stepcauses reduction
of the number of wind speed levels andconsequently increases the
lengths of repeating segments.Naturally, a Hamming distance is not
necessary for theinteger-valued sequence, because exact matching is
possi-ble with integer comparisons.
The algorithm starts by predicting the rst value fromthis
rounded history. Then the history data are separatedinto three
clusters corresponding to three years. The rstfour days data
(corresponding to 244 data values) are
generated from any of the three randomly selected clusterof the
history.
The searched part was limited within the selected datacluster.
Arbitrarily changing the cluster makes sure thatthe generated data
do not exactly duplicate a long seg-ment of the history data, which
may eventually stick intoa cyclic and repeating pattern. Once the
four days dataare generated, the selected part of the history is
shufedwith another randomly selected cluster of the history.These
generation and history shufing steps, which areexplained with the
ow diagram in Figure 2, continue untilone-year articial data
generation is completed.
The main motive of Myc-2 is obtaining not only
similarstatistical properties to the original data but also
similarspectral and correlation characteristics and
autocorrelationproperties with the original data. These
similarities werestated and analyzed in detail in Section 5. The
historyshufing gives the opportunity of analyzing different partsof
total history and avoids being stuck to the search in thesame place
of past data, which consequently avoids
{ }1 2 3, ,History History History History=Round History
365 24History1 History2 History3Length Length Length= = = 0, 0,
0NN P M= = =
1N N= +
M jHistory History=
{ }1,2,3j
Extend Searched PatternShorten Scanned History
Repeat Found?
Search Repeatition
Update NP
( )% 24 4 0 ?N =
{ },M M NHistory History P=
365 24?N =
Stop Procedure
1M M= +
Yes
Yes
Yes
No
No
No
Figure 2. Flow diagram of Myc-2 algorithm. PN: Nth sample of
generated wind speed data.
Improved synthetic wind speed generation M. Fidan, F. O. Hocaolu
and . N. Gerek
1229Int. J. Energy Res. 2012; 36:12261237 # 2011 John Wiley
& Sons, Ltd.DOI: 10.1002/er
-
generating a synthetic data that may be a long copy
ofhistory.
3. WIND SPEED GENERATIONUSING MARKOV CHAINS
To compare the proposed Mycielski methods with theclassical
Markov-based methods, we briey describe theMarkov method here.
Markov chains are based on theprobabilities of observing a
transition from one state(say, a predened wind speed interval) to
another withindiscrete time intervals. The probabilities are
expressed ina form of, so called, probability transition matrix.
Whileconstructing a Markov model, nitely many states of thesystem
must be determined [17]. In this study, to generatewind speed time
series, wind speed values are transformedinto wind state intervals.
The boundaries of the intervalsare selected as 1m/s. Then, the
corresponding Markovchain transition probabilities are
calculated.
As an illustration, let the number of states at each timeinstant
be n. Consequently, there will be nn transitionsbetween two
successive time instances. It is then possibleto nd the number of
transition probabilities, Pij, from astate at time t to another
state at time t+1, and accordingly,the following transition
probabilities matrix (A) can beconstructed from observed wind speed
data that includesthe described transition probabilities at the
correspondingrows and columns.
A
p11 p12 p13 . . . p1np21 p22 p23 . . . p2n: : : . . . :: : : . .
. :pn1 pn2 pn3 . . . pnn
266664
377775 (5)
For state transition matrix A, the following constraints(6) and
(7) must be satised;
0pij1 (6)
0pij1 (7)
The probabilities Pij in Equation (5) can be calculatedfrom
Equation (8)
pijmijPjmij
i; j 1;2; . . . ;n(8)
Here mij represents the number of observed transitionsfrom
states i to j that happens within one step of a timeinterval.
In practice, the transition probability matrix
elementsconstitute the relative frequency of the measured
windspeeds that fall into the jth state at time t+1 provided thatit
was in the ith state at the previous time step. Finally,the
cumulative transition probabilities of a system given
with transition probabilities in Equation (5) can be calcu-lated
using Equation (9).
Pik Xkj1
pij (9)
Here, Pik represents the transition probability in the ith
row at the kth state.After model construction is completed by
probability
calculations using available training data, the algorithmgiven
in the succeeding paragraphs is applied to generatesynthetic wind
speed time series.
1. Cumulative transition probabilities are calculatedusing
Equation (9), and a cumulative transitionmatrix in the form of
Equation (5) is obtained.
2. An initial state is selected at random (using randomnumber
generation fed by the Weibull distributionwith available
parameters).
3. A random number is produced with uniform distribu-tion
between 0 and 1.
4. The upper bound of the interval in which this randomvalue is
greater than the cumulative probability of theprevious state but
less than or equal to the cumula-tive probability of the following
state is determinedto be the new wind state.
To produce more realistic articial data, we also addeda random
amount of noise to the state values. This algo-rithm is available
in previous studies [712] and is widelyused for wind speed
generation from the rst-order andsecond-order Markov chains.
4. RESULTS
4.1. Data generation results
To test wind speed data generation abilities of modiedMycielski
(proposed herein) and Markov chains, we usedwind speed data
belonging to the rst three years (20032005) for training, whereas
the remaining one-year data(2006) are used for testing purposes for
two regions, Izmirand Kayseri. The training and test data for Izmir
andKayseri regions are ploted in Figures 3 and 4, respectively.
First, the modied Mycielski algorithms mentioned inSection 2 are
applied, and one-year wind speed data aregenerated for both regions
using the available training data.Then, using the Markov chain
approach, we calculatedstate transition probabilities of training
data for bothregions, and we applied the algorithm given in Section
3to generate articial wind speed data. To test the efciencyof the
methods, we compared basic statistics of generateddata from both
methods. The basic statistics of the gener-ated and measured data
for Izmir and Kayseri regions aregiven in Table II. In these
tables, the basic statistics oftraining data used for generation
are also presented. Itshould be noted that the basic statistics
constitute an initial
Improved synthetic wind speed generationM. Fidan, F. O. Hocaolu
and . N. Gerek
1230 Int. J. Energy Res. 2012; 36:12261237 # 2011 John Wiley
& Sons, Ltd.DOI: 10.1002/er
-
performance measure, and they do not provide time-variational
behavior of the produced data. Detailedanalysis is discussed in
Section 5.
In Table II, Myc-1 and Myc-2 indicate two differentmodied
Mycielski algorithms as mentioned in Section 2,whereas Markov
indicates wind speed data generation us-ing Markov approach as
mentioned in Section 3.
It can be observed from Table II that mean values of theactual
test data are closer to the mean values of the gener-ated data for
the Mycielski methods as compared with theMarkov approach. Similar
observations can be made forthe median values, too. The standard
deviations of themethods are relatively close; therefore, the
variations ofthe generated data are similar for all methods.
To illustrate the sample-wise performance of the methodson wind
speed data generation, the generated data fromMarkov, Myc-1, and
Myc-2 are plotted in Figure 5 for Izmirregion (such graphs for
Kayseri region are also availableupon request but not reported in
the manuscript to save
space). Clearly, Markov and Myc-1 generate data valueswith
arbitrary numerical values, whereas, because of itsrounded
(interval based) number generation behavior,Myc-2 produces data
with stepped values (Figure 7).
0 0.5 1 1.5 2 2.5x 104
0
5
10
15
20
Hour
Win
d Sp
eed
(m/s
)
(a)
0 2000 4000 6000 8000 100000
5
10
15
Hour
Win
d Sp
eed
(m/s
)
(b) Figure 4. (a) Train data, (b) test data obtained from
Kayseri.
0 0.5 1 1.5 2 2.5x 104
0
5
10
15
Hour
Win
d Sp
eed
(m/s
)
(a)
0 2000 4000 6000 8000 100000
5
10
15
Hour
Win
d Sp
eed
(m/s
)
(b) Figure 3. (a) Training data, (b) test data obtained from
Izmir.
Table II. Basic statistics of the data for Izmir and
Kayseri.
Max(m/s) Mean(m/s) Median(m/s) Std. Dev. (m/s)
IzmirTest data 13.6 3.4 3.1 2.0Train data 13.6 3.7 3.4 2.1Markov
14.0 3.0 2.7 2.0Myc-1 12.3 3.5 3.2 2.0Myc-2 11.0 3.6 3.0
2.0KayseriTest data 14.2 1.5 1.1 1.3Train data 16.4 1.6 1.2
1.3Markov 10.9 1.2 0.8 1.3Myc-1 10.8 1.6 1.2 1.3Myc-2 12.0 1.5 1.0
1.2
Improved synthetic wind speed generation M. Fidan, F. O. Hocaolu
and . N. Gerek
1231Int. J. Energy Res. 2012; 36:12261237 # 2011 John Wiley
& Sons, Ltd.DOI: 10.1002/er
-
Despite its steppedvalued visualization, the statistical
andautocorrelation-wise properties of the Myc-2 method werefound to
be superior to the other tested methods.
Finally, examining the basic statistics and time devia-tions of
generated and measured data, we can conclude thatmodied Mycielski
algorithms perform better than Markovmodels for wind speed data
generation.
A proper accuracy analysis of the produced syntheticwind speed
data should include the examination of Weibullparameters, Markov
transition probabilities, spectral densi-ties, and calculated
autocorrelation values of measured andgenerated test data for
cyclic behavior tests. These compar-ative tests are performed in
the following subsections. Theanalyses are performed for both Izmir
and Kayseri regions,and the results are interpreted.
4.2. Weibull parameters of generated data
In wind speed-related studies such as wind regime
deter-mination, Weibull parameter values play an importantrole. For
instance, Ulgen and Hepbasli explained theWeibull parameters of the
wind speed for Izmir region,for the years between 1995 and 1999
[18]. ahin and
Aksakal also analyzed the Weibull parameters of windspeed for
the eastern region of Saudi Arabia [19]. Classi-cally, the
histogram of wind speed values obeys a Weibulldistribution where
the parameters (such as variance andmean values) are utilized for
determination of the wind po-tential and regime. Therefore, size
optimizations of windturbines can be performed according to
expected electricalenergy from wind turbines. Accurate
optimizations obvi-ously depend on accurate determination of the
describedWeibull parameters. Therefore, in this subsection, the
suc-cess of the proposed method on wind speed data genera-tion is
examined by tting the test and generated data toWeibull
distribution (with two parameters) according tothe data obtained
from both regions. The Weibull distribu-tion function with two
parameters can be described byEquation (10).
f v kc
v
c
k1exp v
c
k v0; k; c > 0
(10)
In this equation, k and c represent shape and scaleparameters of
the distribution function.
0 2000 4000 6000 8000 100000
5
10
15
Hour
Win
d Sp
eed
(m/s
)
0 2000 4000 6000 8000 100000
5
10
15
Hour
Win
d Sp
eed
(m/s
)
(a)
0 2000 4000 6000 8000 100000
2
4
6
8
10
12
Hour
Win
d Sp
eed
(m/s
)
(c)
(b)
Figure 5. Generated wind speed data from (a) Markov, (b) Myc-1,
(c) Myc-2 approach for Izmir.
Improved synthetic wind speed generationM. Fidan, F. O. Hocaolu
and . N. Gerek
1232 Int. J. Energy Res. 2012; 36:12261237 # 2011 John Wiley
& Sons, Ltd.DOI: 10.1002/er
-
The Weibull parameters of the data are calculated andtabulated
in Table III for the measured and generated dataof Izmir and
Kayseri regions.
It can be calculated from Table III that absolute errorvalues
between measured test data and Markov approachis 0.2372 and 0.0638
for k and c parameters, respectively.On the other hand, absolute
errors for the same parametersbetween measured test and proposed
Myc1, Myc2 gener-ated data are calculated as 0.0517, 0.0854 and
0.1759,0.057. Therefore, it can be claimed that the generated
dataare well suited for wind regime determination. To illus-trate,
the distribution-wise matching of generated data toWeibull
distributions for wind regime histograms ofgenerated data of the
Izmir region from the proposedmethods are plotted in Figure 6.
Similar histogram is alsoavailable upon request for the Kayseri
region.
Corresponding theoretical Weibull distributions are alsodrawn as
overlay plots on the histograms. It is clear fromthese plots that
distributions of the generated data closelymatch the Weibull
behavior.
4.3. Markov transition probability analysisof generated data
In this section, to further analyze the efciency of theproposed
methods, we calculated the Markov transitionprobabilities of the
generated data from each methodand compared with the transition
probabilities of mea-sured data. The transition probabilities are
calculatedfrom Equation (6), and transition probability matrices
areformed. For simple visualization, the matrices are renderedin
mesh plots as presented in Figure 7 for the Izmir region.It can be
noticed that the transition structures of all of themethods closely
match the structure obtained from the testdata with some
differences in detail.
4.4. Spectral density analysis of generateddata
The energy spectral density describes how the energy(or
variance) of a signal or a time series is distributed
Table III. Weibull parameters of measured (test) and generated
(Markov, Myc-1, and Myc-2) data.
Region
Test Markov Myc-1 Myc-2
k c k c k c k c
Izmir 3.9765 1.8604 3.7393 1.7965 4.0282 1.9458 4.1524
1.9174Kayseri 1.8877 1.5482 1.9236 1.5311 2.0121 1.5689 1.8436
1.6183
0 1 2 3 4 5 6 7 8 9 10 11 12 13 140
0.1
0.2
Wind Speed Intervals (m/s)
Prob
abili
ty D
ensit
y
0 1 2 3 4 5 6 7 8 9 10 11 12 13 140
0.1
0.2
Wind Speed Intervals (m/s)
Pro
babi
lity
Den
sity
(a) (b)
0 1 2 3 4 5 6 7 8 9 10 11 120
0.1
0.2
Wind Speed Intervals (m/s)
Pro
babi
lity
Den
sity
0 1 2 3 4 5 6 7 8 9 10 110
0.1
0.2
Wind Speed Intervals (m/s)
Pro
babi
lity
Den
sity
(c) (d)Figure 6. Distribution histograms of (a) measured test
data (b) generated data from Markov (c) generated data from Myc-1,
and (d)
generated data from Myc-2 approaches for Izmir.
Improved synthetic wind speed generation M. Fidan, F. O. Hocaolu
and . N. Gerek
1233Int. J. Energy Res. 2012; 36:12261237 # 2011 John Wiley
& Sons, Ltd.DOI: 10.1002/er
-
with frequency. In this study, the spectral analysis ofthe wind
speed time series is carried out to obtain theinformation about
oscillatory changes in wind speed.Such an analysis is useful to
illustrate spectral similari-ties of the generated and measured
data. Although theprevious techniques (in Sections 4.1 and 4.2)
were com-monly applied for analysis, such time-variational
analysis(together with the autocorrelation analysis) is not a
com-monly applied analysis for the articially generated dataexcept
for Aksoy et al. who compared autocorrelationcoefcients of the
observed and generated data. The reasonfor staying away from
spectral or correlation analysis wasthe lack of time-variational
similarity of articial data tothe natural data in previous studies.
The Mycielski meth-ods (particularly, Myc-2) not only accurately
model theWeibull parameters but also provide a natural
oscillatorybehavior in the generated data. To motivate this
observa-tion, we calculated the spectral densities of generated
datafrom Markov, Myc-1, and Myc-2 approaches and com-pared with the
spectral behavior of the original test data.The spectra are plotted
for the Izmir region in Figure 8.Similar spectra plot for the
Kayseri region are not reportedhere to save space. However, it is
available upon requestfrom the authors.
In the plots of spectra, fundamental harmonic points,which are
the rst peaks following the start of the graph,correspond to the
frequency value of the main oscillation.In these three graphs, the
tagged points show that the fun-damental frequency peak appears at
frequency=11.94mHz.This frequency corresponds to the period of 24h,
whichmeans one day. The other harmonics naturally appear atinteger
multiples of the fundamental frequency. The exis-tence of a
fundamental period of 24h is a natural andexpected behavior of wind
phenomenon. The existence ofthis behavior for the case of synthetic
data generated bythe Myc-2 method (Figure 8c) indicates an
importantstrength of the proposed method in terms of depicting
thehard-to-achieve cyclic structure of the natural wind speeddata.
This is a clear advantage of the method, which doesnot exist in
other generation methods, and parametricdesign of dynamic wind
power systems is thought tobenet from this accuracy.
4.5. Autocorrelation results of generateddata
As a nal study, the characteristics of the generated datafrom
all approaches are discussed in the sense of
05
10
0
5
100
0.2
0.4
0.6
0.8
Wind State (i)Wind State (j)
Tran
sitio
n Pr
oba
bilit
y
05
10
0
5
100
0.2
0.4
0.6
0.8
Wind State (i)Wind State (j)
Tran
sitio
n Pr
oba
bilit
y
05
10
0
5
100
0.2
0.4
0.6
0.8
Wind State (i)Wind State (j)
Tran
sitio
n Pr
obab
ility
0 24 6
8 10
0
5
100
0.2
0.4
0.6
0.8
Wind State (i)Wind State (j)
Tran
sitio
n Pr
oba
bilit
y
(a)
(c)
(b)
(d)Figure 7. State transition probabilities of (a) test data (b)
generated data fromMarkov (c) generated data fromMyc-1 (d)
generated data
from Myc-2 for Izmir.
Improved synthetic wind speed generationM. Fidan, F. O. Hocaolu
and . N. Gerek
1234 Int. J. Energy Res. 2012; 36:12261237 # 2011 John Wiley
& Sons, Ltd.DOI: 10.1002/er
-
autocorrelation calculations. The autocorrelations at timelag k
are determined using the following equation:
rk 1
NkPNk i1
xi x
xik x
1NPNi1
xi x
xi x (11)
where x is the mean of wind speed time series (xi, i=1, 2,. . .,
N). Mathematically, the autocorrelation function canbe considered
as the inverse Fourier transform of the powerspectral density.
Because the spectral harmonic behavior ofthe Myc-2 method is
evident, it is expected that the auto-correlation function of this
method can be superior to theother methods. The autocorrelations
calculated from thedata of both regions for the measured and
generatedwind speed are presented in Figure 9a and b for Izmirand
Kayseri regions, respectively.
The critical difference and strength of the Mycielskimethod is
apparent because of this analysis. In Figure 9aand b, peak values
of autocorrelation functions are indi-cated for measured (test)
data. The autocorrelation func-tions can be interpreted as the
inverse Fourier transformsof the spectral densities depicted in
Figure 8 as explainedin Section 4.4. The peaks appearing at the
fundamentalfrequencies in Figure 8 also appear as peaks in the
autocorrelation functions, but this time, at the positions ofthe
fundamental period. Naturally, the location of the funda-mental
period corresponds to 24h (one day). In other words,there are
considerable correlations between the data obtainedat the same hour
of consecutive days. By analyzing the au-tocorrelation gures,
similar to the analysis in Section 4.4,we can see that Markov and
Myc-1 are unable to depictthe cyclic (or pseudo-periodic) behavior
of the wind speeddata (in a smoothly decaying function, we see no
autocor-relation peaks of the plots to indicate a cyclic
behavior).This inability was also conrmed in the plots of
Shamshadet al. in their study about synthetic wind speed
generationsfrom Markov-based models [12]. However,
articiallygenerated data using the proposed Myc-2 algorithm havethe
ability to catch these correlation-wise cyclic behavioras depicted
in Figure 9a and b. This capability is veriedby the data obtained
from the two geographically differentlocations (Izmir: seaside,
Kayseri: innerland). The describednatural behavior in terms of
daily correlations was found toexist in a similar way for both
sites, although in Izmir, itlooks better for Myc-2 method.
5. CONCLUSION
In this study, novel approaches for synthetic wind
speedgeneration are proposed and compared with approaches
0 50 100 15010
10
10
10
Frequency (MicroHertz)
Pow
er S
pect
rum
in d
B
Test DataMarkov
0 50 100 15010
10
10
10
Frequency (MicroHertz)
Pow
er S
pect
rum
in d
B
Test DataMyc-1
(a)
0 50 100 15010
10
10
10
Frequency (MicroHertz)
Pow
er S
pect
rum
in d
B
Test DataMyc-2
(c)
(b)
Figure 8. Spectral density of measured (test) data and (a)
Markov generated, (b) Myc-1 generated, (c) Myc-2 generated data for
Izmir.
Improved synthetic wind speed generation M. Fidan, F. O. Hocaolu
and . N. Gerek
1235Int. J. Energy Res. 2012; 36:12261237 # 2011 John Wiley
& Sons, Ltd.DOI: 10.1002/er
-
that existed in the literature. The proposed approaches
usemodications of the Mycielski prediction algorithm forgenerating
samples of articial wind speed data. TheMycielski algorithm
basically searches for long repetitionpatterns in the history for
predicting a next sample in thetime series. To demonstrate the
efciency of the proposedalgorithms, we selected two geographically
differentregions (Izmir and Kayseri) and used the wind speed dataof
these regions. Apart from the ne performances in termsof matched
Weibull distributions and state transition prob-abilities, it is
observed that one of the modied Mycielskialgorithms (notated by
Myc-2) has the ability to catchand produce samples with daily
quasi-cyclic behavior. Thisproperty inherently exists in real-life
data, and it can beseen from the peaks of autocorrelation or
spectral densityplots. The Myc-2 method was observed to produce
datawith very similar autocorrelation characteristics. Such
aproperty of generating articial wind speed data with natu-ral
daily variations (due to daynight transitions) was notencountered
in the data generated by previous articialdata generation methods
in the literature, making the pro-posed approach a noteworthy model
for understandingthe time series mechanisms of the wind speed
phenome-non. The pattern search strategy of the Mycielski
algorithm
proves to constitute a promising and plausible approach
forsimilar applications requiring forecasting.
NOMENCLATURE
x = measured data valuex
= predicted value of xdH = Hamming distancem = location of the
sample that is used as predicted
valuef = function of predictiondTOL = predened distanceA = state
transition matrixpij = probability of transition from states i to
jmij = number of observed transitions from states i to j
that happens within one step of a time intervalPik = transition
probability in the i
th row at the kth statek = shape parameter of Weibull
distribution functionc = scale parameter of Weibull distribution
functionf(v) = Weibull distribution functionx = mean of wind speed
time seriesrk = autocorrelations at time lag kn = length of the
data
0 10 20 30 40 50 60 70 80 90 100-0.2
0
0.2
0.4
0.6
0.8
1
1.2
Lag (hours)
Au
toco
rrel
atio
n
Test DataMyc-1Myc-2Markov
(a)
0 10 20 30 40 50 60 70 80 90 100-0.2
0
0.2
0.4
0.6
0.8
1
1.2
Lag (hours)
Au
toco
rrel
atio
n
Test DataMyc-1Myc-2Markov
(b)Figure 9. Autocorrelation functions of measured and generated
wind speed data for (a) Izmir and (b) Kayseri.
Improved synthetic wind speed generationM. Fidan, F. O. Hocaolu
and . N. Gerek
1236 Int. J. Energy Res. 2012; 36:12261237 # 2011 John Wiley
& Sons, Ltd.DOI: 10.1002/er
-
ACKNOWLEDGEMENTS
The authors thank the Turkish State Meteorological Service(DMI)
for supplying hourly wind speed data. The authorsare also grateful
to four anonymous reviewers of this journalfor their helpful
comments on an earlier version of thispaper.
REFERENCES
1. Celik AN. A statistical analysis of wind power densitybased
on the Weibull and Rayleigh models at thesouthern region of Turkey.
Renewable Energy 2004;29:593604.
2. Hrayshat ES. Wind resource assessment of theJordanian
southern region. Renewable Energy 2007;32:19481960.
3. Kavak Akpinar E, Akpinar S. An assessment on sea-sonal
analysis of wind energy characteristics and windturbine
characteristics. Energy Conversion andManagement 2005;
46:18481867.
4. Migoya E, Crespo A, Jimnez , Garca J, Manuel F.Wind energy
resource assessment in Madrid region.Renewable Energy 2007;
32:14671483.
5. Kurban M, Hocaoglu FO. Potential analysis of windenergy as a
power generation source. Energy Sources,Part B: Economics,
Planning, and Policy 2010;5:1928.
6. Hocaolu FO, Kurban M. Regional wind energyresource
assessment. Energy Sources, Part B: Eco-nomics, Planning, and
Policy 2010; 5:4149.
7. Sahin AD, Sen Z. First order Markov chain approachto wind
speed modeling. Journal of Wind Engineeringand Industrial
Aerodynamics 2001; 89:263269.
8. Tore MC, Poggi P, Louche A. Markovian model forstudying wind
speed time series in Corsica. Interna-tional Journal of Renewable
Energy Engineering2001; 3:311319.
9. Youcef Ettoumi F, Sauvageot H, Adane AHE. Statisti-cal
bivariate modeling of wind using rst order
Markov chain and Weibull distribution. RenewableEnergy 2003;
28:17871802.
10. Aksoy H, Toprak ZF, Aytek A, nal NE. Stochasticgeneration of
hourly mean wind speed data. Renew-able Energy 2004;
29:21112131.
11. Shamshad A, Bawadi MA, Wan Hussin WMA,Majid TA, Sanusi SAM.
First and second orderMarkov chain models for synthetic generation
of windspeed time series. Energy 2005; 30:693708.
12. Hocaolu FO, Gerek N, Kurban M. The effect ofMarkov chain
state size for synthetic wind speed genera-tion. The 10th
International Conference on ProbabilisticMethods Applied to Power
Systems (PMAPS2008),Rincn, Puerto Rico, May 2529, 2008.
13. Hocaolu FO, Gerek N, Kurban M. A novel windspeed modeling
approach using atmospheric pressureobservations and hidden Markov
models. Journal ofWind Engineering and Industrial Aerodynamics2010;
98:472481.
14. Hocaolu FO, Fidan M, Gerek N. Mycielski ap-proach for wind
speed prediction. Energy Conversionand Management 2009;
50:14361443.
15. Fidan M, Gerek ON. A time improvement over theMycielski
algorithm for predictive signal coding:Mycielski-78. Proceedings of
the 14th European Sig-nal Processing Conference EUSIPCO 2006,
Florence,Sep. 2006.
16. Ehrenfeucht A, Mycielski J. A pseudorandom se-quencehow
random is It. The American Mathemati-cal Monthly 1992;
99(4):373375.
17. Benjamin JR, Cornell CA. Probability, Statistics andDecision
for Civil Engineers. McGrawHill: NewYork,1970.
18. Ulgen K, Hepbasli A. Determination of Weibullparameters for
wind energy analysis of Izmir, Turkey.International Journal of
Energy Research 2002;26:495506.
19. ahin AZ, Aksakal A. A statistical analysis of windenergy
potential at the eastern region of Saudi Arabia.International
Journal of Energy Research 1999;23:909917.
Improved synthetic wind speed generation M. Fidan, F. O. Hocaolu
and . N. Gerek
1237Int. J. Energy Res. 2012; 36:12261237 # 2011 John Wiley
& Sons, Ltd.DOI: 10.1002/er