Self-adaptive Differential Evolution Algorithm …...Differential evolution (DE) algorithm, proposed by Storn and Price [1], is a simple but powerful population-based stochastic search
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1785
Self-adaptive Differential Evolution Algorithm for Numerical Optimization
A. K. QinSchool of Electrical and Electronic Engineering,
Abstract- In this paper, we propose a novel Self-adaptive Differential Evolution algorithm (SaDE),where the choice of learning strategy and the twocontrol parameters F and CR are not required to bepre-specified. During evolution, the suitable learningstrategy and parameter settings are gradually self-adapted according to the learning experience. Theperformance of the SaDE is reported on the set of 25benchmark functions provided by CEC2005 specialsession on real parameter optimization
1 Introduction
Differential evolution (DE) algorithm, proposed by Stornand Price [1], is a simple but powerful population-basedstochastic search technique for solving globaloptimization problems. Its effectiveness and efficiencyhas been successfully demonstrated in many applicationfields such as pattern recognition [1], communication [2]and mechanical engineering [3]. However, the controlparameters and learning strategies involved in DE arehighly dependent on the problems under consideration.For a specific task, we may have to spend a huge amountof time to try through various strategies and fine-tune thecorresponding parameters. This dilemma motivates us todevelop a Self-adaptive DE algorithm (SaDE) to solvegeneral problems more efficiently.
In the proposed SaDE algorithm, two DE's learningstrategies are selected as candidates due to their goodperformance on problems with different characteristics.These two learning strategies are chosen to be applied toindividuals in the current population with probabilityproportional to their previous success rates to generatepotentially good new solutions. Two out of three criticalparameters associated with the original DE algorithmnamely, CR and F are adaptively changed instead oftaking fixed values to deal with different classes ofproblems. Another critical parameter of DE, thepopulation size NP remains a user-specified variable totackle problems with different complexity.
2 Differential Evolution Algorithm
The original DE algorithm is described in detail asfollows: Let S c 9V be the n-dimensional search space
P. N. SuganthanSchool of Electrical and Electronic Engineering,
of the problem under consideration. The DE evolves apopulation of NP n-dimensional individual vectors, i.e.solution candidates, X, = (xi,l...x) E S, i = 1,...,NP,from one generation to the next. The initial populationshould ideally cover the entire parameter space byrandomly distributing each parameter of an individualvector with uniform distribution between the prescribedupper and lower parameter bounds x; and x,.
At each generation G, DE employs the mutation andcrossover operations to produce a trial vector UiG for
each individual vector XiG, also called target vector, inthe current population.
a) Mutation operationFor each target vector XiG at generation G , an
associated mutant vector Vi G = {VIi,G V2,GI...IViG } can
usually be generated by using one of the following 5strategies as shown in the online availbe codes []
"DE/randl/ ": ViG -Xrl,G + F* (Xr2,G Xr3,G)"DE/best/ ": ViEG -Xbest,G + F *(Xr ,G - Xr2,G)"DE/current to best/l ":
Vi,G = Xi,G + F- (XbeStG - Xi,G)+ F* (XIG - Xr2GG)"DE/best/2":Vi,G = Xbes,G + F .(Xrl,G - Xr2,G)+ F (X3 ,G - Xr4,G)"DE/rand/2":Vi,G = XrlG + F * (Xr2,G -Xr3,G)+ F (Xr4,G - XrsG)
where indices rt, r2, r3, r4, r5 are random and mutuallydifferent integers generated in the range [1, NP], whichshould also be different from the current trial vector'sindex i . F is a factor in [0,2] for scaling differentialvectors and XbesitG is the individual vector with bestfitness value in the population at generation G.
b) Crossover operationAfter the mutation phase, the "binominal" crossover
operation is applied to each pair of the generated mutantvector ViG and its corresponding target vector XiG to
generate a trial vector: Ui,G = (u1iG,G U2i,G .**. Uni,G)
X1j,,G = {Vj:i' , if (rand*[0,1] < CR)or (j = jrnd) nj-1,2'ji,=' Xi otherwVise
Authorized licensed use limited to: UNIVERSITY OF NOTTINGHAM. Downloaded on December 11, 2009 at 07:19 from IEEE Xplore. Restrictions apply.
1786
where CR is a user-specified crossover constant in therange [0, 1) and irand is a randomly chosen integer in therange [1, NP] to ensure that the trial vector UiG will
differ from its corresponding target vector XiG by atleast one parameter.
c) Selection operationIf the values of some parameters of a newly generated
trial vector exceed the corresponding upper and lowerbounds, we randomly and uniformly reinitialize it withinthe search range. Then the fitness values of all trialvectors are evaluated. After that, a selection operation isperformed. The fitness value of each trial vector f(UiJG)is com ared to that of its corresponding target vectorf(XG) in the current population. If the trial vector has
smaller or equal fitness value (for minimization problem)than the corresponding target vector, the trial vector willreplace the target vector and enter the population of thenext generation. Otherwise, the target vector will remainin the population for the next generation. The operation isexpressed as follows:
X =_ Ui,G if f(Ui,G) < f(Xi,G)i,G+l-X otherwise
The above 3 steps are repeated generation aftergeneration until some specific stopping criteria aresatisfied.
3 SaDE: Strategy and Parameter Adaptation
To achieve good performance on a specific problem byusing the original DE algorithm, we need to try allavailable (usually 5) learning strategies in the mutationphase and fine-tune the corresponding critical controlparameters CR, F and NP. Many literatures [4], [6]have pointed out that the performance of the original DEalgorithm is highly dependent on the strategies andparameter settings. Although we may find the mostsuitable strategy and the corresponding controlparameters for a specific problem, it may require a hugeamount of computation time. Also, during differentevolution stages, different strategies and correspondingparameter settings with different global and local searchcapability might be preferred. Therefore, we attempt todevelop a new DE algorithm that can automatically adaptthe learning strategies and the parameters settings duringevolution. Some related works on parameter or strategyadapation in evolutionary algorithms have been done inliteratures [7], [8].
The idea behind our proposed learning strategyadaptation is to probabilistically select one out of severalavailable learning strategies and apply to the currentpopulation. Hence, we should have several candidatelearning strategies available to be chosen and also we
need to develop a procedure to determine the probabilityof applying each learning strategy. In our currentimplementation, we select two learning strategies ascandidates: "rand/l/bin" and "current to best/2/bin" thatare respectively expressed as:
The reason for our choice is that these two strategies havebeen commonly used in many DE literatures [] andreported to perform well on problems with distinctcharacteristics. Among them, "rand/i/bin" strategyusually demonstrates good diversity while the "current tobest/2/bin" strategy shows good convergence property,which we also observe in our trial experiments.
Since here we have two candidate strategies, assumingthat the probability of applying strategy "rand/l/bin" toeach individual in the current population is p1 , theprobability of applying another strategy should beP2 = 1-p1 . The initial probabilities are set to be equal 0.5,i.e., p1 = p2 = 0.5. Therefore, both strategies have equalprobability to be applied to each individual in the initialpopulation. For the population of size NP , we canrandomly generate a vector of size NP with uniformdistribution in the range [0, 1] for each element. If the 1thelement value of the vector is smaller than or equal to p1,the strategy "rand/l/bin" will be applied to the jPindividual in the current population. Otherwise thestrategy "current to best/2/bin" will be applied. Afterevaluation of all newly generated trial vectors, the numberof trial vectors successfully entering the next generationwhile generated by the strategy "rand/i/bin" and thestrategy "current to best/2/bin" are recorded as ns, andns2, respectively, and the numbers of trial vectorsdiscarded while generated by the strategy "rand/l/bin"and the strategy "current to best/2/bin" are recorded asnfi and nf2 . Those two numbers are accumulated withina specified number of generations (50 in our experiments),called the "learning period". Then, the probability of p1is updated as:
The above expression represents the percentage of thesuccess rate of trial vectors generated by strategy"'rand/l/bin" in the summation of it and the successfulrate of trial vectors generated by strategy "current tobest/2/bin" during the learnng period. Therefore, theprobability of applying those two strategies is updated,after the learning period. Also we will reset all thecounters ns , ns2, nf1 and nf2 once updating to avoidthe possible side-effect accumulated in the previouslearning stage. This adaptation procedure can gradually
1786
Authorized licensed use limited to: UNIVERSITY OF NOTTINGHAM. Downloaded on December 11, 2009 at 07:19 from IEEE Xplore. Restrictions apply.
1787
evolve the most suitable learning strategy at differentlearning stages for the problem under consideration.
In the original DE, the 3 critical control parametersCR, F and NP are closely related to the problem underconsideration. Here, we keep NP as a user-specifiedvalue as in the original DE, so as to deal with problemswith different dimensionalities. Between the twoparameters CR and F , CR is much more sensitive tothe problem's property and complexity such as the multi-modality, while F is more related to the convergencespeed. According to our initial experiments, the choice ofF has a larger flexibility, although most of the time thevalues between (0, 1] are preferred. Here, we considerallowing F to take different random values in the range(0, 2] with normal distributions of mean 0.5 and standarddeviation 0.3 for different individuals in the currentpopulation. This scheme can keep both local (with samllF values) and global (with large F values) searchability to generate the potential good mutant vectorthroughout the evolution process. The control parameterCR, plays an essential role in the original DE algorithm.The proper choice of CR may lead to good performanceunder several learning strategies while a wrong choicemay result in performance deterioration under anylearning strategy. Also, the good CR parameter valueusually falls within a small range, with which thealgorithm can perform consistently well on a complexproblem. Therefore, we consider accumulating theprevious learning experience within a certain generationinterval so as to dynamically adapt the value of CR to asuitable range. We assume CR normally distributed in arange with mean CRm and standard deviation 0.1.Initially, CRm is set at 0.5 and different CR valuesconforming this normal distribution are generated foreach individual in the current population. These CRvalues for all individuals remain for several generations(5 in our experiments) and then a new set of CR values isgenerated under the same normal distribution. Duringevery generation, the CR values associated with trialvectors successfully entering the next generation arerecorded. After a specified number of generations (25 inour experiments), CR has been changed for several times(25/5=5 times in our experiments) under the same normaldistribution with center CRm and standard deviation 0.1,and we recalculate the mean of normal distribution of CRaccording to all the recorded CR values corresponding tosuccessful trial vectors during this period. With this newnormal distribution's mean and the standard devidation0.1, we repeat the above procedure. As a result, the properCR value range for the current problem can be learned tosuit the particular problem and. Note that we will emptythe record of the successful CR values once werecalculate the normal distribution mean to avoid thepossible inappropriate long-term accumulation effects.We introduce the above learning strategy and
parameter adaptation schemes into the original DEalgorithm and develop a new Self-adaptive Differential
Evolution algorithm (SaDE). The SaDE does not requirethe choice of a certain learning strategy and the setting ofspecific values to critical control parameters CR and F.The learning strategy and control parameter CR, whichare highly dependent on the problem's characteristic andcomplexity, are self-adapted by using the previouslearning experience. Therefore, the SaDE algorithm candemonstrate consistently good performance on problemswith different properties, such as unimodal andmultimodal problems. The influence on the performanceof SaDE by the number of generations during whichprevious learning information is collected is notsignificant. We further investigate this now.
To speed up the convergence of the SaDE algorithm,we apply the local search procedure after a specifiednumber of generations which is 200 generations in ourexperiments, on 5% individuals including the bestindividual found so far and the randomly selectedindividuals out of the best 50% individuals in the currentpopulation. Here, we employ the Quasi-Newton methodas the local search method. A local search operator isrequired as the prespecified MAX_FES are too small toreach the required level accuracy.
4 Experimental Results
We evaluate the performance of the proposed SaDEalgorithm on a new set of test problems includes 25functions with different complexity, where 5 of them areunimodal problems and other 20 are multimodal problems.Experiments are conducted on all 25 10-D functions andthe former 15 30D problems. We choose the populationsize to be 50 and 100 for lOD and 30D problems,respectively.
For each function, the SaDE is run 25 runs. Bestfunctions error values achieved when FES=le+2,FES=le+3, FES=le+4 for the 25 test functions are listedin Tables 1-5 for lOD and Tables 6-8 for 30D,respectively. Successful FES & Success Performance arelisted in Tables 9 and 10 for 1 OD and 30D, respectively.
functions 1-5 , functions 6-10, functions 1 1-15, functions16-20, and functions 21-25 are plotted in Figures 1-5respectively. The 30D convergence maps of the SaDEalgorithm on functions 1-5 , functions 6-10, functions 11-
15 are illustrated in Figures 6-8, respectively.
5-~~~~~~~~~~~-
I 2 t 4 5 6 I 8 0Pt x IC'
Figure 2. Convergence Graph for Function 6-10
12
~~~~~~~~~~~~~~~~~~o13
1 4 5 71
Figure 3. Convergence Graph for Function 11I-15
1789
-
-1-
off I ........zZ.t k
le L 4
iiii
le r, ii
I
......
Authorized licensed use limited to: UNIVERSITY OF NOTTINGHAM. Downloaded on December 11, 2009 at 07:19 from IEEE Xplore. Restrictions apply.
to' .
--.t1a117.
10~~~~~~~~~~~~~~~~~~~~1
W . l
F i~~~~~~~~E
Figure 4. Convergence Graph for Function 16-20
10;
;-W 24r- faal2S
0\ 5 2 $ 4 $ 6 7 6 4 i0
, Oxlo'
Figure 5. Convergence Graph for Function 21-25
1 r0i,0&
r¢-I.
r0
0i 05 1 1t4 9 Zr~~~~~FE
vII
ii
Xt
Figure 7. Convergence Graph for Function 6-10
,oXhi rXke|2~~~~~~~~~~&141
i0t L ;
1rf
r¶0
r.'4f
0 05' t1 6 2 2{**_w %~~~~~~~~~~~~~~1
Figure 8. CnegneGahfrFnto1-5
1 .. ... T ww
-
t.--r.
5. W~~~~~~~~~~~-
t
1-0*Lm 2.$
Figure 6. Convergence Graph for Function 1-5
From the results, we could observe that, for lODproblems, the SaDE algorithm can find the global optimal
km 2r solution for functions 1, 2, 3, 4, 6, 7, 9, 12 and 15 withsuccess rate 1, 1, 0.64, 0.96, 1, 0.24, 1, 1 and 0.92,respectively. For some functions, e.g. function 3, although
-_-r the success rate is not 1, the final obtained best solutions4it are very close to the success level; For 30D problems, the
SaDE algorithm can find the global optimal solutions forfunctions 1, 2, 4, 7 and 9 with success rate 1, 0.96, 0.52,0.8 and 1, respectively. However, from function 16throughout to 25, the SaDE algorithm cannot find anyglobal optimal solution for both 1 OD and 30D over the 25runs due to the high multi-modality of those compositefunctions and also the local search process asscociated
11 with the SaDE make the algorithm to prematurelyconverge to a local optimal solution. Therefore, in ourpaper, we do not list the 30D results for functions 16-25.The algorithm complexity, which is defined onhttp://www.ntu.edu.sg/home/EPNSugan/, is calculated on10, 30, 50 dimensions on function 3, to show thealgorithm complexity's relationship with increasingdimensions as in Table 9. We use the Matlab 6.1 toimplement the algorithm and the system configurationsare listed as follows:
Authorized licensed use limited to: UNIVERSITY OF NOTTINGHAM. Downloaded on December 11, 2009 at 07:19 from IEEE Xplore. Restrictions apply.
1791
System Configurations [8] Bryant A. Julstrom, "What Have You Done for MeIntel Pentiumg 4 CPU 3.00 GHZ Lately? Adapting Operator Probabilities in a Steady-
1 GB ofmemory State Genetic Algorithm" Proc. of the 6thWindows XP Professional Version 2002 International Conference on Genetic Algorithms,
In this paper, we proposed a Self-adaptive DifferentialEvolution algorithm (SaDE), which can automaticallyadapt its learning strategies and the asscociatedparameters during the evolving procedure. Theperformance of the proposed SaDE algorithm areevaluated on the newly proposed testbed for CEC2005special session on real parameter optimization.
Bibliography
[1] R. Storn and K. V. Price, "Differential evolution-Asimple and Efficient Heuristic for GlobalOptimization over Continuous Spaces," Journal ofGlobal Optimization 11:341-359. 1997.
[2] J. Ilonen, J.-K. Kamarainen and J. Lampinen,"Differential Evolution Training Algorithm for Feed-Forward Neural Networks," In: Neural ProcessingLetters Vol. 7, No. 1 93-105. 2003.
[3] R. Storn, "Differential evolution design of an IIR-filter," In: Proceedings of IEEE Int. Conference onEvolutionary Computation ICEC'96. IEEE Press,New York. 268-273. 1996.
[4] T. Rogalsky, R.W. Derksen, and S. Kocabiyik,"Differential Evolution in AerodynamicOptimization," In: Proc. of 46h Annual Conf ofCanadian Aeronautics and Space Institute. 29-36.1999.
[5] K. V. Price, "Differential evolution vs. the functionsof the 2nd ICEO", Proc. of 1997 IEEE InternationalConference on Evolutionary Computation (ICEC '97),pp. 153-157, Indianapolis, IN, USA, April 1997.
[6] R. Gaemperle, S. D. Mueller and P. Koumoutsakos,"A Parameter Study for Differential Evolution", A.Grmela, N. E. Mastorakis, editors, Advances inIntelligent Systems, Fuzzy Systems, EvolutionaryComputation, WSEAS Press, pp. 293-298, 2002.
[7] J. Gomez, D. Dasgupta and F. Gonzalez, "UsingAdaptive Operators in Genetic Search", Proc. of theGenetic and Evolutionary Computation Conference(GECCO), pp.1580-1581,2003.
1791
Authorized licensed use limited to: UNIVERSITY OF NOTTINGHAM. Downloaded on December 11, 2009 at 07:19 from IEEE Xplore. Restrictions apply.