AMaLGaM IDEAs in Noisy Black-Box Optimization Benchmarking Peter A.N. Bosman Centre for Mathematics and Computer Science P.O. Box 94079 1090 GB Amsterdam The Netherlands [email protected]J ¨ orn Grahl Johannes Gutenberg University Mainz Dept. of Information Systems & Business Administration Jakob Welder-Weg 9 D-55128 Mainz, Germany [email protected]Dirk Thierens Utrecht University Dept. of Information and Computing Sciences P.O. Box 80089 3508 TB Utrecht The Netherlands [email protected]ABSTRACT This paper describes the application of a Gaussian Estima- tion-of-Distribution (EDA) for real-valued optimization to the noisy part of a benchmark introduced in 2009 called BBOB (Black-Box Optimization Benchmarking). Specifi- cally, the EDA considered here is the recently introduced parameter-free version of the Adapted Maximum-Likelihood Gaussian Model Iterated Density-Estimation Evolutionary Algorithm (AMaLGaM-IDEA). Also the version with incre- mental model building (iAMaLGaM-IDEA) is considered. Categories and Subject Descriptors G.1.6 [Numerical Analysis]: OptimizationGlobal Opti- mization, Unconstrained Optimization; F.2.1 [Analysis of Algorithms and Problem Complexity]: Numerical Al- gorithms and Problems General Terms Algorithms Keywords Benchmarking, Black-box optimization, Evolutionary com- putation 1. METHOD Estimation-of-distribution algorithms attempt to automat- ically exploit features of a problem’s structure by probabilis- tically modeling the search space based on previously eval- uated solutions and generating new solutions by sampling the probabilistic model. The EDA considered here is the Adapted Maximum-Like- lihood Gaussian Model Iterated Density-Estimation Evo- lutionary Algorithm (AMaLGaM-IDEA, or AMaLGaM for short). In AMaLGaM, the probability distribution used is the normal, also known as the Gaussian, distribution. This Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. GECCO’09, July 8–12, 2009, Montr ´ eal Qu ´ ebec, Canada. Copyright 2009 ACM 978-1-60558-505-5/09/07 ...$5.00. EDA uses maximum–likelihood estimates for the mean and the covariance matrix, estimated from the selected solu- tions. It has a mechanism that scales up the covariance matrix when required to prevent premature convergence on slopes. It furthermore has a mechanism that anticipates the mean shift in the next generation to speed up descent (in case of minimization) along slopes. In another paper [1], AMaLGaM, and its incremental-learning variant iAMaL- GaM, were tested on the noiseless variant of the BBOB benchmark. Due to space restrictions, we refer the inter- ested reader for more details on AMaLGaM such as the parameters and other settings as well as the CPU timing experiment to the other workshop paper. 2. RESULTS AND CONCLUSION Results from experiments according to [3] on the bench- mark functions given in [2, 4] are presented in Figures 1 and 2 and in Tables 1 and 3 for AMaLGaM and in Figures 3 and 4 and in Tables 2 and 4 for iAMaLGaM. Problems with severe noise and multimodality appear to be the hardest for (i)AMaLGaM. Even within 10 6 D evalua- tions the optimum cannot be found within a desirable pre- cision for larger D. The difference between AMaLGaM and iAMaLGaM is not large. Most likely due to the larger base population-size, AMaLGaM performs slightly better. The difference is larger for the multi-modal problems, which is consistent with earlier findings. 3. REFERENCES [1] P. A. N. Bosman, J. Grahl, and D. Thierens. AMaLGaM IDEAs in noiseless black-box optimization benchmarking. In A. Auger et al., editors, Proceedings of the Black Box Optimization Benchmarking BBOB Workshop at the Genetic and Evolutionary Computation Conference — GECCO–2009, New York, New York, 2009. ACM Press. (To Appear). [2] S. Finck, N. Hansen, R. Ros, and A. Auger. Real-parameter black-box optimization benchmarking 2009: Presentation of the noisy functions. Technical Report 2009/20, Research Center PPE, 2009. [3] N. Hansen, A. Auger, S. Finck, and R. Ros. Real-parameter black-box optimization benchmarking 2009: Experimental setup. Technical Report RR-6828, INRIA, 2009. [4] N. Hansen, S. Finck, R. Ros, and A. Auger. Real-parameter black-box optimization benchmarking 2009: Noisy func- tions definitions. Technical Report RR-6829, INRIA, 2009.
8
Embed
AMaLGaM IDEAs in Noisy Black-Box Optimization Benchmarkinghomepages.cwi.nl/~bosman/publications/2009_amalgamideasinnois… · AMaLGaM IDEAs in Noisy Black-Box Optimization Benchmarking
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
AMaLGaM IDEAs in Noisy Black-Box OptimizationBenchmarking
This paper describes the application of a Gaussian Estima-tion-of-Distribution (EDA) for real-valued optimization tothe noisy part of a benchmark introduced in 2009 calledBBOB (Black-Box Optimization Benchmarking). Specifi-cally, the EDA considered here is the recently introducedparameter-free version of the Adapted Maximum-LikelihoodGaussian Model Iterated Density-Estimation EvolutionaryAlgorithm (AMaLGaM-IDEA). Also the version with incre-mental model building (iAMaLGaM-IDEA) is considered.
Categories and Subject Descriptors
G.1.6 [Numerical Analysis]: OptimizationGlobal Opti-mization, Unconstrained Optimization; F.2.1 [Analysis ofAlgorithms and Problem Complexity]: Numerical Al-gorithms and Problems
1. METHODEstimation-of-distribution algorithms attempt to automat-
ically exploit features of a problem’s structure by probabilis-tically modeling the search space based on previously eval-uated solutions and generating new solutions by samplingthe probabilistic model.
The EDA considered here is the Adapted Maximum-Like-lihood Gaussian Model Iterated Density-Estimation Evo-lutionary Algorithm (AMaLGaM-IDEA, or AMaLGaM forshort). In AMaLGaM, the probability distribution used isthe normal, also known as the Gaussian, distribution. This
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.GECCO’09, July 8–12, 2009, Montreal Quebec, Canada.Copyright 2009 ACM 978-1-60558-505-5/09/07 ...$5.00.
EDA uses maximum–likelihood estimates for the mean andthe covariance matrix, estimated from the selected solu-tions. It has a mechanism that scales up the covariancematrix when required to prevent premature convergence onslopes. It furthermore has a mechanism that anticipates themean shift in the next generation to speed up descent (incase of minimization) along slopes. In another paper [1],AMaLGaM, and its incremental-learning variant iAMaL-GaM, were tested on the noiseless variant of the BBOBbenchmark. Due to space restrictions, we refer the inter-ested reader for more details on AMaLGaM such as theparameters and other settings as well as the CPU timingexperiment to the other workshop paper.
2. RESULTS AND CONCLUSIONResults from experiments according to [3] on the bench-
mark functions given in [2, 4] are presented in Figures 1 and2 and in Tables 1 and 3 for AMaLGaM and in Figures 3 and4 and in Tables 2 and 4 for iAMaLGaM.
Problems with severe noise and multimodality appear tobe the hardest for (i)AMaLGaM. Even within 106D evalua-tions the optimum cannot be found within a desirable pre-cision for larger D. The difference between AMaLGaM andiAMaLGaM is not large. Most likely due to the larger basepopulation-size, AMaLGaM performs slightly better. Thedifference is larger for the multi-modal problems, which isconsistent with earlier findings.
3. REFERENCES[1] P. A. N. Bosman, J. Grahl, and D. Thierens. AMaLGaM
IDEAs in noiseless black-box optimization benchmarking. InA. Auger et al., editors, Proceedings of the Black BoxOptimization Benchmarking BBOB Workshop at theGenetic and Evolutionary Computation Conference —GECCO–2009, New York, New York, 2009. ACM Press. (ToAppear).
[2] S. Finck, N. Hansen, R. Ros, and A. Auger. Real-parameterblack-box optimization benchmarking 2009: Presentation ofthe noisy functions. Technical Report 2009/20, ResearchCenter PPE, 2009.
[3] N. Hansen, A. Auger, S. Finck, and R. Ros. Real-parameterblack-box optimization benchmarking 2009: Experimentalsetup. Technical Report RR-6828, INRIA, 2009.
[4] N. Hansen, S. Finck, R. Ros, and A. Auger. Real-parameterblack-box optimization benchmarking 2009: Noisy func-tions definitions. Technical Report RR-6829, INRIA, 2009.
2 3 5 10 20 400
1
2
3
4
5
6101 Sphere moderate Gauss
+1
+0
-1
-2
-3
-5
-8
2 3 5 10 20 400
1
2
3
4
5
6
7
8104 Rosenbrock moderate Gauss
2 3 5 10 20 400
1
2
3
4
5
6
7107 Sphere Gauss
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
6
110 Rosenbrock Gauss
2 3 5 10 20 400
1
2
3
4
5
6
7113 Step-ellipsoid Gauss
2 3 5 10 20 400
1
2
3
4
5
6102 Sphere moderate unif
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
7
105 Rosenbrock moderate unif
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
10
108 Sphere unif
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9111 Rosenbrock unif
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
14
2
114 Step-ellipsoid unif
2 3 5 10 20 400
1
2
3
4
5
6103 Sphere moderate Cauchy
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
127
106 Rosenbrock moderate Cauchy
2 3 5 10 20 400
1
2
3
4
5
6
7109 Sphere Cauchy
2 3 5 10 20 400
1
2
3
4
5
6
7
8
93
112 Rosenbrock Cauchy
2 3 5 10 20 400
1
2
3
4
5
6115 Step-ellipsoid Cauchy
2 3 5 10 20 400
1
2
3
4
5
6
7116 Ellipsoid Gauss
2 3 5 10 20 400
1
2
3
4
5
6
7
8119 Sum of different powers Gauss
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
137
122 Schaffer F7 Gauss
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
9
125 Griewank-Rosenbrock Gauss
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
10
2128 Gallagher Gauss
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
10
1
117 Ellipsoid unif
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
2
120 Sum of different powers unif
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9123 Schaffer F7 unif
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9126 Griewank-Rosenbrock unif
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
12
1
129 Gallagher unif
2 3 5 10 20 400
1
2
3
4
5
6
7118 Ellipsoid Cauchy
2 3 5 10 20 400
1
2
3
4
5
6
7
14 14
121 Sum of different powers Cauchy
2 3 5 10 20 400
1
2
3
4
5
6
7 512 14
124 Schaffer F7 Cauchy
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
14
52
127 Griewank-Rosenbrock Cauchy
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
8
3
130 Gallagher Cauchy
+1
+0
-1
-2
-3
-5
-8
Figure 1: AMaLGaM: Expected Running Time (ERT, •) to reach fopt + ∆f and median number of functionevaluations of successful trials (+), shown for ∆f = 10, 1, 10−1, 10−2, 10−3, 10−5, 10−8 (the exponent is given in thelegend of f101 and f130) versus dimension in log-log presentation. The ERT(∆f) equals to #FEs(∆f) dividedby the number of successful trials, where a trial is successful if fopt + ∆f was surpassed during the trial. The#FEs(∆f) are the total number of function evaluations while fopt + ∆f was not surpassed during the trialfrom all respective trials (successful and unsuccessful), and fopt denotes the optimal function value. Crosses(×) indicate the total number of function evaluations #FEs(−∞). Numbers above ERT-symbols indicate thenumber of successful trials. Annotated numbers on the ordinate are decimal logarithms. Additional gridlines show linear and quadratic scaling.
Figure 2: AMaLGaM: Empirical cumulative distribution functions (ECDFs), plotting the fraction of trialsversus running time (left) or ∆f . Left subplots: ECDF of the running time (number of function evaluations),divided by search space dimension D, to fall below fopt + ∆f with ∆f = 10k, where k is the first value in thelegend. Right subplots: ECDF of the best achieved ∆f divided by 10k (upper left lines in continuation of theleft subplot), and best achieved ∆f divided by 10−8 for running times of D, 10 D, 100 D . . . function evaluations(from right to left cycling black-cyan-magenta). Top row: all results from all functions; second row: moderatenoise functions; third row: severe noise functions; fourth row: severe noise and highly-multimodal functions.The legends indicate the number of functions that were solved in at least one trial. FEvals denotes numberof function evaluations, D and DIM denote search space dimension, and ∆f and Df denote the difference tothe optimal function value.
2 3 5 10 20 400
1
2
3
4
5101 Sphere moderate Gauss
+1
+0
-1
-2
-3
-5
-8
2 3 5 10 20 400
1
2
3
4
5
6
7
8
95
104 Rosenbrock moderate Gauss
2 3 5 10 20 400
1
2
3
4
5
6
7
8107 Sphere Gauss
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
4
110 Rosenbrock Gauss
2 3 5 10 20 400
1
2
3
4
5
6
7
8113 Step-ellipsoid Gauss
2 3 5 10 20 400
1
2
3
4
5102 Sphere moderate unif
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9105 Rosenbrock moderate unif
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
53
108 Sphere unif
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
14
1
111 Rosenbrock unif
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
12
1
114 Step-ellipsoid unif
2 3 5 10 20 400
1
2
3
4
5
6
7103 Sphere moderate Cauchy
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
9
106 Rosenbrock moderate Cauchy
2 3 5 10 20 400
1
2
3
4
5
6
7109 Sphere Cauchy
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
13 147
112 Rosenbrock Cauchy
2 3 5 10 20 400
1
2
3
4
5
6
7115 Step-ellipsoid Cauchy
2 3 5 10 20 400
1
2
3
4
5
6
7
8116 Ellipsoid Gauss
2 3 5 10 20 400
1
2
3
4
5
6
7
8119 Sum of different powers Gauss
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
11
1122 Schaffer F7 Gauss
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
5
125 Griewank-Rosenbrock Gauss
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9 1 2128 Gallagher Gauss
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
10
117 Ellipsoid unif
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9120 Sum of different powers unif
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9123 Schaffer F7 unif
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
14
1
126 Griewank-Rosenbrock unif
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
5
1
129 Gallagher unif
2 3 5 10 20 400
1
2
3
4
5
6
7118 Ellipsoid Cauchy
2 3 5 10 20 400
1
2
3
4
5
6
7
8
14
121 Sum of different powers Cauchy
2 3 5 10 20 400
1
2
3
4
5
6
73
10 13124 Schaffer F7 Cauchy
2 3 5 10 20 400
1
2
3
4
5
6
7
8
9
9
1 3
1127 Griewank-Rosenbrock Cauchy
2 3 5 10 20 400
1
2
3
4
5
6
7
8
93 3
130 Gallagher Cauchy
+1
+0
-1
-2
-3
-5
-8
Figure 3: iAMaLGaM: Expected Running Time (ERT, •) to reach fopt + ∆f and median number of functionevaluations of successful trials (+), shown for ∆f = 10, 1, 10−1, 10−2, 10−3, 10−5, 10−8 (the exponent is given in thelegend of f101 and f130) versus dimension in log-log presentation. The ERT(∆f) equals to #FEs(∆f) dividedby the number of successful trials, where a trial is successful if fopt + ∆f was surpassed during the trial. The#FEs(∆f) are the total number of function evaluations while fopt + ∆f was not surpassed during the trialfrom all respective trials (successful and unsuccessful), and fopt denotes the optimal function value. Crosses(×) indicate the total number of function evaluations #FEs(−∞). Numbers above ERT-symbols indicate thenumber of successful trials. Annotated numbers on the ordinate are decimal logarithms. Additional gridlines show linear and quadratic scaling.
Figure 4: iAMaLGaM: Empirical cumulative distribution functions (ECDFs), plotting the fraction of trialsversus running time (left) or ∆f . Left subplots: ECDF of the running time (number of function evaluations),divided by search space dimension D, to fall below fopt + ∆f with ∆f = 10k, where k is the first value in thelegend. Right subplots: ECDF of the best achieved ∆f divided by 10k (upper left lines in continuation of theleft subplot), and best achieved ∆f divided by 10−8 for running times of D, 10 D, 100 D . . . function evaluations(from right to left cycling black-cyan-magenta). Top row: all results from all functions; second row: moderatenoise functions; third row: severe noise functions; fourth row: severe noise and highly-multimodal functions.The legends indicate the number of functions that were solved in at least one trial. FEvals denotes numberof function evaluations, D and DIM denote search space dimension, and ∆f and Df denote the difference tothe optimal function value.
f101 in 5-D, N=15, mFE=2892 f101 in 20-D, N=15, mFE=31809
Table 1: AMaLGaM: Shown are, for functions f101-f120 and for a given target difference to the optimalfunction value ∆f : the number of successful trials (#); the expected running time to surpass fopt + ∆f
(ERT, see Figure 1); the 10%-tile and 90%-tile of the bootstrap distribution of ERT; the average number offunction evaluations in successful trials or, if none was successful, as last entry the median number of functionevaluations to reach the best function value (RTsucc). If fopt + ∆f was never reached, figures in italics denotethe best achieved ∆f-value of the median trial and the 10% and 90%-tile trial. Furthermore, N denotes thenumber of trials, and mFE denotes the maximum of number of function evaluations executed in one trial.See Figure 1 for the names of functions.
f101 in 5-D, N=15, mFE=1072 f101 in 20-D, N=15, mFE=13288
Table 2: iAMaLGaM: Shown are, for functions f101-f120 and for a given target difference to the optimalfunction value ∆f : the number of successful trials (#); the expected running time to surpass fopt + ∆f
(ERT, see Figure 1); the 10%-tile and 90%-tile of the bootstrap distribution of ERT; the average number offunction evaluations in successful trials or, if none was successful, as last entry the median number of functionevaluations to reach the best function value (RTsucc). If fopt + ∆f was never reached, figures in italics denotethe best achieved ∆f-value of the median trial and the 10% and 90%-tile trial. Furthermore, N denotes thenumber of trials, and mFE denotes the maximum of number of function evaluations executed in one trial.See Figure 1 for the names of functions.
f121 in 5-D, N=15, mFE=2.71e6 f121 in 20-D, N=15, mFE=3.04e6
Table 3: AMaLGaM: Shown are, for functions f121-f130 and for a given target difference to the optimalfunction value ∆f : the number of successful trials (#); the expected running time to surpass fopt + ∆f
(ERT, see Figure 1); the 10%-tile and 90%-tile of the bootstrap distribution of ERT; the average number offunction evaluations in successful trials or, if none was successful, as last entry the median number of functionevaluations to reach the best function value (RTsucc). If fopt + ∆f was never reached, figures in italics denotethe best achieved ∆f-value of the median trial and the 10% and 90%-tile trial. Furthermore, N denotes thenumber of trials, and mFE denotes the maximum of number of function evaluations executed in one trial.See Figure 1 for the names of functions.
f121 in 5-D, N=15, mFE=3.09e6 f121 in 20-D, N=15, mFE=3.60e6
Table 4: iAMaLGaM: Shown are, for functions f121-f130 and for a given target difference to the optimalfunction value ∆f : the number of successful trials (#); the expected running time to surpass fopt + ∆f
(ERT, see Figure 1); the 10%-tile and 90%-tile of the bootstrap distribution of ERT; the average number offunction evaluations in successful trials or, if none was successful, as last entry the median number of functionevaluations to reach the best function value (RTsucc). If fopt + ∆f was never reached, figures in italics denotethe best achieved ∆f-value of the median trial and the 10% and 90%-tile trial. Furthermore, N denotes thenumber of trials, and mFE denotes the maximum of number of function evaluations executed in one trial.See Figure 1 for the names of functions.