Characterizing Fault Tolerance of Genetic Algorithms in Desktop Grid Systems Daniel Lombra ˜ na Gonz ´ alez Juan Luis Jim´ enez Laredo Francisco Fern´ andez de Vega Juan Juli´ an Merelo Guerv´ os April 8, 2010 D. Lombra˜ na, JJ. Jimenez, F. Fer´ nandez, JJ. Merelo Evocop 2010
34
Embed
Characterizing Fault Tolerance of Genetic Algorithms in Desktop Grid Systems
This paper presents a study of the fault-tolerant nature of Genetic Algorithms (GAs) on a real-world Desktop Grid System, without implementing any kind of fault-tolerance mechanism. The aim is to extend to parallel GAs previous works tackling fault-tolerance characterization in Genetic Programming. The results show that GAs are able to achieve a similar quality in results in comparison with a failure-free system in three of the six scenarios under study despite the system degradation. Additionally, we show that a small increase on the initial population size is a successful method to provide resilience to system failures in five of the scenarios. Such results suggest that Paralle GAs are inherently and naturally fault-tolerant.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Characterizing Fault Tolerance of GeneticAlgorithms in Desktop Grid Systems
Daniel Lombrana Gonzalez Juan Luis Jimenez LaredoFrancisco Fernandez de Vega Juan Julian Merelo
Guervos
April 8, 2010
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Outline
1 Introduction
2 Motivation
3 Methodology
4 Experiments and Results
5 Conclusions
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Introduction
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Introduction
Parallel Genetic Algorithms (PGA)
Sometimes Evolutionary Algorithms (EAs) require largeexecution times.One solution is to use:
Parallel Computing andDistributed Platforms.
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Introduction
Parallel algorithms can be run in
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Introduction
Parallel algorithms can be run in
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Introduction
Failures in distributed platforms
Distributed platforms are prone to errors.Failures are expected events rather than catastrophicexceptions.
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Introduction
Fault Tolerance
Fault Toleranceis the ability of a system to behave in a well-defined manneronce a failure occurs.
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Introduction
Fault Tolerance
Different techniques have been developed to cope with failures:
Checkpointing,E. Elnozahy, L. Alvisi, Y. Wang, and D. Johnson. A survey of rollback-recovery protocols inmessage-passing systems. ACM Computing Surveys (CSUR), 34(3):375–408, 2002.
Rejuvenation frameworks,A. T. Tai and K. S. Tso. A performability-oriented software rejuvenation framework for distributedapplications. In DSN ’05, pages 570–579, Washington, DC, USA, 2005. IEEE Computer Society.
etc.
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Introduction
Fault Tolerance
The use of a fault tolerance technique mandates that:the application has to be modified, and eventhe parallel algorithm.
Thus, this modification can represent a heavy burden for thedeveloper.
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Motivation
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Motivation
Parallel EAs and Fault Tolerance
To the best of our knowledgethere has been little research about the fault tolerance featuresof PEAs in general and of PGA applications in particular.
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Motivation
Previous Works
We firstly studied the Fault-Tolerance nature of ParallelGenetic Programming (PGP) on:
Real World Desktop Grid Systems.Concluding that PGP is fault-tolerant by default.
Daniel Lombrana Gonzalez, Francisco Fernandez de Vega, and Henri Casanova.Characterizing fault tolerance in genetic programming.Future Generation Computer Systems, 2010.DOI: 10.1016/j.future.2010.02.006.Daniel Lombrana Gonzalez, Francisco Fernandez de Vega, and Henri Casanova.Characterizing fault tolerance in genetic programming.In Workshop on Bio-Inspired Algorithms for Distributed Systems,pages 1–10. Barcelona, Spain, 2009. ISBN 978-1-60558-564-2.
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Motivation
Proposal
Based on this insightThis work builds on top of the previous ones, and extends thestudy of fault-tolerance in EAs to PGAs, using the samemethodology.
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Methodology
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Methodology
Master-Worker
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Methodology
Desktop Grid platforms (DGs)
DGs exhibit large numbers of failures.DGs failure behavior has been studied in literature.DGs are low-cost when compared to clusters ofcomparable scale.And, PGA applications are loosely coupled and thuswell-suited to DGs.
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Methodology
Desktop Grid Platforms
DGs are very promising for PGA applications, andtheir high failure rate make them a great test case forstudying and characterizing the fault tolerance abilities ofPGA.
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Methodology
Experiments
In order to characterize the fault-tolerant nature of PGA werun two kind of experiments:
a failure-free environment, andreplaying and simulating failure traces from real-world DGplatforms.
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Methodology
DG traces
We perform simulations of DG platforms and of hostavailability based on three real-world traces:
entrfin,ucb,xwtr.
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Methodology
DG traces
Trace Hosts Venue TimeEntrfin 275 San Diego 1.0 monthsUcb 85 UC Berkeley 1.5 monthsXwtr 100 Univeriste Paris-Sud 1.0 months
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Methodology
Using the traces
We consider two cases:hosts that become unavailable never become availableagain (worst case assumption),and the complete host-churn (unavailable hosts can bere-acquired afterwards).
For two different days of each trace.
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Methodology
Host availability for 1 day of the ucb trace
0
5
10
15
20
25
0 50 100 150 200 250 300
Com
pute
rs
Time StepOriginal Trace Trace without return
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Experiments and Results
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Experiments and Results
Problems
We conduct experiments with a 3-trap instance:
trap(u(−→x )) =
{ az (z − u(
−→x )), if u(−→x ) ≤ z
bl−z (u(
−→x )− z), otherwise(1)
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Experiments and Results
GA Parameters for 3-Trap instance
Trap instanceSize of sub-function (k ) 3
Number of sub-functions (m) 10Individual length (L) 30
GA settingsGA GGA
Population size 3000Selection of Parents Binary Tournament
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Experiments and Results
Population size vs. generation
0
500
1000
1500
2000
2500
3000
3500
4000
0 10 20 30 40 50
0
25
50
75
100
Indiv
iduals
% o
f L
oss
Generations
entrfin 1entrfin 2
ucb 1ucb 2
xwtr 1xwtr 2
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Experiments and Results
Obtained Fitness for 3-Trap Day1
Error Free fitness = 23.56
Trace Fitness Wilcoxon Test Significantly different?Entrfin 23.30 W = 6093, p-value = 0.002688 yesEntrfin 10% 23.47 W = 5408.5, p-value = 0.2535 noEntrfin 20% 23.48 W = 5360, p-value = 0.3137 noEntrfin 30% 23.49 W = 5283.5, p-value = 0.4271 noEntrfin 40% 23.57 W = 4923.5, p-value = 0.8286 noEntrfin 50% 23.59 W = 4910.5, p-value = 0.7994 no
Ucb 23.22 W = 6453, p-value = 6.877e-05 yesUcb 10% 23.27 W = 6098.5, p-value = 0.002753 yesUcb 20% 23.37 W = 5837.5, p-value = 0.02051 yesUcb 30% 23.40 W = 5664, p-value = 0.06588 noUcb 40% 23.51 W = 5186.5, p-value = 0.6004 noUcb 50% 23.42 W = 5623, p-value = 0.08335 no
Xwtr 23.56 W = 5056, p-value = 0.8748 noXwtr 10% 23.57 W = 4923.5, p-value = 0.8286 noXwtr 20% 23.68 W = 4474, p-value = 0.1245 noXwtr 30% 23.73 W = 4259.5, p-value = 0.02812 yesXwtr 40% 23.68 W = 4502, p-value = 0.1466 noXwtr 50% 23.71 W = 4356.5, p-value = 0.05817 no
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Experiments and Results
Obtained fitness for 3-Trap Day2
Error Free fitness = 23.56
Trace Fitness Wilcoxon Test Significantly different?Entrfin 23.57 W = 4979.5, p-value = 0.9546 noEntrfin 10% 23.69 W = 4397.5, p-value = 0.07682 noEntrfin 20% 23.67 W = 4522.5, p-value = 0.1645 noEntrfin 30% 23.70 W = 4405, p-value = 0.08086 noEntrfin 40% 23.69 W = 4453.5, p-value = 0.11 noEntrfin 50% 23.75 W = 4162.5, p-value = 0.01234 yes
Ucb 23.09 W = 6672.5, p-value = 7.486e-06 yesUcb 10% 23.12 W = 6826, p-value = 6.647e-07 yesUcb 20% 23.14 W = 6654, p-value = 7.223e-06 yesUcb 30% 23.26 W = 6371, p-value = 0.0001507 yesUcb 40% 23.37 W = 5893.5, p-value = 0.01316 yesUcb 50% 23.32 W = 6108, p-value = 0.002166 yes
Xwtr 23.60 W = 4806, p-value = 0.5791 noXwtr 10% 23.62 W = 4765, p-value = 0.5002 noXwtr 20% 23.69 W = 4453.5, p-value = 0.11 noXwtr 30% 23.60 W = 4806, p-value = 0.5791 noXwtr 40% 23.63 W = 4688.5, p-value = 0.3695 noXwtr 50% 23.77 W = 4065.5, p-value = 0.004877 yes
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Experiments and Results
Obtained fitness with host-churn
Table: Day1Error Free fitness = 23.56
Trace Fitness Wilcoxon Test Significantly different?Entrfin 23.52 W = W = 5222, p-value = 0.5322 noUcb 21.31 W = 9708.5, p-value < 2.2e-16 yesXwtr 23.64 W = 4640, p-value = 0.2982 no
Table: Day2Error Free fitness = 23.56
Trace Fitness Wilcoxon Test Significantly different?Entrfin 23.58 W = 4931, p-value = 0.8452 noUcb 23.03 W = 7038.5, p-value = 4.588e-08 yesXwtr 23.7 W = 4405, p-value = 0.08086 no
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Conclusions
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Conclusions
Summary of Results
PGA applications are fault-tolerant by nature in DGplatforms.PGA features the well-known fault-tolerant techniqueknown as graceful degradation in DG platforms.We provided a new method to mitigate the effect of failuresby increasing the initial population.
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
Conclusions
Conclusions
We have studied and characterized the behavior of PGAapplications running in distributed platforms with highfailure rates.We have tested the PGA fault-tolerance using threereal-world DG traces.Our main conclusion is that PGA inherently providesgraceful degradation.
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010