The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher [email protected]Univ. of Houston - Clear Lake, Houston, TX, USA http://nas.cl.uh.edu/boetticher/ publications.html The 2006 IEEE International Conference on Information Reuse and Integration Kim Kaminsky [email protected]Univ. of Houston - Clear Lake, Houston, TX, USA
17
Embed
The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher [email protected] Univ. of Houston.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models
Gary D. Boetticher [email protected]. of Houston - Clear Lake, Houston, TX, USA
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Kim Kaminsky [email protected]. of Houston - Clear Lake, Houston, TX, USA
About the Author: Gary D. Boetticher
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Ph.D. in Machine Learning and Software Engineering
A neural network-based software reuse economic model Executive member of IEEE Reuse Standard Committees (1990s) Commercial consultant:
U.S. Olympic Committee, LDDS Worldcom, Mellon Mortgage, … Currently: Associate Professor
Department of Comp. Science/Software Engineering
University of Houston - Clear Lake,
Houston, TX, USA
[email protected] Research interests: Data mining, ML, Computational Bioinformatics,
and Software metrics
Motivating Questions
Does chromosome lineage information within a Genetic Program (GP) provide any insight into the effectiveness of solving problems?
If so, how could these insights be utilized to make better breeding decisions?
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
2) Determine the fitness for each (1 /Stand. Error)http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on
Information Reuse and Integration
Genetic Program Overview
X, Y, and Z RESULT?
X Y Z RESULT
2 4 5 30
5 3 2 16
: : : :
1 3 6 24
1) Create a population of equations
Eq# Equation
1 X+Y
2 (Z-X)*Y+X
: :
1000 (X*X)-Z
87
84
:
57
3) Breed Equations
X + Y
(Z-X) * Y+X
(Z-X) + Y
X * Y+X
4) Generate new populations and breed until a solution is found
Genetic Program Overview
Equation Fitness
(X+Y) 87
(X - Z) * (Y * Y) 86
ZY 75
: :
Y 22
Y - X 18
Generation N Generation N+1
Equation Fitness
(X - Z)
(X + Y) * (Y * Y)
Z + Y
:
X
Y + Y
Why discard legacy information?
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Goal: Examine fitness patterns over time
Equation Fitness(X+Y) 87
(X - Z) * (Y * Y) 86
ZY 85
(X - Z) * (Y * Y) 84
Y 79
Y - X 75
Z + Y 75
(X - Z) * (Y * Y) 75
Y 73
Y - X 71
(X - Z) * (Y * Y) + W + W 68
Y - X 67
ZY 66
(X - Z) * (Y * Y) 66
Y 65
Y - X 65
(X - Z) * (Y * Y) + W + W 64
Y - X 64
Z - Y 62
(X - Z) * (Y * Y) 59
Y 58
Y - X 55
(X - Z) * (Y * Y) + W + W 44
Equation Fitness(X+Y) 87
(X - Z) * (Y * Y) 86
ZY 85
(X - Z) * (Y * Y) 84
Y 79
Y - X 75
Z + Y 75
(X - Z) * (Y * Y) 75
Y 73
Y - X 71
(X - Z) * (Y * Y) + W + W 68
Y - X 67
ZY 66
(X - Z) * (Y * Y) 66
Y 65
Y - X 65
(X - Z) * (Y * Y) + W + W 64
Y - X 64
Z - Y 62
(X - Z) * (Y * Y) 59
Y 58
Y - X 55
(X - Z) * (Y * Y) + W + W 44
Equation Fitness(X+Y) 87
(X - Z) * (Y * Y) 86
ZY 85
(X - Z) * (Y * Y) 84
Y 79
Y - X 75
Z + Y 75
(X - Z) * (Y * Y) 75
Y 73
Y - X 71
(X - Z) * (Y * Y) + W + W 68
Y - X 67
ZY 66
(X - Z) * (Y * Y) 66
Y 65
Y - X 65
(X - Z) * (Y * Y) + W + W 64
Y - X 64
Z - Y 62
(X - Z) * (Y * Y) 59
Y 58
Y - X 55
(X - Z) * (Y * Y) + W + W 44
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Generation 1 Generation 2 Generation 3
Localized?
Volatile?
Proof of Concept Experiments - 1
5 experiments using synthetic equations:Z = W + X + Y
Z = 2 * X + Y – W
Z = X / Y
Z = X3
Z = W2 + W * X - Y
Data slightly perturbedto prevent prematureconvergence
Genetic Program1000 Chromosomes (Equations)50 GenerationsBreeding based on fitness rank
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Proof of Concept Experiments - 2
For the 1000 Chromosomes:
Divide into 5 groups of 200(by fitness)
Focus on the best, middle, and worst groups
See where each group’s offspring occur in the next generation
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Results for Z = W + X + Y
Best
Middle
Worst
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Results for Z = 2 * X + Y – W
Best
Middle
Worst
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Results for Z = X / Y
Best
MiddleWorst
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Results for Z = X 3
Best
Middle
Worst
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Results for Z = W 2 + W * X - Y
Best
Middle
Worst
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Applied ExperimentsBest class produces best offspring. Now what?Compare 2 Genetic Programs (GPs)
1) Use a vanilla-based GP2) Use a GP that breeds only the top 20% of a
population and replicates 5 times.
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration