Top Banner
The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher [email protected] Univ. of Houston - Clear Lake, Houston, TX, USA http://nas.cl.uh.edu/boetticher/ publications.html The 2006 IEEE International Conference on Information Reuse and Integration Kim Kaminsky [email protected] Univ. of Houston - Clear Lake, Houston, TX, USA
17

Gary D. Boetticher Boetticher@uhcl Univ. of Houston - Clear Lake, Houston, TX, USA

Jan 12, 2016

Download

Documents

cherie

The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models. Gary D. Boetticher [email protected] Univ. of Houston - Clear Lake, Houston, TX, USA. Kim Kaminsky [email protected] Univ. of Houston - Clear Lake, Houston, TX, USA. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Gary D. Boetticher            Boetticher@uhcl Univ. of Houston - Clear Lake, Houston, TX, USA

The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models

Gary D. Boetticher [email protected]. of Houston - Clear Lake, Houston, TX, USA

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Kim Kaminsky [email protected]. of Houston - Clear Lake, Houston, TX, USA

Page 2: Gary D. Boetticher            Boetticher@uhcl Univ. of Houston - Clear Lake, Houston, TX, USA

About the Author: Gary D. Boetticher

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Ph.D. in Machine Learning and Software Engineering

A neural network-based software reuse economic model Executive member of IEEE Reuse Standard Committees (1990s) Commercial consultant:

U.S. Olympic Committee, LDDS Worldcom, Mellon Mortgage, … Currently: Associate Professor

Department of Comp. Science/Software Engineering

University of Houston - Clear Lake,

Houston, TX, USA

[email protected] Research interests: Data mining, ML, Computational Bioinformatics,

and Software metrics

Page 3: Gary D. Boetticher            Boetticher@uhcl Univ. of Houston - Clear Lake, Houston, TX, USA

Motivating Questions

Does chromosome lineage information within a Genetic Program (GP) provide any insight into the effectiveness of solving problems?

If so, how could these insights be utilized to make better breeding decisions?

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Page 4: Gary D. Boetticher            Boetticher@uhcl Univ. of Houston - Clear Lake, Houston, TX, USA

2) Determine the fitness for each (1 /Stand. Error)http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on

Information Reuse and Integration

Genetic Program Overview

X, Y, and Z RESULT?

X Y Z RESULT

2 4 5 30

5 3 2 16

: : : :

1 3 6 24

1) Create a population of equations

Eq# Equation

1 X+Y

2 (Z-X)*Y+X

: :

1000 (X*X)-Z

87

84

:

57

3) Breed Equations

X + Y

(Z-X) * Y+X

(Z-X) + Y

X * Y+X

4) Generate new populations and breed until a solution is found

Page 5: Gary D. Boetticher            Boetticher@uhcl Univ. of Houston - Clear Lake, Houston, TX, USA

Genetic Program Overview

Equation Fitness

(X+Y) 87

(X - Z) * (Y * Y) 86

ZY 75

: :

Y 22

Y - X 18

Generation N Generation N+1

Equation Fitness

(X - Z)

(X + Y) * (Y * Y)

Z + Y

:

X

Y + Y

Why discard legacy information?

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Page 6: Gary D. Boetticher            Boetticher@uhcl Univ. of Houston - Clear Lake, Houston, TX, USA

Goal: Examine fitness patterns over time

Equation Fitness(X+Y) 87

(X - Z) * (Y * Y) 86

ZY 85

(X - Z) * (Y * Y) 84

Y 79

Y - X 75

Z + Y 75

(X - Z) * (Y * Y) 75

Y 73

Y - X 71

(X - Z) * (Y * Y) + W + W 68

Y - X 67

ZY 66

(X - Z) * (Y * Y) 66

Y 65

Y - X 65

(X - Z) * (Y * Y) + W + W 64

Y - X 64

Z - Y 62

(X - Z) * (Y * Y) 59

Y 58

Y - X 55

(X - Z) * (Y * Y) + W + W 44

Equation Fitness(X+Y) 87

(X - Z) * (Y * Y) 86

ZY 85

(X - Z) * (Y * Y) 84

Y 79

Y - X 75

Z + Y 75

(X - Z) * (Y * Y) 75

Y 73

Y - X 71

(X - Z) * (Y * Y) + W + W 68

Y - X 67

ZY 66

(X - Z) * (Y * Y) 66

Y 65

Y - X 65

(X - Z) * (Y * Y) + W + W 64

Y - X 64

Z - Y 62

(X - Z) * (Y * Y) 59

Y 58

Y - X 55

(X - Z) * (Y * Y) + W + W 44

Equation Fitness(X+Y) 87

(X - Z) * (Y * Y) 86

ZY 85

(X - Z) * (Y * Y) 84

Y 79

Y - X 75

Z + Y 75

(X - Z) * (Y * Y) 75

Y 73

Y - X 71

(X - Z) * (Y * Y) + W + W 68

Y - X 67

ZY 66

(X - Z) * (Y * Y) 66

Y 65

Y - X 65

(X - Z) * (Y * Y) + W + W 64

Y - X 64

Z - Y 62

(X - Z) * (Y * Y) 59

Y 58

Y - X 55

(X - Z) * (Y * Y) + W + W 44

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Generation 1 Generation 2 Generation 3

Localized?

Volatile?

Page 7: Gary D. Boetticher            Boetticher@uhcl Univ. of Houston - Clear Lake, Houston, TX, USA

Proof of Concept Experiments - 1

5 experiments using synthetic equations:Z = W + X + Y

Z = 2 * X + Y – W

Z = X / Y

Z = X3

Z = W2 + W * X - Y

Data slightly perturbedto prevent prematureconvergence

Genetic Program1000 Chromosomes (Equations)50 GenerationsBreeding based on fitness rank

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Page 8: Gary D. Boetticher            Boetticher@uhcl Univ. of Houston - Clear Lake, Houston, TX, USA

Proof of Concept Experiments - 2

For the 1000 Chromosomes:

Divide into 5 groups of 200(by fitness)

Focus on the best, middle, and worst groups

See where each group’s offspring occur in the next generation

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Page 9: Gary D. Boetticher            Boetticher@uhcl Univ. of Houston - Clear Lake, Houston, TX, USA

Results for Z = W + X + Y

Best

Middle

Worst

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Page 10: Gary D. Boetticher            Boetticher@uhcl Univ. of Houston - Clear Lake, Houston, TX, USA

Results for Z = 2 * X + Y – W

Best

Middle

Worst

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Page 11: Gary D. Boetticher            Boetticher@uhcl Univ. of Houston - Clear Lake, Houston, TX, USA

Results for Z = X / Y

Best

MiddleWorst

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Page 12: Gary D. Boetticher            Boetticher@uhcl Univ. of Houston - Clear Lake, Houston, TX, USA

Results for Z = X 3

Best

Middle

Worst

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Page 13: Gary D. Boetticher            Boetticher@uhcl Univ. of Houston - Clear Lake, Houston, TX, USA

Results for Z = W 2 + W * X - Y

Best

Middle

Worst

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Page 14: Gary D. Boetticher            Boetticher@uhcl Univ. of Houston - Clear Lake, Houston, TX, USA

Applied ExperimentsBest class produces best offspring. Now what?Compare 2 Genetic Programs (GPs)

1) Use a vanilla-based GP2) Use a GP that breeds only the top 20% of a

population and replicates 5 times.

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Genetic Program1000 Chromosomes (Equations)50 Generations20 Trials

Equations to modelZ = Sin(W) + Sin(X) + Sin(Y)

Z = log10

(WX) + (Y * Z)

Page 15: Gary D. Boetticher            Boetticher@uhcl Univ. of Houston - Clear Lake, Houston, TX, USA

Results for Z = Sin(W) + Sin(X) + Sin(Y)

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Vanilla-Based

GP

Lineage-Based

GPAverage Fitness 591.8 740.9

Average r2 0.8734 0.9315

Ave. Generations needed to complete

29.1

28.5

Page 16: Gary D. Boetticher            Boetticher@uhcl Univ. of Houston - Clear Lake, Houston, TX, USA

Results for Z = log10

(W X) + (Y * Z)

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Vanilla-Based

GP

Lineage-Based

GPAverage Fitness 210.9 346.5

Average r2 0.7244 0.8069

Ave. Generations needed to complete

50.0

48.6

Page 17: Gary D. Boetticher            Boetticher@uhcl Univ. of Houston - Clear Lake, Houston, TX, USA

Conclusions

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Proof of concept experiments demonstrate the viability of considering lineage in GPs

Applied experiments show that lineage-based GP modeling produce better results faster