Silico-paleontology with graph databases Rooting through the relics of digital evolution Nic McPhee & David Donatucci (w/ Thomas Helmuth) Division of Science and Mathematics University of Minnesota, Morris Morris, Minnesota, USA May 2015 Genetic Programming Theory and Practice University of Michigan Ann Arbor, MI McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 1 / 26
35
Embed
Silica-Paleontology with graph databases: Rooting through the relics of digital evolution
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Silico-paleontology with graph databasesRooting through the relics of digital evolution
Nic McPhee & David Donatucci (w/ Thomas Helmuth)
Division of Science and MathematicsUniversity of Minnesota, Morris
Morris, Minnesota, USA
May 2015Genetic Programming Theory and Practice
University of MichiganAnn Arbor, MI
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 1 / 26
Overview The Big Picture
The Big Picture
Genetic programming clearly works.But we rarely know why or how.Databases allow examination of the internal interactions of a run.Graph databases better suited for this than relational databases.Silico-paleontology can help us understand and improve our tools.
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 2 / 26
Overview Outline
Outline
1 What do we know? (And how do we talk about it?)
2 Using a graph database
3 Let’s go exploring!
4 Conclusions
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 3 / 26
What do we know? (And how do we talk about it?)
Outline
1 What do we know? (And how do we talk about it?)We throw so much awaySummary results are highly lossyPlots are better (but can still obscure details)Can we zoom in to individual runs?
2 Using a graph database
3 Let’s go exploring!
4 Conclusions
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 4 / 26
What do we know? (And how do we talk about it?) We throw so much away
We keep/see/share so little
EC research has the potential to generatehuge amounts of data.
What do we normally do with that data?
We normally throw it away – &paleontologists weep!
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 12 / 26
Using a graph DB Cypher
Can model (complex) paths
Find Nic’s parents:
(Nic)<-[:PARENT_OF]-(p)
Find all Nic’s grandparents:
(Nic)<-[:PARENT_OF*2]-(gp)
Find everyone at most 5 steps from Nic:
(Nic)<-[:PARENT_OF*1..5]-(a)
Find all Nic’s siblings:
(Nic)<-[:PARENT_OF]-()-[:PARENT_OF]->(s)
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 13 / 26
Let’s go exploring!
Outline
1 What do we know? (And how do we talk about it?)
2 Using a graph database
3 Let’s go exploring!SetupComparing the end-games
4 Conclusions
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 14 / 26
Let’s go exploring! Setup
What are we exploring?
Tom Helmuth provided a lot of data:A number of program synthesis problems taken from introcomputing textsThree different selection mechanisms: Lexicase, tournament, andimplicit fitness sharing (IFS)All using Clojush implementation of Lee Spector’s PushGP systemhttps://github.com/lspector/Clojush
Population size 1,000; ≤ 300 generationsSee [Helmuth and Spector, 2015] for more.
We used batch-import tool and custom scripts to import into Neo4j.https://github.com/jexp/batch-import
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 15 / 26
We have data from hundreds of runsCurrently a very “by hand” processDefinitely learned valuable things about:
The behavior of lexicaseRole of alternation (a type of crossover) in PushGPImpact of test cases on evolutionary dynamics
We’ll look at results from two runs:Both successful on replace-space-with-newline problemOne using lexicase (sol’n found in 88 gens)One using tournament selection (sol’n found in 151 gens)
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 16 / 26
Let’s go exploring! Comparing the end-games
How did we construct a winner?
How is a winner constructed at the end of a run?
This query finds all ancestors of a winner (zero total_error) goingback at most 8 steps:
MATCH (w) WHERE w.total_error = 0MATCH (p)-->(c)-[*0..7]->(w)RETURN DISTINCT id(p), id(c);
8 steps is fairly arbitrary; returns a small enough set to visualize.
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 17 / 26
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 18 / 26
Let’s go exploring! Comparing the end-games
Lexicase selection
Gen 79
Gen 80
Gen 81
Gen 82
Gen 83
Gen 84
Gen 85
Gen 86
Gen 87
80:220
82:447
83:04783:124 83:619
84:319
85:086
86:261
87:71987:941 87:94742 Other Winners
A number of observations:45(!) “winning” individualsIndividual “86:261” is (a)parent of all 45Individual “86:261” is aparent of 934 (of 1,000)individuals in nextgeneration
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 19 / 26
Let’s go exploring! Comparing the end-games
Lexicase selection
Gen 79
Gen 80
Gen 81
Gen 82
Gen 83
Gen 84
Gen 85
Gen 86
Gen 87
80:220
82:447
83:04783:124 83:619
84:319
85:086
86:261
87:71987:941 87:94742 Other Winners
Seriously?!? 934 offspring?!?
Turns out to an be extreme caseof a common phenomena withlexicase
Nodes marked with diamondsall had at least 100 offspring
Shaded diamonds also have atleast 5 offspring that are ances-tors of or are winners
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 19 / 26
Let’s go exploring! Comparing the end-games
Lexicase selection
Gen 79
Gen 80
Gen 81
Gen 82
Gen 83
Gen 84
Gen 85
Gen 86
Gen 87
80:220
82:447
83:04783:124 83:619
84:319
85:086
86:261
87:71987:941 87:94742 Other Winners
What’s the total error (fitness) of“86:261”?
4,034(!)Bottom quartile!But had 934 offspring!
Failed to return on 4 cases(error 1,000 each)Got 2 other answers wrong(error 17 each)Terrible total error, butperfect on 194 of 200 testsGreat for lexicase!
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 19 / 26
Let’s go exploring! Comparing the end-games
Lexicase selection
Gen 79
Gen 80
Gen 81
Gen 82
Gen 83
Gen 84
Gen 85
Gen 86
Gen 87
80:220
82:447
83:04783:124 83:619
84:319
85:086
86:261
87:71987:941 87:94742 Other Winners
What’s the total error (fitness) of“86:261”?
4,034(!)Bottom quartile!But had 934 offspring!
Failed to return on 4 cases(error 1,000 each)Got 2 other answers wrong(error 17 each)Terrible total error, butperfect on 194 of 200 testsGreat for lexicase!
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 19 / 26
Let’s go exploring! Comparing the end-games
Lexicase selection
Gen 79
Gen 80
Gen 81
Gen 82
Gen 83
Gen 84
Gen 85
Gen 86
Gen 87
80:220
82:447
83:04783:124 83:619
84:319
85:086
86:261
87:71987:941 87:94742 Other Winners
What’s the total error (fitness) of“85:086”?
100,000!Rank 971 out of 1,000But had 180 offspring
Got all the “print” casesFailed to return value for all100 “return” cases (error1,000 each)Terrible total error, butperfect on 100 of 200 testsFine for lexicase
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 19 / 26
Let’s go exploring! Comparing the end-games
Lexicase selection
Gen 79
Gen 80
Gen 81
Gen 82
Gen 83
Gen 84
Gen 85
Gen 86
Gen 87
80:220
82:447
83:04783:124 83:619
84:319
85:086
86:261
87:71987:941 87:94742 Other Winners
What’s the total error (fitness) of“85:086”?
100,000!Rank 971 out of 1,000But had 180 offspring
Got all the “print” casesFailed to return value for all100 “return” cases (error1,000 each)Terrible total error, butperfect on 100 of 200 testsFine for lexicase
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 19 / 26
Let’s go exploring! Comparing the end-games
Lexicase selection
Gen 79
Gen 80
Gen 81
Gen 82
Gen 83
Gen 84
Gen 85
Gen 86
Gen 87
80:220
82:447
83:04783:124 83:619
84:319
85:086
86:261
87:71987:941 87:94742 Other Winners
High proportion of mutations:Roughly half the offspringin this graph created viamutationProbably why there’s lessbranching
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 19 / 26
Let’s go exploring! Comparing the end-games
Tournament selection
Gen 142
Gen 143
Gen 144
Gen 145
Gen 146
Gen 147
Gen 148
Gen 149
Gen 150
233 5 2
3
2332
2
2
2
2
2
Much broader: 42 ancestors of a winner for tournament 9 gensback; 14 for lexicaseAbout two-thirds created via crossover, so more branching thanlexicase
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 20 / 26
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 22 / 26
Conclusions
Outline
1 What do we know? (And how do we talk about it?)
2 Using a graph database
3 Let’s go exploring!
4 Conclusions
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 23 / 26
Conclusions
Conclusions
Still early days, but we can definitely see some useful things:Differences in ways selection mechanisms workSupport for hypotheses (e.g., Tom’s paper)Evidence for importance of crossover in PushGPImpact of test cases on evolutionary dynamics
Future WorkAutomate more of the workExamine more runs/problems/etc.Explore how to include this “on-line”
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 24 / 26
Conclusions
Thanks!
Thank you for your time and attention!
Thanks to M. Kirbie Dramdahl (University of Minnesota, Morris), and toLee Spector’s Computational Intelligence group (Hampshire College)for ideas and feedback.
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 25 / 26
References
References
Burlacu, B., Affenzeller, M., Kommenda, M., Winkler, S., and Kronberger, G. (2013).Visualization of genetic lineages and inheritance information in genetic programming.In Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation, GECCO ’13Companion, pages 1351–1358, New York, NY, USA. ACM.
Helmuth, T. and Spector, L. (2015).General program synthesis benchmark suite.In Proceedings of the 17th Annual Conference on Genetic and Evolutionary Computation, GECCO ’15, New York, NY,USA. ACM.
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 26 / 26