Figs, Wasps, Gophers, and Lice: A Computational Exploration of Coevolution Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College
Jan 03, 2016
Figs, Wasps, Gophers, and Lice: A Computational Exploration of
Coevolution
Ran Libeskind-HadasDepartment of Computer Science
Harvey Mudd College
The Cophylogeny Problem
From Hafner MS and Nadler SA, Phylogenetic trees support the coevolution of parasites and their hosts. Nature 1988, 332:258-259
Obligate Mutualism ofFigs and Fig Wasps
From Cophylogeny of the Ficus Microcosm, A. Jackson, 2004
ovipostor
Indigobirds and Finches
www.indigobirds.com
• High level of host specificity (e.g. eggs and mouth markings)
Cophylogeny Reconstruction
Host tree
Problem Instance
Host tree
a b c
Parasite tree
d
e
Problem Instance
Host tree
Tips associations
a b c
Parasite tree
d
e
Possible Solutions
a b c
d
e
a b c
d
e
Input
Event Cost Modelcospeciation
a b c
d
e
cospeciation cospeciation
a b c
d
e
Event Cost Modelduplication
a b c
d
eduplication
a b c
d
e
Event Cost Modelhost switch
a b c
d
e
host switch
a b c
d
e
Event Cost Modelloss
a b c
d
e
lossloss loss
loss
a b c
d
e
Event Cost Model
a b c
d
e
cospeciation
lossloss
duplication
host switchloss
loss cospeciation
a b c
d
e
Cost = duplication + cospeciation + 3 * loss
Cost = cospeciation + host switch + loss
Some typical costs
a b c
d
e
a b c
Cost = 8 Cost = 5
cospeciation
lossloss
duplication
host switchloss
loss cospeciation+ 0
+ 2+ 2
+ 2
+ 3+ 2
+ 2 + 0e
d
How hard is this problem?
• If host switches are not permitted, we can find optimal solutions in “next-to-no-time” (time proportional to the number of nodes in the trees)…
• … but host switches shouldn’t be ignored – they are quite common…
• … and with host switches, this problem is computationally hard. How hard?
• Let’s take a short aside on “hardness”…
Snowplows of Northern Minnesota
Burrsburg
Frostbite City
Shiversville
Tundratown
Freezeapolis
A Short Aside on “Hard” Problems
“Hard” Problems
Snowplows of Northern Minnesota
Burrsburg
Frostbite City
Shiversville
Tundratown
FreezeapolisGreed? Brute Force?
“Greed” isn’t always good!
Temptingville
A
B
C
D
E
F
“Hard” Problems
The Travelling Salesperson Problem
New York
Moscow
Paris
San Francisco
Claremont
242
1942
742
1342
2142
Brute Force? Greed?
4422642
“Hard” Problems
The Travelling Salesperson Problem
Claremont 1 Montclare
ClearmontMontclear
1
1
22
“Hard” Problems
The Travelling Salesperson Problem
Claremont 1 Montclare
ClearmontMontclear
1
1
221042
n2 versus 2n
The Fast-O-Matic performs 109 operations/sec
Fast-O-MaticFast-O-Matic
n2
2n
n = 10 n = 30 n = 50n = 70
100< 1 sec
900< 1 sec
2500< 1 sec
1024< 1 sec
109
1 sec
4900< 1 sec
n2 versus 2n
The Fast-O-Matic performs 109 operations/sec
Fast-O-MaticFast-O-Matic
n2
2n
n = 10 n = 30 n = 50n = 70
100< 1 sec
900< 1 sec
2500< 1 sec
1024< 1 sec
109
1 sec 1015
13 days
4900< 1 sec
n2 versus 2n
The Fast-O-Matic performs 109 operations/sec
Fast-O-MaticFast-O-Matic
n2
2n
n = 10 n = 30 n = 50n = 70
100< 1 sec
900< 1 sec
2500< 1 sec
1024< 1 sec
109
1 sec 1015
13 days
4900< 1 sec
1021
37 trillion years
n2 versus 2n
The Fast-O-Matic performs 109 operations/sec
Fast-O-MaticFast-O-Matic
n2
2n
n = 10 n = 30 n = 50n = 70
100< 1 sec
900< 1 sec
2500< 1 sec
1024< 1 sec
109
1 sec 1015
13 days
4900< 1 sec
1021
37 trillion years
Computers double in speed every 2 years. Let’s just wait 10 years!Computers double in speed every 2 years. Let’s just wait 10 years! 37 trillion years ->
n2 versus 2n
The Fast-O-Matic performs 109 operations/sec
Fast-O-MaticFast-O-Matic
n2
2n
n = 10 n = 30 n = 50n = 70
100< 1 sec
900< 1 sec
2500< 1 sec
1024< 1 sec
109
1 sec 1015
13 days
4900< 1 sec
1021
37 trillion years
Computers double in speed every 2 years. Let’s just wait 10 years!Computers double in speed every 2 years. Let’s just wait 10 years! 37 trillion years ->
37 billion years!
Snowplows and Travelling Salesperson Revisited!
Travelling Salesperson Problem
Snowplow Problem
Protein Folding
NP-complete problems
Tens of thousands of other known problems go in this cloud!!
Cophylogeny Problem!
“I can’t find an efficient algorithm. I guess I’m too dumb.”
Cartoon from “Computers and Intractability: A Guide to the Theory of NP-completeness” by M. Garey and D. Johnson
Cartoon from “Computers and Intractability: A Guide to the Theory of NP-completeness” by M. Garey and D. Johnson
“I can’t find an efficient algorithm because no suchalgorithm is possible!”
Cartoon from “Computers and Intractability: A Guide to the Theory of NP-completeness” by M. Garey and D. Johnson
“I can’t find an efficient algorithm, but neithercan all these famous people.”
$1 million
Coping with NP-completeness…
• Brute force • Ad hoc Heuristics• Meta-heuristics• Approximation algorithms
A Meta-heuristic Approach• Fix a timing for the host tree – a relative ordering of
the speciation events• All host switches occur “horizontally” in time• We can solve the problem optimally for a given
timing using Dynamic Programming
Genetic Algorithm• Host tree and three different possible
ordering of the speciation events.
Jane 2.0(available at www.cs.hmc.edu/~hadas/jane)
What Jane does…
Gopher/Louse pair…8 tips on gopher tree10 tips on louse tree
Best solutions found are listed here… along with total cost
But perhaps those “seemingly good” solutions of cost 11 are no better than random…
In “Stats” mode, we can generaterandom tip mappings or entirelyrandom parasite trees.
Here, we ran 50 trials with randomtip mappings.
The red dashed line shows the best solution found to our original dataset and the blue histogram shows the costs for the 50 random trials. In this case, none of the random trials resulted in solutions of cost 11 or less!
Jane Demo!