Top Banner
Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans, and Luay Nakhleh)
23

Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,

Dec 18, 2015

Download

Documents

Nancy Simon
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,

Perfect phylogenetic networks, and inferring language evolution

Tandy WarnowThe University of Texas at Austin

(Joint work with Don Ringe, Steve Evans, and Luay Nakhleh)

Page 2: Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,

Species phylogeny

Orangutan Gorilla Chimpanzee Human

From the Tree of the Life Website,University of Arizona

Page 3: Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,

Possible Indo-European tree(Ringe, Warnow and Taylor 2000)

Page 4: Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,

Controversies for Indo-European history

• Subgrouping: Other than the 10 major subgroups, what is likely to be true? In particular, what about– Indo-Hittite– Italo-Celtic, – Greco-Armenian, – Anatolian + Tocharian, – Satem Core?

Page 5: Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,

Historical Linguistic Data

• A character is a function that maps a set of languages, L, to a set of states.

• Three kinds of characters:– Phonological (sound changes)– Lexical (meanings based on a wordlist)– Morphological (especially inflectional)

Page 6: Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,

Homoplasy-free evolution

• When a character changes state, it changes to a new state not in the tree

• In other words, there is no homoplasy (character reversal or parallel evolution)

• First inferred for weird innovations in phonological characters and morphological characters in the 19th century. 0 0 0 1 1

0

1

0

0

Page 7: Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,

Lexical characters can also evolve without homoplasy

• For every cognate class, the nodes of the tree in that class should form a connected subset - as long as there is no undetected borrowing nor parallel semantic shift.

• However, in practice, lexical characters are more likely to evolve homoplastically than complex phonological or morphological characters.

0 0 1 1 2

1

1

1

0

Page 8: Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,

Differences between different characters

• Lexical: most easily borrowed (most borrowings detectable), and homoplasy relatively frequent (we estimate about 25-30% overall for our wordlist, but a much smaller percentage for basic vocabulary).

• Phonological: can still be borrowed but much less likely than lexical. Complex phonological characters are infrequently (if ever) homoplastic, although simple phonological characters very often homoplastic.

• Morphological: least easily borrowed, least likely to be homoplastic.

Page 9: Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,

Linguistic character evolution

• Characters are lexical, phonological, and morphological.

• Homoplasy is much less frequent than in biomolecular data: most changes result in a new state, and hence there is an unbounded number of possible states.

• Borrowing between languages occurs, but can often be detected.

Page 10: Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,

Our methods/models

• Ringe & Warnow “Almost Perfect Phylogeny”: most characters evolve without homoplasy under a no-common-mechanism assumption (various publications since 1995)

• Ringe, Warnow, & Nakhleh “Perfect Phylogenetic Network”: extends APP model to allow for borrowing, but assumes homoplasy-free evolution for all characters (Language, 2005)

• Warnow, Evans, Ringe & Nakhleh “Extended Markov model”: parameterizes PPN and allows for homoplasy provided that homoplastic states can be identified from the data (to appear in Cambridge University Press)

• Ongoing work: incorporating unidentified homoplasy.

Page 11: Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,

First analysis: Almost Perfect Phylogeny

• The original dataset contained 375 characters (336 lexical, 17 morphological, and 22 phonological).

• We screened the dataset to eliminate characters likely to evolve homoplastically or by borrowing.

• On this reduced dataset (259 lexical, 13 morphological, 22 phonological), we attempted to maximize the number of compatible characters while requiring that certain of the morphological and phonological characters be compatible. (Computational problem NP-hard.)

Page 12: Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,

Indo-European Tree(95% of the characters compatible)

Page 13: Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,

Initial analysis

• Initial analysis of the IE dataset revealed that no perfect phylogeny for that dataset existed, even after careful screening.

• Possible explanations: – Homoplasy– Polymorphism (e.g. rock/stone)– Mistakes in character coding – Borrowing (horizontal transmission)

Page 14: Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,

Modelling borrowing: Networks and Trees within Networks

Page 15: Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,

Perfect Phylogenetic Networks

1 2 2 1

• An underlying tree + additional contact edges

• No cycles that involve tree edges

• Each character is compatible on at least one of the trees “inside” the network

2 2

1

1 1

Page 16: Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,

PPN Reconstruction Method

Minimum Increment to a PPN (MIPPN):(1) Estimate the underlying “genetic” tree(2) Add a minimum number of contact edges to

make all characters compatible

(NP-hard to solve exactly even when the genetic tree is known, so we do exhaustive search on each candidate tree.)

Page 17: Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,

The Indo-European (IE) Dataset

• 24 languages

• 294 characters: 22 phonological, 13 morphological, and 259 lexical

• We examined five different “genetic” trees, one of which had a minimum number of incompatible characters (14 lexical characters)

Page 18: Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,

Quality of Solutions

Three mathematical criteria:1. # characters incompatible with the “genetic” tree

T2. # additional contact edges needed to obtain a

PPN from T3. # borrowing events needed to make all

characters compatible on the PPN Also: feasibility with respect to the archaeological

and historical record

Page 19: Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,

Our best PPN (Language, 2005)

Page 20: Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,

Are we done?

• We observed that of the three contact edges, only two are well-supported. If we eliminate that weakly supported edge, then we must explain the incompatibility of some characters (either through homoplasy or polymorphism).

• Challenge: How to model polymorphism, homoplasy, borrowing, and genetic transmission?

Page 21: Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,

Other work

• Stochastic model of language evolution incorporating homoplasy, showing identifiability and efficient reconstructability (to appear Cambridge University Press)

• Comparison of various methods on the IE dataset (to appear, Transactions of the Philological Society)

• Modelling polymorphism (SIAM J. Computing, and ongoing)

• Simulation study (ongoing)

Page 22: Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,

For more information

• Please see the Computational Phylogenetics for Historical Linguistics web site for papers, data, and additional material http://www.cs.rice.edu/~nakhleh/CPHL

Page 23: Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,

Acknowledgements

• The Program for Evolutionary Dynamics at Harvard

• NSF, the David and Lucile Packard Foundation, the Radcliffe Institute for Advanced Studies, and the Institute for Cellular and Molecular Biology at UT-Austin.

• Collaborators: Don Ringe, Steve Evans, and Luay Nakhleh.