Top Banner
Estimating the Reliability of Evolutionary Tree&* David Penny* and Michael Hendy-f *Department of Botany and Zoology and TDepartment of Mathematics and Statistics, Massey University, Palmerston North, New Zealand Six protein sequences from the same 11 mammalian taxa were used to estimate the accuracy and reliability of phylogenetic trees using real, rather than simulated, data. A tree comparison metric was used to measure the increase in similarity of minimal trees as larger, randomly selected subsets of nucleotide positions were taken. The ratio of the observed to the expected number of incompatibilities for each nucleotide position (character) is a good predictor of the number of changes required at that position on the minimal (most-parsimonious) tree. This allows a higher weighting of nucleotide positions that have changed more slowly and should result in the minimal length tree converging to the correct tree as more sequences are obtained. An estimate was made of the smallest subset of trees that need to be considered to include the actual historical tree for a given set of data. It was concluded that it is possible to give a reasonable estimate of the reliability of the final tree, at least when several sequences are combined. With the present data, resolving the rodent-primate-lagomorph (rabbit) trichotomy is the least certain aspect of the final tree, followed then by establishing the position of dog. In our opinion, it is unrea- sonable to publish an evolutionary tree derived from sequence data without giving an idea of the reliability of the tree. Introduction It is a difficult task to estimate the accuracy of knowledge about unique events in the remote past. Nevertheless, estimating the reliability of evolutionary trees is an important task for molecular evolution. We have concluded quantitatively (Penny et al. 1982) that sequences of macro- molecules do carry evolutionary information because minimal (most-parsimonious) trees derived from each of five different protein sequences from 11 mammalian taxa listed in table 1 were much more similar than would be expected by chance. However, no pair of the 2 1 minimal trees found for the five proteins was identical, nor were any of the minimal trees from a single protein identical to minimal trees derived from all five sequences combined into one long sequence. This result-that is, that minimal trees derived from different proteins are very similar but not identical-is to be expected when one studies an evolutionary process with a strong stochastic element. Nevertheless, it leaves the problem of just how close a particular minimal tree is to the historical evolutionary tree (T*). 1. Key words: bootstrap samples, evolution, jackknifing, myoglobin, probability, phylogenetic trees, weighting characters. 2. This paper was presented to the Fourth International Theriological Congress, Edmonton, Alberta, Canada, August 13-20, 1985. Address for correspondenceand reprints: Dr. David Penny, Department of Botany and Zoology, Massey University, Palmerston North, New Zealand. Mol. Biol. Evol. 3(5):403-417. 1986. 0 1986 by The University of Chicago. All rights reserved. 0737-4038/86/0305-3503%02.00 403 Downloaded from https://academic.oup.com/mbe/article/3/5/403/988007 by guest on 28 December 2021
15

Estimating the Reliability of Evolutionary Tree - Molecular Biology

Feb 11, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Estimating the Reliability of Evolutionary Tree - Molecular Biology

Estimating the Reliability of Evolutionary Tree&*

David Penny* and Michael Hendy-f *Department of Botany and Zoology and TDepartment of Mathematics and Statistics, Massey University, Palmerston North, New Zealand

Six protein sequences from the same 11 mammalian taxa were used to estimate the accuracy and reliability of phylogenetic trees using real, rather than simulated, data. A tree comparison metric was used to measure the increase in similarity of minimal trees as larger, randomly selected subsets of nucleotide positions were taken. The ratio of the observed to the expected number of incompatibilities for each nucleotide position (character) is a good predictor of the number of changes required at that position on the minimal (most-parsimonious) tree. This allows a higher weighting of nucleotide positions that have changed more slowly and should result in the minimal length tree converging to the correct tree as more sequences are obtained. An estimate was made of the smallest subset of trees that need to be considered to include the actual historical tree for a given set of data. It was concluded that it is possible to give a reasonable estimate of the reliability of the final tree, at least when several sequences are combined. With the present data, resolving the rodent-primate-lagomorph (rabbit) trichotomy is the least certain aspect of the final tree, followed then by establishing the position of dog. In our opinion, it is unrea- sonable to publish an evolutionary tree derived from sequence data without giving an idea of the reliability of the tree.

Introduction

It is a difficult task to estimate the accuracy of knowledge about unique events in the remote past. Nevertheless, estimating the reliability of evolutionary trees is an important task for molecular evolution.

We have concluded quantitatively (Penny et al. 1982) that sequences of macro- molecules do carry evolutionary information because minimal (most-parsimonious) trees derived from each of five different protein sequences from 11 mammalian taxa listed in table 1 were much more similar than would be expected by chance. However, no pair of the 2 1 minimal trees found for the five proteins was identical, nor were any of the minimal trees from a single protein identical to minimal trees derived from all five sequences combined into one long sequence. This result-that is, that minimal trees derived from different proteins are very similar but not identical-is to be expected when one studies an evolutionary process with a strong stochastic element. Nevertheless, it leaves the problem of just how close a particular minimal tree is to the historical evolutionary tree (T*).

1. Key words: bootstrap samples, evolution, jackknifing, myoglobin, probability, phylogenetic trees, weighting characters.

2. This paper was presented to the Fourth International Theriological Congress, Edmonton, Alberta, Canada, August 13-20, 1985.

Address for correspondence and reprints: Dr. David Penny, Department of Botany and Zoology, Massey University, Palmerston North, New Zealand.

Mol. Biol. Evol. 3(5):403-417. 1986. 0 1986 by The University of Chicago. All rights reserved. 0737-4038/86/0305-3503%02.00

403

Dow

nloaded from https://academ

ic.oup.com/m

be/article/3/5/403/988007 by guest on 28 Decem

ber 2021

Page 2: Estimating the Reliability of Evolutionary Tree - Molecular Biology

404 Penny and Hendy

Table 1 Informative Positions for Myoglobin

1 Kangaroo. 2 Sheep . . . . . 3 cow . . . . 4 Pig . . . . . . . 5 Horse. . . . . 6 Dog . . . . . 7 Rodent . . 8 Rabbit . . . . 9 Monkey . . .

10 Ape . . . . . . 11 Human . . .

1 2 3 123456789012345678901234567890 1234

GUGAGAGGUACAAACCCGAGCAAUGGACCGGAAA A.AG.. . . . . . .UC.. . .C.. . . . .C.A.. . . . . .C.G.C..AC.. .GAGA.C. .CCGA.G.. . .C.U .C.G.C. .AC.. .GAGAAC. .CCGA.G..U. ..U . . . G C G . . . . . . . . . . . . . . . . . . . . . . . . . . . U . . . . . C. .AC. .CC.. .AC. .CC.. .C..A. . . .

A.AC . .AAA.G. .C.. ..C.. . . . .C.A.A. . . . . . . C . .AA. .G.. . . . .A. . . . . . . .G. .A. . . .

C . . . . . AC... . . . . . . . . . .CC. .CG..A.C.. . . . . C . . . . . . GU.. . . . . U.. . . . . . .A.U.. . . . . . U . . . . . . GC.. . . . .UG.. . . . . .A.U.C. . . . . C . . . . . . GC.. . . . .UG.. . . . . .A.U.C.

NOTE.-Sequences are from Dayhoff (1972), except rodent (the mole rat, Spukzx ehrenbergi), which is from Gurnett et al. (1984). In this case ape is the gorilla, and other taxa are as in Penny et al. (1982). The consensus sequence is shown as the top row; a dot (period) indicates that a species has the consensus nucleotide. Sequences are converted from amino acids to best-guess nucleotides. The total data set includes these 34 nucleotide positions together with 132 from cytochrome c, hemoglobins a and p, and fibrinopeptides A and B (Penny et al. 1982, with the omission of character 85). These six sequences (including myoglobin) are available for the 11 closely related taxa shown here (rodent sequences are the most diverse to come from rat, house mouse, and mole rat, all of which are myomorphic rodents). Nucleotide positions, columns, and characters are used as equivalent terms.

With the six proteins used in this study (i.e., the original five plus myoglobin) there are 25 different minimal trees (including the minimal trees for all six proteins combined). At least 24 of these minimal trees must be wrong! To accept any one of these minimal trees as T* gives at least a 96% chance of having rejected the correct tree-and having accepted an incorrect one.

In a previous paper (Penny and Hendy 1985~~) we showed that a minimal tree derived from one set of sequence data was a good predictor in that it gave a good estimate of the minimal tree derived from other sequences. Nevertheless, because different sequences gave different minimal trees, it follows that most of the minimal trees cannot be the correct T*. This shows that any claim that a minimal tree is the correct T* is unjustified. However, we do need to quantify just how good an approx- imation a minimal tree is to T*. In statistics, a type I error is rejecting a hypothesis that is correct and a type II error is accepting an incorrect hypothesis (Sokal and Rohlf 198 1). It appears that most publications on trees commit both types of error, rejecting the correct hypothesis and accepting an incorrect one.

We have previously discussed three approaches used to help evaluate the results of tree-building methods. We called them simulation, analytical, and empirical (Penny and Hendy 1985a). The most popular approach has been to use computer simulation. To follow the spirit of the method, we will randomly select three examples: the early work of Peacock and Boulter ( 1975) and the more recent works of Tateno et al. ( 1982) and Fiala and Sokal(l985). We did develop a full simulation of the covarion model of molecular evolution (Fitch and Markowitz 1970), because this seemed to be the most realistic model of molecular evolution. However, since 1975 we have not used the simulation approach for estimating because we found that there were many pa- rameters that could be varied in the covarion model. It was premature to test just one,

Dow

nloaded from https://academ

ic.oup.com/m

be/article/3/5/403/988007 by guest on 28 Decem

ber 2021

Page 3: Estimating the Reliability of Evolutionary Tree - Molecular Biology

Reliability of Evolutionary Trees 405

or a few sets, of conditions. It is necessary to be able to independently test the as- sumptions on a data set before full use of simulation can be made.

Examples of the analytical approach include Felsenstein’s ( 198 1) analysis of the conditions necessary for minimal length trees to converge on T* and our own analysis (Penny 1982) of why methods that rely on distance matrices do not use all the available information in the data. Our work showing a large loss of information in converting sequences to distance matrices (Penny 1982) has been a stimulus to finding methods that make more use of the information between nucleotide positions. The first stage is a method to detect the nucleotide positions that change more slowly. This information can then be used to find methods that are more reliable.

Analytical approaches are essential to a full understanding, but assumptions of a particular model need to be tested independently on a given data set before we can make full use of analytical approaches. It may not be known whether the assumptions of a model are valid-for example, whether there is a low probability of change on a line of descent (Felsenstein 198 1). Results presented in this paper are a start in testing whether a particular set of data conforms with the assumptions of a model.

It is important to use real, rather than only simulated, data to determine what can be learned about the evolutionary process. This leads to the third approach for evaluating the results of tree-building methods, the empirical. This approach uses biological (real) data and attempts to test hypotheses about the data. Examples are Fitch and Atchley’s (1985) use of inbred strains of mice with known phylogeny and Felsenstein’s (1985) use of bootstrap samples (Efron 1982) to estimate the reliability of individual edges on the tree. We have used several sampling methods in a similar approach that measures the similarity of minimal trees derived from subsets (Penny and Hendy 1985~~). Comparing the similarity of trees found with different sets of data from the same taxa has been called congruence by authors working with morphological data (Mickevich 1978; Schuh and Polhemus 1980).

This paper is an application of the empirical approach that seeks to test methods and models of evolutionary-tree reconstruction. We have published some results (Penny and Hendy 198%~) based on the five sets of sequences that were known for the same 11 mammalian taxa. Now that a rodent myoglobin sequence is available, we have repeated the calculations including myoglobins for these 11 taxa.

The prediction from the theory of descent is that minimal trees for one set of data will be similar to minimal trees from a second data set. Our expectations in this study were that ( 1) minimal trees derived from larger subsets of data would be more similar than minimal trees derived from smaller subsets of data, (2) minimal trees derived from subsets of data derived from six proteins would be more similar than those derived from subsets of data derived from five proteins, and (3) objective weighting that gives more emphasis to compatible rather than to random characters would result in minimal trees derived from different subsets being more similar than those found with uniform weighting.

The theme of the present paper is that it is possible to give an estimate of the reliability of results. Despite the difficulty of the problem, it should not be acceptable to publish a tree that is (1) not guaranteed to be optimal for the criterion used, (2) has been found by a technique of unknown reliability with real data, and (3) takes no account of the amount of noise in the data. A recent paper on “Integrity in Science” (Branscombe 1985) makes the comment, “To make the literature worth reviewing, authors . . . must give the reader quantitative estimates of the amounts by which the values given may be in error.”

Dow

nloaded from https://academ

ic.oup.com/m

be/article/3/5/403/988007 by guest on 28 Decem

ber 2021

Page 4: Estimating the Reliability of Evolutionary Tree - Molecular Biology

406 Penny and Hendy

Material and Methods

The data are a set of 166 nucleotides derived from six proteins whose sequences are known for the same 11 mammals (see table 1 for species). Sequences from cyto- chrome c, hemoglobins a and p, and fibrinopeptides A and B were given earlier (Penny et al. 1982), and those from myoglobin are presented in table 1.

Three methods have been used in this study to sample nucleotide positions from the combined sequences. The first method randomly partitions the nucleotide positions into two equal sized, disjoint subsets (Schuh and Polhemus 1980). (Disjoint subsets have no nucleotide positions in common.) In this study each subset has 83 characters and is known as a ‘“halfling” or “Hobbit.” We call this “the method of Hobbits.” The partitioning was repeated on the halflings to give first four, then eight, disjoint, equal sized subsets having 41 and 20 nucleotide positions, respectively.

The second method randomly selects characters without replacement. Subsets with 20, 4 1, 83, and 152 characters were used. This form of resampling is based on “jackknifing” (see review in Efron 1982). No nucleotide position occurs more than once in a subset.

In the third method, “bootstrapping” (Efron 1982; Felsenstein 1985), sets of 166 characters are selected with replacement (in any sample some characters may be omit- ted, while others may be selected more than once). A branch-and-bound program (Hendy and Penny 1982) was used to find all minimal trees for each selected sample, as well as for the combined sequences.

An additional, fourth, method of forming subsets was used in an earlier study (Penny and Hendy 1985a). This method uses one protein and compares minimal trees for that protein against minimal trees derived from the other sequences combined. For example, the lengths of trees derived from cytochrome c are compared with the lengths of the same trees derived from the combined sequences of the hemoglobins and fibrinopeptides. This method was particularly useful in showing that shorter- length trees were better predictors than longer trees.

The partition metric (Robinson and Foulds 198 1) was used to compare trees. This metric is simple to calculate (Penny and Hendy 19858), its probability distribution is known for both binary and nonbinary trees (Hendy et al. 1984), and the tree com- parison method can be modified for several other applications (Penny and Hendy 19853). For binary trees, with n = 11 taxa, the metric takes even values between 0 (trees identical) and 16 (2n - 6; no partitions in common). Figure 1 shows the prob- ability values for binary trees.

Compatibility analysis (Le Quesne 1969; Fitch 1975; Sneath et al. 1975; E&brook et al. 1976; Guise et al. 1982) was used to measure the extent to which characters are consistent with each other. For each character i the observed number of incompati- bilities (Oi) is counted. The original method of Le Quesne (1969) is modified for characters with up to four discrete states (see table 2).

The expected number of incompatibilities (Ei) is calculated on the assumption that the distribution of nucleotides within each position is independent of other nu- cleotide positions (see table 3). Fisher’s exact test (Sokal and Rohlf 198 I), extended to a 4 X 4 contingency table, is used to calculate the expected frequency of every possible combination. The method is illustrated in table 3.

The aim is to test whether the ratio Oi/Ei can be used to predict the reliability of each character. Several forms of objective weighting based on OJEi have been tested (Penny and Hendy 1985a). To weight the ordered characters higher than the random characters, the ratios are subtracted from one to give an initial weight of 1 - (OJEi).

Dow

nloaded from https://academ

ic.oup.com/m

be/article/3/5/403/988007 by guest on 28 Decem

ber 2021

Page 5: Estimating the Reliability of Evolutionary Tree - Molecular Biology

Reliability of Evolutionary Trees 407

FIG. l.-The expected frequencies for the distance between randomly selected pairs of binary trees. For a given number of taxa n, the relative frequency f(d) is the proportion of pairs of trees (among all pairs) that are distance d apart. For d = 0, 2, 4, . . . 2n - 6, -(log f[d]) is plotted. Diagonal lines connect points of the same d value, and, for convenience, the horizontal lines that connect points with the same n - (d/2) values are shown. Thus, for example, the frequency f of pairs of binary trees for 11 taxa where d = 6 is 1 O-4.4. Distances are calculated on the partition metric of Robinson and Foulds ( 1982), and the expected frequencies are given in table 4 of Hendy et al. (1984). (Values off for up to n = 11 are also in Penny et al. 1982.) A method for calculating the distance between two trees is in Penny and Hendy (198%).

A value of one means that the states of that character are highly ordered (and in agreement) when compared with those of the other characters. A value of zero means that the states of the column are randomly distributed when paired with those of the other columns (although not necessarily with triples or higher groups). For some char- acters, Oi > E,; that is, the observed value is higher than the expected value. This implies that the distribution of character states is not random but disagrees with that of the other characters. These characters would give negative initial weightings. Negative values would give contradictory results for a minimal tree, so we truncate these weights at zero. We take, therefore, the basic weights for each nucleotide position Wi as max- imum [ 1 - (Oi/Ei), 01. Results with differential weighting are compared against those with uniform weighting (in which each nucleotide position is given a weight of one).

Results

The minimal (most-parsimonious) tree for the 11 myoglobin sequences (table 1) with uniformly weighted characters is shown in figure 2A. This tree is different from all 2 1 minimal trees derived from the other five sequences (cytochrome c, hemoglobins a! and & and fibrinopeptides A and B), including the combined five sequences (Penny et al. 1982). In addition, the three minimal trees for the combined six proteins are shown as figures 2B-2D. These trees are also different from the 2 1 minimal trees in Penny et al. (1982) and from the minimal tree for myoglobin. There are thus 25 different minimal trees derived from these six proteins.

Dow

nloaded from https://academ

ic.oup.com/m

be/article/3/5/403/988007 by guest on 28 Decem

ber 2021

Page 6: Estimating the Reliability of Evolutionary Tree - Molecular Biology

408 Penny and Hendy

Table 2 Incompatibilities for the First 10 Nucleotide Positions

Nucleotide Position 10 20 30 Oi Ei 1 - Oi/Ei

1 2 . . 3 . . 4 5 6 :: 7 . . 8 9 ::

10

_..l..lll.l.l................l.... 7 1 1 ._............... . . . . . . . . . . . . . . . 2

.._l..lll.l.l................l.... 7 1.1_.1..21...1...22..22..111...1.. 19 . . .._....... 1 . . . . . . . . . . . . . . . . . . . . . 1

1 1 l...........l.l.l . . . ._...... . . . . 6 1.1..._.1....1...11..11..111...1.. 12 1.1...._1....1...11......111...... 9 1.12..11_.1.1....1...11..111.1.1.1 7

1 . . . . . . . . _..l.... l........... 1.1.1 6

13.91 0.497 13.91 0.856 13.91 0.497 36.67 0.482 13.37 0.925 20.4 1 0.706 20.41 0.412 13.37 0.327 23.94 0.290 20.41 0.706

Nom.-This table shows the incompatibilities for the first 10 nucleotide positions from table 1. The number of incompatibilities between a pair of nucleotide positions is shown as 1 or 2, and compatible pairs are indicated by a period. Oi is the observed number of incompatibilities in each row. J!$ is the expected number summed over comparisons of a nucleotide position with all others by the method illustrated in table 3. The final column (1 - Oi/&) is the initial weighting. The values shown are positive, but columns 18 and 32 in table 1 give negative weights (Oi > &). An interpretation is, for example, that the information in col. 5 of table 1 is consistent with the remaining information (and, by implication, reliable), whereas the information in col. 32 (sheep and rabbit sharing the same code) is inconsistent with the remaining data (and, by implication, an “error” as far as phylogeny is concerned). This analysis is done before a minimal tree is found.

Although different from the other minimal trees, the one for myoglobin (fig. 2A) is very similar to those found previously, much too similar to be ascribed to chance. For example, the minimal tree (fig. 2A) is distance six from both one minimal tree for cytochrome c and from the minimal tree for fibrinopeptide A. The probability of randomly selecting any two trees with six differences is 4.0 X 10m5 (fig. 1). Thus it is valid to state that the minimal tree (fig. 2A) for myoglobin is very similar to those of the other proteins. In this case, two of the near-minimal trees for the combined five

Table 3 Calculation of the Expected Number of Incompatibilities

A c u AC-AU

I A. ......... 3 1 4 c ......... 3 * * 3 CA G ......... 3 * 1 4

Total .... 6 3 2 11 GA GU

NOTE.-The example uses nucleotide positions 4 and 13 from table 1 (there are four A’s, three C’s, and four G’s at position 4). The entries no of the left-hand table are hlled in all possible ways that give the observed row and column sums for positions 4 (rows) and 13 (columns). (The example shown is that observed in the data of table 1.) A minimum spanning tree (right-hand side) is found for the nonzero entries. In the example it can be done with only two changes for each nucleotide position, which is the minimum for three nucleotides at each position; there are therefore no duplicate changes with this combination. The expected frequency of each possible combination is Xi. Z .,/n! Z,,. Summing over all ways of filling the table gives the expected frequency of incompatibilities, as well as the expected variance. One algorithm for this computation is given in Pagan0 and Halvorsen (1981).

Dow

nloaded from https://academ

ic.oup.com/m

be/article/3/5/403/988007 by guest on 28 Decem

ber 2021

Page 7: Estimating the Reliability of Evolutionary Tree - Molecular Biology

RABBIT RODENT

RABBIT RODENT

Reliability of Evolutionary Trees 409

DOG ROD

PIG. 2.-Minimal (most parsimonious) trees with uniform weighting. Tree A is for the myoglobin data (table 1) and is 59 nucleotide substitutions long. The branch-and-bound algorithm (Hendy and Penny 1982) was used and will have found all minimal trees. Trees B-D are the uniformly weighted minimal trees (length 37 1) based on the combined sequences of the six proteins (myoglobin [table I]; cytochrome c, hemoglobins C-I and 8, and fibrinopeptides A and B [Penny et al. 19821).

sequences (Penny et al. 1982) are distance four from the minimal tree (fig. 2A). The trees derived from the combined sequences are thus better predictors of the minimal tree for myoglobin than are any of the trees derived from individual proteins.

The next test is whether it is possible to predict which nucleotide positions will be the best predictors of the minimal tree. Figure 3 is a plot of OJEi against the proportion of duplicate (homoplasious) (parallel, convergent, or reverse) changes. Each point on the graph is one nucleotide position of the myoglobin data. There is a high positive correlation, which demonstrates that OJEi is a useful predictor of the reliability for each nucleotide position on the minimal trees. There is a similar pattern in each case studied thus far, myoglobin (fig. 3), hemoglobin p (Penny and Hendy 1985a), and hemoglobin a (data not shown). In this pattern characters with a high weight fit the minimal tree exactly, those with a very low or negative weight contradict the tree almost completely, and there is a region in the middle with a changeover.

A conclusion deducible from figure 3 is that it is possible to predict which nu- cleotide positions are most reliable before the minimal tree is found. The basic weights (Wi = maximum [ 1 - (OJEi); 01) can be used directly, but, because a line of best fit in figure 3 is not linear through the origin, we have tried several simple functions of Wi (Penny and Hendy 198%). The weightings selected for further study were (1) uniform weighting (Ui = l), (2) linear weighting (I_Ji = Wi), and (3) squares weighting (Ui = WY).

The next step is to test whether the minimal trees are becoming more similar as larger subsets are being analyzed. Minimal trees were found for all subsets with uniform, linear, and squares weighting. Results from comparing minimal trees derived from disjoint subsets are given in table 4, and those from comparing minimal trees derived from other subsets are given in table 5. One aim was to detect whether the trees were converging on a small subset of trees (Felsenstein 1983) and whether this occurred faster with differential weighting.

As expected (excluding the bootstrap samples, which are not directly comparable with the smaller subsets in table 5 and are discussed below) the trees derived from larger subsets were more similar than those derived from smaller subsets. The results in table 4 are particularly interesting, since they show strongly that minimal trees from

Dow

nloaded from https://academ

ic.oup.com/m

be/article/3/5/403/988007 by guest on 28 Decem

ber 2021

Page 8: Estimating the Reliability of Evolutionary Tree - Molecular Biology

4 10 Penny and Hendy

0,2 0,4 0,6 0,8 1 1.2 1,4

INCOMPATIBILITIES , Observed / Expected FIG. 3.-Plot of incompatibilities against duplications for the myoglobin data. For the minimal tree,

duplications are the number of nucleotide substitutions at a given position in excess of the theoretical minimum for that position, the theoretical minimum being one less than the number of different kinds of nucleotides at that position. The X-axis is the ratio of observed to expected incompatibilities (OJE,) for each character (nucleotide position) of the data. This ratio is calculated before the minimal trees are found. The Y-axis is the proportion of the maximum possible number of duplicated character states for each character (whether reversions, parallel changes, or convergent changes) summed over the six shortest trees. There are one minimal tree and five trees one longer than minimal (Penny et al. 1982) for the myoglobin data. A character with no duplications (zero on the Y-axis) fits the minimal trees exactly. A character with a value of one on the Y-axis contradicts the minimal trees completely. Given the random nature of substitutions that will have occurred in a gene, the ratio Oi/Ei is an effective predictor for how well a nucleotide position will fit the minimal tree.

nonoverlapping sets of data become increasingly similar as more nucleotide positions are included.

Minimal trees derived from the larger subsets of the six proteins are found to be more similar than those derived from the equivalent subsets of five proteins (right- hand column, table 4). The improvement is small with the smaller subsets because they are approximately the same size ( 18 and 20 nucleotide positions and 33 and 4 1 nucleotide positions). We have tried to give a quantitative interpretation of this increase in tree similarity. With six proteins and with uniform weighting there was an average difference of 6.26 between the minimal trees derived from two halflings (83 characters) (table 4). With 11 taxa, the probability (by logarithmic interpolation; fig. 1) of randomly selecting trees this similar is 5.37 X 10m5. By interpolation this corresponds to 17,600 trees (5.37 X lo-’ X 34,459,425). (34,459,425 is the number of unrooted binary trees for 11 taxa; Penny et al. 1982.) The corresponding figure for the halflings from the five proteins (66 characters) is an average distance between minimal trees of 7.22, which corresponds to 48,500 trees.

We use the ratio of the sizes of these subsets as a measure of the improvement in the accuracy of a technique. We call the increase in this ratio the improvement in

Dow

nloaded from https://academ

ic.oup.com/m

be/article/3/5/403/988007 by guest on 28 Decem

ber 2021

Page 9: Estimating the Reliability of Evolutionary Tree - Molecular Biology

Reliability of Evolutionary Trees 4 11

Table 4 Average Distances between Minimal Trees from Disjoint Subsets from Six Combined Proteins

WEIGHTING USED

SUBSET Uniform Linear huares Uniform”

Half(83) . . . . . 6.26 [0.21; 1,818] 6.40 [0.22; 2,108] 5.90 [0.28; 1,240] 7.22 Quarter (41). . . 8.04 [O. 18; 11,900] 7.32 [0.22; $5651 7.35 [0.21; 5,744] 8.05 Eighth (20). . . . 10.25 [0.21; 123,900] 9.82 [0.27; 78,580] 9.86 [0.25; 8 1,990] 10.27

NOTE.-Pairs of subsets of nucleotide positions were formed by selecting at random (without replacement) from the 166 informative nucleotide positions. All minimal trees were found for each subset. Distances between minimal trees were made by the partition metric for tree comparisons (Penny and Hendy 19856). Note that in this table the trees being compared are from subsets that have no nucleotide positions in common; that is, they are disjoint subsets. The number of characters in each subset is shown in parentheses. (Values in brackets are the standard error of the mean followed by the number of binary trees equivalent to the average distance between minimal trees). The final column contains results from the five proteins excluding myoglobin in Penny and Hendy (1985~) and are given with uniform weighting.

’ Results using only the original five proteins.

“precision” or “sharpness” of our estimate. We do not know the absolute value of the precision; rather, we estimate the relative increase in precision that results from including more data or modifying the technique. In this case, increasing the subset size from 66 to 83 characters has improved the precision of our estimate by approx- imately three times (48,500: 17,600).

Objective weighting has increased the similarity (reduced the distances) between minimal trees derived from different subsets, although the effect is less than that seen when only the five proteins are used (Penny and Hendy 1985a). With uniform weighting the average distance between minimal trees derived from subsets with 83 characters was 5.07; with squares weighting the average distance was 4.53 (table 5). These values correspond to 510 and 280 trees, respectively. The precision is thus increased ap- proximately two times (5 10:280) over the uniform weighting method. Despite the smaller effect of squares weighting in this study, the difference between 5.07 and 4.53 remains significant.

In this study differential weighting has had a smaller effect than it had in an earlier study (Penny and Hendy 1985a) with five proteins. It may be that we have

Table 5 Comparisons of Minimal Trees from Subsets of Nucleotide Positions

WEIGHTING USED

SUBSET Uniform Linear Squares

Bootstrap (166; 50) . . . . Jackknife (152; 40) . . . . Half (83; 40) . . . . . . . . . Quarter(41; 80) . . . . . . Eighth (20; 80) . . . . . . .

4.46 [2.32] 4.87 [2. lo] 4.50 [2.03] 2.59 [ 1.201 2.00 [1.44] 1.65 [1.62] 5.07 [ 1.931 4.92 [ 1.961 4.53 [ 1.911 7.53 [ 1.781 6.70 [2.15] 6.71 [2.12]

10.12 [1.94] 9.60 [2.45] 9.90 [2.39]

Nom.-Bootstrap samples were found by randomly selecting nucleotide positions with re- placement; all others are formed without replacement. The number of nucleotide positions and the number of subsets used are shown in parentheses. The average distance between minimal trees, using the partition metric, is shown. The values in brackets are the standard deviations. In contrast with table 4, in these results smaller subsets may include the same nucleotide positions and the larger subsets must have nucleotide positions in common.

Dow

nloaded from https://academ

ic.oup.com/m

be/article/3/5/403/988007 by guest on 28 Decem

ber 2021

Page 10: Estimating the Reliability of Evolutionary Tree - Molecular Biology

4 12 Penny and Hendy

reached a plateau in improving our estimate of T*. With the six proteins, the minimal trees derived with uniform weighting are similar to those derived with five proteins with squares weighting. Thus we did get a useful effect with weighting the five proteins. However, differential weighting with six proteins gives only a small improvement in precision. We appear to be left with more difficult problems to solve. The first is the primate-lagomorph-rodent trichotomy; the second is the placing of dog. These two problems are discussed below. In addition to the effect of faster convergence, the squares-weighting consensus trees for the minimal trees derived from each subset are more consistent even when smaller subsets are taken.

The minimal tree derived with squares weighting and using all 166 characters is shown in figure 4. We expect that this trace would be the best available estimate of T*, but we require a quantitative estimate of its reliability. What we would like to have is an estimate of the average distance between minimal trees derived from the full data of 166 columns. One of the best estimates of variability is that for the minimal trees derived from disjoint subsets with up to 83 columns (table 4). We could extrapolate the tree comparison values for disjoint subsets in table 4, assuming that the same trends continue as more data become available. This may give us an estimate of the expected variability between minimal trees derived from disjoint subsets with 166 columns-when the additional data become available!

A better process may be to establish upper and lower estimates of the expected variability. One approach is to use the results in table 5 (with squares weighting and excluding the bootstrap samples) and to extrapolate to 166 characters (see fig. 5). This process gives a predicted difference between minimal trees derived from different subsets of approximately 1.5, corresponding to approximately 50 trees. This is a lower estimate of how many trees could be T*. Should this trend continue, then we might expect that approximately twice as many informative nucleotide positions would be needed before we could have confidence that the minimal tree is T*. The results in figure 5 are

root FIG. 4.-Minimal tree from the full data (six proteins, 166 characters) using squares weighting. The has been placed arbitrarily between kangaroo and the eutherians. This tree differs from the minimal

tree based on five proteins (Penny and Hendy 1985a) only in having rabbit rather than branching independently from the common line with primates.

and rodents on a side

Dow

nloaded from https://academ

ic.oup.com/m

be/article/3/5/403/988007 by guest on 28 Decem

ber 2021

Page 11: Estimating the Reliability of Evolutionary Tree - Molecular Biology

Reliability of Evolutionary Trees 4 13

plotted on a scale of the reciprocal of subset size because a plot using the square root of the sample size is not linear. This feature warrants further investigation because we would have expected it to be linear with the square root of subset size.

Finally, we have taken bootstrap samples (sampling characters with replacement), found the minimal trees, and compared them (table 5). The general consensus tree (Penny et al. 1982) for these minimal trees with squares weighting has been determined and is the tree already found in figure 4. The bootstrap samples may give us an upper estimate on the number of trees that need to be considered to include T* because the variability of the bootstrap sample is similar to that of the disjoint subsets with half the data, as predicted by Felsenstein (1985). Consequently, the expected variability between disjoint subsets with 166 columns is expected to be less than the bootstrap variability.

Felsenstein (1985) has also used bootstrap samples and has used a nonbinary consensus tree in which each partition occurs in at least 95% of the samples. This is probably appropriate in a classification study if categories are not to be established until there is reasonable confidence that they will be supported by additional work. A nonbinary tree is a degenerate case of a number of binary trees. The nonbinary tree ignores the differences between these binary trees and may not be so useful in predicting minimal trees when more data is available.

The tree in figure 4 is similar to the minimal trees with uniform weighting (fig. 2B-2D). The main differences are that differential weighting has (1) placed horse and pig in their accepted positions by placing pig next to the other artiodactyls (cow and sheep) and (2) brought dog within the main group of eutherians. The difference vis- a-vis the best tree derived from five proteins (Penny and Hendy 1985a) is that rabbit

l/8 l/4 l/2 I

Proportion of Sites in &set FIG. 5.-Average similarity of trees from different-sized subsets of the data. 0 = Uniform weighting,

and + = squares weighting from table 4 (disjoint subsets). X and A are uniform weighting and squares weighting, respectively, from table 5 (characters selected randomly without replacement, jackknifing). To estimate the reliability of our final tree, one needs to know the expected variation between minimal trees from disjoint subsets of 166 nucleotide positions. The jackknifing subsets allow one to derive a lower estimate of the variability.

Dow

nloaded from https://academ

ic.oup.com/m

be/article/3/5/403/988007 by guest on 28 Decem

ber 2021

Page 12: Estimating the Reliability of Evolutionary Tree - Molecular Biology

4 14 Penny and Hendy

Table 6 Time Required to Find All the Minimal Trees for a Subset

WEIGHTING USED

SAMPLE Uniform Linear Squares Ratio

Bootstrap (166) . . . . 1,094 [87] 106 [50] 65 [50] 16.8 Jackknife (152) . . . . 1,106 [I371 98 [41] 60 [41] 18.4 Halfling (83) . . . . . 579 [84] 49 [40] 36 [40] 16.1 Quarter (41) . . . . . . 283 [360] 28 [96] 27 [loll 10.5 Eighth (20) . . . . . . . 148 [884] 75 [326] 60 [263] 2.5

NOTE.-Average CPU time, in seconds, required to find all minimal trees for subsets of different sizes and with three forms of weighting. The total number of minimal trees, for the number of subsets given in table 5, is given in brackets. Objective weighting gives a significant reduction in CPU time, and the effect increases with the size of the subsets. Ratio is the time required to find all minimal trees under uniform weighting divided by the time required under squares weighting. The computer used was a Prime 750.

(a lagomorph) and rodents share a common ancestor, rather than independently joining the primate line.

Discussion

Biologists seem to seek “The One Tree” and appear not to be satisfied by a range of options. However, there is no logical difficulty with having a range of trees. There are 34,459,425 possible trees for 11 taxa (Penny et al. 1982), and to reduce this to the order of lo-50 trees is analogous to an accuracy of measurement of approximately one part in 106.

Many measurements in biology are only accurate to one or two significant figures and pale when compared to physical measurements that may be accurate to 10 sig- nificant figures. To be able to estimate an accuracy of one tree in lo6 reflects the increasing sophistication of tree reconstruction methods. (Note that, on this argument, to iden@ an organism to a species is also analogous to a measurement with an accuracy of approximately one in 106.)

The 25 shortest trees have been found using squares weighting on the complete data set The two trees closest to minimal have rabbit. and mouse independently joining the primate line. Clearly, the exact relationship of these taxa is the least certain con- clusion on the minimal tree. In this case the three possibilities can be represented by a nonbinary tree, joining rabbit, mouse, and the primates to a single point. Although their exact relationship is uncertain, the close relationship of rodents and lagomorphs is one of the strongest conclusions resulting from molecular studies (Penny and Hendy 1985b).

The next problem is the position of dog. The next shortest trees have dog forming the first separation on the eutherian line. After that, the shortest trees have dog joining with the early primates, after the separation of rodents and lagomorphs. None of these positions can yet be excluded. In this case it is not helpful to represent the positions of dog on a multifurcating point because there are preferred positions for dog within the tree because, for example, none of the near-minimal trees have dog separating rodents and lagomorphs. To take a nonbinary tree is to hide the large amount of information on the relative lengths of the individual trees, and it is known that shorter trees are better predictors when more information is available (Penny and Hendy 1985~).

Dow

nloaded from https://academ

ic.oup.com/m

be/article/3/5/403/988007 by guest on 28 Decem

ber 2021

Page 13: Estimating the Reliability of Evolutionary Tree - Molecular Biology

Reliability of Evolutionary Trees 4 15

An improved understanding could result from combining dog hemoglobins with those of other carnivores. If there is something unusual, say, in the dog hemoglobins, then this may be eliminated by comparing it with those of other carnivores. We call this a testing of the “stability” of the tree as new taxa are added, as opposed to a measuring of the “convergence” as more nucleotide positions are included. It is a limitation of our present programs that all sequences are necessary for each of the taxa. There appears to be no fundamental reason why these programs cannot be made more general without losing any of the power of the method.

We suggest that objective weighting is using more of the information in the data and that this is why it increases the reliability of the results. The information being used to weight characters (comparisons between columns) is information discarded when a distance matrix is formed (Penny 1982).

The approach used in this study does require a lot of computer time (table 6). Most of this time was devoted to finding large numbers of minimal and near-minimal trees for approximately 290 subsets of data, with and without various forms of weight- ing. However, the branch-and-bound method (Hendy and Penny 1982) with squares weighting runs faster than that with uniformly weighted data, particularly as the number of nucleotide positions increases. With our program on a Prime 750 computer, it required 66 s with all 166 characters to find the minimal tree with squares weighting and 174 s to get the 25 shortest trees. The calculation of Ei does require significant computer time, although it is possible that approximations could be found that would reduce this time considerably.

The present method of weighting has proved to be the most powerful of those tested, possibly because it allows the detection of nucleotide positions that deviate from randomness by having either more or fewer incompatibilities than would be expected had they occurred by chance. We have not tested triples or higher groups of characters; some possible maximum likelihood methods (Felsenstein 198 1 ), identifying duplicated character states (Guise et al. 1982); other recursive optimization approaches (Farris 1969); or some possible discriminant functions.

Several other methods have been proposed to weight characters (Farris 1969; Hansel1 and Ewing 1973; Hogeweg 1976; Davies 198 1; Felsenstein 198 1; Whiffin 1982), and these could be evaluated by our general procedure. We therefore cannot claim that the present method is in any sense the best method. The general approach allows these other methods of weighting to be tested.

The method can also be used to compare different approaches to tree building. Do minimal length trees converge faster as more information is included? Questions such as these can be tested with all the data included, rather than with half the data. Again, it can be tested whether trees converge faster if the molecular clock is assumed.

The reason that differential weighting gives better results is probably related to Felsenstein’s calculations ( 1983) that the minimal length tree is expected to converge on the true historical tree T* if the probability of change between nucleotides is low.

If there is a high probability for nucleotide changes, then the minimal tree will not necessarily converge on the true T* as more positions are added. The difficulty in using Felsenstein’s analysis has been that the probability of change of any character was not known-and that it almost certainly varies at different positions. With the present form of weighting, based on Oi and Ei, we are almost certainly weighting more highly those nucleotides that have changed more slowly.

Finally, we would like to return to the important point of estimating the reliability of a particular evolutionary tree. Branscomb ( 1985), in a paper titled “Integrity in Science,” makes the comment: “Information that is wrong is not useful. And infor-

Dow

nloaded from https://academ

ic.oup.com/m

be/article/3/5/403/988007 by guest on 28 Decem

ber 2021

Page 14: Estimating the Reliability of Evolutionary Tree - Molecular Biology

4 16 Penny and Hendy

mation lacking evidence revealing whether it is right or wrong is scarcely more so.” Given this definition, there is a need for more integrity in the study of evolu- tionary trees.

Note. All of the analyses in this paper are available on one large interactive program that is available from the authors.

LITERATURE CITED

BRANSCOMBE, L. M. 1985. Integrity in science. Am. Sci. 73:42 l-423. DAVIES, R. G. 198 1. Information theory and character selection in the numerical taxonomy of

some male Diaspididae (Hemiptera: Coccoidea). Syst. Entomology 6: 149- 178. DAYHOFF, M. 0. 1972. Atlas of protein sequence and structure. Vol. 5 (plus suppl. l-3). National

Biomedical Research Foundation, Silver Spring, Md. EFRON, B. 1982. The jackknife, the bootstrap, and other resampling plans. SIAM Monogr. no.

38. Sot. Ind. Appl. Math., Philadelphia. ESTABROOK, G. F., C. S. JOHNSON, and F. R. MCMORRIS. 1976. An algebraic analysis of

cladistic characters. Discrete Math. 16: 14 1- 147. FARRIS, J. S. 1969. A successive approximations approach to character weighting. Syst. Zool.

18:374-385. FELSENSTEIN, J. 198 1. A likelihood approach to character weighting and what it tells us about

parsimony and compatibility. Biol. J. Linn. Sot. 16: 183- 196. - 1983. Statistical inference of phylogenies. J. R. Stat. Sot. 146:246-272. . - 1985. Confidence limits on phylogenies-an approach using the bootstrap. Evolution

39:783-79 1. FIALA, K. L., and R. R. SOKAL. 1985. Factors determining the accuracy of cladogram estimation:

evaluation using computer simulation. Evolution 39:609-622. FITCH, W. M. 1975. On the problem of discovering the most parsimonious tree. Am. Nat. 111:

223-257. FITCH, W. M., and W. R. ATCHLEY. 1985. Evolution in inbred strains of mice appears rapid.

Science 228: 1169-l 175. FITCH, W. M., and E. MARKOWITZ. 1970. An improved method for determining codon variability

in a gene and its application to the rate of fixation of mutations in evolution. Biochem. Genet. 4:579-593.

GUISE, A., D. PEACOCK, and T. GLEAVES. 1982. A method for identification of parallelism in discrete data sets. Zool. J. Linn. Sot. 74:293-303.

GURNETT, A. M., J. P. O’CONNELL, D. E. HARRIS, H. LEHMANN, K. A. JOYSEY, and E. NEVO. 1984. The myoglobin of rodents: Lagostomus maximus (viscacha) and Spalax ehrenbergi (mole rat) J. Protein Chem. 3:445-454.

HANSELL, R. I. C., and B. EWING. 1973. The detection and estimation of character weighting in classifications. J. Theor. Biol. 39:297-314.

HENDY, M. D., C. H. C. LITTLE, and D. PENNY. 1984. Comparing trees with pendant vertices labelled. SIAM. J. Appl. Math. 44: 1054-1067.

HENDY, M. D., and D. PENNY. 1982. Branch and bound algorithms to determine minimal evolutionary trees. Math. Biosci. 59:277-290.

HOGEWEG, P. 1976. Iterative character weighting in numerical taxonomy. Comput. Biol. Med. 6:199-211.

LE QUESNE, W. J. 1969. A method for objectively selecting characters in numerical taxonomy. Syst. Zool. 18:201-205.

MICKEVICH, M. R. 1978. Taxonomic congruence. Syst. Zool. 27: 143-158. PAGANO, M., and K. T. HALVORSEN. 198 1. An algorithm for finding the exact significance

levels of r X c contingency tables. J. Am. Stat. Assoc. 76:93 l-934. PEACOCK, D., and D. BOULTER. 1975. Use of amino acid sequence data in phylogeny and

evaluation of methods using computer simulation. J. Mol. Biol. 95:5 13-527. PENNY, D. 1982. Towards a basis for classification: the incompleteness of distance measures,

incompatibility analysis and phenetic classification. J. Theor. Biol. 96: 129- 142.

Dow

nloaded from https://academ

ic.oup.com/m

be/article/3/5/403/988007 by guest on 28 Decem

ber 2021

Page 15: Estimating the Reliability of Evolutionary Tree - Molecular Biology

Reliability of Evolutionary Trees 4 17

PENNY, D., L. R. FOULDS, and M. D. HENDY. 1982. Testing the theory of evolution by comparing phylogenetic trees constructed from 5 different protein sequences. Nature 297: 197-200.

PENNY, D., and M. D. HENDY. 1985a. Testing methods of evolutionary tree construction. Cladistics 1:266-272.

-. 19858. The use of tree comparison methods. Syst. Zool. 34:75-82. ROBINSON, D. F., and L. R. FOULDS. 198 1. Comparison of phylogenetic trees. Math. Biosci.

53:131-147. SCHUH, R. T., and J. T. POLHEMUS. 1980. Analysis of taxonomic congruence among morpho-

logical, ecological, and biogeographic data sets for the Leptopodomorpha (Hemiptera). Syst. Zool. 29: l-26.

SNEATH, P. H. A., M. J. SACKIN, and R. P. AMBLER. 1975. Detecting evolutionary incompat- ibilities from protein sequences. Syst. Zool. 24:3 1 l-332.

SOKAL, R. R., and F. J. ROHLF. 198 1. Introduction to biostatistics. 2d ed. W. H. Freeman, San Francisco.

TATENO, Y., M. NEI, and F. TAJIMA. 1982. Accuracy of estimated phylogenetic trees from molecular data. I. Distantly related species. J. Mol. Evol. l&387-404.

WHIFFIN, T. 1982. Numerical analysis of volatile oil data in systematic studies of Australian rainforest trees. Taxon 31:204-2 10.

WALTER M. FITCH, reviewing editor

Received December 23, 1985; revision received May 12, 1986.

Dow

nloaded from https://academ

ic.oup.com/m

be/article/3/5/403/988007 by guest on 28 Decem

ber 2021