Top Banner
1 Parsimony, Likelihood, Common Causes, and Phylogenetic Inference Elliott Sober Philosophy Department University of Wisconsin, Madison
62

Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

Dec 30, 2015

Download

Documents

Maxwell Frost

Parsimony, Likelihood, Common Causes, and Phylogenetic Inference. Elliott Sober Philosophy Department University of Wisconsin, Madison. 2 suggested uses of O’s razor. O’s razor should be used to constrain the order in which hypotheses are to be tested. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

1

Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

Elliott Sober

Philosophy Department

University of Wisconsin, Madison

Page 2: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

2 suggested uses of O’s razor

• O’s razor should be used to constrain the order in which hypotheses are to be tested.

• O’s razor should be used to interpret the acceptability/support of hypotheses that have already been tested.

2

Page 3: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

Pluralism about Ockham’s razor?

• [Pre-test] O’s razor should be used to constrain the order in which hypotheses are to be tested.

• [Post-test] O’s razor should be used to interpret the acceptability/support of hypotheses that have already been tested.

3

Page 4: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

these can be compatible, but…

• [Pre-test] O’s razor should be used to constrain the order in which hypotheses are to be tested.

• [Post-test] O’s razor should be used to interpret the acceptability/support of hypotheses that have already been tested.

If pre-test O’s razor is “rejectionist,” then

post-test O’s razor won’t have a point.

4

Page 5: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

these can be compatible, but…

• [Pre-test] O’s razor should be used to constrain the order in which hypotheses are to be tested.

• [Post-test] O’s razor should be used to interpret the acceptability/support of hypotheses that have already been tested.

If the pre-test idea involves testing hypotheses one

at time, then it views testing as noncontrastive.

5

Page 6: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

within the post-test category of support/plausibility ...

• Bayesianism – compute posterior probs.

• Likelihoodism – compare likelihoods.

• Frequentist model selection criteria like AIC – estimate predictive accuracy.

6

Page 7: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

I am a pluralist about these broad philosophies …

• Bayesianism – compute posterior probs

• Likelihoodism – compare likelihoods

• Frequentist model selection criteria like AIC – estimate predictive accuracy.

7

Page 8: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

I am a pluralist about these broad philosophies …

• Bayesianism – compute posterior probs

• Likelihoodism – compare likelihoods

• Frequentist model selection criteria like AIC – estimate predictive accuracy.

Not that each is okay as a global thesis

about all scientific inference …

8

Page 9: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

• Bayesianism – compute posterior probs

• Likelihoodism – compare likelihoods

• Frequentist model selection criteria like AIC – estimate predictive accuracy.

But I do think that each has its place.

I am a pluralist about these broad philosophies …

9

Page 10: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

Ockham’s Razors*

Different uses of O’s razor have different

justifications and some have none at all.

* “Let’s Razor Ockham’s Razor,” in D. Knowles (ed.), Explanation and

Its Limits, Cambridge University Press, 1990, 73-94.10

Page 11: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

Parsimony and Likelihood

In model selection criteria like AIC and BIC, likelihood and parsimony are conflicting desiderata.

AIC(M) = log[Pr(Data│L(M)] - k

11

Page 12: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

In model selection criteria like AIC and BIC, likelihood and parsimony are conflicting desiderata.

In other settings, parsimony

has a likelihood justification.

Parsimony and Likelihood

12

Page 13: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

the Law of Likelihood

Observation O favors H1 over H2

iff

Pr(O│H1) > Pr(O│H2)

13

Page 14: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

a Reichenbachian idea

Salmon’s example of plagiarism

E1 E2 E1 E2

C C1 C2

[Common Cause] [Separate Causes]

14

Page 15: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

a Reichenbachian idea

Salmon’s example of plagiarism

E1 E2 E1 E2

C C1 C2

[Common Cause] [Separate Causes]

more parsimonious

15

Page 16: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

Reichenbach’s argument

IF(i) A cause screens-off its effects from each other

(ii) All probabilities are non-extreme (≠ 0,1)

(iii)a particular parameterization of the CC and SC models

(iv)cause/effect relationships are “homogenous” across branches.

THEN Pr[Data │Common Cause] > Pr[Data │Separate Causes].

16

Page 17: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

parameters and homogeneity

E1 E2 E1 E2

p1 p2 p1 p2

C C1 C2

[Common Cause] [Separate Causes]

17

Page 18: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

IF(i) A cause screens-off its effects from each other.

(ii) All probabilities are non-extreme

(iii)parameterization of the CC and SC models.

(iv)cause/effect relationships are “homogenous” across branches.

THEN Pr[Data │Common Cause] > Pr[Data │Separate Causes].

The more parsimonious hypothesis

has the higher likelihood.

Reichenbach’s argument

18

Page 19: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

IF(i) A cause screens-off its effects from each other.

(ii) All probabilities are non-extreme

(iii)parameterization of the CC and SC models.

(iv)cause/effect relationships are “homogenous” across branches.

THEN Pr[Data │Common Cause] > Pr[Data │Separate Causes].

Parsimony and likelihood are

ordinally equivalent.

Reichenbach’s argument

19

Page 20: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

Some differences with Reichenbach

• I am comparing two hypotheses.

• I’m not using R’s Principle of the Common Cause.

• I take the evidence to be the matching of the students’ papers, not their “correlation.”

20

Page 21: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

empirical foundations for likelihood ≈ parsimony

(i) A cause screens-off its effects from each other.

(ii) All probabilities are non-extreme

(iii)parameterization of the CC and SC models.

(iv)cause/effect relationships are “homogenous” across branches.

By adopting different assumptions, you can

arrange for CC to be less likely than SC.

Now likelihood and parsimony conflict!21

Page 22: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

empirical foundations for likelihood ≈ parsimony

(i) A cause screens-off its effects from each other.

(ii) All probabilities are non-extreme

(iii)parameterization of the CC and SC models.

(iv)cause/effect relationships are “homogenous” across branches.

Note: the R argument shows that these are

sufficient for likelihood ≈ parsimony,

not that they are necessary. 22

Page 23: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

Parsimony in Phylogenetic Inference

Two sources: ─ Willi Hennig

─ Luigi Cavalli-Sforza and Anthony Edwards

Two types of inference problem: ─ find the best tree “topology”

─ estimate character states of ancestors

23

Page 24: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

24

1. Which tree topology is better?

H C G H C G

(HC)G H(CG)

MP: (HC)G is better supported than H(CG) by data D if and only if

(HC)G is a more parsimonious explanation of D than H(CG) is.

Page 25: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

25

An Example of a Parsimony Calculation

1 1 0 1 1 0H C G H C G

0 0 (HC)G H(CG)

Page 26: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

1 1 1 H C G

A=?

26

2. What is the best estimate of the character states of ancestors in an assumed tree?

Page 27: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

1 1 1 H C G

A=?

MP says that the best estimate is that A=1.

27

2. What is the best estimate of the character states of ancestors in an assumed tree?

Page 28: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

28

Maximum LikelihoodH C G H C G

(HC)G H(CG)

ML: (HC)G is better supported than H(CG) by data D if and only if Pr[D│(HC)G] > Pr[D│H(CG)].

Page 29: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

29

Maximum LikelihoodH C G H C G

(HC)G H(CG)

ML: (HC)G is better supported than H(CG) by data D if and only if PrM[D│(HC)G] > PrM[D│H(CG)]. ML is “model dependent.”

Page 30: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

the present situation in evolutionary biology

• MP and ML sometimes disagree.

• The standard criticism of MP is that it assumes that evolution proceeds parsimoniously.

• The standard criticism of ML is that you need to choose a model of the evolutionary process.

30

Page 31: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

31

When do parsimony and likelihood agree?

• (Ordinal Equivalence) For any data set D and any pair of phylogenetic hypotheses H1and H2, Pr(D│H1) > Pr(D│H2) iff H1 is a more parsimonious explanation of D than H2 is.

Page 32: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

32

When do parsimony and likelihood agree?

• (Ordinal Equivalence) For any data set D and any pair of phylogenetic hypotheses H1and H2, PrM(D│H1) > PrM(D│H2) iff H1 is a more parsimonious explanation of D than H2 is.

• Whether likelihood agrees with parsimony depends on the probabilistic model of evolution used.

Page 33: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

33

When do parsimony and likelihood agree?

• (Ordinal Equivalence) For any data set D and any pair of phylogenetic hypotheses H1and H2, PrM(D│H1) > PrM(D│H2) iff H1 is a more parsimonious explanation of D than H2 is.

• Whether likelihood agrees with parsimony depends on the probabilistic model of evolution used.

• Felsenstein (1973) showed that the postulate of very low rates of evolution suffices for ordinal equivalence.

Page 34: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

34

Does this mean that parsimony assumes that rates are low?

• NO: the assumptions of a method are the propositions that must be true if the method correctly judges support.

Page 35: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

35

Does this mean that parsimony assumes that rates are low?

• NO: the assumptions of a method are the propositions that must be true if the method correctly judges support.

• Felsenstein showed that the postulate of low rates suffices for ordinal equivalence, not that it is necessary for ordinal equivalence.

Page 36: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

36

Tuffley and Steel (1997)

• T&S showed that the postulate of “no-common-mechanism” also suffices for ordinal equivalence.

• “no-common-mechanism” means that each character on each branch is subject to its own drift process.

Page 37: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

the two probability models of evolution

Felsenstein

• Rates of change are low, but not necessarily equal.

• Drift not assumed:

Pr(i j) and Pr(j i)

may differ.

Tuffley and Steel

• Rates of change can be high.

• Drift is assumed:

Pr(i j) = Pr(j i)

37

Page 38: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

38

How to use likelihood to define what it means for parsimony to assume something

• The assumptions of parsimony = the propositions that must be true if parsimony correctly judges support.

• For a likelihoodist, parsimony correctly judges support if and only if parsimony is ordinally equivalent with likelihood.

• Hence, for a likelihoodist, parsimony assumes any proposition that follows from ordinal equivalence.

Page 39: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

39

A Test for what Parsimony does not assume

Model M ordinal equivalence A

where A = what parsimony assumes

Page 40: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

40

A Test for what Parsimony does not assume

Model M ordinal equivalence A

where A = what parsimony assumes

• If model M entails ordinal equivalence, and M entails proposition X,

X may or may not be an assumption of parsimony.

Page 41: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

41

A Test for what Parsimony does not assume

Model M ordinal equivalence A

where A = what parsimony assumes

• If model M entails ordinal equivalence, and M entails proposition X,

X may or may not be an assumption of parsimony.

• If model M entails ordinal equivalence, and M does not entail proposition X, then X is not an assumption of parsimony.

Page 42: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

42

applications of the negative test

• T&S’s model does not entail that rates of change are low; hence parsimony does not assume that rates are low.

• F’s model does not assume neutral evolution; hence parsimony does not assume neutrality.

Page 43: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

43

How to figure out what parsimony does assume?

• Find a model that forces parsimony and likelihood to disagree about some example.

• Then, if parsimony is right in what it says about the example, the model must be false.

Page 44: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

44

Example #1

Task: Infer the character state of the MRCA of species that

all exhibit the same state of a quantitative character.

10 10 … 10 10

A=?

The MP estimate is A=10.

When is A=10 the ML estimate? And when is it not?

Page 45: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

45

Answer

10 10 … 10 10

A=?

ML says that A=10 is the best estimate (and thus agrees with MP) if there is neutral evolution or selection is pushing each lineage towards a trait value of 10.

Page 46: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

46

Answer

10 10 … 10 10

A=?

ML says that A=10 is the best estimate (and thus agrees with MP) if there is neutral evolution or selection is pushing each lineage towards a trait value of 10.

ML says that A=10 is not the best estimate (and thus disagrees with MP) if (*) selection is pushing all lineages towards a single trait value different from 10.

Page 47: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

47

Answer

10 10 … 10 10

A=?

ML says that A=10 is the best estimate (and thus agrees with MP) if there is neutral evolution or selection is pushing each lineage towards a trait value of 10.

ML says that A=10 is not the best estimate (and thus disagrees with MP) if (*) selection is pushing all lineages towards a single trait value different from 10.

So: Parsimony assumes, in this problem, that (*) is false.

Page 48: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

48

Example #2 Task: Infer the character state of the MRCA of two species that

exhibit different states of a dichotomous character.

1 0

A=?

A=0 and A=1 are equally parsimonious. When are they equally likely? And when are they unequally likely?

Page 49: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

49

Answer

1 0

A=?

ML agrees with MP that A=0 and A=1 are equally good estimates if the same neutral process occurs in the two lineages.

Page 50: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

50

Answer

1 0

A=?

ML agrees with MP that A=0 and A=1 are equally good estimates if the same neutral process occurs in the two lineages.

ML disagrees with MP if (*) the same selection process occurs in both lineages.

Page 51: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

51

Answer

1 0

A=?

ML agrees with MP that A=0 and A=1 are equally good estimates if the same neutral process occurs in the two lineages.

ML disagrees with MP if (*) the same selection process occurs in both lineages.

So: Parsimony assumes, in this problem, that (*) is false.

Page 52: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

52

Conclusions about phylogenetic parsimony ≈ likelihood

• The assumptions of parsimony are the propositions that must be true if parsimony correctly judges support.

Page 53: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

53

Conclusions about phylogenetic parsimony ≈ likelihood

• The assumptions of parsimony are the propositions that must be true if parsimony correctly judges support.

• To find out what parsimony does not assume, use the test described [M ordinal equivalence A].

Page 54: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

54

Conclusions about phylogenetic parsimony ≈ likelihood

• The assumptions of parsimony are the propositions that must be true if parsimony correctly judges support.

• To find out what parsimony does not assume, use the test described [M ordinal equivalence A].

• To find out what parsimony does assume, look for examples in which parsimony and likelihood disagree, not for models that ensure that they agree.

Page 55: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

55

Conclusions about phylogenetic parsimony ≈ likelihood

• The assumptions of parsimony are the propositions that must be true if parsimony correctly judges support.

• To find out what parsimony does not assume, use the test described [M ordinal equivalence A].

• To find out what parsimony does assume, look for examples in which parsimony and likelihood disagree, not for models that ensure that they agree.

• Maybe parsimony’s assumptions vary from problem to problem.

Page 56: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

broader conclusions

• underdetermination: O’s razor often comes up when the data don’t settle truth/falsehood or acceptance/rejection.

56

Page 57: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

broader conclusions

• underdetermination: O’s razor often comes up when the data don’t settle truth/falsehood or acceptance/rejection.

• reductionism: when O’s razor has authority, it does so

because it reflects some other, more fundamental, desideratum.

57

Page 58: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

broader conclusions

• underdetermination: O’s razor often comes up when the data don’t settle truth/falsehood or acceptance/rejection.

• reductionism: when O’s razor has authority, it does so

because it reflects some other, more fundamental, desideratum. [But there isn’t a single global justification.]

58

Page 59: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

broader conclusions

• underdetermination: O’s razor often comes up when the data don’t settle truth and falsehood.

• reductionism: when O’s razor has authority, it does so

because it reflects some other, more fundamental, desideratum.

• two questions: When parsimony has a precise meaning, we can investigate: What are its presuppositions? What suffices to justify it?

59

Page 60: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

A curiosity: in the R argument, to get a difference in likelihood, the hypotheses should not specify the

states of the causes.

E1 E2 E1 E2

p1 p2 p1 p2

C C1 C2

[Common Cause] [Separate Causes]

60

Page 61: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

61

Example #0

Task: Infer the character state of the MRCA of species that

all exhibit the same state of a dichotomous character.

1 1 … 1 1

A=?

The MP inference is that A=1. When is A=1 the ML inference?

Page 62: Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

62

Example #0

Task: Infer the character state of the MRCA of species that

all exhibit the same state of a dichotomous character.

1 1 … 1 1

A=?

The MP inference is that A=1. When is A=1 the ML inference?

Answer: when lineages have finite duration and the process is Markovian. It doesn’t matter whether selection or drift is the process at work.