Top Banner
Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University
81

Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Dec 28, 2015

Download

Documents

Martina Rodgers
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Causal Inference and Graphical Models

Peter SpirtesCarnegie Mellon University

Page 2: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Overview

Manipulations Assuming no Hidden Common Causes

From DAGs to Effects of Manipulation From Data to Sets of DAGs From Sets of Dags to Effects of Manipulation

May be Hidden Common Causes From Data to Sets of DAGs From Sets of DAGs to Effects of Manipulations

Page 3: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

If I were to force a group of people to smoke one pack a day, what what percentage would develop lung cancer?

The Evidence

Page 4: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

P(Lung cancer = yes) = 1/2

Page 5: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

P(Lung Cancer = yes|Teeth white = yes) = 1/4

Conditioning on Teeth white = yes

Page 6: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Manipulating Teeth white = yes

Page 7: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

P(Lung Cancer = yes ||White teeth = yes) = 1/2

Manipulating Teeth white = yes - After Waiting

P(Lung Cancer = yes|White teeth = yes) = 1/4

Page 8: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Smoking Decision

Setting insurance rates for smokers - conditioning Suppose the Surgeon General is considering banning

smoking? Will this decrease smoking? Will decreasing smoking decrease cancer? Will it have negative side-effects – e.g. more obesity? How is greater life expectancy valued against decrease in

pleasure from smoking?

Page 9: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Manipulations and Distributions

Since Smoking determines Teeth white, P(T,L,R,W) = P(S,L,R,W)

But the manipulation of Teeth white leads to different results than the manipulation of Smoking

Hence the distribution does not always uniquely determine the results of a manipulation

Page 10: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Causation

We will infer average causal effects. We will not consider quantities such as probability

of necessity, probability of sufficiency, or the counterfactual probability that I would get a headache conditional on taking an aspirin, given that I did not take an aspirin

The causal relations are between properties of a unit at a time, not between events.

Each unit is assumed to be causally isolated.The causal relations may be genuinely

indeterministic, or only apparently indeterministic.

Page 11: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Causal DAGs

Probabilistic Interpretation of DAGs A DAG represents a distribution P when

each variable is independent of its non-descendants conditional on its parents in the DAG

Causal Interpretation of DAGs There is a directed edge from A to B

(relative to V) when A is a direct cause of B.

An acyclic graph is not a representation of reversible or feedback processes

Page 12: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Conditioning

Conditioning maps a probability distribution and an event into a new probability distribution:

f(P(V),e) P’(V), where P’(V=v) = P(V=v)/P(e)

Page 13: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Manipulating

A manipulation maps a population joint probability distribution, a causal DAG, and a set of new probability distributions for a set of variables, into a new joint distribution

Manipulating: for {X1,…,Xn} V f: P(V), population distribution G, causal DAG {P’(X1|Non-Descendants(G,X1)),…, manipulated variables P’(Xn|Non-Descendants(G,Xn))} P’(V) manipulated distribution

(assumption that manipulations are independent)

P '(X) = P'i∏ (Xi |Non-Descendants(G,Xi ))

Page 14: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Manipulation Notation - Adapting LauritzenThe distribution of Lung Cancer given the

manipulated distribution of Smoking P(Lung Cancer||P’(Smoking))

The distribution of Lung Cancer conditional on Radon given the manipulated distribution of Smoking P(Lung Cancer|Radon||P’(Smoking)) = P(Lung Cancer,Radon||P’(Smoking))/ P(Radon||

P’(Smoking)) First manipulate, then condition

Page 15: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Ideal Manipulations

No fat hand Effectiveness Whether or not any actual action is an ideal manipulation of a

variable Z is not part of the theory - it is input to the theory. With respect to a system of variables containing murder rates,

outlawing cocaine is not an ideal manipulation of cocaine usage It is not entirely effective - people still use cocaine It affects murder rates directly, not via its effect on cocaine usage,

because of increased gang warfare

Page 16: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

3 Representations of Manipulations

Structural EquationPolicy VariablePotential Outcomes

Page 17: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

College Plans

Sewell and Shah (1968) studied five variables from a sample of 10,318 Wisconsin high school seniors.

SEXSEX [male = 0, female = 1][male = 0, female = 1] IQIQ = Intelligence Quotient, = Intelligence Quotient, [lowest = 0, highest = 3] [lowest = 0, highest = 3] CPCP = college plans = college plans [yes = 0, no = 1] [yes = 0, no = 1] PEPE = parental encouragement [low = 0, high = 1] = parental encouragement [low = 0, high = 1] SESSES = socioeconomic status = socioeconomic status [lowest = 0, highest = 3][lowest = 0, highest = 3]

Page 18: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

College Plans - A Hypothesis

SESSES

SEX PE CPSEX PE CP

IQIQ

Page 19: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Equational Representation

xi = f(pai(G), i) If the i are causes of two or more variables,

they must be included in the analysisThere is a distribution over the i

The equations and the distribution over the i

determine a distribution over the xi

When manipulating variable to a value, replace with xi = c

Page 20: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Policy Variable Representation P(PE,SES,SEX,IQ,CP) Suppose P’(PE=1)=1 P(SES,SEX,IQ,CP,PE=1||P’(PE)) P(CP|PE||P’(PE))

P(PE,SES,SEX,IQ,CP|policy = off) P(PE=1|policy = on) = 1 P(SES,SEX,IQ,CP,PE=1|policy=on) P(CP|PE|policy = on)

PE CPSEX

SES

IQ

SES

Pre-manipulation

PE CPSEX

SES

IQ

Post-manipulation

Page 21: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

From DAG to Effects of Manipulation

Effect of Manipulation

Causal DAGs Background Knowledge Causal Axioms, PriorPopulation Distribution Sampling and

DistributionalSample Assumptions, Prior

Page 22: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Causal Sufficiency

A set of variables is causally sufficient if every cause of two variables in the set is also in the set.

{PE,CP,SES} is causally sufficient {IQ,CP,SES} is not causally sufficient.

PE CPSEX

SES

IQ

SES

Page 23: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Causal Markov Assumption

For a causally sufficient set of variables, the joint distribution is the product of each variable conditional on its parents in the causal DAG.

P(SES,SEX,PE,CP,IQ) = P(SES)P(SEX)P(IQ|SES)P(PE|SES,SEX,IQ)P(CP|PE)

PE CPSEX

SES

IQ

SES

Page 24: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Equivalent Forms of Causal Markov Assumption• In the population distribution, each variable is independent of its

non-descendants in the causal DAG (non-effects) conditional on its parents (immediate causes).

• If X is d-separated from Y conditional on Z (written as <X,Y|Z>) in the causal graph, then X is independent of Y conditional on Z in the population distribution) denoted I(X,Y|Z)).

PE CPSEX

SES

IQ

SES

Page 25: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Causal Markov Assumption

Causal Markov implies that if X is d-separated from Y conditional on Z in the causal DAG, then X is independent of Y conditional on Z.

Causal Markov is equivalent to assuming that the causal DAG represents the population distribution.

What would a failure of Causal Markov look like? If X and Y are dependent, but X does not cause Y, Y does not cause X, and no variable Z causes both X and Y.

Page 26: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Causal Markov Assumption

Assumes that no unit in the population affects other units in the population If the “natural” units do affect each other, the units

should be re-defined to be aggregations of units that don’t affect each other

For example, individual people might be aggregated into families

Assumes variables are not logically related, e.g. x and x2

Assumes no feedback

Page 27: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Manipulation Theorem - No Hidden Variables P(PE,SES,SEX,CP,IQ||P’(PE)) = P(PE)P(SEX)P(CP|PE,SES,IQ)P(IQ|SES)P(PE|

policy=on) = P(PE)P(SEX)P(CP|PE,SES,IQ)P(IQ|SES)P’(PE)

PE CPSEX

SES

IQ

SES Policy

Page 28: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Invariance Note that P(CP|PE,SES,IQ,policy = on) = P(CP|

PE,SES,IQ,policy = off) because the policy variable is d-separated from CP conditional on PE,SES,IQ

We say that P(CP|PE,SES,IQ) is invariant An invariant quantity can be estimated from the pre-

manipulation distribution This is equivalent to one of the rules of the Do

Calculus and can also be applied to latent variable models

IQ

PE CPSEX

SESSES Policy

Page 29: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Calculating Effects

P(cp || P '(PE)) =

P(cp|pe||P'(pe))P(pe||P'(PE)) =PE∑ (chain rule)

P(cp|pe||P'(pe))P'(pe) =PE∑ (definition of P'(PE))

P(cp|pe,ses, iq||P'(PE))×P(iq|pe,ses||P'(PE))×P(ses|pe||P'(PE))IQ,SES∑

⎝⎜⎞

⎠⎟PE∑ P'(pe) =

(chain rule)

P(cp|pe,ses, iq||P'(PE))×P(iq|ses||P'(PE))×P(ses||P'(PE))IQ,SES∑

⎝⎜⎞

⎠⎟PE∑ P'(pe)

(d-separation in manipulated DAG)

P(cp|pe,ses, iq)×P(iq|ses)×P(ses)IQ,SES∑

⎝⎜⎞

⎠⎟PE∑ P'(pe) (invariance)

IQ

PE CPSEX

SESSES Policy

Page 30: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

From Sample to Sets of DAGs

Effect of Manipulation

Causal DAGs Background Knowledge Causal Axioms, PriorPopulation Distribution Sampling and

DistributionalSample Assumptions, Prior

Page 31: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

From Sample to Population to DAGsConstraint - Based

Uses tests of conditional independence

Goal: Find set of DAGs whose d-separation relations match most closely the results of conditional independenc tests

Score - Based Uses scores such as

Bayesian Information Criterion or Bayesian posterior

Goal: Maximize score

Page 32: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Two Kinds Of Search

Constraint Score

Use non conditional independence information

No Yes

Quantitative comparison of models

No Yes

Single test result leads astray

Yes No

Easy to apply to latent Yes No

Page 33: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Bayesian Information Criterion

D is the sample data G is a DAG is the vector of maximum likelihood estimates of

the parameters for DAG G N is the sample size d is the dimensionality of the model, which in DAGs

without latent variables is simply the number of free parameters in the model

log P(D | ˆ θ G,G) − (d / 2) log N

ˆ θ G

Page 34: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

3 Kinds of Alternative Causal Models

PE CPSEX

SES

IQ

SES

PE CPSEX

SES

IQ

SES

PE CPSEX

SES

IQ

SES

PE CPSEX

SES

IQ

SESTrue Model Alternative 1

Alternative 3 Alternative 2

Page 35: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Alternative Causal Models

PE CPSEX

SES

IQ

SES

PE CPSEX

SES

IQ

SES

True Model Alternative 1 Constraint - Based: Alternative 1 violates Causal Markov

Assumption by entailing that SES and IQ are independent Score - Based: Use a score that prefers a model that contains

the true distribution over one that does not.

Page 36: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Alternative Causal Models

PE CPSEX

SES

IQ

SES

True Model Alternative 2 Constraint - Based: Assume that if Sex and CP are independent (conditional on some

subset of variables such as PE, SES, and IQ) then Sex and CP are adjacent - Causal Adjacency Faithfulness Assumption.

Score - Based: Use a score such that if two models contain the true distribution, choose the one with fewer parameters. The True Model has fewer parameters.

PE CPSEX

SES

IQ

SES

Page 37: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Both Assumptions Can Be False

Alternative 2 True Model

Independence holds for all values of parameters

Independence holds only for parameters on lower dimensional surface - Lebesgue measure 0

Page 38: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

When Not to Assume Faithfulness Deterministic relationships between variables entail

“extra” conditional independence relations, in addition to those entailed by the global directed Markov condition.

If A B C, and B = A, and C = B, then not only I(A,C|B), which is entailed by the global directed Markov condition, but also I(B,C|A), which is not.

The deterministic relations are theoretically detectible, and when present, faithfulness should not be assumed.

Do not assume in feedback systems in equilibrium.

Page 39: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Alternative Causal Models

PE CPSEX

SES

IQ

SES

True Model Alternative 3

PE CPSEX

SES

IQ

SES

Constraint - Based: Alternative 2 entails the same set of conditional independence relations - there is no principled way to choose.

Page 40: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Alternative Causal Models

PE CPSEX

SES

IQ

SES

True Model Alternative 2

PE CPSEX

SES

IQ

SES

Score - Based: Whether or not one can choose depends upon the parametric family. For unrestricted discrete, or linear Gaussian, there is no way to choose - the BIC scores will

be the same. For linear non-Gaussian, the True Model will be preferred (because while the two models

entail the same second order moments, they entail different fourth order moments.)

Page 41: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Patterns

A pattern (or p-dag) represents a set of DAGs that all have the same d-separation relations, i.e. a d-separation equivalence class of DAGs.

The adjacencies in a pattern are the same as the adjacencies in each DAG in the d-separation equivalence class.

An edge is oriented as A B in the pattern if it is oriented as A B in every DAG in the equivalence class.

An edge is oriented as A B in the pattern if the edge is oriented as A B in some DAGs in the equivalence class, and as A B in other DAGs in the equivalence class.

Page 42: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Patterns to Graphs

All of the DAGs in a d-separation equivalence class can be derived from the pattern that represents the d-separation equivalence class by orienting the unoriented edges in the pattern.

Every orientation of the unoriented edges is acceptable as long as it creates no new unshielded colliders.

That is A B C can be oriented as A B C, A B C, or A B C, but not as A B C.

Page 43: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Patterns

PE CPSEX

SES

IQ

SES

PE CPSEX

SES

IQ

SES

PE CPSEX

SES

IQ

SESD-separation Equivalence Class

Pattern

Page 44: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Search Methods

Constraint Based: PC (correct in limit) Variants of PC (correct in limit, better on

small sample sizes)Score - Based:

Greedy hill climbing Simulated annealing Genetic algorithms Greedy Equivalence Search (correct in

limit)

Page 45: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

From Sets of DAGs to Effects of Manipulation

Effect of Manipulation

Causal DAGs Background Knowledge Causal Axioms, PriorPopulation Distribution Sampling and

DistributionalSample Assumptions, Prior

Page 46: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Causal Inference in Patterns Is P(IQ) invariant when SES is manipulated to

a constant? Can’t tell. If SES IQ, then policy is d-connected to IQ

given empty set - no invariance. If SES IQ, then policy is not d-connected to IQ

given empty set - invariance.

PE CPSEX

SES

IQ

SES

policy?

Page 47: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Causal Inference in PatternsDifferent DAGs represented by pattern give

different answers as to the effect of manipulating SES on IQ - not identifiable.

In these cases, should ouput “can’t tell”. Note the difference from using Bayesian networks

for classification - we can use either DAG equally well for correct classification, but we have to know which one is true for correct inference about the effect of a manipulation.

PE CPSEX

SES

IQ

SES

policy?

Page 48: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Causal Inference in Patterns

Is P(CP|PE,SES,IQ) invariant when PE is manipulated to a constant? Can tell. policy variable is d-separated from CP given PE,

SES, IQ regardless of which way the edge points - invariance in every DAG represented by the pattern.

PE CPSEX

SES

IQ

SES

policy

?

Page 49: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

College Plans SESSES

SEX PE CPSEX PE CP

IQIQ

P(cp | pe || P '(PE)) =

P(cp|pe,ses, iq||P'(PE))×P(iq|ses, pe||P'(PE))×P(ses|pe||P'(PE))IQ,SES∑ =

P(cp|pe,ses, iq||P'(PE))×P(iq|ses||P'(PE))×P(ses||P'(PE))IQ,SES∑ =

P(cp|pe,ses, iq)×P(iq|ses)×P(ses)IQ,SES∑

not invariant, but is identifiable

invariant

Page 50: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Good News

Effect of Manipulation

Causal DAGs Background Knowledge Causal Axioms, PriorPopulation Distribution Sampling and

DistributionalSample Assumptions, Prior

In the large sample limit, there are algorithms (PC, Greedy Equivalence Search) that are arbitrarily close to correct (or output “can’t tell”) with probability 1 (pointwise consistency).

Page 51: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Bad News

Effect of Manipulation

Causal DAGs Background Knowledge Causal Axioms, PriorPopulation Distribution Sampling and

DistributionalSample Assumptions, Prior

At every finite sample size, every method will be far from truth with high probability for some values of the truth (no uniform consistency.) (Typically not true of classification problems.)

Page 52: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Why Bad News?

Effect of Manipulation

Causal DAGs Background Knowledge Causal Axioms, PriorPopulation Distribution Sampling and

DistributionalSample Assumptions, Prior

The problem - small differences in population distribution can lead to big changes in inference to causal DAGs.

Page 53: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Strengthening Faithfulness AssumptionStrong versus weak

Weak adjacency faithfulness assumes a zero conditional dependence between X and Y entails a zero-strength edge between X and Y

Strong adjacency faithfulness assumes in addition that a weak conditional dependence between X and Y entails a weak-strength edge between X and Y

Under this assumption, there are uniform consistent estimators of the effects of manipulations.

Page 54: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Obstacles to Causal Inference from Non-experimental Data unmeasured confounders measurement error, or

discretization of data mixtures of different causal

structures in the sample feedback reversibility the existence of a number of

models that fit the data equally well

an enormous search space

low power of tests of independence conditional on large sets of variables

selection bias missing values sampling error complicated and dense causal

relations among sets of variables,

complcated probability distributions

Page 55: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

From Data to Sets of DAGs - Possible Hidden Variables

Effect of Manipulation

Causal DAGs Background Knowledge Causal Axioms, PriorPopulation Distribution Sampling and

DistributionalSample Assumptions, Prior

Page 56: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Why Latent Variable Models?

For classification problems, introducing latent variables can help get closer to the right answer at smaller sample sizes - but they are needed to get the right answer in the limit.

For causal inference problems, introducing latent variables are needed to get the right answer in the limit.

Page 57: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Score-Based Search Over Latent ModelsStructural EM interleaves estimation of

parameters with structural searchCan also search over latent variable

models by calculating posteriorsBut there are substantial computational

and statistical problems with latent variable models

Page 58: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

DAG Models with Latent VariablesDAG Models with Latent Variables

Facilitates construction of causal models

Provides a finite search space‘Nice’ statistical properties:

Always identified Correspond to a set of distributions

characterized by independence relations Have a well-defined dimension Asymptotic existence of ML estimates

Page 59: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

SolutionSolution Embed each latent variable model in a ‘larger’ model

without latent variables that is easier to characterize. Disadvantage - uses only conditional independence

information in the distribution.

Latent variablemodel

Model imposingonly independenceconstraintson observed variables

Sets of distributions

Page 60: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Alternative Hypothesis and Some D-separations

<CP,{IQ,L1,SEX}|{L2,PE,SES}>

<PE,{IQ,L2}|{L1,SEX,SES}>

<IQ,{SEX,PE,CP}|{L1,L2,SES}>

<SES,{SEX,IQ,L1,L2}|>

SESSES

SEX PE CPSEX PE CP

LL11 L L22

IQIQ

<L2,{SES,L1,SEX, PE}|>

<SEX,{L1,SES,L2,IQ}|>

<L1,{SES,L2,SEX}|>

<SEX,CP|{PE,SES})

These entail conditional independence relations in population.

Page 61: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

D-separations Among Observed SESSES

SEX PE CPSEX PE CP

LL11 L L22

IQIQ

<CP,{IQ,L1,SEX}|{L2,PE,SES}>

<PE,{IQ,L2}|{L1,SEX,SES}>

<IQ,{SEX,PE,CP}|{L1,L2,SES}>

<SES,{SEX,IQ,L1,L2}|>

<L2,{SES,L1,SEX, PE}|>

<SEX,{L1,SES,L2,IQ}|>

<L1,{SES,L2,SEX}|>

<SEX,CP|{PE,SES})

Page 62: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

D-separations Among Observed

It can be shown that no DAG with just the measured variables has exactly the set of d-separation relationsamong the observed variables. In this sense, DAGs are not closed under marginalization.

SESSES

SEX PE CPSEX PE CP

LL11 L L22

IQIQ

Page 63: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Mixed Ancestral Graphs

Under a natural extension of the concept of d-separation to graphs with , MAG(G) is a graphical object that contains only the observed variables, and has exactly the d-separations among the observed variables.

SESSES

SEX PE CPSEX PE CP

IQIQ

SESSES

SEX PE CPSEX PE CP

IQIQ

LL11 LL22

Latent Variable DAG Corresponding MAG

Page 64: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

There is an edge between A and B if and only if for every <{A},{B}|C>, there is a latent variable in C.

If A and B are adjacent, then A B if and only if A is an ancestor of B.

If A and B are adjacent, then A B if and only if A is not an ancestor of B and B is not an ancestor of A.

Mixed Ancestral Graph ConstructionMixed Ancestral Graph Construction

Page 65: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Suppose SES Unmeasured

SEX PE CPSEX PE CP

IQIQ

SESSES

SEX PE CPSEX PE CP

IQIQ

LL11 LL22

SEX PE CPSEX PE CP

IQIQ

LL11 LL22

DAG Corresponding MAG

Another DAG with the same MAG

Page 66: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Mixed Ancestral Models

Can score and evaluate in the usual waysNot every parameter is directly interpreted as

a structural (causal) coefficientNot every part of marginal manipulated model

can be predicted from mixed ancestral graph Because multiple DAGs can have the same MAG,

they might not all agree on the effect of a manipulation.

It is possible to tell from the MAG when all of the DAGs with that MAG all agree on the effect of a manipulation.

Page 67: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Mixed Ancestral Graph

Mixed ancestral models are closed under marginalization.

In the linear normal case, the parameterization of a MAG is just a special case of the parameterization of a linear structural equation model.

There is a maximum liklihood estimator of the parameters (Drton).

The BIC score is easy to calculate. In the discrete case, it is not known how to

parameterize a MAG - some progress has been made.

Page 68: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Some Markov Equivalent Mixed Ancestral Graphs

SEX PE CPSEX PE CP

IQIQ

These different MAGs all have the same d-separation relations.

SEX PE CPSEX PE CP

IQIQ

SEX PE CPSEX PE CP

IQIQ

SEX PE CPSEX PE CP

IQIQ

Page 69: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Partial Ancestral Graphs

SEX PE CPSEX PE CP

IQIQ

Partial Ancestral Graph

SEX PE CPSEX PE CP

IQIQ

SEX PE CPSEX PE CP

IQIQ

SEX PE CPSEX PE CP

IQIQ

SEX PE CPSEX PE CP

IQIQ

oo

oo

o o

Page 70: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Partial Ancestral Graph represents MAG M A is adjacent to B iff A and B are adjacent in M. A B iff A is an ancestor of B in every MAG d-separation

equivalent to M. A B iff A and B are not ancestors of each other in every MAG

d-separation equivalent to M. A o B iff B is not an ancestor of A in every MAG d-separation

equivalent to M, and A is an ancestor of B in some MAGs d-separation equivalent to M, but not in others.

A oo B iff A is an ancestor of B in some MAGs d-separation equivalent to M, but not in others, and B is an ancestor of A in some MAGs d-separation equivalent to M, but not in others.

Page 71: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Partial Ancestral Graph

Partial Ancestral Graph represents ancestor features common to

MAGs that are d-separation equivalent d-separation relations in the d-separation

equivalence class of MAGs. Can be parameterized by turning it into a

mixed ancestral graph Can be scored and evaluated like MAG

Page 72: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

FCI Algorithm In the large sample limit, with probability 1, the output is a PAG that

represents the true graph over O If the algorithm needs to test high order conditional independence

relations then Time consuming - worst case number of conditional independence tests

(complete PAG)

Unreliable (low power of tests) Modified versions can halt at any given order of conditional independence

test, at the cost of more “Can’t tell” answers. Not useful information when each pair of variables have common

hidden cause. There is a provably correct score-based search, but it outputs “can’t

tell” in most cases

On

2

⎝⎜⎞

⎠⎟2n−2

⎝⎜

⎠⎟

Page 73: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Output for College Plans SESSES

SEX PE CPSEX PE CP

IQIQ

o

o

SESSES

SEX PE CPSEX PE CP

IQIQ

o

o

o

o

Output of FCI Algorithm PAG Corresponding to Output of PC Algorithm

These are different because no DAG can represent the d-separations in the output of the FCI algorithm.

Page 74: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

From Sets of DAGs to Effects of Manipultions - May Be Hidden Common Causes

Effect of Manipulation

Causal DAGs Background Knowledge Causal Axioms, PriorPopulation Distribution Sampling and

DistributionalSample Assumptions, Prior

Page 75: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Manipulation Model for PAGs A PAG can be used to calculate the results of

manipulations for which every DAG represented by the PAG gives the same answer. It is possible to tell from the PAG that the policy variable for

PE is d-separated from CP given PE. Hence P(CP|PE) is invariant.

SESSES

SEX PE CPSEX PE CP

IQIQ

o

o

Page 76: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Comparison with non-latent case FCI

P(cp|pe||P’(PE)) = P(cp|pe). P(CP=0|PE=0||P’(PE)) = .063 P(CP=1|PE=0||P’(PE)) = .937 P(CP=0|PE=1||P’(PE)) = .572 P(CP=1PE=1||P’(PE)) = .428

PC

P(CP=0|PE=0||P’(PE)) = .095 P(CP=1|PE=0||P’(PE)) = .905 P(CP=0|PE=1||P’(PE)) = .484 P(CP=1PE=1||P’(PE)) = .516

P(cp | pe || P '(PE)) = P(cp | pe,ses,iq)× P(iq | ses)× P(ses)IQ, SES

Page 77: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Good News

Effect of Manipulation

Causal DAGs Background Knowledge Causal Axioms, PriorPopulation Distribution Sampling and

DistributionalSample Assumptions, Prior

In the large sample limit, there is an algorithm (FCI) whose output is arbitrarily close to correct (or output “can’t tell”) with probability 1 (pointwise consistency).

Page 78: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Bad News

Effect of Manipulation

Causal DAGs Background Knowledge Causal Axioms, PriorPopulation Distribution Sampling and

DistributionalSample Assumptions, Prior

At every finite sample size, every method will be arbitrarily far from truth with high probability for some values of the truth (no uniform consistency.)

Page 79: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Other Constraints

The disadvantage of using MAGs or FCI is they only use conditional independence information

In the case of latent variable models, there are constraints implied on the observed margin that are not conditional independence relations, regardless of the family of distributions These can be used to choose between two

different latent variable models that have the same d-separation relations over the observed variables

In addition, there are constraints implied on the observed margin that are particular to a family of distributions

Page 80: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Examples of Open Questions

Complete non-parametric manipulation calculations for partially known DAGs with latent variables

Define strong faithfulness for the latent case. Calculating constraints (non-parametric or

parametric) from latent variable DAGs Using constraints (non-parametric or parametric) to

guide search for latent variable DAGs Latent variable score-based search over PAGs Parameterizations of MAGs for other families of

distsributions Completeness of do-calculus for PAGs Time series inference

Page 81: Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Introductory Books on Graphical Causal InferenceCausation, Prediction, and Search, by

P. Spirtes, C. Glymour, R. Scheines, MIT Press, 2000.

Causality: Models, Reasoning, and Inference by J. Pearl, Cambridge University Press, 2000.

Computation, Causation, and Discovery (Paperback) , ed. by C. Glymour and G. Cooper, AAAI Press, 1999.