Top Banner
Causal Clustering of Variables with Multiple Latent Causes (More Theory than Applied) Peter Spirtes, Erich Kummerfeld, Richard Scheines, Joe Ramsey 1
54

An example

Feb 23, 2016

Download

Documents

mirra

Causal Clustering of Variables with Multiple Latent Causes (More Theory than Applied) Peter Spirtes, Erich Kummerfeld , Richard Scheines, Joe Ramsey. An example. Person 1 Stress Depression 3. Religious Coping. Data from Bongjae Lee, described in Silva et al. 2006. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: An example

1

Causal Clustering of Variables with Multiple Latent Causes(More Theory than Applied)

Peter Spirtes, Erich Kummerfeld, Richard Scheines, Joe Ramsey

Page 2: An example

2

An example

Person 1

1. Stress2. Depression3. Religious Coping

Task: learn causal model

Data from Bongjae Lee, described in Silva et al. 2006

Page 3: An example

3

These variables cannot be measured directlyThey are estimated by asking people to answer

questions, and constructing a model that relates the measured answers to the unobserved variables

Problems:What is the relationship between the measured

variables and the latent variables to be estimated?Some questions

Might be caused by multiple latent variablesMight be caused by answers to previous questionsMight be caused by latent variables that are not being

estimated

Example

Page 4: An example

4

L1 L3 L5

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14

L2 L4 L6

Example

This edge is not identifiable (unlike single factor case where all of the latent connections are identifiable if the measurement model is simple).

Page 5: An example

5

A set of variables V is causally sufficient iff each cause that is a direct cause relative to V of any pair of variables in V, is also in V. It is minimal if the set formed by removing any latent variables is not causally sufficient.

Causal Sufficiency

Page 6: An example

6

L1 L3 L5

L2 L4 L6

Structural Graph

The stuctural graph has all and only the latent variables, and the edges between the latent variables.

Page 7: An example

7

L1 L3 L5

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14

L2 L4 L6

Measurement Graph

The measurement graph has a minimal causally sufficient set of variables, and all of the edges except the latent-latent edges.

Page 8: An example

8

A pure n-factor measurement model for an observed set of variables O is such that:Each observed variable has exactly n latent

parents.No observed variable is an ancestor of other

observed variable or any latent variable. A set of observed variables O in a pure n-

factor measurement model is a pure cluster if each member of the cluster has the same set of n parents.

Pure Measurement Models

Page 9: An example

9

L1 L3 L5

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14

L2 L4 L6

Impure Measurement Model

Strategy: (1) find a subset of variables for which (i) the measurement model is simple, and (ii) it is possible to determine that it is simple, without knowing the true structural model; (2) then find structural model.

Page 10: An example

10

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11

L2 L4

Pure Measurement SubModel

Page 11: An example

Use of Pure Measurement Submodel L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11

L2 L4

Actual Impure Measurement Model

Page 12: An example

Use of Pure Measurement Submodel L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11

L2 L4If treat measurement model as pure, no structural model

will fit the data well.But adding an L1 -> L3 edge may improve the fit because

it allows for correlations between X1 – X6 and X7 – X11.

Assumed Pure Measurement Model

Page 13: An example

13

Causally unconnected variables are independent.No observed variable is a cause of a latent variable.No correlations are close to 0 or to 1 (pre-process)All of the sub covariance matrices are invertibleNo feedback(In practice) There is a one-factor pure

measurement submodelEach variable is a linear function of its parents

in the graph + a noise term that is uncorrelated with any of the other noise terms – linear structural equation model.

Silva 06 (and others) Assumptions

Page 14: An example

14

Let be the submatrix with rows from A and columns from B

For each quartet of variables there are 3 different tetrad constraints: <1,2;3,4 > <1,3;2,4> <1,4;2,3>

Only two of the constraints are independent: any two entail the third.

Vanishing Tetrad Constraints

Page 15: An example

15

For each sextuple of variables there are 10 different sextad constraints: <1,2,3;4,5,6> <1,2,4;3,5,6> <1,2,5;3,4,6> <1,2,6;3,4,5> <1,3,4;2,5,6> <1,3,5;2,4,6> <1,3,6;2,4,5> <1,4,5;2,3,6> <1,4,6;2,3,5> <1,5,6;2,3,4>

Vanishing sextad constraints

Page 16: An example

16

An algebraic constraint is linearly entailed by a DAG if it is true of the implied covariance for every value of the free parameters (the linear coefficients and the variances of the noise terms)

Entailed Algebraic Constraints

Page 17: An example

17

L1 L3 L5

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14

L2 L4 L6A trek in G from i to j is an ordered pair of

directed paths (P1; P2) where P1 has sink i, P2 has sink j, and both P1 and P2 have the same source k.

(L5,X13;L5,X14); (L6,X13;L6,X14); (X13;X13,X14)

Simple Treks

Page 18: An example

18

L1 L3 L5

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14

L2 L4 L6The two paths of a simple trek intersect only at

the source. (L5,X13;L5,X14); (L6,X13;L6,X14); (X13;X13,X14) X13 side; X14 side

Simple Treks

Page 19: An example

19

Two-Factor Model

A = {1,2,3} B = {4,5,6} CA = {L1} CB = {L2}A is t-separated from B by <CA,CB> ->

Page 20: An example

20

L1 L3 L5

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14

L2 L4 L6Let A, B, CA, and CB be four subsets of V (G) whichneed not be disjoint. The pair (CA;CB) trek separates (or t-separates) A from B if for every trek (P1; P2) from a vertex in A to a vertex in B, either P1 contains a vertex in CA or P2 contains a vertex in CB.

T-separation

Page 21: An example

21

The submatrix ΣA,B has rank less than or equal to r for all covariance matrices consistent with the graph G if and only if there exist subsets (CA,CB) included in V(G) with #CA + #CB ≤ r such that (CA,CB) t-separates A from B. Consequently, rk(ΣA,B) ≤ min{#CA + #CB : (CA,CB) t-separates A from B};

and equality holds for covariance matrices consistent with G (Lebesgue measure 1 over parameters).

If rank of submatrix is n, then the determinant of every n+1 x n+1 determinant is zero

Choke Set Theorem

Page 22: An example

22

Algebraic Constraint Faithfulness Assumption: If an algebraic constraint holds in the population distribution, then it is linearly entailed to hold by the causal DAG.Partial CorrelationsTetradsSextads

Strong Faithfulness Assumption (for finite sample sizes) A causal DAG does not have parameters such that non-entailed vanishing sextad constraints are very close to zero.

Algebraic Constraint Faithfulness Assumption

Page 23: An example

23

Violations of Algebraic Faithfulness Assumption are Lebesgue measure 0.There is a lower dimensional surface in the

space of parameters on which faithfulness is violated.

Violations of Strong Algebraic Faithfulness Assumption are not Lebesgue measure 0.The surface of parameters on which almost

faithfulness is violated is not lower dimensional than the space of parameters

As the number of variables grows, the probability of some violation of faithfulness becomes large.

Algebraic Constraint Faithfulness Assumption

Page 24: An example

24

AdvantagesNo need for estimation of model.

No iterative algorithmNo local maxima.No problems with identifiability.Fast to compute.

DisadvantagesDoes not contain information about

inequalities.Power and accuracy of tests?Difficulty in determining implications among

constraints

Advantages and Disadvantages of Algebraic Constraints

Page 25: An example

25

Input – Data from observed variable in linear model Output – Set of variables that appear in (almost)

pure measurement model, clustered into (almost) pure subsets

We haven’t defined almost pure (not Silva 06 sense) – there is a list of impurities that can’t be detected by constaint search, but we don’t know whether it is complete.

The basic idea with trivial modifications (in theory) can be applied to arbitrary numbers of latent parents, using different constraints.

FindTwoFactorClusters: Algorithm Sketch (from Kummerfeld)

Page 26: An example

26

L1 L3 L5

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14

L2 L4 L6

Complete Sextet – All 10 sextads hold

<1,2,3;4,5,6> <1,2,4;3,5,6> <1,2,5;3,4,6> <1,2,6;3,4,5> <1,3,4;2,5,6> <1,3,5;2,4,6> <1,3,6;2,4,5> <1,4,5;2,3,6> <1,4,6;2,3,5> <1,5,6;2,3,4>

Page 27: An example

27

L1 L3 L5

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14

L2 L4 L6

Complete Sextet – All 10 sextads hold

<1,2,3;4,5,8> <1,2,4;3,5,8> <1,2,5;3,4,8> <1,2,8;3,4,5> <1,3,4;2,5,8> <1,3,5;2,4,8> <1,3,8;2,4,5> <1,4,5;2,3,8> <1,4,8;2,3,5> <1,5,8;2,3,4>

Page 28: An example

28

L1 L3 L5

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14

L2 L4 L6<X13,X14> not appear in any entailed sextad.

Remove one of the variables. Heuristic – remove the variable which appears in

the fewest sextads that hold.

1. Remove one of pair of variables that appear in no sextads that hold

Page 29: An example

29

L1 L3 L5

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4 L6<X13,X14> not appear in any entailed sextad.

Remove one of the variables. Heuristic – remove the variable which appears in

the fewest sextads that hold.

1. Remove one of pair of variables that appear in no sextads that hold

Page 30: An example

30

A subset of 5 variables is a good pentuple iff when add any sixth variable to the pentuple, the resulting sextuple is complete

Good Pentuple

Page 31: An example

31

L1 L3 L5

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4 L6

2. Find all good pentuples

<1,2,3,4,5,6> <1,2,3,4,5,7 > <1,2,3,4,5,8 > <1,2,3,4,5,9 > <1,2,3,4,5,10 > <1,2,3,4,5,11 > <1,2,3,4,5,12 > <1,2,3,4,5,13>

Any subset of X1-X6 with 5 variables is a good pentuple

Page 32: An example

32

L1 L3 L5

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4 L6

<1,2,3,4,7> is not a good pentuple

<1,2,3,4,7,6> <1,2,3,4,7,5 > <1,2,3,4,7,8 > <1,2,3,4,7,9 > <1,2,3,4,7,10 > <1,2,3,4,7,11 > <1,2,3,4,7,12 > <1,2,3,4,7,13>

Page 33: An example

33

<7,8,9,10,12,1> <7,8,9,10,12,2> <7,8,9,10,12,3> <7,8,9,10,12,4> <7,8,9,12,11,5> <7,8,9,12,11,6> <7,8,9,10,12,11> <7,8,9,10,12,13>

L1 L3 L5

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4 L6

<7,8,9,10,12> is not a good pentuple

Page 34: An example

34

For a given set of variables, if all subsets of 5 are good pentuples, merge them.

All subsets of size 5 of X1-X6 are good pentuples, so merge.

L1 L3 L5

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4 L6

3. Merge Good Pentuples

Page 35: An example

35

L1 L3 L5

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

L2 L4 L6

<7,8,9,10,11> is a good pentuple

<7,8,9,10,11,1> <7,8,9,10,11,2> <7,8,9,10,11,3> <7,8,9,10,11,4> <7,8,9,10,11,5> <7,8,9,10,11,6> <7,8,9,10,11,12> <7,8,9,10,11,13>

Page 36: An example

36

X12 and X13 do not appear in any good pentuples. If X13 is removed, all subsets of size 5 of X7-X12 become good pentuples, so they are merged. (Similarly for X12.)

L1 L3 L5

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12

L2 L4 L6

4. Check whether leftover variables should be removed, and repeat previous

Page 37: An example

37

We can (conceptually) remove L5 because it is not needed to make a causally sufficient set. However, L6 has to remain, and X7-X12 is not pure by our definition because X12 has 3 latent parents.

L1 L3

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12

L2 L4 L6

4. Check whether leftover variables should be removed, and repeat previous

Page 38: An example

38

Collider Model – Impure Cluster, but Complete Sextet

Choke sets <{L1},{L7}> where L7 on the X6 side

Page 39: An example

39

Spider Model – Impure Cluster, but Complete Sextet

Choke sets <{L1},{L1}>

Page 40: An example

40

However, the spider model and the collider model do not receive the same chi-squared score when estimated, so in principle they can be distinguished from a 2-factor model. ExpensiveRequires multiple restartsNeed to test only pure clustersIf non-Gaussian, may be able to detect

additional impurities.

Checking with Estimated Model

Page 41: An example

41

For sextads, the first step is to check 10 * n choose 6 sextads.

However, a large proportion of social science contexts, there are at most 100 observed variables, and 15 or 16 latents. If based on questionairres, generally can’t get

people to answer more questions than that. Simulation studies by Kummerfeld indicate that

given the vanishing sextads, the rest of the algorithm is subexponential in the number of clusters, but exponential in the size of the clusters.

Complexity

Page 42: An example

42

Problems in Testing ConstraintsTests require (algebraic) independence among

constraints.

Additional complication – when some correlations or partial correlations are non-zero, additional dependencies among constraints arise

Some models entail that neither of a pair of sextad constraints vanish, but that they are equal to each other

Page 43: An example

43

For single factor submodels, the algorithm can be applied to more than a hundred measured variables, with comparable accuracy to Silva 06 algorithm.

Preliminary Results

Page 44: An example

44

3 latents, 6 measures, 1 crossconstruct impurity, 2 direct edge impurities, 20 trials

# 2 cluster – 15/20# 1 cluster – 5/20# 0 clusters – 2/20Average misassigned: 1Average left out if 2 cluster: 1Average impurities left in: .1

Sanity Check Simulation for 2-Factor

Page 45: An example

45

L1 L3 L5

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14

L2 L4 L6

Extension to Non-linearity

Theory: As long as parts (choke sets to observed) of the graph are linear with additive noise, t-separation theorem still holds.

Practice: The algorithm can be applied (with same caveats) even if the structural model is non-linear or has feedback.

Page 46: An example

46

Described algorithm that relies on weakened assumptionsWeakened linearity assumption to linearity below

the latentsWeakened assumption of existence of pure

submodels to existence of n-pure submodelsConjecture correct if add assumptions of no star

or collider models, and faithfulness of constraintsIs there reason to believe in faithfulness of

constraints when non-linear relationships among the latents?

Summary

Page 47: An example

47

Give complete list of assumptions for output of algorithm to be pure.

Speed up the algorithm.Modify algorithm to deal with almost

unfaithful constraints as much as possible.Add structure learning component to output of

algorithm. Silva – Gaussian process model among latents,

linearity below latentsIdentifiability questions for stuctural models

with pure measurement models.

Open Problems

Page 48: An example

48

Silva, R. (2010). Gaussian Process Structure Models with Latent Variables. Proceedings from Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-10).

Silva, R., Scheines, R., Glymour, C., & Spirtes, P. (2006a). Learning the structure of linear latent variable models. J Mach Learn Res, 7, 191-246.

Sullivant, S., Talaska, K., & Draisma, J. (2010). Trek Separation for Gaussian Graphical Models. Ann Stat, 38(3), 1665-1685.

References

Page 49: An example

49

3 latents, 6 measures, 1 crossconstruct impurity, 2 direct edge impurities, 10 trials

Sanity Check Simulation

Cluster 1 Cluster 2 Cluster 3 Impurities5/6 4/6 4/5 23/5 4/6 4/5 13/5 4/6 4/5 25/6 4/6 4/5 26/6 6/6 - 33/6 3/5 - 13/5 3/6 - 2- - - 35/6 - - 33/6 - - 3

Page 50: An example

50

3 latents, 6 measures, 10 trials

Sanity Check Simulation

Clusters + Clusters - Unassigned Misassigned0 0 4 21 1 10 20 0 4 20 0 4 30 1 10 20 0 4 40 0 4 41 0 3 10 0 4 10 0 4 2

Page 51: An example

51

Main Example

Sanity Check Simulation

Clusters +

Clusters - Unassigned

Misassigned

Impure

0 0 1 1 01 1 10 20 0 4 20 0 4 30 1 10 20 0 4 40 0 4 41 0 3 10 0 4 10 0 4 2

Page 52: An example

52

3 latents, 6 measures, 1 crossconstruct impurity, 2 direct edge impurities, 10 trials

Sanity Check Simulation for 2-Factor

Unassigned Misassigned Impurities Missed

6 1 01 0 06 0 01 0 02 1 01 2 010 0 010 0 00 0 07 1 0

Page 53: An example

53

Suppose A = {X2,X3}, B = {X4,X5}, CA = {L1}, CB =

X2 = 3 X1 + f2(e2,X6) X4 = 0.6 L1 + f4(e4)X1 = 2 L1 + f1(e1) X5 = 0.9 L1 + f5(e5)X3 = 0.8 L1 + f3(e3)D(CA,A) = {X1,X2,X3} D(CB,B) =

Illustration of Linearity Below Choke Set

Page 54: An example

54

Theorem: Suppose G is a directed graph containing CA , A, CB , and B, <CA ,CB > t-separates A and B, and A and B are linear below their choke sets CA and CB . Then rank(cov(A,B)) ≤ #CA + #CB .

Theorem 2: Suppose G is a directed graph containing CA , A, CB , and B, and A and B are linear below CA, CB but <CA ,CB > does not t-separate A and B. Then there is a covariance matrix compatible with the graph in which rank(cov(A,B)) > #CA + #CB .

Proof: This follows from Sullivant et al. for linear models.Question: Is there a natural sense in which the set of parameters

for which the rank(cov(A,B)) ≤ #CA + #CB is of measure 0 if it is not entailed by t-separation, even for the non-linear case?

Extension of Choke Point Theorem