Top Banner
Joint causal inference on observational and experimental datasets Sara Magliacane, Tom Claassen, Joris M. Mooij [email protected] What If?, 10 th December, 2016 Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 1 / 21
52

Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Feb 14, 2017

Download

Science

Sara Magliacane
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Joint causal inferenceon observational and experimental datasets

Sara Magliacane, Tom Claassen, Joris M. Mooij

[email protected]

What If?, 10th December, 2016

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 1 / 21

Page 2: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Part I

Introduction

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 2 / 21

Page 3: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Causal inference: learning causal relations from data

Definition

X causes Y (X 99KY ) = intervening upon (changing) X changes Y

We can represent causal relations with a causal DAG (hidden vars):

X Y

Z

E.g. X = Smoking, Y = Cancer, Z = Genetics

Causal inference = structure learning of the causal DAG

Traditionally, causal relations are inferred from interventions.

Sometimes, interventions are unethical, unfeasible or too expensive

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 3 / 21

Page 4: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Causal inference: learning causal relations from data

Definition

X causes Y (X 99KY ) = intervening upon (changing) X changes Y

We can represent causal relations with a causal DAG (hidden vars):

X Y

Z

E.g. X = Smoking, Y = Cancer, Z = Genetics

Causal inference = structure learning of the causal DAG

Traditionally, causal relations are inferred from interventions.

Sometimes, interventions are unethical, unfeasible or too expensive

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 3 / 21

Page 5: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Causal inference: learning causal relations from data

Definition

X causes Y (X 99KY ) = intervening upon (changing) X changes Y

We can represent causal relations with a causal DAG (hidden vars):

X Y

Z

E.g. X = Smoking, Y = Cancer, Z = Genetics

Causal inference = structure learning of the causal DAG

Traditionally, causal relations are inferred from interventions.

Sometimes, interventions are unethical, unfeasible or too expensive

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 3 / 21

Page 6: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Causal inference: learning causal relations from data

Definition

X causes Y (X 99KY ) = intervening upon (changing) X changes Y

We can represent causal relations with a causal DAG (hidden vars):

X Y

Z

E.g. X = Smoking, Y = Cancer, Z = Genetics

Causal inference = structure learning of the causal DAG

Traditionally, causal relations are inferred from interventions.

Sometimes, interventions are unethical, unfeasible or too expensive

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 3 / 21

Page 7: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Causal inference from observational and experimental data

Holy Grail of Causal Inference

Learn as much causal structure as possible from observations,integrating background knowledge and experimental data.

Current causal inference methods:

Score-based: evaluate models using a penalized likelihood score

Constraint-based: use statistical independences to expressconstraints over possible causal models

Advantage of constraint-based methods:

can handle latent confounders naturally

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 4 / 21

Page 8: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Causal inference from observational and experimental data

Holy Grail of Causal Inference

Learn as much causal structure as possible from observations,integrating background knowledge and experimental data.

Current causal inference methods:

Score-based: evaluate models using a penalized likelihood score

Constraint-based: use statistical independences to expressconstraints over possible causal models

Advantage of constraint-based methods:

can handle latent confounders naturally

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 4 / 21

Page 9: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Causal inference from observational and experimental data

Holy Grail of Causal Inference

Learn as much causal structure as possible from observations,integrating background knowledge and experimental data.

Current causal inference methods:

Score-based: evaluate models using a penalized likelihood score

Constraint-based: use statistical independences to expressconstraints over possible causal models

Advantage of constraint-based methods:

can handle latent confounders naturally

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 4 / 21

Page 10: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Joint inference on observational and experimental data

Advantage of score-based methods:

can formulate joint inference on observational and experimental dataand learn the targets of interventions, e.g. [Eaton and Murphy, 2007].

raf

mek12

pip2

erk

akt

p38jnk

pkc

plcy

pip3

pka

(a) (b)

raf

plcy

pip2

pip3

erk

p38

pkc

akt

pka

mek12

PresentMissing jnk

B2cAMP

f

erk

akt

jnk

Psitect AKT inh U0126PMA

p38

G06967mek12

raf

pkc

pip3

plcy

pip2

pka

PresentMissingInt. edge

(c) (d)

Figure 6: Models of the biological data. (a) A partial model of the T-cell pathway, as currently accepted by biologists. The small roundcircles with numbers represent various interventions (green = activators, red = inhibitors). From [SPP+05]. Reprinted with permissionfrom AAAS. (b) Edges with marginal probability above 0.5 as estimated by [SPP+05]. (c) Edges with marginal probability above 0.5as estimated by us, assuming known perfect interventions. Dashed edges are ones that are missing from the union of (a) and (b). Theseare either false positives, or edges that Sachs et al missed. (d) Edges with marginal probability above 0.5 as estimated by us, assuminguncertain, imperfect interventions, and a fan-in bound of k = 2. The intervention nodes are in red, and edges from the interventionnodes are light gray. Dashed edges are ones that are missing from the union of (a) and (b). This figure is best viewed in colour.

Example from [Eaton and Murphy, 2007]

Goal: Can we perform joint inference using constraint-based methods?

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 5 / 21

Page 11: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Joint inference on observational and experimental data

Advantage of score-based methods:

can formulate joint inference on observational and experimental dataand learn the targets of interventions, e.g. [Eaton and Murphy, 2007].

raf

mek12

pip2

erk

akt

p38jnk

pkc

plcy

pip3

pka

(a) (b)

raf

plcy

pip2

pip3

erk

p38

pkc

akt

pka

mek12

PresentMissing jnk

B2cAMP

f

erk

akt

jnk

Psitect AKT inh U0126PMA

p38

G06967mek12

raf

pkc

pip3

plcy

pip2

pka

PresentMissingInt. edge

(c) (d)

Figure 6: Models of the biological data. (a) A partial model of the T-cell pathway, as currently accepted by biologists. The small roundcircles with numbers represent various interventions (green = activators, red = inhibitors). From [SPP+05]. Reprinted with permissionfrom AAAS. (b) Edges with marginal probability above 0.5 as estimated by [SPP+05]. (c) Edges with marginal probability above 0.5as estimated by us, assuming known perfect interventions. Dashed edges are ones that are missing from the union of (a) and (b). Theseare either false positives, or edges that Sachs et al missed. (d) Edges with marginal probability above 0.5 as estimated by us, assuminguncertain, imperfect interventions, and a fan-in bound of k = 2. The intervention nodes are in red, and edges from the interventionnodes are light gray. Dashed edges are ones that are missing from the union of (a) and (b). This figure is best viewed in colour.

Example from [Eaton and Murphy, 2007]

Goal: Can we perform joint inference using constraint-based methods?

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 5 / 21

Page 12: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Part II

Joint Causal Inference

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 6 / 21

Page 13: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Joint Causal Inference: Assumptions

Idea: Model jointly several observational or experimental datasets{Dr}r∈{1...n} with zero or more possibly unknown intervention targets.

We assume a unique underlying causal DAG across datasets definedover system variables {Xj}j∈X (some of which possibly hidden).

Example

X1 X2

X3

Dataset D1

unknown ints

X1 X2

X3

Dataset D2

unknown ints

X1 X2

X3

Dataset D3

do X1

Note: cannot handle certain intervention types, e.g. perfect interventions.

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 7 / 21

Page 14: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Joint Causal Inference: Assumptions

Idea: Model jointly several observational or experimental datasets{Dr}r∈{1...n} with zero or more possibly unknown intervention targets.

We assume a unique underlying causal DAG across datasets definedover system variables {Xj}j∈X (some of which possibly hidden).

Example

X1 X2

X3

Dataset D1

unknown ints

X1 X2

X3

Dataset D2

unknown ints

X1 X2

X3

Dataset D3

do X1

Note: cannot handle certain intervention types, e.g. perfect interventions.

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 7 / 21

Page 15: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Joint Causal Inference: Assumptions

Idea: Model jointly several observational or experimental datasets{Dr}r∈{1...n} with zero or more possibly unknown intervention targets.

We assume a unique underlying causal DAG across datasets definedover system variables {Xj}j∈X (some of which possibly hidden).

Example

X1 X2

X3

Dataset D1

unknown ints

X1 X2

X3

Dataset D2

unknown ints

X1 X2

X3

Dataset D3

do X1

Note: cannot handle certain intervention types, e.g. perfect interventions.

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 7 / 21

Page 16: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Joint Causal Inference: SCM

We introduce two types of dummy variables in the data:

a regime variable R, indicating which dataset Dr a data point isfrom

intervention variables {Ii}i∈I , functions of R

We assume that we can represent the whole system as an acyclic SCM:

R = ER ,

Ii = gi (R), i ∈ I,Xj = fj(Xpa(Xj )∩X , Ipa(Xj )∩I ,Ej), j ∈ X ,

P((Ek)k∈X∪{R}

)=

∏k∈X∪{R}

P(Ek).

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 8 / 21

Page 17: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Joint Causal Inference: SCM

We introduce two types of dummy variables in the data:

a regime variable R, indicating which dataset Dr a data point isfrom

intervention variables {Ii}i∈I , functions of R

We assume that we can represent the whole system as an acyclic SCM:

R = ER ,

Ii = gi (R), i ∈ I,Xj = fj(Xpa(Xj )∩X , Ipa(Xj )∩I ,Ej), j ∈ X ,

P((Ek)k∈X∪{R}

)=

∏k∈X∪{R}

P(Ek).

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 8 / 21

Page 18: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Joint Causal Inference: single joint causal DAG

We represent the SCM with a causal DAG C representing all datasetsjointly:

R I1 I2 X1 X2 X4

1 20 0 0.1 0.2 0.51 20 0 0.13 0.21 0.491 20 0 . . . . . . . . .

2 20 1 . . . . . . . . .

3 30 0 . . . . . . . . .

4 30 1 . . . . . . . . .

4 datasets with 2 interventions

R

I1 I2

X1

X2

X3

X4

Causal DAG C

We assume Causal Markov and Minimality hold in C.

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 9 / 21

Page 19: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Joint Causal Inference: Faithfulness violations

Causal Faithfulness assumption:

X ⊥⊥ Y |W =⇒ X ⊥d Y |W [C]

R I1 X1 I1 = g1(R) =⇒ X1 ⊥⊥ I1 |R ... but X1 6⊥d I1 |R

Solution: D-separation [Spirtes et al., 2000]: X1 ⊥D I1 |R

Problem: ⊥D provably complete (for now) for functionally determinedrelations

Solution: restrict deterministic relations to only ∀i ∈ I : Ii = gi (R)

D-Faithfulness: X ⊥⊥ Y |W =⇒ X ⊥D Y |W [C].

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 10 / 21

Page 20: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Joint Causal Inference: Faithfulness violations

Causal Faithfulness assumption:

X ⊥⊥ Y |W =⇒ X ⊥d Y |W [C]

R I1 X1 I1 = g1(R) =⇒ X1 ⊥⊥ I1 |R ... but X1 6⊥d I1 |R

Solution: D-separation [Spirtes et al., 2000]: X1 ⊥D I1 |R

Problem: ⊥D provably complete (for now) for functionally determinedrelations

Solution: restrict deterministic relations to only ∀i ∈ I : Ii = gi (R)

D-Faithfulness: X ⊥⊥ Y |W =⇒ X ⊥D Y |W [C].

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 10 / 21

Page 21: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Joint Causal Inference: Faithfulness violations

Causal Faithfulness assumption:

X ⊥⊥ Y |W =⇒ X ⊥d Y |W [C]

R I1 X1 I1 = g1(R) =⇒ X1 ⊥⊥ I1 |R ... but X1 6⊥d I1 |R

Solution: D-separation [Spirtes et al., 2000]: X1 ⊥D I1 |R

Problem: ⊥D provably complete (for now) for functionally determinedrelations

Solution: restrict deterministic relations to only ∀i ∈ I : Ii = gi (R)

D-Faithfulness: X ⊥⊥ Y |W =⇒ X ⊥D Y |W [C].

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 10 / 21

Page 22: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Joint Causal Inference: Faithfulness violations

Causal Faithfulness assumption:

X ⊥⊥ Y |W =⇒ X ⊥d Y |W [C]

R I1 X1 I1 = g1(R) =⇒ X1 ⊥⊥ I1 |R ... but X1 6⊥d I1 |R

Solution: D-separation [Spirtes et al., 2000]: X1 ⊥D I1 |R

Problem: ⊥D provably complete (for now) for functionally determinedrelations

Solution: restrict deterministic relations to only ∀i ∈ I : Ii = gi (R)

D-Faithfulness: X ⊥⊥ Y |W =⇒ X ⊥D Y |W [C].

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 10 / 21

Page 23: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Joint Causal Inference

Joint Causal Inference (JCI) = Given all the assumptions, reconstructthe causal DAG C from independence test results.

X1 6⊥⊥ I1

X1 ⊥⊥ I1 |R

X4 ⊥⊥ X2 | I1. . .

=⇒

R

I1 I2

X1

X2

X3

X4

Problem: Current constraint-based methods cannot work with JCI,because of faithfulness violations.

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 11 / 21

Page 24: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Joint Causal Inference

Joint Causal Inference (JCI) = Given all the assumptions, reconstructthe causal DAG C from independence test results.

X1 6⊥⊥ I1

X1 ⊥⊥ I1 |R

X4 ⊥⊥ X2 | I1. . .

=⇒

R

I1 I2

X1

X2

X3

X4

Problem: Current constraint-based methods cannot work with JCI,because of faithfulness violations.

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 11 / 21

Page 25: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Joint Causal Inference

Joint Causal Inference (JCI) = Given all the assumptions, reconstructthe causal DAG C from independence test results.

X1 6⊥⊥ I1

X1 ⊥⊥ I1 |R

X4 ⊥⊥ X2 | I1. . .

=⇒

R

I1 I2

X1

X2

X3

X4

Problem: Current constraint-based methods cannot work with JCI,because of faithfulness violations.

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 11 / 21

Page 26: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Joint Causal Inference

Joint Causal Inference (JCI) = Given all the assumptions, reconstructthe causal DAG C from independence test results.

X1 6⊥⊥ I1

X1 ⊥⊥ I1 |R

X4 ⊥⊥ X2 | I1. . .

=⇒

R

I1 I2

X1

X2

X3

X4

Problem: Current constraint-based methods cannot work with JCI,because of faithfulness violations.

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 11 / 21

Page 27: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Part III

Extending constraint-based methods for JCI

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 12 / 21

Page 28: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

A simple strategy for dealing with faithfulness violations

Idea: Faithfulness violations → Partial inputs

A simple strategy for dealing with functionally determined relations:

1 Rephrase constraint-based method in terms of d-separations

2 For each independence test result derive sound d-separations:

X 6⊥⊥ Y |W =⇒ X 6⊥d Y |W

X 6∈ Det(W ) and Y 6∈ Det(W ) and X ⊥⊥ Y |W =⇒X ⊥d Y |Det(W )

where Det(W ) = variables determined by (a subset of) W .

Conjecture: sound also for a larger class of deterministic relations.

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 13 / 21

Page 29: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

A simple strategy for dealing with faithfulness violations

Idea: Faithfulness violations → Partial inputs

A simple strategy for dealing with functionally determined relations:

1 Rephrase constraint-based method in terms of d-separations

2 For each independence test result derive sound d-separations:

X 6⊥⊥ Y |W =⇒ X 6⊥d Y |W

X 6∈ Det(W ) and Y 6∈ Det(W ) and X ⊥⊥ Y |W =⇒X ⊥d Y |Det(W )

where Det(W ) = variables determined by (a subset of) W .

Conjecture: sound also for a larger class of deterministic relations.

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 13 / 21

Page 30: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

A simple strategy for dealing with faithfulness violations

Idea: Faithfulness violations → Partial inputs

A simple strategy for dealing with functionally determined relations:

1 Rephrase constraint-based method in terms of d-separations

2 For each independence test result derive sound d-separations:

X 6⊥⊥ Y |W =⇒ X 6⊥d Y |W

X 6∈ Det(W ) and Y 6∈ Det(W ) and X ⊥⊥ Y |W =⇒X ⊥d Y |Det(W )

where Det(W ) = variables determined by (a subset of) W .

Conjecture: sound also for a larger class of deterministic relations.

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 13 / 21

Page 31: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

A simple strategy for dealing with faithfulness violations

Idea: Faithfulness violations → Partial inputs

A simple strategy for dealing with functionally determined relations:

1 Rephrase constraint-based method in terms of d-separations

2 For each independence test result derive sound d-separations:

X 6⊥⊥ Y |W =⇒ X 6⊥d Y |W

X 6∈ Det(W ) and Y 6∈ Det(W ) and X ⊥⊥ Y |W =⇒X ⊥d Y |Det(W )

where Det(W ) = variables determined by (a subset of) W .

Conjecture: sound also for a larger class of deterministic relations.

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 13 / 21

Page 32: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

A simple strategy for dealing with faithfulness violations

Idea: Faithfulness violations → Partial inputs

A simple strategy for dealing with functionally determined relations:

1 Rephrase constraint-based method in terms of d-separations

2 For each independence test result derive sound d-separations:

X 6⊥⊥ Y |W =⇒ X 6⊥d Y |W

X 6∈ Det(W ) and Y 6∈ Det(W ) and X ⊥⊥ Y |W =⇒X ⊥d Y |Det(W )

where Det(W ) = variables determined by (a subset of) W .

Conjecture: sound also for a larger class of deterministic relations.

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 13 / 21

Page 33: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

A simple strategy for dealing with faithfulness violations

Idea: Faithfulness violations → Partial inputs

A simple strategy for dealing with functionally determined relations:

1 Rephrase constraint-based method in terms of d-separations

2 For each independence test result derive sound d-separations:

X 6⊥⊥ Y |W =⇒ X 6⊥d Y |W

X 6∈ Det(W ) and Y 6∈ Det(W ) and X ⊥⊥ Y |W =⇒X ⊥d Y |Det(W )

where Det(W ) = variables determined by (a subset of) W .

Conjecture: sound also for a larger class of deterministic relations.

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 13 / 21

Page 34: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Ancestral Causal Inference with Determinism (ACID)

Idea: Faithfulness violations → Partial inputs

Traditional constraint-based methods (e.g., PC, FCI, ...):

cannot handle partial inputs

cannot exploit the rich background knowledge in JCI

Solution: Logic-based methods, e.g., ACI [Magliacane et al., 2016]

=⇒ we implement the strategy in an extension of ACI called:

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 14 / 21

Page 35: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Ancestral Causal Inference with Determinism (ACID)

Idea: Faithfulness violations → Partial inputs

Traditional constraint-based methods (e.g., PC, FCI, ...):

cannot handle partial inputs

cannot exploit the rich background knowledge in JCI

Solution: Logic-based methods, e.g., ACI [Magliacane et al., 2016]

=⇒ we implement the strategy in an extension of ACI called:

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 14 / 21

Page 36: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Ancestral Causal Inference with Determinism (ACID)

Idea: Faithfulness violations → Partial inputs

Traditional constraint-based methods (e.g., PC, FCI, ...):

cannot handle partial inputs

cannot exploit the rich background knowledge in JCI

Solution: Logic-based methods, e.g., ACI [Magliacane et al., 2016]

=⇒ we implement the strategy in an extension of ACI called:

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 14 / 21

Page 37: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Ancestral Causal Inference with Determinism (ACID)

Idea: Faithfulness violations → Partial inputs

Traditional constraint-based methods (e.g., PC, FCI, ...):

cannot handle partial inputs

cannot exploit the rich background knowledge in JCI

Solution: Logic-based methods, e.g., ACI [Magliacane et al., 2016]

=⇒ we implement the strategy in an extension of ACI called:

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 14 / 21

Page 38: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Rephrasing ACI rules in terms of d-separations

ACI rules:

Example

For X , Y , W disjoint (sets of) variables:

(X ⊥⊥Y |W ) ∧ (X 699KW ) =⇒ X 699KY

ACID rules:

Example

For X , Y , W disjoint (sets of) variables:

(X ⊥d Y |W ) ∧ (X 699KW ) =⇒ X 699KY

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 15 / 21

Page 39: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Rephrasing ACI rules in terms of d-separations

ACI rules:

Example

For X , Y , W disjoint (sets of) variables:

(X ⊥⊥Y |W ) ∧ (X 699KW ) =⇒ X 699KY

ACID rules:

Example

For X , Y , W disjoint (sets of) variables:

(X ⊥d Y |W ) ∧ (X 699KW ) =⇒ X 699KY

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 15 / 21

Page 40: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

ACID-JCI

ACID-JCI = ACID rules + sound d-separations from strategy

+ JCI background knowledge

For example:

∀i ∈ I,∀j ∈ X : (Xj 699KR) ∧ (Xj 699K Ii )

“Standard variables cannot cause dummy variables”

Example

I1

X1

R

I1

X1

R

I1 X1

R

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 16 / 21

Page 41: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

ACID-JCI

ACID-JCI = ACID rules + sound d-separations from strategy

+ JCI background knowledge

For example:

∀i ∈ I, ∀j ∈ X : (Xj 699KR) ∧ (Xj 699K Ii )

“Standard variables cannot cause dummy variables”

Example

I1

X1

R

I1

X1

R

I1 X1

R

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 16 / 21

Page 42: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

ACID-JCI

ACID-JCI = ACID rules + sound d-separations from strategy

+ JCI background knowledge

For example:

∀i ∈ I, ∀j ∈ X : (Xj 699KR) ∧ (Xj 699K Ii )

“Standard variables cannot cause dummy variables”

Example

I1

X1

R

I1

X1

R

I1 X1

R

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 16 / 21

Page 43: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Part IV

Evaluation

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 17 / 21

Page 44: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Simulated data accuracy: example Precision Recall curve

Ancestral (“causes”) relations Non-ancestral (“not causes”)

Precision Recall curves of 2000 randomly generated causal graphs for4 system variables and 3 interventions

ACID-JCI substantially improves accuracy w.r.t. merging learntstructures (merged ACI)

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 18 / 21

Page 45: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Conclusion

Joint Causal Inference, a powerful formulation of causal discoveryover multiple datasets for constraint-based methods

A simple strategy for dealing with faithfulness violations due tofunctionally determined relations

An implementation, ACID-JCI, that substantially improves theaccuracy w.r.t. state-of-the-art

Future work:

Improve scalability (now max 7 variables in C).

Working paper: https://arxiv.org/abs/1611.10351

Collaborators:

Tom Claassen Joris M. Mooij

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 19 / 21

Page 46: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Conclusion

Joint Causal Inference, a powerful formulation of causal discoveryover multiple datasets for constraint-based methods

A simple strategy for dealing with faithfulness violations due tofunctionally determined relations

An implementation, ACID-JCI, that substantially improves theaccuracy w.r.t. state-of-the-art

Future work:

Improve scalability (now max 7 variables in C).

Working paper: https://arxiv.org/abs/1611.10351

Collaborators:

Tom Claassen Joris M. Mooij

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 19 / 21

Page 47: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Conclusion

Joint Causal Inference, a powerful formulation of causal discoveryover multiple datasets for constraint-based methods

A simple strategy for dealing with faithfulness violations due tofunctionally determined relations

An implementation, ACID-JCI, that substantially improves theaccuracy w.r.t. state-of-the-art

Future work:

Improve scalability (now max 7 variables in C).

Working paper: https://arxiv.org/abs/1611.10351

Collaborators:

Tom Claassen Joris M. Mooij

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 19 / 21

Page 48: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Conclusion

Joint Causal Inference, a powerful formulation of causal discoveryover multiple datasets for constraint-based methods

A simple strategy for dealing with faithfulness violations due tofunctionally determined relations

An implementation, ACID-JCI, that substantially improves theaccuracy w.r.t. state-of-the-art

Future work:

Improve scalability (now max 7 variables in C).

Working paper: https://arxiv.org/abs/1611.10351

Collaborators:

Tom Claassen Joris M. Mooij

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 19 / 21

Page 49: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Conclusion

Joint Causal Inference, a powerful formulation of causal discoveryover multiple datasets for constraint-based methods

A simple strategy for dealing with faithfulness violations due tofunctionally determined relations

An implementation, ACID-JCI, that substantially improves theaccuracy w.r.t. state-of-the-art

Future work:

Improve scalability (now max 7 variables in C).

Working paper: https://arxiv.org/abs/1611.10351

Collaborators:

Tom Claassen Joris M. Mooij

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 19 / 21

Page 50: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Conclusion

Joint Causal Inference, a powerful formulation of causal discoveryover multiple datasets for constraint-based methods

A simple strategy for dealing with faithfulness violations due tofunctionally determined relations

An implementation, ACID-JCI, that substantially improves theaccuracy w.r.t. state-of-the-art

Future work:

Improve scalability (now max 7 variables in C).

Working paper: https://arxiv.org/abs/1611.10351

Collaborators:

Tom Claassen Joris M. Mooij

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 19 / 21

Page 51: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

References I

Claassen, T. and Heskes, T. (2011).

A logical characterization of constraint-based causal discovery.In UAI 2011, pages 135–144.

Eaton, D. and Murphy, K. (2007).

Exact Bayesian structure learning from uncertain interventions.In Proceedings of the 10th International Conference on Artificial Intelligence and Statistics (AISTATS), pages 107–114.

Magliacane, S., Claassen, T., and Mooij, J. M. (2016).

Ancestral causal inference.In NIPS.

Mooij, J. M. and Heskes, T. (2013).

Cyclic causal discovery from continuous equilibrium data.In Nicholson, A. and Smyth, P., editors, Proceedings of the 29th Annual Conference on Uncertainty in ArtificialIntelligence (UAI-13), pages 431–439. AUAI Press.

Spirtes, P., Glymour, C., and Scheines, R. (2000).

Causation, Prediction, and Search.MIT press, 2nd edition.

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 20 / 21

Page 52: Talk: Joint causal inference on observational and experimental data - NIPS 2016 "What If?" workshop poster

Ancestral Causal Inference (ACI)

Weighted list of inputs: I = {(ij ,wj)}:

E.g. I = { (Y ⊥⊥Z |X , 0.2), (Y 6⊥⊥X , 0.1)} }

Any consistent weighting scheme, e.g. frequentist, Bayesian

For any possible ancestral structure C , we define the loss function:

Loss(C , I ) :=∑

(ij ,wj )∈I : ij is not satisfied in C

wj

Here: “ij is not satisfied in C” = defined by ancestral reasoning rules

For each possible causal relation X 99KY provide score:

Conf (X 99KY ) = minC∈CLoss(C , I + (X 699KY ,∞))

−minC∈CLoss(C , I + (X 99KY , ∞))

Sara Magliacane (VU, UvA) Joint causal inference 10-12-2016 21 / 21