Learning causal theories Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL)

Learning causal theories

Josh TenenbaumMIT

Department of Brain and Cognitive SciencesComputer Science and AI Lab (CSAIL)

Collaborators

Charles Kemp Noah Goodman

Tom Griffiths Vikash Mansinghka

Structure

Data

A standard answer: infer the network structure that best fits the statistics of the observed data.

How do people learn causal relations from data?

What’s missing from this account? The background knowledge that makes causal learning possible.• Causal schemata: domain-specific theories that constrain “natural”

causal hypotheses– Abstract classes of variables and mechanisms– Causal laws defined over these classes

• Causal variables: substrate of causal hypotheses– Which variables are relevant– How variables ground out in perceptual and motor experience

The puzzle: This background knowledge must itself be learned, and learned together with specific causal relations. How?

A possible answer: hierarchical Bayesian models

Learning causal schemata

Causal schema

Causalmodel

Eventdata

Behaviors Diseases Symptoms

high-fat dietworking in factory…

heart diseaselung cancer…

coughingchest pain…

(Griffiths & Tenenbaum; Kemp, Goodman, Tenenbaum)

TO ADD• # of Bayes nets on 12 variables:

521939651343829405020504063

• # of Bayes nets on 12 variables that fit this schema: 131072

• Maybe put the bayes net learning slide next.

recoveredmodel

1 2 3 4 5 6

7 8 9 10 11 12 13 14 15 16

20 80 1000

Data

Causal model

Data

Causal model

Causal schema

recoveredmodel

# samples

(Mansinghka, Kemp, Tenenbaum, Griffiths)

Towards moreschema-basedmachine learning

1 2 3 4 5 6…

7 8 9 10 11 12 1314 15 16…

…c1

c2

0.4 “blessing of abstraction”


cashew

almond

chestnut

walnut macadamia ?

(Kemp, Goodman, Tenenbaum)

rash

?Causal model

EventData

macadamia

nut rash



rash

+0.54

rash

+0.47

rash

?

almond

rash

chestnut

rash… …… …

Causal model

EventData

walnut cashew macadamia

nut rash

nut rash

nut rash

nut rash

nut rash



rash

+0.54

rash

+0.47

rash

+0.5?

almond

rash

chestnut

rash

walnut cashew macadamiaalmond chestnut T2

T2

rash

+0.5

T1

rash

… …… …

Causal model

EventData

Causal schema

walnut cashew macadamia

nut rash

nut rash

nut rash

nut rash

nut rash

T1


GO

+ + + +

o1o3

o2o4

e

o5o7

o6o8

e

1)

Type 2Type 1

o1o3

o2o4

e

o5o7

o6o8

e

o1o4

o2o5

e

o3o6

o1o4

o2o5

e

o3o6

2) 3) 4)

One-shot learning: Design

• Training phase: 1 of 4 conditions shown above.• Test phase: new object activates the machine once.

new object fails on the machine once.

+0.5

+0.1 +0.9 +0.1

One-shot learning: Resultso1o3

o2o4

e+0.5

o5o7

o6o8

e

o1o3

o2o4

e+0.1

o5o7

o6o8

e+0.9

o1o4

o2o5

e

o3o6

Con

ditio

nM

odel

Peo

ple

strength strength strength

like

lihoo

dlik

elih

ood

o1o4

o2o5

e

o3o6

+0.1

strength

Question: what is the causal strength of ?

What’s missing from this account?

• Causal schemas: domain-specific theories that constrain “natural” causal hypotheses– Abstract classes of variables and mechanisms– Causal laws defined over these classes

• Causal variables: constituents of causal hypotheses– Which variables are relevant– How variables ground out in perceptual and motor experience

A possible answer: hierarchical Bayesian models (Kemp, Goodman, Griffiths, Tenenbaum).

The problem

A child learns that petting the cat leads to purring, while pounding leads to growling. But what are the origins of these symbolic event concepts (“variables”) over which causal links are defined?• Option 1: Variables are innate.

• Option 2 (“clusters than causes”): Variables are learned first, independent of causal relations, through a kind of bottom-up perceptual clustering.

• Option 3: Variables are learned together with causal relations.

?

Learning grounded causal models(Goodman, Mansinghka & Tenenbaum)

Hypotheses:

Data:

Time t Time t’

…

A

B

C

“Alien control panel” experiment

Blue bar: human

Red bar: model

A B C A B C A B C

A B C A B C A B C A B C A B C

“blessing of abstraction”

Testing joint model vs. bottom-up model

How many variables are discovered?A B C

Joint Model

Humans Blue bars: 3 variables

Red bars: 4 variables

Bottom-up Model

Conclusions• Hierarchical Bayesian models (HBMs) explain how the background

knowledge that supports causal learning may itself be learned from data through rational inferential means.– Domain-specific schemas constraining candidate causal networks

– Causal variables grounded in sensorimotor experience

• These issues are more general than just causal learning, relevant to learning associations, symbolic rules, …

• Contrast with traditional approaches to knowledge acquisition:– Classical empiricism: variables are innate, schemata learned slowly by accretion

and superposition.

– Classical nativism: variables are innate, schemata are innate.

– Hierarchical Bayes: variables and schemata could be learned; abstract knowledge may be learned from surprisingly little data.

• Ongoing and future work: Applying HBMs to many different aspects of cognitive development – categories and properties, word learning, syntax in language, social relations, theory of mind, …

A B C A B C A B C

“Alien control panel” experiment

A B C

a

b

c

Modeling learning curves

Blue bar: human

Red bar: model

Learning causal theories Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL)

Documents

causal network

causal learning possible

network of causal relations

specific causal relations

aspects of causal knowledge

classescausal variables

relevanthow variables

observed data