Top Banner
Bayesian models of human learning and inference Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL) Thanks to Tom Griffiths, Charles Kemp, Vikash Mansinghka http://web.mit.edu/cocosci/Talks/nips06-tutorial.pp
156

Bayesian models of human learning and inference Josh Tenenbaum MIT

Jan 08, 2016

Download

Documents

thao

Bayesian models of human learning and inference Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL). (http://web.mit.edu/cocosci/Talks/nips06-tutorial.ppt). Thanks to Tom Griffiths, Charles Kemp, Vikash Mansinghka. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Bayesian models of human learning and inference

Josh TenenbaumMIT

Department of Brain and Cognitive SciencesComputer Science and AI Lab (CSAIL)

Thanks to Tom Griffiths, Charles Kemp, Vikash Mansinghka

(http://web.mit.edu/cocosci/Talks/nips06-tutorial.ppt)

Page 2: Bayesian models of human       learning and inference Josh Tenenbaum MIT

The probabilistic revolution in AI

• Principled and effective solutions for inductive inference from ambiguous data:– Vision– Robotics– Machine learning– Expert systems / reasoning– Natural language processing

• Standard view: no necessary connection to how the human brain solves these problems.

Page 3: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Probabilistic inference inhuman cognition?

• “People aren’t Bayesian”– Kahneman and Tversky (1970’s-present): “heuristics and

biases” research program. 2002 Nobel Prize in Economics. – Slovic, Fischhoff, and Lichtenstein (1976): “It appears that

people lack the correct programs for many important judgmental tasks.... it may be argued that we have not had the opportunity to evolve an intellect capable of dealing conceptually with uncertainty.”

– Stephen Jay Gould (1992): “Our minds are not built (for whatever reason) to work by the rules of probability.”

Page 4: Bayesian models of human       learning and inference Josh Tenenbaum MIT

The probability of breast cancer is 1% for a woman at 40 who participates in a routine screening. If a woman has breast cancer,the probability is 80% that she will have a positive mammography.If a woman does not have breast cancer, the probability is 9.6%that she will also have a positive mammography.

A woman in this age group had a positive mammography in aroutine screening. What is the probability that she actually hasbreast cancer?

A. greater than 90%B. between 70% and 90%C. between 50% and 70%D. between 30% and 50%E. between 10% and 30%F. less than 10%

95 out of 100 doctors

Correct answer“Base rate neglect”

Page 5: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Availability biases in probability judgment

• How likely is that a randomly chosen word – ends in “g”? – ends in “ing”?

• When buying a car, how much do you weigh your friend’s experience relative to consumer satisfaction surveys?

Page 6: Bayesian models of human       learning and inference Josh Tenenbaum MIT
Page 7: Bayesian models of human       learning and inference Josh Tenenbaum MIT
Page 8: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Probabilistic inference inhuman cognition?

• “People aren’t Bayesian”– Kahneman and Tversky (1970’s-present): “heuristics and

biases” research program. 2002 Nobel Prize in Economics.

• Psychology is often drawn towards the mind’s errors and apparent irrationalities.

• But the computationally interesting question remains: How does mind work so well?

Page 9: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Bayesian models of cognitionVisual perception [Weiss, Simoncelli, Adelson, Richards, Freeman, Feldman, Kersten, Knill, Maloney,

Olshausen, Jacobs, Pouget, ...]

Language acquisition and processing [Brent, de Marken, Niyogi, Klein, Manning, Jurafsky, Keller, Levy, Hale, Johnson, Griffiths, Perfors, Tenenbaum, …]

Motor learning and motor control [Ghahramani, Jordan, Wolpert, Kording, Kawato, Doya, Todorov, Shadmehr, …]

Associative learning [Dayan, Daw, Kakade, Courville, Touretzky, Kruschke, …]

Memory [Anderson, Schooler, Shiffrin, Steyvers, Griffiths, McClelland, …]

Attention [Mozer, Huber, Torralba, Oliva, Geisler, Yu, Itti, Baldi, …]

Categorization and concept learning [Anderson, Nosfosky, Rehder, Navarro, Griffiths, Feldman, Tenenbaum, Rosseel, Goodman, Kemp, Mansinghka, …]

Reasoning [Chater, Oaksford, Sloman, McKenzie, Heit, Tenenbaum, Kemp, …]

Causal inference [Waldmann, Sloman, Steyvers, Griffiths, Tenenbaum, Yuille, …]

Decision making and theory of mind [Lee, Stankiewicz, Rao, Baker, Goodman, Tenenbaum, …]

Page 10: Bayesian models of human       learning and inference Josh Tenenbaum MIT

• Word learning

“horse” “horse” “horse”

Learning concepts from examples

Page 11: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Learning concepts from examples

“tufa”

“tufa”

“tufa”

Page 12: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Everyday inductive leaps

How can people learn so much about the world . . . – Kinds of objects and their properties– The meanings of words, phrases, and sentences – Cause-effect relations– The beliefs, goals and plans of other people– Social structures, conventions, and rules

. . . from such limited evidence?

Page 13: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Contributions of Bayesian models• Principled quantitative models of human behavior, with

broad coverage and a minimum of free parameters and ad hoc assumptions.

• Explain how and why human learning and reasoning works, in terms of (approximations to) optimal statistical inference in natural environments.

• A framework for studying people’s implicit knowledge about the structure of the world: how it is structured, used, and acquired.

• A two-way bridge to state-of-the-art AI and machine learning.

Page 14: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Marr’s Three Levels of Analysis

• Computation: “What is the goal of the computation, why is it

appropriate, and what is the logic of the strategy by which it can be carried out?”

• Algorithm: Cognitive psychology

• Implementation:Neurobiology

Page 15: Bayesian models of human       learning and inference Josh Tenenbaum MIT

What about those errors?• The human mind is not a universal Bayesian engine.

• But, the mind does appear adapted to solve important real-world inference problems in approximately Bayesian ways, e.g. – Predicting everyday events

– Causal learning and reasoning

– Learning concepts from examples

• Like perceptual tasks, adults and even young children solve these problems mostly unconsciously, effortlessly, and successfully.

Page 16: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Technical themes• Inference in probabilistic models

– Role of priors, explaining away.

• Learning in graphical models– Parameter learning, structure learning.

• Bayesian model averaging– Being Bayesian over network structures.

• Bayesian Occam’s razor– Trade off model complexity against data fit.

Page 17: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Technical themes• Structured probabilistic models

– Grammars, first-order logic, relational schemas.

• Hierarchical Bayesian models– Acquire abstract knowledge, supports transfer.

• Nonparametric Bayes– Flexible models that grow in complexity as new data

warrant.

• Tractable approximate inference – Markov chain Monte Carlo (MCMC), Sequential Monte

Carlo (particle filtering).

Page 18: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Outline

• Predicting everyday events

• Causal learning and reasoning

• Learning concepts from examples

Page 19: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Outline

• Predicting everyday events

• Causal learning and reasoning

• Learning concepts from examples

Page 20: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Basics of Bayesian inference

• Bayes’ rule:

• An example– Data: John is coughing

– Some hypotheses:1. John has a cold

2. John has lung cancer

3. John has a stomach flu

– Likelihood P(d|h) favors 1 and 2 over 3

– Prior probability P(h) favors 1 and 3 over 2

– Posterior probability P(h|d) favors 1 over 2 and 3

Hhii

i

hPhdP

hPhdPdhP

)()|(

)()|()|(

Page 21: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Bayesian inference in perception and sensorimotor integration

(Weiss, Simoncelli & Adelson 2002) (Kording & Wolpert 2004)

Page 22: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Power law of forgetting:

Log delay (hours)

Memory retrieval as Bayesian inference(Anderson & Schooler, 1991)

Log

me

mo

ry s

tre

ngth

Additive effects of practice & delay:

Spacing effects in forgetting:

Retention interval(days)

Me

an

# r

eca

lled

Log delay(seconds)

Page 23: Bayesian models of human       learning and inference Josh Tenenbaum MIT

For each item in memory, estimate the probability that it will be useful in the present context.

Use priors based on the statistics of natural information sources.

Memory retrieval as Bayesian inference(Anderson & Schooler, 1991)

Page 24: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Log # days sincelast occurrence

Memory retrieval as Bayesian inference(Anderson & Schooler, 1991)

Log

nee

d o

dds

Log

nee

d o

dds

Log # days sincelast occurrence

Log # days sincelast occurrence

Power law of forgetting:

Additive effects of practice & delay:

Spacing effects in forgetting:

[New York Times data; c.f. email sources, child-directed speech]

Page 25: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Everyday prediction problems(Griffiths & Tenenbaum, 2006)

• You read about a movie that has made $60 million to date. How much money will it make in total?

• You see that something has been baking in the oven for 34 minutes. How long until it’s ready?

• You meet someone who is 78 years old. How long will they live?

• Your friend quotes to you from line 17 of his favorite poem. How long is the poem?

• You see taxicab #107 pull up to the curb in front of the train station. How many cabs in this city?

Page 26: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Making predictions

• You encounter a phenomenon that has existed for tpast units of time. How long will

it continue into the future? (i.e. what’s ttotal?)

• We could replace “time” with any other quantity that ranges from 0 to some unknown upper limit.

Page 27: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Bayesian inference

P(ttotal|tpast) P(tpast|ttotal) P(ttotal)

posterior probability

likelihood prior

Page 28: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Bayesian inference

P(ttotal|tpast) P(tpast|ttotal) P(ttotal)

1/ttotal 1/ttotal

posterior probability

likelihood prior

“Uninformative” prior

Assume randomsample

(0 < tpast < ttotal) (e.g., Jeffreys, Jaynes)

Page 29: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Bayesian inference

P(ttotal|tpast) 1/ttotal 1/ttotal

posterior probability

Randomsampling

“Uninformative” prior

P(ttotal|tpast)

ttotaltpast

Page 30: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Bayesian inference

P(ttotal|tpast) 1/ttotal 1/ttotal

posterior probability

Randomsampling

“Uninformative” prior

P(ttotal|tpast)

ttotaltpast

Best guess for ttotal: t such that P(ttotal > t|tpast) = 0.5:

Page 31: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Bayesian inference

P(ttotal|tpast) 1/ttotal 1/ttotal

posterior probability

Randomsampling

“Uninformative” prior

P(ttotal|tpast)

ttotaltpast

Yields Gott’s Rule: P(ttotal > t|tpast) = 0.5 when t = 2tpast

i.e., best guess for ttotal = 2tpast .

Page 32: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Evaluating Gott’s Rule

• You read about a movie that has made $78 million to date. How much money will it make in total?– “$156 million” seems reasonable.

• You meet someone who is 35 years old. How long will they live?– “70 years” seems reasonable.

• Not so simple:– You meet someone who is 78 years old. How long will they live?

– You meet someone who is 6 years old. How long will they live?

Page 33: Bayesian models of human       learning and inference Josh Tenenbaum MIT

The effects of priors

• Different kinds of priors P(ttotal) are appropriate in different domains.

e.g., wealth, contacts

e.g., height, lifespan

[Gott: P(ttotal) ttotal-1 ]

Page 34: Bayesian models of human       learning and inference Josh Tenenbaum MIT

The effects of priors

Page 35: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Evaluating human predictions

• Different domains with different priors:– A movie has made $60 million

– Your friend quotes from line 17 of a poem

– You meet a 78 year old man

– A move has been running for 55 minutes

– A U.S. congressman has served for 11 years

– A cake has been in the oven for 34 minutes

• Use 5 values of tpast for each.

• People predict ttotal .

Page 36: Bayesian models of human       learning and inference Josh Tenenbaum MIT
Page 37: Bayesian models of human       learning and inference Josh Tenenbaum MIT

You learn that in ancient Egypt, there was a great flood in the 11th year of a pharaoh’s reign. How long did he reign?

Page 38: Bayesian models of human       learning and inference Josh Tenenbaum MIT

You learn that in ancient Egypt, there was a great flood in the 11th year of a pharaoh’s reign. How long did he reign?

How long did the typicalpharaoh reign in ancientegypt?

Page 39: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Summary: prediction• Predictions about the extent or magnitude of everyday events

follow Bayesian principles.

• Contrast with Bayesian inference in perception, motor control, memory: no “universal priors” here.

• Predictions depend rationally on priors that are appropriately calibrated for different domains.– Form of the prior (e.g., power-law or exponential)– Specific distribution given that form (parameters)– Non-parametric distribution when necessary.

• In the absence of concrete experience, priors may be generated by qualitative background knowledge.

Page 40: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Outline

• Predicting everyday events

• Causal learning and reasoning

• Learning concepts from examples

Page 41: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Bayesian networks

Four random variables:

X1 coughing

X2 high body temperature

X3 flu

X4 lung cancer

X1 X2

X3X4P(x4) P(x3)

P(x2|x3)P(x1|x3, x4)

Nodes: variablesLinks: direct dependencies

Each node has a conditional probability distribution

Data: observations of X1, ..., X4

Page 42: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Causal Bayesian networks

X1 X2

X3X4P(x4) P(x3)

P(x2|x3)P(x1|x3, x4)

Nodes: variablesLinks: causal mechanisms

Each node has a conditional probability distribution

Data: observations of and interventions on X1, ..., X4

Four random variables:

X1 coughing

X2 high body temperature

X3 flu

X4 lung cancer

(Pearl; Glymour & Cooper)

Page 43: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Inference in causal graphical models

• Explaining away or “discounting” in social reasoning (Kelley; Morris & Larrick)

• “Screening off” in intuitive causal reasoning (Waldmann, Rehder & Burnett, Blok & Sloman, Gopnik & Sobel)

– Better in chains than common-cause structures; common-cause better if mechanisms clearly independent

• Understanding and predicting the effects of interventions (Sloman & Lagnado; Gopnik & Schulz)

C

A B

B

A C

BA C P(c|b) vs. P(c|b, a) P(c|b, not a)

Page 44: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Learning graphical models

X1 X2

X3X4P(x4) P(x3)

P(x2|x3)P(x1|x3, x4)

• Structure learning: what causes what?

• Parameter learning: how do causes work?

X1 X2

X3X4P(x4) P(x3)

P(x2|x3)P(x1|x3, x4)

Page 45: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Bayesian learning of causal structure

Data d Causal hypotheses h

1,1,0,1

1,0,1,0

1,0,0,1

1,1,1,1

43214

43213

43212

43211

XXXXd

XXXXd

XXXXd

XXXXd

1. What is the most likely network h given observed data d ?

2. How likely is there to be a link X4 X2 ?

P(hi | d) P(d | hi)P(hi)

P(d | h j )P(h j )j

Hjh

jj dhPhXXPdXXP )|()|()|( 2424

X1

X4 X3

X2 X1

X4 X3

X2

(Bayesian model averaging)

Page 46: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Bayesian Occam’s Razor

All possible data sets d

p(D

= d

| M

) M1

M2

1)|( all

MdDpDd

For any model M,

Law of “conservation of belief”: A model that can predict many possible data sets must assign each of them low probability.

(MacKay, 2003; Ghahramani tutorials)

Page 47: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Learning causation from contingencies

Subjects judge the extent C to which causes E (rate on a scale from 0 to 100)

E present (e+)

E absent (e-)

C present(c+)

C absent(c-)

a

b

c

d

e.g., “Does injectingthis chemical causemice to express acertain gene?”

Page 48: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Two models of causal judgment

• Delta-P (Jenkins & Ward, 1965):

• Power PC (Cheng, 1997):

)|()|( cePcePP

)|(1

ceP

PpPower

Page 49: Bayesian models of human       learning and inference Josh Tenenbaum MIT

People

P

Power

0.00 0.25 0.50 0.75 1.00P

Judging the probability that C E

(Buehner & Cheng, 1997; 2003)

• Independent effects of both P and causal power.• At P=0, judgments decrease with base rate. (“frequency illusion”)

Page 50: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Learning causal strength(parameter learning)

Assume this causal structure:

P and causal power are maximum likelihood estimates of the strength parameter w1, under different parameterizations for P(E|B,C):

linear P, Noisy-OR causal power

E

B C

w0 w1

B

Page 51: Bayesian models of human       learning and inference Josh Tenenbaum MIT

• Hypotheses:

• Bayesian causal support:

P(d | h1) P(d | w0,w1) p(w0,w1 | h1)0

10

1 dw0 dw1

P(d | h0) P(d | w0) p(w0 | h0)0

1 dw0

Learning causal structure(Griffiths & Tenenbaum, 2005)

likelihood ratio (Bayes factor) gives evidence in favor of h1

noisy-OR(assume uniform parameter priors, but see Yuille et al., Danks et al.)

E

B C

w0 w1

B

E

B C

w0

Bh0:h1:

)|(

)|(log

0

1

hdP

hdP

Page 52: Bayesian models of human       learning and inference Josh Tenenbaum MIT

People

P (r = 0.89)

Power (r = 0.88)

Support (r = 0.97)

Buehner and Cheng (1997)

Page 53: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Implicit background theory• Injections may or may not cause gene expression, but gene

expression does not cause injections. – No hypotheses with E C

• Other naturally occurring processes may also cause gene expression.– All hypotheses include an always-present background cause B C

• Causes are generative, probabilistically sufficient and independent, i.e. each cause independently produces the effect in some proportion of cases.– Noisy-OR parameterization

Page 54: Bayesian models of human       learning and inference Josh Tenenbaum MIT

People

Support (Noisy-OR)

2

Support (generic parameterization)

Sensitivity analysis

Page 55: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Generativity is essential

• Predictions result from “ceiling effect”– ceiling effects only matter if you believe a cause increases the

probability of an effect

P(e+|c+)P(e+|c-)

8/88/8

6/86/8

4/84/8

2/82/8

0/80/8

Support 10050

0

Page 56: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Blicket detector (Sobel, Gopnik, and colleagues)

See this? It’s a blicket machine. Blickets make it go.

Let’s put this oneon the machine.

Oooh, it’s a blicket!

Page 57: Bayesian models of human       learning and inference Josh Tenenbaum MIT

– Initially: Nothing on detector – detector silent (A=0, B=0, E=0)

– Trial 1: A B on detector – detector active (A=1, B=1, E=1)

– Trial 2: A on detector – detector active (A=1, B=0, E=1)

– 4-year-olds judge if each object is a blicket

A: a blicket (100% say yes)

B: probably not a blicket (34% say yes)

“Backwards blocking” (Sobel, Tenenbaum & Gopnik, 2004)

AB Trial A TrialA B

E

BA

? ?

(cf. “explaining away in weight space”, Dayan & Kakade)

Page 58: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Possible hypotheses?

E

BA

E

BA

E

BA

E

BA

E

BA

E

BA

E

BA

E

BA

E

BA

E

BA

E

BA

E

BA

E

BA

E

BA

E

BA

E

BA

E

BA

E

BA

E

BA

E

BA

E

BA

E

BA

E

BA

E

BA

Page 59: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Bayesian causal learning

With a uniform prior on hypotheses, generic parameterization:

A B

Probability of being a blicket:

0.32 0.32

0.34 0.34

Page 60: Bayesian models of human       learning and inference Josh Tenenbaum MIT

A stronger hypothesis space• Links can only exist from blocks to detectors.

• Blocks are blickets with prior probability q.

• Blickets always activate detectors, detectors never activate on their own (i.e., deterministic OR parameterization, no hidden causes).

P(E=1 | A=0, B=0): 0 0 0 0

P(E=1 | A=1, B=0): 0 0 1 1P(E=1 | A=0, B=1): 0 1 0 1P(E=1 | A=1, B=1): 0 1 1 1

E

BA

E

BA

E

BA

E

BA

P(h00) = (1 – q)2 P(h10) = q(1 – q)P(h01) = (1 – q) q P(h11) = q2

Page 61: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Manipulating prior probability(Tenenbaum, Sobel, Griffiths, & Gopnik)

AB Trial A TrialInitial

Page 62: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Learning more complex structures• Tenenbaum et al., Griffiths & Sobel: detectors with more than

two objects and noisy mechanisms

• Steyvers et al., Sobel & Kushnir: active learning with interventions (c.f. Tong & Koller, Murphy)

• Lagnado & Sloman: learning from interventions on continuous dynamical

systems

Page 63: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Inferring hidden causesCommon unobserved cause

4 x 2 x 2 x

Independent unobserved causes

1 x 2 x 2 x 2 x 2 x

One observed cause

2 x 4 x(Kushnir, Schulz, Gopnik, & Danks, 2003)

The “stick ball”machine

Page 64: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Bayesian learning with unknown number of hidden variables

(Griffithset al 2006)

Page 65: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Common unobserved cause

Independent unobserved causes

One observed cause

= 0.3 = 0.8

r = 0.94

Page 66: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Inferring latent causes in classical conditioning

(Courville, Daw, Gordon, Touretzky 2003)

Training: A US A X B US

Test: X X B

e.g., A noise X tone B click US shock

Page 67: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Inferring latent causes in perceptual learning

(Orban, Fiser, Aslin, Lengyel 2006)

Learning to recognize objects and segment scenes:

Page 68: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Inferring latent causes in sensory integration

(Kording et al. 2006, NIPS 06)

Page 69: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Coincidences(Griffiths & Tenenbaum, in press)

• The birthday problem– How many people do you need to have in the

room before the probability exceeds 50% that two of them have the same birthday?

• The bombing of London

23.

Page 70: Bayesian models of human       learning and inference Josh Tenenbaum MIT

How much ofa coincidence?

Page 71: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Bayesian coincidence factor:

Alternative hypotheses:

proximity in date, matching days of the month, matching month, ....

August

C

x x xx xx x xx x

Chance: Latent common cause:

)|(

)|(log

randomdP

latentdP

Page 72: Bayesian models of human       learning and inference Josh Tenenbaum MIT

How much ofa coincidence?

Page 73: Bayesian models of human       learning and inference Josh Tenenbaum MIT

C

x x xx xx x xx x

uniformuniform

+regularity

Chance: Latent common cause:

Bayesian coincidence factor:)|(

)|(log

randomdP

latentdP

Page 74: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Summary: causal inference & learning

• Human causal induction can be explained using core principles of graphical models.– Bayesian inference (explaining away, screening off)– Bayesian structure learning (Occam’s razor,

model averaging)– Active learning with interventions– Identifying latent causes

Page 75: Bayesian models of human       learning and inference Josh Tenenbaum MIT

• Crucial constraints on hypothesis spaces come from abstract prior knowledge, or “intuitive theories”. – What are the variables?– How can they be connected?– How are their effects parameterized?

• Big open questions…– How can these theories be described formally?– How can these theories be learned?

Summary: causal inference & learning

Page 76: Bayesian models of human       learning and inference Josh Tenenbaum MIT

AbstractPrinciples

Structure

Data

(Griffiths, Tenenbaum, Kemp et al.)

Hierarchical Bayesian framework

Page 77: Bayesian models of human       learning and inference Josh Tenenbaum MIT

A theory for blickets(c.f. PRMs, BLOG, FOPL)

Page 78: Bayesian models of human       learning and inference Josh Tenenbaum MIT

attributes (1-12)

observed data

True network

Sample 75 observations…

patients

Learning with a uniform prior on network structures:

Page 79: Bayesian models of human       learning and inference Josh Tenenbaum MIT

True network

Sample 75 observations…

Learning a block-structured prior on network structures: (Mansinghka et al. 2006)

attributes (1-12)

observed datapatients

z

1 2 3 40.80.0 0.01

0.0 0.0 0.75

0.0 0.0 0.0

5 6 7 8

9 1011 12

Page 80: Bayesian models of human       learning and inference Josh Tenenbaum MIT

The “blessing of abstraction”True structure of graphical model G:

edge (G)

class (z)

edge (G)

1 2 3 4 5 6

7 8 9 10 11 12 13 14 15 16

# of samples: 20 80 1000

Data D

Graph G

Data D

Graph G

Abstract theory Z

Page 81: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Outline

• Predicting everyday events

• Causal learning and reasoning

• Learning concepts from examples

Page 82: Bayesian models of human       learning and inference Josh Tenenbaum MIT

“tufa”

“tufa”

“tufa”

Learning from just one or a few examples, and mostly unlabeled examples (“semi-supervised learning”).

Page 83: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Simple model of concept learning

“This is a blicket.”

“Can you show me the other blickets?”

Page 84: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Simple model of concept learning

Other blickets.

“This is a blicket.”

Page 85: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Simple model of concept learning

Learning from just one positive example is possible if:– Assume concepts refer to clusters in the world.

– Observe enough unlabeled data to identify clear clusters.

(c.f. Learning with mixture models and EM, Ghahramani & Jordan, 1994; Nigam et al. 2000)

Other blickets.

“This is a blicket.”

Page 86: Bayesian models of human       learning and inference Josh Tenenbaum MIT

• Fried & Holyoak (1984)– Modeled unsupervised and semi-

supervised categorization as EM in a Gaussian mixture.

• Anderson (1990)– Modeled unsupervised and semi-supervised

categorization as greedy sequential search in an infinite (Chinese restaurant process) mixture.

Concept learning with mixture models in cognitive science

Page 87: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Infinite (CRP) mixture models• Construct from k-component mixtures by integrating out mixing

weights, collapsing equivalent partitions, and taking the limit as .

• Does not require that we commit to a fixed – or even finite – number of classes.

• Effective number of classes can grow with number of data points, balancing complexity with data fit.

• Computationally much simpler than applying Bayesian Occam’s razor or cross-validation.

• Easy to learn with standard Monte Carlo approximations (MCMC, particle filtering), hopefully avoiding local minima.

k

Page 88: Bayesian models of human       learning and inference Josh Tenenbaum MIT

High school lunch room analogy

Page 89: Bayesian models of human       learning and inference Josh Tenenbaum MIT

“nerds”

“jocks”

“punks”

“preppies”

Sampling from the CRP:

Page 90: Bayesian models of human       learning and inference Josh Tenenbaum MIT
Page 91: Bayesian models of human       learning and inference Josh Tenenbaum MIT

“nerds”

“jocks”

“punks”

“preppies”

Gibbs sampler (Neal):Assign to

larger groupsGroup with

similar objects

Page 92: Bayesian models of human       learning and inference Josh Tenenbaum MIT

A typical cognitive experiment

Training stimuli: 1 1 1 1 11 0 1 0 10 1 0 1 10 0 0 0 00 1 0 0 01 0 1 1 0

Test stimuli: 0 1 1 1 ?1 1 0 1 ?1 1 1 0 ?1 0 0 0 ?0 0 1 0 ?0 0 0 1 ?

F1 F2 F3 F4 Label

Page 93: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Anderson (1990), “Rational model of categorization”:

Greedy sequential search in an infinite mixture model.

Sanborn, Griffiths, Navarro (2006), “More rational model of categorization”:

Particle filter with a small # of particles

Page 94: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Towards more natural concepts

Page 95: Bayesian models of human       learning and inference Josh Tenenbaum MIT

CrossCat: Discovering multiple structures that capture different subsets of features(Shafto, Kemp, Mansinghka, Gordon & Tenenbaum, 2006)

Page 96: Bayesian models of human       learning and inference Josh Tenenbaum MIT

conc

ept

concept

predicate

Infinite relational models (Kemp, Tenenbaum, Griffiths, Yamada & Ueda, AAAI 06)

Biomedical predicate data from UMLS (McCrae et al.): – 134 concepts: enzyme, hormone, organ, disease, cell function ...

– 49 predicates: affects(hormone, organ), complicates(enzyme, cell function), treats(drug, disease), diagnoses(procedure, disease) …

(c.f. Xu, Tresp, et al.SRL 06)

Page 97: Bayesian models of human       learning and inference Josh Tenenbaum MIT

e.g., Diseases affect Organisms

Chemicals interact with Chemicals

Chemicals cause Diseases

Infinite relational models (Kemp, Tenenbaum, Griffiths, Yamada & Ueda, AAAI 06)

Page 98: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Learning from very few examples

Cows have T9 hormones.Sheep have T9 hormones.Goats have T9 hormones.

All mammals have T9 hormones.

Cows have T9 hormones.Seals have T9 hormones.Squirrels have T9 hormones.

All mammals have T9 hormones.

• Property induction

• Word learning

“tufa”

“tufa”

“tufa”

Page 99: Bayesian models of human       learning and inference Josh Tenenbaum MIT

The computational problem(c.f., semi-supervised learning)

???????

?

Features New property

?

HorseCow

ChimpGorillaMouse

SquirrelDolphin

SealRhino

Elephant

(85 features from Osherson et al., e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘quadrapedal’,…)

Page 100: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Many sources of priors

Chimps have T9 hormones.

Gorillas have T9 hormones.

Poodles can bite through wire.

Dobermans can bite through wire.

Salmon carry E. Spirus bacteria.

Grizzly bears carry E. Spirus bacteria.

Taxonomic similarity

Jaw strength

Food web relations

Page 101: Bayesian models of human       learning and inference Josh Tenenbaum MIT

F: form

S: structure

D: data

Tree

Hierarchical Bayesian Framework(Kemp & Tenenbaum)

mouse

squirrel

chimp

gorilla

P(structure | form)

P(data | structure)

P(form)

mousesquirrel

chimpgorilla

F1

F2

F3

F4

Ha

s T

9h

orm

on

es

??

?

Page 102: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Smooth: P(h) high

P(D|S): How the structure constrains the data of experience

• Define a stochastic process over structure S that generates hypotheses h.– For generic properties, prior should favor hypotheses that

vary smoothly over structure.

– Many properties of biological species were actually generated by such a process (i.e., mutation + selection).

Not smooth: P(h) low

Page 103: Bayesian models of human       learning and inference Josh Tenenbaum MIT

S

y

Gaussian Process (~ random walk, diffusion)

Threshold

)(~ 1 S

P(D|S): How the structure constrains the data of experience

[Zhu, Ghahramani & Lafferty 2003]

h

Page 104: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10

Structure S

Data D

Features

(85 features from Osherson et al., e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘quadrapedal’,…)

Page 105: Bayesian models of human       learning and inference Josh Tenenbaum MIT
Page 106: Bayesian models of human       learning and inference Josh Tenenbaum MIT
Page 107: Bayesian models of human       learning and inference Josh Tenenbaum MIT

???????

?

Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10

Features New property

Structure S

(85 features from Osherson et al., e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘quadrapedal’,…)

Data D

Page 108: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Gorillas have property P.Mice have property P.Seals have property P.

All mammals have property P.

Cows have property P.Elephants have property P.

Horses have property P.

Tre

e

2D

Page 109: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Reasoning about spatially varying properties

“Native American artifacts” task

Page 110: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Property type “has T9 hormones” “can bite through wire” “carry E. Spirus bacteria”

Theory Structure taxonomic tree directed chain directed network + diffusion process + drift process + noisy transmission

Class C

Class A

Class D

Class E

Class G

Class F

Class BClass C

Class A

Class D

Class E

Class G

Class F

Class B

Class AClass BClass CClass DClass EClass FClass G

. . . . . . . . .

Class C

Class G

Class F

Class E

Class D

Class B

Class A

Hypotheses

Page 111: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Kelp

Human

Dolphin

Sand shark

Mako shark

Tuna

Herring

Kelp Human

Dolphin

Sand shark

Mako sharkTunaHerring

Page 112: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Hierarchical Bayesian Framework

F: form

S: structure

D: data mousesquirrel

chimpgorilla

F1

F2

F3

F4

Tree

mouse

squirrel

chimp

gorilla

mousesquirrel

chimpgorilla

SpaceChain

chimp

gorilla

squirrel

mouse

Page 113: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Discovering structural forms

Ostrich

Robin

Croco

dile

Snake

Bat

Orangu

tan

Turtle

Ostrich Robin Crocodile Snake Bat OrangutanTurtle

Page 114: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Ostrich

Robin

Croco

dile

Snake

Bat

Orangu

tan

Turtle

Angel

GodRock

Plant

Ostrich Robin Crocodile Snake Bat OrangutanTurtle

Discovering structural forms

Linnaeus

“Great chain of being”

Page 115: Bayesian models of human       learning and inference Josh Tenenbaum MIT

People can discover structural forms

• Scientists– Tree structure for living kinds (Linnaeus)– Periodic structure for chemical elements (Mendeleev)

• Children– Hierarchical structure of category labels– Clique structure of social groups– Cyclical structure of seasons or days of the week– Transitive structure for value

Page 116: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Typical structure learning algorithms assume a fixed structural form

Flat Clusters

K-MeansMixture modelsCompetitive learning

Line

Guttman scalingIdeal point models

Tree

Hierarchical clusteringBayesian phylogenetics

Circle

Circumplex models

Euclidean Space

MDSPCAFactor Analysis

Grid

Self-Organizing MapGenerative topographic

mapping

Page 117: Bayesian models of human       learning and inference Josh Tenenbaum MIT

F: form

S: structure

D: data

Hierarchical Bayesian Framework

Favors simplicity

Favors smoothness[Zhu et al., 2003]

mousesquirrel

chimpgorilla

F1

F2

F3

F4

mouse

squirrel

chimp

gorilla

Page 118: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Structural forms as graph grammarsForm FormProcess Process

Page 119: Bayesian models of human       learning and inference Josh Tenenbaum MIT
Page 120: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Development of structural forms as more data are observed

Page 121: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Beyond “Nativism” versus “Empiricism”• “Nativism”: Explicit knowledge of structural forms for

core domains is innate.– Atran (1998): The tendency to group living kinds into hierarchies reflects

an “innately determined cognitive structure”.– Chomsky (1980): “The belief that various systems of mind are organized

along quite different principles leads to the natural conclusion that these systems are intrinsically determined, not simply the result of common mechanisms of learning or growth.”

• “Empiricism”: General-purpose learning systems without explicit knowledge of structural form. – Connectionist networks (e.g., Rogers and McClelland, 2004). – Traditional structure learning in probabilistic graphical models.

Page 122: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Summary: concept learning

Models based on Bayesian inference over hierarchies of structured representations.– How does abstract domain

knowledge guide learning of new concepts?

– How can this knowledge be represented, and how might it be learned?

F: form

S: structure

D: data

mouse

squirrel

chimp

gorilla

mousesquirrel

chimpgorilla

F1

F2

F3

F4

– How can probabilistic inference work together with flexibly structured representations to model complex, real-world learning and reasoning?

Page 123: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Contributions of Bayesian models• Principled quantitative models of human behavior, with

broad coverage and a minimum of free parameters and ad hoc assumptions.

• Explain how and why human learning and reasoning works, in terms of (approximations to) optimal statistical inference in natural environments.

• A framework for studying people’s implicit knowledge about the structure of the world: how it is structured, used, and acquired.

• A two-way bridge to state-of-the-art AI and machine learning.

Page 124: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Looking forward• What we need to understand: the mind’s ability to build

rich models of the world from sparse data.– Learning about objects, categories, and their properties.

– Causal inference

– Language comprehension and production

– Scene understanding

– Understanding other people’s actions, plans, thoughts, goals

• What do we need to understand these abilities?– Bayesian inference in probabilistic generative models

– Hierarchical models, with inference at all levels of abstraction

– Structured representations: graphs, grammars, logic

– Flexible representations, growing in response to observed data

Page 125: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Structure

Data

Whole-object principleShape biasTaxonomic principleContrast principleBasic-level bias

Learning word meanings

(Tenenbaum & Xu)

AbstractPrinciples

Page 126: Bayesian models of human       learning and inference Josh Tenenbaum MIT

AbstractPrinciples

Structure

Data

(Griffiths, Tenenbaum, Kemp et al.)

Causal learning and reasoning

Page 127: Bayesian models of human       learning and inference Josh Tenenbaum MIT

VerbVP

NPVPVP

VNPRelRelClause

RelClauseNounAdjDetNP

VPNPS

][

][][

Phrase structure

Utterance

Speech signal

Grammar

“Universal Grammar” Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG)

P(phrase structure | grammar)

P(utterance | phrase structure)

P(speech | utterance)

(c.f. Chater and Manning, 2006)

P(grammar | UG)

Page 128: Bayesian models of human       learning and inference Josh Tenenbaum MIT

(Han & Zhu, 2006; c.f.,Zhu, Yuanhao & Yuille NIPS 06 )

Vision as probabilistic parsing

Page 129: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Goal-directed action (production and comprehension)

(Wolpert et al., 2003)

Page 130: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Open directions and challenges• Effective methods for learning structured knowledge

– How to balance expressiveness/learnability tradeoff?

• More precise relation to psychological processes– To what extent do mental processes implement boundedly rational methods

of approximate inference?

• Relation to neural computation– How to implement structured representations in brains?

• Modeling individual subjects and single trials– Is there a rational basis for probability matching?

• Understanding failure cases– Are these simply “not Bayesian”, or are people using a different model?

How do we avoid circularity?

Page 131: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Want to learn more?

• Special issue of Trends in Cognitive Sciences (TiCS), July 2006 (Vol. 10, no. 7), on “Probabilistic models of cognition”.

• Tom Griffiths’ reading list, a/k/a http://bayesiancognition.com

• Summer school on probabilistic models of cognition, July 2007, Institute for Pure and Applied Mathematics (IPAM) at UCLA.

Page 132: Bayesian models of human       learning and inference Josh Tenenbaum MIT
Page 133: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Extra slides

Page 134: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Bayesian prediction

P(ttotal|tpast)

ttotal

What is the best guess for ttotal? Compute t such that P(ttotal > t|tpast) = 0.5:

P(ttotal|tpast) 1/ttotal P(tpast)

posterior probability

Randomsampling

Domain-dependent prior

We compared the medianof the Bayesian posteriorwith the median of subjects’judgments… but what about the distribution of subjects’ judgments?

Page 135: Bayesian models of human       learning and inference Josh Tenenbaum MIT

• Individuals’ judgments could by noisy.

• Individuals’ judgments could be optimal, but with different priors. – e.g., each individual has seen only a sparse sample of

the relevant population of events.

• Individuals’ inferences about the posterior could be optimal, but their judgments could be based on probability (or utility) matching rather than maximizing.

Sources of individual differences

Page 136: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Individual differences in prediction

P(ttotal|tpast)

ttotal

Quantile of Bayesian posterior distribution

Pro

port

ion

of ju

dgm

ents

bel

ow p

redi

cted

val

ue

Page 137: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Individual differences in prediction

Average over all prediction tasks:• movie run times• movie grosses• poem lengths• life spans• terms in congress• cake baking times

P(ttotal|tpast)

ttotal

Page 138: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Individual differences in concept learning

Page 139: Bayesian models of human       learning and inference Josh Tenenbaum MIT

• Optimal behavior under some (evolutionarily natural) circumstances. – Optimal betting theory, portfolio theory– Optimal foraging theory– Competitive games– Dynamic tasks (changing probabilities or utilities)

• Side-effect of algorithms for approximating complex Bayesian computations.– Markov chain Monte Carlo (MCMC): instead of integrating over complex

hypothesis spaces, construct a sample of high-probability hypotheses.

– Judgments from individual (independent) samples can on average be almost as good

as using the full posterior distribution.

Why probability matching?

Page 140: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Markov chain Monte Carlo

(Metropolis-Hastings algorithm)

Page 141: Bayesian models of human       learning and inference Josh Tenenbaum MIT

The puzzle of coincidences

Discoveries of hidden causal structure are often driven by noticing coincidences. . .

• Science– Halley’s comet (1705)

Page 142: Bayesian models of human       learning and inference Josh Tenenbaum MIT

(Halley, 1705)

Page 143: Bayesian models of human       learning and inference Josh Tenenbaum MIT

(Halley, 1705)

Page 144: Bayesian models of human       learning and inference Josh Tenenbaum MIT

The puzzle of coincidences

Discoveries of hidden causal structure are often driven by noticing coincidences. . .

• Science– Halley’s comet (1705) – John Snow and the cause of cholera (1854)

Page 145: Bayesian models of human       learning and inference Josh Tenenbaum MIT
Page 146: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Rational analysis of cognition• Often can show that apparently irrational behavior

is actually rational.

Which cards do you have to turn over to test this rule? “If there is an A on one side, then there is a 2 on the other side”

Page 147: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Rational analysis of cognition• Often can show that apparently irrational behavior

is actually rational.• Oaksford & Chater’s rational analysis:

– Optimal data selection based on maximizing expected information gain.

– Test the rule “If p, then q” against the null hypothesis that p and q are independent.

– Assuming p and q are rare predicts people’s choices:

Page 148: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Integrating multiple forms of reasoning(Kemp, Shafto, Berke & Tenenbaum NIPS 06)

1) Taxonomicrelations between categories

2) Causal relationsbetween features

… Parameters of causalrelations vary smoothlyover the category hierarchy.

T9 hormones cause elevated heart rates.Elevated heart rates cause faster metabolisms.Mice have T9 hormones.

…?

Page 149: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Integrating multiple forms of reasoning

Page 150: Bayesian models of human       learning and inference Josh Tenenbaum MIT

conc

ept

concept

predicate

Infinite relational models (Kemp, Tenenbaum, Griffiths, Yamada & Ueda, AAAI 06)

Biomedical predicate data from UMLS (McCrae et al.): – 134 concepts: enzyme, hormone, organ, disease, cell function ...

– 49 predicates: affects(hormone, organ), complicates(enzyme, cell function), treats(drug, disease), diagnoses(procedure, disease) …

(c.f. Xu, Tresp, et al.SRL 06)

Page 151: Bayesian models of human       learning and inference Josh Tenenbaum MIT

e.g., Diseases affect Organisms

Chemicals interact with Chemicals

Chemicals cause Diseases

Learning relational theories

Page 152: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Learning annotated hierarchies from relational data

(Roy, Kemp, Mansinghka, Tenenbaum NIPS 06)

Page 153: Bayesian models of human       learning and inference Josh Tenenbaum MIT

Primate troop Bush administration Prison inmates Kula islands “x beats y” “x told y” “x likes y” “x trades with y”

Dominance hierarchy Tree Cliques Ring

Learning abstract relational structures

Page 154: Bayesian models of human       learning and inference Josh Tenenbaum MIT

(Rao, in press)

Bayesian inference in neural networks

Page 155: Bayesian models of human       learning and inference Josh Tenenbaum MIT

The big problem of intelligence

• The development of intuitive theories in childhood.– Psychology: How do we learn to understand

others’ actions in terms of beliefs, desires, plans, intentions, values, morals?

– Biology: How do we learn that people, dogs, bees, worms, trees, flowers, grass, coral, moss are alive, but chairs, cars, tricycles, computers, the sun, Roomba, robots, clocks, rocks are not?

Page 156: Bayesian models of human       learning and inference Josh Tenenbaum MIT

The big problem of intelligence

Consider a man named Boris. – Is the mother of Boris’s father his grandmother?

– Is the mother of Boris’s sister his mother?

– Is the son of Boris’s sister his son?

(Note: Boris and his family were stranded on a desert island when he was a young boy.)

• Common sense reasoning.