Bayesian models of human inference Josh Tenenbaum MIT.

Bayesian models of human inference

Josh Tenenbaum

MIT

The Bayesian revolution in AI • Principled and effective solutions for inductive inference

from ambiguous data:– Vision– Robotics– Machine learning– Expert systems / reasoning– Natural language processing

• Standard view in AI: no necessary connection to how the human brain solves these problems.– Heuristics & Biases program in the background (“We know people

aren’t Bayesian, but…”).

Bayesian models of cognitionVisual perception [Weiss, Simoncelli, Adelson, Richards, Freeman, Feldman, Kersten, Knill, Maloney,

Olshausen, Jacobs, Pouget, ...]

Language acquisition and processing [Brent, de Marken, Niyogi, Klein, Manning, Jurafsky, Keller, Levy, Hale, Johnson, Griffiths, Perfors, Tenenbaum, …]

Motor learning and motor control [Ghahramani, Jordan, Wolpert, Kording, Kawato, Doya, Todorov, Shadmehr, …]

Associative learning [Dayan, Daw, Kakade, Courville, Touretzky, Kruschke, …]

Memory [Anderson, Schooler, Shiffrin, Steyvers, Griffiths, McClelland, …]

Attention [Mozer, Huber, Torralba, Oliva, Geisler, Yu, Itti, Baldi, …]

Categorization and concept learning [Anderson, Nosfosky, Rehder, Navarro, Griffiths, Feldman, Tenenbaum, Rosseel, Goodman, Kemp, Mansinghka, …]

Reasoning [Chater, Oaksford, Sloman, McKenzie, Heit, Tenenbaum, Kemp, …]

Causal inference [Waldmann, Sloman, Steyvers, Griffiths, Tenenbaum, Yuille, …]

Decision making and theory of mind [Lee, Stankiewicz, Rao, Baker, Goodman, Tenenbaum, …]

How to meet up with mainstream JDM research (i.e., heuristics & biases)?

1. How to reconcile apparently contradictory messages of H&B and Bayesian models?

Are people Bayesian or aren’t they? When are they, when aren’t they, and why?

2. How to integrate the H&B and Bayesian research approaches?

When are people Bayesian, and why?

• Low level hypothesis (Shiffrin, Maloney, etc.)– People are Bayesian in low-level input or output processes

that have a long evolutionary history shared with other species, e.g. vision, motor control, memory retrieval.


• Low level hypothesis (Shiffrin, Maloney, etc.)• Information format hypothesis (Gigerenzer)

– Higher-level cognition can be Bayesian when information is presented in formats that we have evolved to process, and that support simple heuristic algorithms, e.g., base-rate neglect disappears with “natural frequencies”.

Explicit probabilities

Natural frequencies


• Low level hypothesis (Shiffrin, Maloney, etc.)• Information format hypothesis (Gigerenzer)• Core capacities hypothesis

– Bayes can illuminate core human cognitive capacities for inductive inference – learning words and concepts, projecting properties of objects, causal inference, or action understanding: problems we solve effortlessly, unconsciously, and successfully in natural contexts, which any five-year-old solves better than any animal or computer.



Causal induction

(Sobel, Griffiths, Tenenbaum, & Gopnik)

E

BA

? ?

BB

A AB

AB Trial A Trial

BA



Hypothesisspace

Data

(Tenenbaum & Xu)

Word learning



– Bayes can illuminate core human cognitive capacities for inductive inference – learning words and concepts, projecting properties of objects, causal inference, or action understanding: problems we solve effortlessly, unconsciously, and successfully in natural contexts, which a five-year-old solves better than any animal or computer.

– The mind is not good at explicit Bayesian reasoning about verbally or symbolically presented statistics, unless core capacities can be engaged.



Statistical version ofDiagnosis problem

Causal version ofDiagnosis problem

Correct

Base-rateneglect

(Krynski & Tenenbaum)

How to meet up with mainstream JDM research (i.e., heuristics & biases)?

1. How to reconcile apparently contradictory messages of H&B and Bayesian models?

Are people Bayesian or aren’t they? When are they, when aren’t they, and why?

2. How to integrate the H&B and Bayesian research approaches?

Reverse engineering

• Goal is to reverse-engineer human inference.– A computational understanding of how the mind

works and why it works it does.

• Even for core inferential capacities, we are likely to observe behavior that deviates from any ideal Bayesian analysis.

• These deviations are likely to be informative about how the mind works.

Analogy to visual illusions

(Shepard)

• Highlight the problems the visual system is designed to solve: inferring world structure from images, not judging properties of the images themselves.

• Reveal the implicit visual system’s implicit assumptions about the physical world and the processes of image formation that are needed to solve these problems.

(Adelson)

How do we interpret deviations from a Bayesian analysis?

• H&B: People aren’t Bayesian, but use some other means of inference. – Base-rate neglect: representativeness heuristic – Recency bias: availability heuristic– Order of evidence effects: anchoring and adjustment– …

• Not so compelling as reverse engineering.– What engineer would want to design a system based on

“representativeness”, without knowing how it is computed, why it is computed that way, what problem it attempts to solve, when it works, or how its accuracy and efficiency compares to some ideal computation or other heuristics.


Multiple levels of analysis (Marr)• Computational theory

– What is the goal of the computation – the outputs and available inputs? What is the logic by which the inference can be performed? What constraints (prior knowledge) do people assume to make the solution well-posed?

• Representation and algorithm– How is the information represented? How is the computation

carried out algorithmically, approximating the ideal computational theory with realistic time & space resources?

• Hardware implementation







Bayes

Different philosophies• H&B

– One canonical Bayesian analysis of any given task, and we know what it is. – Ideal Bayesian solution can be computed.– The question “Are people Bayesian?” is empirically meaningful on any given

task.

• Bayes+Marr– Many possible Bayesian analyses of any given task, and we need to discover

which best characterize cognition.– Ideal Bayesian solution can only be approximately computed. – The question “Are people Bayesian?” is not an empirical one, at least not for

an individual task. Bayes is a framework-level assumption, like distributed representations in connectionism or condition-action rules in ACT-R.







The centrality of causal inference

(Griffiths & Tenenbaum)

• In visual perception:

– Judge P(scene|image features) rather than P(image features|scene) or P(image features|other image features).

• Coin–flipping: Which sequence is more likely to come from flipping a fair coin, HHTHT or HHHHH?

• Coincidences: How likely that 2 people in a random party of 25 have the same birthday? 3 in a party of 10?

(Griffiths & Tenenbaum)

Rational measure of evidential support:

Judgments of randomness:

)|(

)|(

0

1

hdataP

hdataP

)|(

)|()|(

regulardataP

randomdataPdatarandomP

Judgments of coincidence:

)|(

)|()|(

randomdataP

regulardataPdataregularP







Assuming the world is simple

P(A is a blicket|data) = 1 P(B is a blicket|data) ~ 1/6

P(A is a blicket|data) ~ 3/4 P(B is a blicket|data) ~ 1/4

AB Trial A Trial

AB Trial AC Trial

CA B

A BB

BA A

A ABBC

C

• In visual perception:– “Slow and smooth”

prior on visual motion–

• Causal induction:– P(blicket) = 1/6, “Activation law”

A B

Recognizing the world is complex

(Kemp & Tenenbaum)

• In visual perception:– Need uncertainty about

coherence ratio and velocity of coherent motion. (Lu & Yuille)

• Property induction:– Properties should be

distributed stochastically over tree structure, not just focused on single branches.

Gorillas have T9 cells.Seals have T9 cells.

Horses have T9 cells.

Bayes:single

branchprior

r = 0.50

Recognizing the world is complex

(Kemp & Tenenbaum)

• In visual perception:– Need uncertainty about

coherence ratio and velocity of coherent motion. (Lu & Yuille)

• Property induction:– Properties should be

distributed stochastically over tree structure, not just focused on single branches. Bayes:

“mutation”prior

Gorillas have T9 cells.Seals have T9 cells.

Horses have T9 cells.

r = 0.92

“has T9hormones”

“is found nearMinneapolis”

“can bite through wire”

“carry E. Spirus bacteria”

(Kemp & Tenenbaum)







Sampling-based approximate inference

(Griffiths et al., Goodman et al.)

• In visual perception:– Temporal dynamics

of bi-stability due to fast sampling-based approximation of a

bimodal posterior (Schrater & Sundareswara).

• Order effects in category learning– Particle filter (sequential Monte Carlo), an online approximate inference algorithm assuming

stationarity.

• Probability matching in classification decisions– Sampling-based approximations with guarantees of near optimal generalization performance.

Conclusions• “Are people Bayesian?”, “When are they Bayesian?”

– Maybe not the most interesting questions in the long run….

• What is the best way to reverse engineer cognition at multiple levels of analysis? Assuming core inductive capacities are approximately Bayesian at the computational-theory level offers several benefits:– Explanatory power: why does cognition work? – Fewer degrees of freedom in modeling– A bridge to state-of-the-art AI and machine learning– Tools to study the big questions: What are the goals of cognition? What

does the mind know about the world? How is that knowledge represented? What are the processing mechanisms and why do they work as they do?

Bayesian models of human inference Josh Tenenbaum MIT.

Documents

people bayesian

people arent bayesian

bayesian revolution

low level hypothesis

bayesian research approaches

motor learning

causal inference waldmann

trial b