Top Banner
Bayesian models of human inductive learning Josh Tenenbaum MIT
73
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Bayesian models of human inductive learning

Josh TenenbaumMIT

Page 2: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Everyday inductive leaps

How can people learn so much about the world from such limited evidence?– Kinds of objects and their properties– The meanings of words, phrases, and sentences – Cause-effect relations– The beliefs, goals and plans of other people– Social structures, conventions, and rules

“tufa”

Page 3: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Modeling Goals• Explain how and why human learning and reasoning works,

in terms of (approximations to) optimal statistical inference in natural environments.

• Computational-level theories that provide insights into algorithmic- or processing-level questions.

• Principled quantitative models of human behavior, with broad coverage and a minimum of free parameters and ad hoc assumptions.

• A framework for studying people’s implicit knowledge about the world: how it is structured, used, and acquired.

• A two-way bridge to state-of-the-art AI, machine learning.

Page 4: Bayesian models of human inductive learning Josh Tenenbaum MIT.

1. How does background knowledge guide learning from sparsely observed data? Bayesian inference, with priors based on background knowledge.

2. What form does background knowledge take, across different domains and tasks? Probabilities defined over structured representations: graphs, grammars, rules, logic, relational schemas, theories.

3. How is background knowledge itself learned? Hierarchical Bayesian models, with inference at multiple levels of abstraction.

4. How can background knowledge constrain learning yet maintain flexibility, balancing assimilation and accommodation? Nonparametric models, growing in complexity as the data require.

Explaining inductive learning

Page 5: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Two case studies

• The number game• Property induction

Page 6: Bayesian models of human inductive learning Josh Tenenbaum MIT.

The number game

• Program input: number between 1 and 100

• Program output: “yes” or “no”

Page 7: Bayesian models of human inductive learning Josh Tenenbaum MIT.

The number game

• Learning task:– Observe one or more positive (“yes”) examples.– Judge whether other numbers are “yes” or “no”.

Page 8: Bayesian models of human inductive learning Josh Tenenbaum MIT.

The number game

Examples of“yes” numbers

Generalizationjudgments (N = 20)

60Diffuse similarity

Page 9: Bayesian models of human inductive learning Josh Tenenbaum MIT.

The number game

Examples of“yes” numbers

Generalizationjudgments (N = 20)

60

60 80 10 30

Diffuse similarity

Rule: “multiples of 10”

Page 10: Bayesian models of human inductive learning Josh Tenenbaum MIT.

The number game

Examples of“yes” numbers

Generalizationjudgments (N = 20)

60

60 80 10 30

60 52 57 55

Diffuse similarity

Rule: “multiples of 10”

Focused similarity: numbers near 50-60

Page 11: Bayesian models of human inductive learning Josh Tenenbaum MIT.

The number game

Examples of“yes” numbers

Generalizationjudgments (N = 20)

16

16 8 2 64

16 23 19 20

Diffuse similarity

Rule: “powers of 2”

Focused similarity: numbers near 20

Page 12: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Main phenomena to explain:– Generalization can appear either similarity-based (graded) or rule-based (all-or-

none). – Learning from just a few positive examples.

60

60 80 10 30

60 52 57 55

Diffuse similarity

Rule: “multiples of 10”

Focused similarity: numbers near 50-60

The number game

Page 13: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Divisions into “rule” and “similarity” subsystems?

• Category learning– Nosofsky, Palmeri et al.: RULEX– Erickson & Kruschke: ATRIUM

• Language processing– Pinker, Marcus et al.: Past tense morphology

• Reasoning– Sloman – Rips– Nisbett, Smith et al.

Page 14: Bayesian models of human inductive learning Josh Tenenbaum MIT.

• H: Hypothesis space of possible concepts:– h1 = {2, 4, 6, 8, 10, 12, …, 96, 98, 100} (“even numbers”)

– h2 = {10, 20, 30, 40, …, 90, 100} (“multiples of 10”)

– h3 = {2, 4, 8, 16, 32, 64} (“powers of 2”)

– h4 = {50, 51, 52, …, 59, 60} (“numbers between 50 and 60”)

– . . .

Bayesian model

Representational interpretations for H:– Candidate rules

– Features for similarity

– “Consequential subsets” (Shepard, 1987)

Page 15: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Three hypothesis subspaces for number concepts

• Mathematical properties (24 hypotheses): – Odd, even, square, cube, prime numbers– Multiples of small integers– Powers of small integers

• Raw magnitude (5050 hypotheses): – All intervals of integers with endpoints between 1 and

100.

• Approximate magnitude (10 hypotheses):– Decades (1-10, 10-20, 20-30, …)

Page 16: Bayesian models of human inductive learning Josh Tenenbaum MIT.

• H: Hypothesis space of possible concepts:– Mathematical properties: even, odd, square, prime, . . . .

– Approximate magnitude: {1-10}, {10-20}, {20-30}, . . . .

– Raw magnitude: all intervals between 1 and 100.

• X = {x1, . . . , xn}: n examples of a concept C.

• Evaluate hypotheses given data:

– p(h) [prior]: domain knowledge, pre-existing biases

– p(X|h) [likelihood]: statistical information in examples.

– p(h|X) [posterior]: degree of belief that h is the true extension of C.

Bayesian model

Hh

hphXp

hphXpXhp

)()|(

)()|()|(

Page 17: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Generalizing to new objects

Given p(h|X), how do we compute , the probability that C applies to some new stimulus y?

x1 x2 x3 x4

h

Background knowledge

X =

)|( XCyp

?Cy

Page 18: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Generalizing to new objects

Hypothesis averaging:

Compute the probability that C applies to some new object y by averaging the predictions of all hypotheses h, weighted by p(h|X):

Hh

XhphCypXCyp )|()|()|(

hy

hy

if 0

if 1

},{

)|(Xyh

Xhp

Page 19: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Likelihood: p(X|h)

• Size principle: Smaller hypotheses receive greater likelihood, and exponentially more so as n increases.

• Follows from assumption of randomly sampled examples + law of “conservation of belief”:

• Captures the intuition of a “representative” sample.

hxx

n

nhhXp

,,if

1)size(

1)|(

hxi any if 0

1)|( all

MdDpDd

Page 20: Bayesian models of human inductive learning Josh Tenenbaum MIT.

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100

Illustrating the size principle

h1 h2

Page 21: Bayesian models of human inductive learning Josh Tenenbaum MIT.

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100

Illustrating the size principle

h1 h2

Data slightly more of a coincidence under h1

Page 22: Bayesian models of human inductive learning Josh Tenenbaum MIT.

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100

Illustrating the size principle

h1 h2

Data much more of a coincidence under h1

Page 23: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Prior: p(h)

• Choice of hypothesis space embodies a strong prior: effectively, p(h) ~ 0 for many logically possible but conceptually unnatural hypotheses.

• Prevents overfitting by highly specific but unnatural hypotheses, e.g. “multiples of 10 except 50 and 70”.

Page 24: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Prior: p(h)

• Choice of hypothesis space embodies a strong prior: effectively, p(h) ~ 0 for many logically possible but conceptually unnatural hypotheses.

• Prevents overfitting by highly specific but unnatural hypotheses, e.g. “multiples of 10 except 50 and 70”.

e.g., X = {60 80 10 30}:

0001.010

1)10 of multiples|(

4

Xp

00024.08

1)70 50,except 10 of multiples|(

4

Xp

Page 25: Bayesian models of human inductive learning Josh Tenenbaum MIT.

The “ugly duckling” theorem

Hypotheses

How would we generalize without any inductive bias – without constraints on the hypothesis space, informative priors or likelihoods?

1234

Objects

Page 26: Bayesian models of human inductive learning Josh Tenenbaum MIT.

The “ugly duckling” theorem

1234

)|}3{( hXp

Hypotheses

Objects

1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0

Page 27: Bayesian models of human inductive learning Josh Tenenbaum MIT.

The “ugly duckling” theorem

})3{|4( XCp

})3{|2( XCp

1234

Hypotheses

Objects

)|}3{( hXp 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0

1 + 1 + 1 + 1 = 4/8

})3{|1( XCp 1 1 + 1 1 = 4/8

1 1 + 1 1 = 4/8

Page 28: Bayesian models of human inductive learning Josh Tenenbaum MIT.

The “ugly duckling” theorem

1234

Hypotheses

Objects

Without any inductive bias – constraints on hypotheses, informative priors or likelihoods – no meaningful generalization!

})1,3{|4( XCp

)|}1,3{( hXp 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0

1 + 1 = 2/4

})1,3{|2( XCp 1 1 + = 2/4

Page 29: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Posterior:

• X = {60, 80, 10, 30}

• Why prefer “multiples of 10” over “even numbers”? p(X|h).

• Why prefer “multiples of 10” over “multiples of 10 except 50 and 20”? p(h).

• Why does a good generalization need both high prior and high likelihood? p(h|X) ~ p(X|h) p(h)

Hh

hphXp

hphXpXhp

)()|(

)()|()|(

Page 30: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Prior: p(h)

• Choice of hypothesis space embodies a strong prior: effectively, p(h) ~ 0 for many logically possible but conceptually unnatural hypotheses.

• Prevents overfitting by highly specific but unnatural hypotheses, e.g. “multiples of 10 except 50 and 70”.

• p(h) encodes relative weights of alternative theories:

H1: Math properties (24)

• even numbers• powers of two• multiples of three ….

H2: Raw magnitude (5050)

• 10-15• 20-32• 37-54 ….

H3: Approx. magnitude (10)

• 10-20• 20-30• 30-40 ….

H: Total hypothesis spacep(H1) = 1/5

p(H2) = 3/5p(H3) = 1/5

p(h) = p(H1) / 24 p(h) = p(H2) / 5050 p(h) = p(H3) / 10

Page 31: Bayesian models of human inductive learning Josh Tenenbaum MIT.

+ Examples Human generalization

60

60 80 10 30

60 52 57 55

Bayesian Model

16

16 8 2 64

16 23 19 20

Page 32: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Examples: 16

Page 33: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Examples: 16 8 2 64

Page 34: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Examples: 16 23 19 20

Page 35: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Summary of the Bayesian model

• How do the statistics of the examples interact with prior knowledge to guide generalization?

• Why does generalization appear rule-based or similarity-based?

priorlikelihoodposterior

principle size averaging hypothesis

broad p(h|X): similarity gradient narrow p(h|X): all-or-none rule

Page 36: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Summary of the Bayesian model

• How do the statistics of the examples interact with prior knowledge to guide generalization?

• Why does generalization appear rule-based or similarity-based?

priorlikelihoodposterior

principle size averaging hypothesis

broad p(h|X): Many h of similar size, or very few examples (i.e. 1)narrow p(h|X): One h much smaller

Page 37: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Alternative models

• Neural networks

• Hypothesis ranking and elimination

• Similarity to exemplars

Time?

Page 38: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Alternative models• Neural networks

even multiple of 10

power of 2

multiple of 3

80

10

30

60

Page 39: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Alternative models• Neural networks

• Hypothesis ranking and elimination

even multiple of 10

power of 2

multiple of 3

80

10

30

60

Hypothesis ranking: 1 2 3 4 ….

….

Page 40: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Model (r = 0.80)Data

Alternative models• Neural networks

• Hypothesis ranking and elimination

• Similarity to exemplars– Average similarity:

60

60 80 10 30

60 52 57 55

),(sim||

1)|( j

Xx

xyX

XCyp

j

Page 41: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Alternative models

• Neural networks

• Hypothesis ranking and elimination

• Similarity to exemplars– Flexible similarity? Bayes.

Page 42: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Distance in psychological space

Probability of generalization

The universal law of generalization

Page 43: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Explaining the universal law(Tenenbaum & Griffiths, 2001)

Bayesian generalization when the hypotheses correspond to convex regions in a low-dimensional metric space (e.g., intervals in one dimension), with an isotropic prior.

)|( xCyp x

y

xy1

y2

y3

Page 44: Bayesian models of human inductive learning Josh Tenenbaum MIT.

• Assume a gradient of typicality • Examples sampled in proportion to their typicality:

• Size of hypotheses now

horse camel

Asymmetric generalization

Page 45: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Asymmetric generalization

Symmetry may depend heavily on the context:– Healthy levels of hormone (left) versus healthy levels

of toxin (right)

– Predicting the durations or magnitudes of events.

Page 46: Bayesian models of human inductive learning Josh Tenenbaum MIT.

“tufa” “tufa”

“tufa”

Modeling word learningBayesian inference over tree-structured hypothesis space:

(Xu & Tenenbaum; Schmidt & Tenenbaum)

Page 47: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Taking stock• A model of high-level, knowledge-driven inductive

reasoning that makes strong quantitative predictions with minimal free parameters. (r2 > 0.9 for mean judgments on 180 generalization stimuli, with 3

free numerical parameters)

• Explains qualitatively different patterns of generalization (rules, similarity) as the output of a single general-purpose rational inference engine.

• Differently structured hypothesis spaces account for different kinds of generalization behavior seen in different domains and contexts.

Page 48: Bayesian models of human inductive learning Josh Tenenbaum MIT.

What’s missing: How do we choose a good prior?

• Can we describe formally how these priors are generated by abstract knowledge or theories?

• Can we move from ‘weak rational analysis’ to ‘strong rational analysis’ in inductive learning?– “Weak”: behavior consistent with some reasonable prior.– “Strong”: behavior consistent with the “correct” prior given the structure

of the world (c.f., ideal observer analyses in vision).

• Can we explain how people learn these rich priors? • Can we work with more flexible priors, not just restricted to a

small subset of all logically possible concepts? – Would like to be able to learn any concept, even complex and unnatural

ones, given enough data (a non-dogmatic prior).

Page 49: Bayesian models of human inductive learning Josh Tenenbaum MIT.

• How likely is the conclusion, given the premises?

“Similarity”, “Typicality”,

“Diversity”

Gorillas have T9 hormones.Seals have T9 hormones.Squirrels have T9 hormones.

Horses have T9 hormones.Gorillas have T9 hormones.Chimps have T9 hormones.Monkeys have T9 hormones.Baboons have T9 hormones.

Horses have T9 hormones.

Gorillas have T9 hormones.Seals have T9 hormones.Squirrels have T9 hormones.

Flies have T9 hormones.

Property induction

Page 50: Bayesian models of human inductive learning Josh Tenenbaum MIT.

The computational problem

?

?????

??

Features New property

?

HorseCow

ChimpGorillaMouse

SquirrelDolphin

SealRhino

Elephant

85 features for 50 animals (Osherson et al.): e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…

“Transfer Learning”, “Semi-Supervised Learning”

Page 51: Bayesian models of human inductive learning Josh Tenenbaum MIT.

???????

?

HorseCow

ChimpGorillaMouse

SquirrelDolphin

SealRhino

Elephant

... ...

Horses have T9 hormonesRhinos have T9 hormones

Cows have T9 hormones

X

Y

}

Xh

YXh

hP

hP

XYP

with consistent

, with consistent

)(

)(

)|(

Prior P(h)

Hypotheses h

Page 52: Bayesian models of human inductive learning Josh Tenenbaum MIT.

F: form

S: structure

D: data

Tree with species at leaf nodes

mouse

squirrel

chimp

gorilla

mousesquirrel

chimpgorilla

F1

F2

F3

F4

Ha

s T

9h

orm

on

es

??

?

P(structure | form)

P(data | structure)

P(form)

Hierarchical Bayesian Framework(Kemp & Tenenbaum)

Page 53: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Smooth: P(h) high

P(D|S): How the structure constrains the data of experience

• Define a stochastic process over structure S that generates candidate property extensions h.– Intuition: properties should vary smoothly over structure.

Not smooth: P(h) low

Page 54: Bayesian models of human inductive learning Josh Tenenbaum MIT.

S

y

Gaussian Process (~ random walk, diffusion)

Threshold

P(D|S): How the structure constrains the data of experience

h

[Zhu, Lafferty & Ghahramani 2003]

Page 55: Bayesian models of human inductive learning Josh Tenenbaum MIT.

S

y

Gaussian Process (~ random walk, diffusion)

Threshold

P(D|S): How the structure constrains the data of experience

[Zhu, Lafferty & Ghahramani 2003]

h

Page 56: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10

Structure S

Data D

Features

85 features for 50 animals (Osherson et al.): e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…

Page 57: Bayesian models of human inductive learning Josh Tenenbaum MIT.
Page 58: Bayesian models of human inductive learning Josh Tenenbaum MIT.

[c.f., Lawrence, 2004; Smola & Kondor 2003]

Page 59: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10

Features New property

Structure S

Data D ?

?????

??

85 features for 50 animals (Osherson et al.): e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…

Page 60: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Probability of generalization:

Page 61: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Gorillas have property P.Mice have property P.Seals have property P.

All mammals have property P.

Cows have property P.Elephants have property P.

Horses have property P.

Tre

e

2D

Page 62: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Testing different priors

Correctbias

Wrongbias

Too weakbias

Too strongbias

Inductive bias

Page 63: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Learning about spatial properties

Geographic inference task: “Given that a certain kind of native American artifact has been found in sites near city X, how likely is the same artifact to be found near city Y?”

Tre

e

2D

Page 64: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Summary so far• A framework for modeling human inductive

reasoning as rational statistical inference over structured knowledge representations– Qualitatively different priors are appropriate for different

domains of property induction.

– In each domain, a prior that matches the world’s structure fits people’s judgments well, and better than alternative priors.

– A language for representing different theories: graph structure defined over objects + probabilistic model for the distribution of properties over that graph.

• Remaining question: How can we learn appropriate structures for different domains?

Page 65: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Hierarchical Bayesian Framework

F: form

S: structure

D: data mousesquirrel

chimpgorilla

F1

F2

F3

F4

Tree

mouse

squirrel

chimp

gorilla

ClustersLinear

chimp

gorilla

squirrel

mouse

mouse

squirrel

chimp

gorilla

Page 66: Bayesian models of human inductive learning Josh Tenenbaum MIT.

F: form

S: structure

D: data mousesquirrel

chimpgorilla

F1

F2

F3

F4

Favors simplicity

Favors smoothness[Zhu et al., 2003]

Tree

mouse

squirrel

chimp

gorilla

ClustersLinear

chimp

gorilla

squirrel

mouse

mouse

squirrel

chimp

gorilla

Page 67: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Hypothesis space of structural forms

Order Chain RingPartition

Hierarchy Tree Grid Cylinder

Page 68: Bayesian models of human inductive learning Josh Tenenbaum MIT.
Page 69: Bayesian models of human inductive learning Josh Tenenbaum MIT.
Page 70: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Development of structural forms as more data are observed

Page 71: Bayesian models of human inductive learning Josh Tenenbaum MIT.

The “blessing of abstraction”• Often quicker to learn at higher levels of abstraction.

– Quicker to learn that you have a biased coin than to learn its precise bias, or to learn that you have a second-order polynomial than to learn the precise coefficients.

– Quicker to learn that shape matters most for labeling object categories than to learn the labels for most categories.

– Quicker to learn that a domain is tree-structured than to learn the precise tree that best

characterizes it.

• Explanation in hierarchical Bayesian models: – At higher levels, hypothesis space gets smaller and simpler, and draw support (albeit

indirectly) from a broader sample of data.

– Total hypothesis space gets bigger when we add levels of abstraction, but the effective number of degrees of freedom only decreases, because higher levels specify constraints on lower levels.

– Hence the overall learning problem becomes easier.

Page 72: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Beyond “Nativism” versus “Empiricism”• “Nativism”: Explicit knowledge of structural forms for

core domains is innate.– Atran (1998): The tendency to group living kinds into hierarchies reflects

an “innately determined cognitive structure”.– Chomsky (1980): “The belief that various systems of mind are organized

along quite different principles leads to the natural conclusion that these systems are intrinsically determined, not simply the result of common mechanisms of learning or growth.”

• “Empiricism”: General-purpose learning systems without explicit knowledge of structural form. – Connectionist networks (e.g., Rogers and McClelland, 2004). – Traditional structure learning in probabilistic graphical models.

Page 73: Bayesian models of human inductive learning Josh Tenenbaum MIT.

Conclusions• Computational tools for studying core questions of human learning (and

building more human-like machine learning)– What is the structure of knowledge, at multiple levels of abstraction?

– How does abstract domain knowledge guide new learning?

– How can abstract domain knowledge itself be learned?

– How can inductive biases provide strong constraints yet be flexible?

• A different way to think about the development of cognition.– Powerful abstractions can be learned “from the top down”, together with or prior to

learning more concrete knowledge.

• Go beyond the traditional “either-or” dichotomies: – How can probabilistic inference over symbolic hypotheses span the range of “rule-based”

to “similarity-based” generalization?

– How can domain-general learning mechanisms acquire domain-specific representations?

– How can structured symbolic representations be acquired by statistical learning?