Exploring cultural transmission by iterated learning
Tom GriffithsBrown University
Mike KalishUniversity of Louisiana
With thanks to: Anu Asnaani, Brian Christian, and Alana Firl
Cultural transmission
• Most knowledge is based on secondhand data
• Some things can only be learned from others– cultural knowledge transmitted across generations
• What are the consequences of learners learning from other learners?
Iterated learning(Kirby, 2001)
Each learner sees data, forms a hypothesis, produces the data given to the next learner
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Objects of iterated learning
• Knowledge communicated through data
• Examples:– religious concepts– social norms– myths and legends– causal theories– language
Analyzing iterated learning
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
PL(h|d): probability of inferring hypothesis h from data d
PP(d|h): probability of generating data d from hypothesis h
PL(h|d)
PP(d|h)
PL(h|d)
PP(d|h)
Analyzing iterated learning
What are the consequences of iterated learning?
Simulations
Analytic results
Complexalgorithms
Simplealgorithms
Komarova, Niyogi, & Nowak (2002)
Brighton (2002)
Kirby (2001)
Smith, Kirby, & Brighton (2003)
?
Bayesian inference
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Reverend Thomas Bayes
• Rational procedure for updating beliefs
• Foundation of many learning algorithms
• Widely used for language learning
Bayes’ theorem
€
P(h | d) =P(d | h)P(h)
P(d | ′ h )P( ′ h )′ h ∈H
∑
Posteriorprobability
Likelihood Priorprobability
Sum over space of hypothesesh: hypothesis
d: data
Iterated Bayesian learning
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Learners are Bayesian agents
€
PL (h | d) =PP (d | h)P(h)
PP (d | ′ h )P( ′ h )′ h ∈H
∑
PL(h|d)
PP(d|h)
PL(h|d)
PP(d|h)
• Variables x(t+1) independent of history given x(t)
• Converges to a stationary distribution under easily checked conditions for ergodicity
x x x x x x x x
Transition matrixT = P(x(t+1)|x(t))
Markov chains
Stationary distributions
• Stationary distribution:
• In matrix form
is the first eigenvector of the matrix T
• Second eigenvalue sets rate of convergence
€
i = P(x(t +1) = i |j
∑ x(t ) = j)π j = Tijπ j
j
∑
€
=Tπ
Analyzing iterated learningd0 h1 d1 h2
PL(h|d) PP(d|h) PL(h|d)d2 h3
PP(d|h) PL(h|d)
d PP(d|h)PL(h|d)h1 h2d PP(d|h)PL(h|d)
h3
A Markov chain on hypotheses
d0 d1h PL(h|d) PP(d|h)d2h PL(h|d) PP(d|h) h PL(h|d) PP(d|h)
A Markov chain on data
PL(h|d) PP(d|h) PL(h|d) PP(d|h)h1,d1 h2 ,d2 h3 ,d3
A Markov chain on hypothesis-data pairs
Stationary distributions
• Markov chain on h converges to the prior, p(h)
• Markov chain on d converges to the “prior predictive distribution”
• Markov chain on (h,d) is a Gibbs sampler for
€
p(d) = p(d | h)h
∑ p(h)
€
p(d,h) = p(d | h) p(h)
Implications
• The probability that the nth learner entertains the hypothesis h approaches p(h) as n
• Convergence to the prior occurs regardless of:– the properties of the hypotheses themselves– the amount or structure of the data transmitted
• The consequences of iterated learning are determined entirely by the biases of the learners
Identifying inductive biases
• Many problems in cognitive science can be formulated as problems of induction– learning languages, concepts, and causal relations
• Such problems are not solvable without bias(e.g., Goodman, 1955; Kearns & Vazirani, 1994; Vapnik, 1995)
• What biases guide human inductive inferences?
If iterated learning converges to the prior, then it may provide a method for investigating biases
Serial reproduction(Bartlett, 1932)
• Participants see stimuli, then reproduce them from memory
• Reproductions of one participant are stimuli for the next
• Stimuli were interesting, rather than controlled– e.g., “War of the Ghosts”
Iterated function learning(heavy lifting by Mike Kalish)
• Each learner sees a set of (x,y) pairs
• Makes predictions of y for new x values
• Predictions are data for the next learner
data hypotheses
Function learning experiments
Stimulus
Response
Slider
Feedback
Examine iterated learning with different initial data
1 2 3 4 5 6 7 8 9
IterationInitialdata
Iterated concept learning(heavy lifting by Brian Christian)
• Each learner sees examples from a species
• Identifies species of four amoebae
• Iterated learning is run within-subjects
data hypotheses
Two positive examples
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
data (d)
hypotheses (h)
Bayesian model(Tenenbaum, 1999; Tenenbaum & Griffiths, 2001)
€
P(h | d) =P(d | h)P(h)
P(d | ′ h )P( ′ h )′ h ∈H
∑d: 2 amoebaeh: set of 4 amoebae
€
P(d | h) =1/ h
m
0
⎧ ⎨ ⎩
d ∈ h
otherwise
m: # of amoebae in the set d (= 2)|h|: # of amoebae in the set h (= 4)
€
P(h | d) =P(h)
P( ′ h )h '|d ∈h'
∑Posterior is renormalized prior
What is the prior?
Classes of concepts(Shepard, Hovland, & Jenkins, 1958)
Class 1
Class 2
Class 3
Class 4
Class 5
Class 6
shape
size
color
Experiment design (for each subject)Class 1Class 2Class 3Class 4Class 5Class 6Class 1Class 2Class 3Class 4Class 5Class 6
6 iterated learning chains
6 independent
learning “chains”
Estimating the prior
data (d)hy
poth
eses
(h)
Estimating the prior
Class 1Class 2
Class 3
Class 4
Class 5
Class 6
0.8610.087
0.009
0.002
0.013
0.028
Prior
r = 0.952
Bayesian modelHuman subjects
Two positive examples(n = 20)
Prob
abil
ity
Iteration
Prob
abil
ity
Iteration
Human learners Bayesian model
Two positive examples(n = 20)
Prob
abil
ity
Bayesian model
Human learners
Three positive examples
data (d)
hypotheses (h)
Three positive examples(n = 20)
Prob
abil
ity
Iteration
Prob
abil
ity
Iteration
Human learners Bayesian model
Three positive examples(n = 20)
Bayesian model
Human learners
Conclusions
• Consequences of iterated learning with Bayesian learners determined by the biases of the learners
• Consistent results are obtained with human learners
• Provides an explanation for cultural universals…– universal properties are probable under the prior– a direct connection between mind and culture
• …and a novel method for evaluating the inductive biases that guide human learning
Discovering the biases of models
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Generic neural network:
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Discovering the biases of models
EXAM (Delosh, Busemeyer, & McDaniel, 1997):
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Discovering the biases of models
POLE (Kalish, Lewandowsky, & Kruschke, 2004):