-
Developmental Science 9:2 (2006), pp 125–157
© 2006 The Authors. Journal compilation © 2006 Blackwell
Publishing Ltd. Published by Blackwell Publishing Ltd., 9600
Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA
02148, USA.
Blackwell Publishing LtdTARGET ARTICLE WITH COMMENTARIES AND
RESPONSEThe emergence of gaze followingGaze following: why (not)
learn it?
Jochen Triesch,1,2 Christof Teuscher,3 Gedeon O. Deák1 and Eric
Carlson1
1. Department of Cognitive Science, University of California,
San Diego, USA2. Frankfurt Institute for Advanced Studies, Johann
Wolfgang Goethe University, Germany3. Los Alamos National
Laboratory, Los Alamos, USA
For commentaries on this article see Csibra (2006), Moore (2006)
and Richardson and Thomas (2006).
Abstract
We propose a computational model of the emergence of gaze
following skills in infant–caregiver interactions. The model
isbased on the idea that infants learn that monitoring their
caregiver’s direction of gaze allows them to predict the locations
ofinteresting objects or events in their environment (Moore &
Corkum, 1994). Elaborating on this theory, we demonstrate thata
specific Basic Set of structures and mechanisms is sufficient for
gaze following to emerge. This Basic Set includes the
infant’sperceptual skills and preferences, habituation and
reward-driven learning, and a structured social environment
featuring acaregiver who tends to look at things the infant will
find interesting. We review evidence that all elements of the Basic
Set areestablished well before the relevant gaze following skills
emerge. We evaluate the model in a series of simulations and show
thatit can account for typical development. We also demonstrate
that plausible alterations of model parameters, motivated by
findingson two different developmental disorders – autism and
Williams syndrome – produce delays or deficits in the emergence of
gazefollowing. The model makes a number of testable predictions. In
addition, it opens a new perspective for theorizing
aboutcross-species differences in gaze following.
Introduction
The capacity for shared attention is a cornerstone of
socialintelligence. It plays a crucial role in the
communicationbetween infant and caregiver (Brazelton, Koslowski
&Main, 1974; Kaye, 1982; Adamson & Bakeman, 1991;Adamson,
1995; Moore & Dunham, 1995). By 9–12months most infants can
follow adults’ gaze and point-ing gestures, and monitor a
caregiver’s affect and use itto modulate their own response to an
ambiguous stimulus.These behaviors emerge and coalesce on a
predictableschedule (e.g. Butterworth & Itakura, 2000; Deàk,
Flom& Pick, 2000), although specific milestones show
consider-able individual differences in age of attainment
(Mundy& Gomes, 1998; Markus, Mundy, Morales, Delgado &Yale,
2000). Shared attention skills allow the young ofour species to
learn what is important in the environ-ment, based on the patterns
of attention in older, moreexpert individuals. In conjunction with
a shared lan-guage, these skills allow children to communicate
what
they perceive and think about, and to construct
mentalrepresentations of what others perceive and think
about.Consequently, shared attention is crucial for languageand
communication (Bruner, 1983; Baldwin, 1993;Tomasello, 1999).
The term shared attention is typically used to denotea set of
different skills comprising gaze following, point-ing and
requesting behaviors. While some authors usethe terms joint and
shared attention interchangeablyto refer to the matching of one’s
focus of attention withthat of another person, other authors make a
subtledistinction between the two. ‘Shared’ attention is
sometimesreserved for the more complex form of
communication,wherein two individuals attend to the same object,
andeach have knowledge of the other’s attention to thisobject
(Tomasello, 1995; Emery, 2000). In this paper, wewill be concerned
with joint attention more broadly,which we view as an important
precursor to the emerg-ence of true shared attention. Our
particular focus is ongaze following, which may be defined as
looking where
Address for correspondence: Jochen Triesch, Department of
Cognitive Science, University of California, San Diego, 9500 Gilman
Drive, La Jolla,CA 92093-0515, USA; e-mail:
[email protected]
-
126 Jochen Triesch et al.
© 2006 The Authors. Journal compilation © 2006 Blackwell
Publishing Ltd.
somebody else is looking. Gaze following is a goodstarting point
for investigations into shared attention,because it develops early
in life and is a precedent forother shared attention skills.
How does gaze following emerge?
Starting with a pioneering study by Scaife and Bruner(1975), the
emergence of gaze following has been inves-tigated in many studies.
There has been some debateabout when gaze following emerges in
human infants,with most estimates ranging from 3 to 12 months
(e.g.Butterworth & Cochran, 1980; D’Entremont, Hains &Muir,
1997; Hood, Willen & Driver, 1998; Morales,Mundy & Rojas,
1998). The reasons for this wide rangeare threefold. First,
researchers have used different crite-ria to define gaze following
(Tomasello, 1995). Second,different levels of sophistication of
gaze following can bedistinguished. Third, different experimental
paradigmsmay differ in sensitivity. The earliest signs or
precursorsof gaze following can be observed around 3 months ofage,
and some very rudimentary skills are even presentin newborns
(Farroni, Massaccesi, Pividori & Johnson,2004). In particular,
D’Entremont et al. (1997) showedthat 3-month-olds will turn their
eyes in the direction ofan adult’s head turn more frequently than
in the oppo-site direction. Their observation requires rather
idealconditions, such as targets that are well within the
infant’svisual field. In addition, these demonstrations of
‘gazefollowing’ seem to rely on more basic visual
trackingmechanisms that facilitate gaze shifts in the direction
ofmotion of a centrally located stimulus. In fact, suchmotion
cueing may initially be necessary, but by around9 months static
head pose alone can be sufficient forgaze following (Moore,
Angelopoulos & Bennett, 1997).
Beyond these first signs of gaze following, Butterworthand
Jarrett (1991) proposed three different stages of gazefollowing
emerging around 6, 12 and 18 months, respect-ively (but also see
Deàk et al., 2000). These stages aredefined by infants’ new
abilities, first to ignore distract-ing visual objects, and later
to follow adults’ gaze tolocations outside of their visual
field.
An important line of research is concerned with thespecific
features that infants use to establish the adult’sdirection of
gaze. There is evidence that younger infantsrely more on the
caregiver’s head pose than the eyes,whereas between 12 and 14
months there is a significantincrease in sensitivity to eye
orientation (Caron, Butler& Brooks, 2002). By 18 months, gaze
following is reliablyproduced on the basis of eye movements alone
(Butterworth& Jarrett, 1991). This body of work suggests that
limita-tions of the infant’s developing face processing skills
mayplay an important role in their ability to follow gaze.
A rather difficult question is what gaze following skillsimply
about how infants at various ages conceptualizetheir caregivers’
looking behavior. Although earlyaccounts interpreted gaze following
skills as indicatingconsiderable social understanding or even a
theory ofmind, it has been argued that young infants may learnto
follow gaze without such an understanding (Moore &Corkum, 1994;
Corkum & Moore, 1995). More recently,Woodward (2003)
demonstrated that infants need nothave an understanding of the
relation between a personwho looks and the object of his or her
gaze. In addition,early gaze following skills may not even require
a repre-sentational strategy involving the identification of
thecaregiver as an intentional, perceiving individual
(Leekam,Hunnisett & Moore, 1998). Certainly, such
representa-tions will emerge over time in older infants, but
theymight not be necessary to explain the emergence of
gazefollowing behaviors.
Gaze following in other species
Humans are not the only species that exhibit gazefollowing. Gaze
following has been demonstrated in anumber of other species,
including some (but not all)non-human primates (e.g. Itakura, 1996,
2004; Emery,Lorincz, Perrett, Oram & Baker, 1997; Tomasello,
Call& Hare, 1997). Chimpanzees even seem to exhibit themore
advanced level of gaze following that requiresignoring a distractor
object along the scan path – Butter-worth’s geometric stage of gaze
following (see above)(Tomasello, Hare & Agnetta, 1999). In
addition, Hare,Call, Agnetta and Tomasello (2000) demonstrated
thatchimpanzees know what conspecifics can and cannotsee. There has
also been some work with non-primates.Domestic dogs, for example,
are capable of following thegaze of humans at about the level of 6-
to 9-month-oldhuman infants (but are not capable of shared
attention)(Hare & Tomasello, 1999; Agnetta, Hare &
Tomasello,2000). In contrast, wolves don’t seem to follow the
gazeof humans (Hare, Brown, Williamson & Tomasello, 2002).Why
some species are able to follow gaze while otherspecies are not is
currently unclear. Behavioral researchhas been cataloging
cross-species differences but little isknown about the underlying
reasons for cross-speciesdifferences.
The role of learning
Early attempts to explain gaze following postulated theexistence
of innate modules. Examples of strongly nativisttheories have been
articulated by Leslie and Baron-Cohen(Leslie, 1987; Baron-Cohen,
1995). Such approacheshave marginalized the role of learning in the
development
-
The emergence of gaze following 127
© 2006 The Authors. Journal compilation © 2006 Blackwell
Publishing Ltd.
of cognitive skills. One line of critique against
modularaccounts is that they tend to have little predictive
power,because it is typically not made explicit how the moduleswork
internally and exactly what information is passedbetween them (see
Deák & Triesch, in press, for detailedanalysis). In principle,
however, this criticism can beovercome, and recent computational
and robotic modelingwork has started to address this question
(Scassellati,2002).
An alternative view explains the emergence of gazefollowing by
postulating that infants gradually discoverthat monitoring their
caregiver’s direction of gaze allowsthem to predict where
interesting visual events will be.This idea was first articulated
by Moore and Corkum(1994; Corkum & Moore, 1995). Note that
while thisview highlights the role of learning processes, it does
notpreclude an evolved propensity to follow gaze in
certainsituations, which depends only minimally or not at all
onearly social experiences. Such mechanisms may beimportant in
jump-starting the learning process. There issubstantial evidence
consistent with a learning account.In particular, Corkum and Moore
(1998) (C&M) demon-strated that 8-month-old infants can be
trained tofollow their caregiver’s gaze in a contingent
reinforcementparadigm, where an interesting visual stimulus
wasshown if the infant followed the adult’s gaze to thestimulus
location. C&M concluded that ‘learning couldbe involved in the
acquisition of gaze following’ (p. 37).A second experiment by
C&M, however, seems some-what inconsistent with a pure learning
account. Specific-ally, they found it more difficult to train
infants to lookto the location opposite of where the adult turned.
Thisprompts C&M to claim that ‘simple learning is not
suf-ficient as the mechanism through which joint attentioncues
acquire their signal value’ (p. 28). In our view, however,C&M’s
second experiment is quite difficult to interpretand the results
appear still consistent with a learningaccount.1
The importance of learning is also supported by someevidence,
albeit preliminary, that gaze following skillsemerge gradually
through social experience. Deák et al.(2000) found that 12- and
18-month-old infants’ gaze
following diminished less across trials if targets werenovel and
distinctive, than if targets were repetitive andidentical. This
suggests that even in a single interactionwith as few as 12 trials,
infants adjust their expectationsabout the validity of adults’
social cues for predictingvisual reward. Also, Deák et al. (Deák,
Wakabayashi,Sepeta & Triesch, 2004) reported preliminary
observa-tional data showing that gaze and gesture following
skillsemerge somewhat gradually between 5 and 10 monthsof age,
which is consistent with an ongoing learningprocess. In sum, then,
there is intriguing evidence tosuggest that learning models might
explain how gazefollowing and other joint attention skills emerge
in thefirst 18 months. However, existing models are too vagueto
specify the kinds of data that would help us sharpena powerful,
predictive account of how these skills emerge.
The need for computational models
Our ultimate goal is to explain how and why gazefollowing (in
its different forms) emerges at a level thatreveals the underlying
mechanisms of change in thebrain and their relation to changes in
overt social behavior.A theory of the emergence of gaze following
shouldaccount for the experimental findings obtained in behavi-oral
experiments, be consistent with known neurosciencedata, and make
specific predictions that can be used tofalsify it. It should offer
plausible explanations for dif-ferences in populations with
developmental disordersand in other species. All else being equal,
it should be assimple and parsimonious as possible.
In this paper we propose an account of the emergenceof gaze
following and evaluate its plausibility throughcomputational
modeling. Like many others, we believethat computational models can
be a great aid in theoriz-ing about developmental phenomena. The
benefits ofsuch an approach have been adequately discussed in
sev-eral places (e.g. Elman, Bates, Johnson, Karmiloff-Smith,Parisi
& Plunkett, 1996; O’Reilly & Munakata, 2002).For instance,
computational models can be very helpfulin bridging the explanatory
gap between biologicalmechanisms and observed behaviors.
Importantly,computational approaches can be useful in analyzingthe
causal structure of developmental processes, that is,which changes
may be necessary or sufficient for deve-lopmental events like the
emergence of a new cognitiveskill. These questions cannot easily be
studied experi-mentally because (1) changes to individual neural
pro-cesses are not readily observable or manipulable, and (2)there
are typically many processes changing at the sametime, making it
very difficult to answer questions aboutcause and effect relations.
Computational modeling maybe particularly helpful in studying such
relations because
1 There are at least two questions about the proper
interpretation ofExperiment 2 in Corkum and Moore (1998). First, it
is unclear to whatextent the participants could already follow
gaze, because the exclusionmeasure was not very powerful. Corkum
and Moore’s interpretationrests on the assumption that the tested
infants were incapable of anygaze following. Second, motion cues
may have facilitated gaze shifts inthe direction of the caregiver’s
head turn, but Corkum and Moore’sinterpretation rests on the
assumption that turns in the opposite direc-tion are equally likely
a priori. This does not consider that motioncueing facilitates gaze
shifts in the same direction, which is supportedby current evidence
(e.g. Farroni et al., 2000).
-
128 Jochen Triesch et al.
© 2006 The Authors. Journal compilation © 2006 Blackwell
Publishing Ltd.
one can easily monitor all changes in the model,
andsystematically prohibit or promote certain changes inorder to
study how this alters the developmental trajectory.
The specific approach described in the following iscomparable to
other modeling work in the area ofcognitive development. To some
extent our approach isinspired by connectionist models (Elman et
al., 1996)and dynamical systems approaches to development(Thelen
& Smith, 1994). We share with connectionistmodelers the desire
to explain behavior in terms ofunderlying neural structures. In
contrast to classical con-nectionist models of development,
however, our approachemphasizes aspects of the embodied nature of
cognitivedevelopment (Clark, 1997; Wilson, 2002). In particular,we
consider the role of the learner’s situated real-timeinteraction
with its environment. A good understandingand careful modeling of
this interaction is a central goalof our approach (see Schlesinger
& Parisi, 2001, foranother example of this approach). These
issues havealso been addressed to some extent within the
dynamicsystems approach (Thelen, Schöner, Scheier &
Smith,2000), but our approach emphasizes the role of biologic-ally
plausible reward-driven learning processes. It issurprising to us
that reward-driven learning mechanismssuch as Temporal Difference
learning (see below) arerarely being used in computational models
of infantdevelopment. For example, connectionist style
modelstypically utilize supervised learning (often using
thebackpropagation learning mechanism) which is notapplicable to
many developmental learning contexts.Similarly, in dynamical
systems approaches, goal-directedlearning is frequently not
addressed either. Instead, thetransition from one (younger and less
capable) develop-mental state to the next (older and more capable)
stateis often modeled by changing a control parameter of
thedynamical system in order to account for different per-formance
levels. What is not addressed is what forcesmay drive these
changing control parameters in develop-ing infants. We feel that
computational models that aimto carefully capture the affect-driven
learning duringsituated, real-time interactions with the
environment holdmuch promise for advancing our understanding of
earlycognitive development. The account that follows is anattempt
to evaluate the promise of such models in thecontext of gaze
following.
The Basic Set account of gaze following
At the heart of our account lies the idea that infantslearn gaze
following because they discover that monitor-ing their caregiver’s
direction of gaze allows them to pre-dict where interesting visual
sights occur. Elaborating on
this idea, we propose that gaze following (and
otherattention-sharing skills) emerge through the interplay ofa
Basic Set of structures and mechanisms. This setincludes perceptual
skills and preferences, reward-drivenlearning, habituation and a
structured social environ-ment (Fasel, Deák, Triesch &
Movellan, 2002). In thefollowing, we will briefly discuss each
component of thisBasic Set, and review evidence that each of these
is func-tioning in normally developing infants before the timethat
the first solid gaze following skills emerge. This iscrucial for
establishing the viability of this set as a causalprecursor for the
emergence of gaze following skills. Wewill then describe how these
components may interact toallow for the learning of gaze
following.
Perceptual skills and preferences
Several perceptual skills and preferences that are in placeby 3
months of age or earlier might be important forshared attention
skills to develop. Even the youngestinfants prefer human stimuli,
especially their caregivers’faces and voices (Brazelton et al.,
1974; DeCasper &Fifer, 1980; Pascalis, de Schonen, Morton,
Deruelle &Fabre-Grenet, 1995). One interpretation is that
socialstimuli have a higher salience than competing
inanimatestimuli (Bates, 1979). Infants also generally enjoy
socialinteraction. Around 2–3 months, infants begin respond-ing in
a more consistent and focused way to caregivers.At the same time
most infants produce their first socialsmiles, and parents report
greater engagement and‘presence’ during interactions (Cole &
Cole, 1996).Infants as young as 3 months prefer looking at theeyes
of an approaching person, rather than the mouth(Haith, Bergman
& Moore, 1979).
Attention-shifting skills (critical for following gaze
orpointing cues) begin to mature around 3–4 months (e.g.Butcher,
Kalverboer & Geuze, 2000; Farroni, Johnson,Brockbank &
Simion, 2000; Johnson, Posner & Rothbart,1994), but other, more
complex perceptual skills willcontinue to undergo significant
changes. A skill that ishighly relevant to the development of gaze
followingand other attention-sharing skills is face processing,
ormore specifically, head pose and eye direction perception(i.e.
discriminating the rotational angles of the face, andestimating the
line of gaze). One study found that 1-month-olds prefer a
photograph of their caregiver’s facein frontal to profile poses,
suggesting that even younginfants can discriminate extreme
differences in care-givers’ head poses (Sai & Bushnell, 1998).
But this findinghas not been extended, so we do not know how
wellinfants of different ages can discriminate different headposes.
It appears that 8–10-month-olds use head pose,not eye direction, to
estimate adults’ gaze direction
-
The emergence of gaze following 129
© 2006 The Authors. Journal compilation © 2006 Blackwell
Publishing Ltd.
(Moore et al., 1997). Robust use of the eyes seems toemerge
later, with significant improvement between 12and 14 months (Caron
et al., 2002). Thus, by this age,face processing skills must be
sufficiently well developedto allow for robust gaze following even
in somewhatambiguous circumstances. However, for gaze following
tobe successful, the ability to accurately encode the
caregiver’shead pose needs to be mapped to the proper
motorbehaviors, which requires additional learning processes.
Reward-driven learning
Reward-driven learning, we claim, is important forlearning
attention-sharing. Reward-driven or reinforce-ment learning occurs
in 2- and 3-month-olds (Kaye,1982) and may even be present at birth
(Floccia, 1997).2
Two-month-olds can, for example, learn within minutesto predict
the locations of the next interesting event in asimple repeated
sequence (Haith, Hazan & Goodman,1988). We propose that the
principal learning mecha-nisms used for acquiring attention-sharing
behaviors areneurally plausible processes of Reinforcement
Learningcalled Temporal Difference or TD learning (Sutton,
1988;Sutton & Barto, 1998). These processes are not
merelySkinnerean, nor are they anti-mentalistic, but they havethe
goal of formalizing the relation between an agent’saffect-laden
experienced outcomes (positive or negative)and the agent’s means of
adapting behavior to increasepositive outcomes and decrease
negative ones. TD learn-ing in particular has been tied to specific
neuromodula-tory systems (Schultz, Dayan & Montague, 1997),
andrecent models are neurally plausible (Montague, Hyman&
Cohen, 2004). In particular, the firing of dopaminer-gic neurons in
parts of the basal ganglia has been asso-ciated with the temporal
difference signal from whichTD learning methods derive their name.
Although TDlearning has previously played almost no role in
develop-mental models, it holds promise for understanding
thedevelopment of behaviors in all contexts that involveaffectively
valued outcomes. Reward-driven learning,however, may not be the
only learning mechanism thatis important for the emergence of gaze
following.
Habituation
Habituation also plays an important role in our theoryas a
fundamental learning process. Habituation processes
have complex dynamics that are in themselves challeng-ing to
understand and to model (Sirois & Mareschal,2002). In most
previous modeling attempts, habituationwas related directly to the
behavioral responses ofthe organism, e.g. the strength or
probability of a motorresponse to a certain stimulus. Our view is
somewhatdifferent in that we relate habituation processes tochanges
in the internal evaluation or reward of a stimulus.Together,
habituation and reward-driven learning (seeabove) will produce
certain behavioral sequences andmodify them adaptively. For
example, when an infantlooks at a caregiver’s face, or at a toy
held by the care-giver, habituation will systematically occur,
which weinterpret as a systematically declining reward valueover
time for looking at this object. Dishabituation, con-versely,
amounts to a recovery of this reward. BecauseTD learning predicts
future rewards, habituation willfacilitate attention shifts away
from the current targetso that a new, more rewarding target can be
fixated.Dishabituation leads to a relative recovery of the
rewardvalue of an object when a different stimulus is
attended.These processes, in conjunction with reward-drivenlearning
of behavioral policies, will produce cycles ofattention-shifting
between interesting social objects in thevisual environment, such
as the caregiver, and variousother objects with properties that
infants find interest-ing. The utility of these cycles for learning
to follow gazewill depend on predictable behavior patterns provided
bythe caregiver.
Structured social environment
We posit that the most relevant situations for learningshared
attention skills include interactions such as face-to-face play,
feeding, diaper changing and bathing,which make up a high
proportion of infants’ wakingtime. What is important about such
interactions, wehypothesize, is their predictable event-contingency
struc-ture. This structure is learnable, by means of reinforce-ment
learning and habituation, and infants can learn tomaximize their
positive engagement in such interactions.Studies on the statistical
structure of infant–parentinteractions generally show that each
participant syn-chronizes his or her actions with the other, and
selectsactions based partly on the other’s recent actions,
emotionsand messages (Watson & Ramey, 1985). We hypothesizethat
infants soon start to predict where interestingobjects and events
will be, based on their caregivers’gaze patterns. The caregiver’s
gaze is predictive ofinteresting sights because caregivers will
tend to look atother people or at objects they are manipulating
(Land,Mennie & Rusted, 1999), and infants are interested insuch
stimuli.
2 Sometimes the term contingency learning is used in the
developmentalliterature. We use reinforcement learning because it
is more commonin neuroscience, cognitive science and machine
learning, and becauseit makes explicit an assumption that is
implicit in the idea of contin-gency learning – specifically, that
the learner is motivated or affectivelydriven to predict, and
experience, certain outcomes.
-
130 Jochen Triesch et al.
© 2006 The Authors. Journal compilation © 2006 Blackwell
Publishing Ltd.
The emergence of gaze following
How can the Basic Set elements (perceptual skills
andpreferences, TD learning, habituation, structured
socialenvironment provided by caregivers) act in concert toallow
gaze following to emerge? Our claim is that infants(or other
developing organisms, or even robots) withthese ‘ingredients’ will
learn to anticipate the locationsof interesting visual stimuli
based on caregivers’ atten-tive behaviors, both intentional (e.g.
pointing) and un-intentional (e.g. reflexive looking). They will
learn to parsesocial events into conditions and outcomes, each
associ-ated with a hedonic value. A typical social sequence
thatsupports learning might include the following events:
1. Initially, the caregiver and infant are looking at
oneanother, in part because the infant has a preferencefor looking
at social stimuli (i.e. it is rewarding to doso).
2. The caregiver looks away toward an object (possiblywhile
holding or pointing to it), causing, first, areduction in the
reward value of the caregiver’s face(making the infant more likely
to search for otherstimuli); and second, producing directional
motion ofthe head or eyes, which can trigger a
same-directionattention shift by the infant (Farroni et al., 2000).
Also,the infant may start to habituate to the caregiver’sface,
further biasing the infant to make a gaze shift.
3. In some of these cases, due to ‘noisy’ action selectionor
random exploration of different behaviors (e.g.Sutton & Barto,
1998), the infant makes a gaze shiftin the same direction as the
adult. This can result inthe infant looking directly at the
rewarding sight, orit can bring the sight into the field of view so
that asubsequent eye movement can bring it to the centerof
gaze.
4. In these cases, the infant on average receives a relat-ively
greater reward (in terms of interesting sights)than if he or she
had selected other actions. In a‘high-reward sequence’, infants
receive informationabout contingencies between the caregiver’s
headpose and the presence of interesting visual events in acertain
location. This allows infants to learn that it isbeneficial to
follow caregivers’ gaze shifts by shiftingtheir own gaze to the
same regions of space.
In summary, we propose that the Basic Set of struc-tures and
mechanisms outlined above allows infants tolearn to follow gaze
because they learn to exploit thecaregiver’s tendency to look at
things that are interesting(rewarding) for the infant. This theory
is geared toexplain the basic phenomenon of gaze following, i.e.how
the infant learns to associate the head pose of otherswith gaze
shifts to certain locations inside or outside of
its own visual field. Ultimately, the test of this theorywill be
whether it can be extended to explain many ofthe interesting
subtleties such as the ordered sequence ofthe development of gaze
following skills, or the valueof different caregiver cues (eyes,
face, body posture) forjoint attention, or the later development of
theory-of-mind-like representations. We are optimistic thatour
framework provides a good starting point for thisendeavor, and that
we will eventually be able to accountfor a large range of empirical
phenomena, including‘higher’ shared attention skills. We will
return to thispoint in the discussion.
Computational model
We now present a simple computational model to testwhether the
mechanisms of the Basic Set can lead to theemergence of gaze
following and to explore how altera-tions of model parameters can
simulate some develop-mental disorders that are characterized by
delays in theemergence of gaze following.3 The goal of this
inquiryis to determine under what conditions the Basic Set
issufficient for the emergence of gaze following. We do notsuggest,
however, that all of the Basic Set elementsare strictly necessary –
some might be replaceable byalternative mechanisms. Also, we do not
claim that thisset is sufficient for a comprehensive account of all
humanattention-sharing behaviors. It merely attempts to explainthe
basic gaze following behaviors that progressivelyemerge during the
first year in typically developing infants,and, hopefully,
disruptions of this progression that occurin certain developmental
disorders. Future work willestablish whether the model can also
explain, for example,point-following behaviors.
The model was implemented in Matlab. The sourcecode is available
at http://mesa.ucsd.edu
Environment and caregiver model
The simulation comprises a model of the infant (referredto
simply as ‘infant’, merely for expositional fluency), amodel of the
caregiver (the ‘caregiver’) and a model ofthe environment in which
they interact. An overview ofthe model is given in Figure 1. As a
simplification in themodel, we assume that infant and caregiver are
facingeach other and remain in the same position. The
spacesurrounding infant and caregiver is discretized into Ndistinct
regions. The caregiver can look at any of theseregions or at the
infant. The infant can look at any of
3 An initial account of the model was given in Carlson and
Triesch(2003).
-
The emergence of gaze following 131
© 2006 The Authors. Journal compilation © 2006 Blackwell
Publishing Ltd.
these regions or at the caregiver. The infant’s and care-giver’s
shifting of gaze are the only ways they interactwith each other and
the environment. Time runs in dis-crete steps, each corresponding
to roughly a quarter ofa second. Each gaze shift is assumed to take
one time step.
At any time there is one interesting object present orevent
occurring in one of the N regions of the environ-ment. This could
be an interesting toy, a third socialagent, the caregiver’s hand
manipulating an object orperforming a gesture, or other stimuli
that the infantwould find interesting. We will refer to this object
or eventas the target. (Below we will also consider
environmentswith multiple targets.) After some minimum time at
onelocation Tmin, the interesting target is relocated to arandomly
chosen new location with some probabilitypshift per time step.
Whenever the target moves, the caregiver model shiftsits
direction of gaze. There is a certain probability pvalidthat the
caregiver will be looking at the new location ofthe target.
Otherwise, the caregiver’s new direction ofgaze is drawn from a
uniform distribution over all of theother N locations (one for the
infant plus N − 1 locationsnot containing the target). Thus, the
parameter pvalidmodels how predictive the caregiver’s direction of
gaze isfor indicating the location of the interesting target.
The parameter pvalid also has a second function. Wecan use it to
model inaccuracies in the infant’s head posediscrimination.
Consider the case where the caregiveris always looking at the
target. Even in this case, if theinfant’s head pose discrimination
is inaccurate or noisy,the infant will not be able to correctly
infer the care-
giver’s head pose and, as a consequence, the estimatedhead pose
will not be very predictive of rewarding sights.Thus, a
not-so-predictive caregiver whose head pose canbe estimated
accurately and a highly predictive caregiverwhose head pose we can
only infer correctly some frac-tion of the time will produce the
same net effect, and wecan model both situations with the same
parameter pvalid.
Note that this environment and caregiver modelis extremely
simple. In particular, the caregiver is notresponding to the infant
in any way. This is obviously agross simplification of the complex,
reciprocal dynamicsof infant–caregiver interactions (e.g. Kaye,
1982), but aswe will demonstrate below, even this kind of
socialenvironment can be sufficient for gaze following to
emerge.More complex, interactive caregiver models have alsorecently
been investigated, and these show that the care-giver’s behavior
plays an important role (Teuscher &Triesch, 2004). In
particular, the caregiver’s behavior hasto be properly matched to
the parameters of the infantmodel for optimal learning speed,
although gaze followingwill emerge under a wide range of caregiver
behaviors.
Infant model
Our infant model is essentially that of a pleasure-drivenagent.
There are many ways of formalizing this idea buta particularly
appropriate formal framework is reinforce-ment learning (Sutton
& Barto, 1998). Besides being thebasis for modern theories of
learning under rewards andpunishments, reinforcement learning is
also an impor-tant subfield of machine learning with some
impressiveapplication successes (Sutton & Barto, 1998). In
particular,our model uses temporal difference learning (TD
learn-ing) algorithms, which have been proposed as models
forcertain basal ganglia functions (Schultz et al., 1997).
Adetailed description of the equations of the model isgiven in the
Appendix.
We conceive the infant as a reinforcement learningsystem that
learns to make two kinds of decisions. First,at any given time it
decides whether to shift gaze or keepfixating the same location.
Second, it decides where tolook next, once the decision to shift
the direction of gazehas been made. The information available to
the infantincludes the identity of its current object of fixation,
itsassociated reward value, and the length of time the infanthas
been fixating this object. If and only if the fixatedobject is the
caregiver, the infant will know the caregiver’scurrent head
pose.
Looking, reward and habituation
The infant model receives rewards for looking at inter-esting
things. The amount of reward received depends
Figure 1 Overview of the model showing infant, caregiver and
interesting object. Corresponding model parameters are given in
brackets. Note that while we draw the spatial locations as arranged
in a hexagonal fashion, the model does not assume or use any
specific topological relations between these locations.
-
132 Jochen Triesch et al.
© 2006 The Authors. Journal compilation © 2006 Blackwell
Publishing Ltd.
on the contents of the infant’s gaze and how habituatedthe
infant is to those contents. There are four possiblethings for the
infant to see, (1) a frontal view of thecaregiver (in case the
caregiver is also looking at theinfant), (2) a non-frontal view of
the caregiver, which wesimply refer to as a profile view (in case
the caregiver isnot looking at the infant), (3) the target or (4)
nothing.Associated with these sights are the base rewards
Rfrontal,Rprofile, Rtarget, Rnothing. The actual reward received by
theinfant is the base reward attenuated by habituation. Asthe
infant looks at a location, the infant habituates toits contents in
the sense that the actual reward for anyobject at this location
will decrease over time. Similarly,dishabituation is modeled as a
recovery of the actualreward for objects at other locations.
For each object in the environment, including the care-giver,
the infant has a habituation value hfix(t) ∈ [0,1],indicating the
fraction of the base reward the infantreceives for looking at this
object. A value of hfix = 1means that the infant is not habituated
to the object,while a value of hfix = 0 means that the infant is
com-pletely habituated to the object. As the infant continuesto
fixate on an object its habituation value decreasesaccording to
hfix(t) = hfix(0)e
−βt, where hfix(0) is the habitu-ation level at the beginning of
the current fixation, and tis the time since the start of the
fixation, and β is thehabituation rate. Thus, the actual reward
received bythe infant at time t is ractual(t) = R fixhfix(t), where
Rfix ∈{Rfrontal, Rprofile, R target, R nothing} is the base reward.
At thesame time, the reward levels for objects at locations
notbeing fixated recover in a corresponding fashion,
modelingdishabituation. In particular, when the infant is
notlooking at an object it dishabituates according to hnofix(t)= 1
− hnofix(0)e−βt, where t is the time since last looking atthat
object and hnofix is the level of habituation of thisobject
currently not being fixated.
One infant, two agents: when and where
Inspired by the proposal that the decisions of when toshift gaze
and where to shift gaze are made in separateneural pathways
(Findlay & Walker, 1999), the infantmodel consists of two
separate agents. The state space ofthe when-agent, which decides
whether to continue tofixate on the same location or shift gaze,
has two dimen-sions. The first dimension represents the time the
infanthas been fixating at the same location, discretized asthe
number of time steps (0, 1, 2, . . . , 8, 9 or more). Thesecond
dimension is the actual reward received by theinfant. This is the
total reward the infant receives on thattime step, taking
habituation into account, discretizeduniformly into ten bins
between the maximum and mini-mum possible actual rewards.
If the when-agent makes the decision to shift gaze,
thewhere-agent determines the target of the gaze shift. Thestate
space of this agent has only a single dimension: thecaregiver’s
head pose. Importantly, unless the infant islooking at the
caregiver, the caregiver’s head pose willbe unknown to the infant.
Concretely, this agent distin-guishes N + 2 different states: N for
the N different headposes observed when the caregiver looks at the
N regionsof space, plus one for the caregiver’s head pose when
thecaregiver is facing the infant, plus one state to representthat
the head pose of the caregiver is unknown to theinfant. The
where-agent learns to map these states ontoN + 1 different actions:
one action for looking at eachof the N regions of space and one
action for looking atthe caregiver. Note that we assume a
one-to-one cor-respondence between a caregiver head pose and the
regionof space the caregiver looks at. In reality, this mappingis
ambiguous and the ambiguity can produce character-istic errors in
gaze following (Butterworth & Jarrett, 1991).Modeling this
ambiguity and how the infant learns toresolve it is the subject of
a separate paper (Lau &Triesch, 2004).
Learning in both agents occurs through the SARSAalgorithm (see
Appendix), which was chosen because ofits simplicity. Both agents
balance exploration vs. exploi-tation by selecting actions with a
softmax action selec-tion mechanism (see Appendix). It should be
noted thatseparating the infant model into two separate
learningagents is not strictly necessary. We would expect
similarresults for a simpler model that uses a single
reinforce-ment learning agent to model the infant, whose statespace
was the product space of the state spaces of thewhen and where
agents, and whose possible actions areto shift gaze to any of the N
+ 1 locations. However, thelearning time would be expected to
increase because ofthe higher dimensionality of the resulting state
space.
Experiments
Normal emergence of gaze following
In this section we describe a first analysis of the modeland the
effects of some model parameters on its learningbehavior. For easy
reference, all parameters, their defaultvalues, and their allowed
ranges are listed in Table 1. Inthe following, default parameter
values are used unlessotherwise indicated. The effect of changing
severalparameters is discussed below. Generally speaking,the model
is robust to changes in the parameters overwide ranges. The
parameters Tmin, pshift and pvalid wereset ad hoc but could
eventually be set in accordancewith data from an observational
study of naturalistic
-
The emergence of gaze following 133
© 2006 The Authors. Journal compilation © 2006 Blackwell
Publishing Ltd.
infant–caregiver interactions that is currently under way(Deák
et al., 2004).
To quantify the emergence of gaze following in themodel and its
dependence on model parameters we usethe following approach. At
specific points during thelearning process we temporarily ‘freeze’
the model andevaluate its behavior for 1000 time steps (which
cor-responds to slightly more than 4 minutes of
simulatedinteraction), after which the learning process resumes.The
model behavior at these stages of the learningprocess is analyzed
by observing the infant model inter-acting with the environment and
computing two statis-tics. The caregiver index CGI is defined as
the frequencyof the infant’s gaze shifts towards the caregiver:
(1)
The gaze following index GFI is the frequency of gazeshifts that
lead from the location of the caregiver towhere the caregiver is
looking:
(2)
An example run of the system with the default para-meters is
shown in Figure 2. The model first learns toalternate gaze between
the caregiver and other locations.In terms of the model, the
when-agent discovers that itis best not to continue staring at a
single location for toolong. At the same time, the where-agent
discovers that ifthe infant is not looking at the caregiver it
tends to berewarding to make a gaze shift back to the
caregiver.After this has been achieved, gaze following
behaviorslowly emerges. Here, the where-agent discovers
thatunexpectedly high rewards tend to follow gaze shiftsto certain
locations, depending on the caregiver’s head
pose. It learns to correctly map the caregiver’s head poseto
gaze shifts to the locations that the caregiver looks at.The
increasing average reward the model obtains pertime step during
this phase confirms that gaze followingis in fact beneficial for
the model under these para-meters. Note that for a model without
habituating rewardsit would be optimal to continually stare at the
caregiver.
A microscopic view of the behavior of the infantmodel is shown
in Figure 3 (top). It shows the fixationbehavior of the infant
during various stages of the learn-ing process. Fixations on the
caregiver are indicated bywhite pixels, target fixations by black
pixels, and fixa-tions on other regions of space by grey pixels.
The quick
Table 1 Overview of model parameters, their allowed ranges and
default values
Symbol Explanation Range Default
N number of spatial regions 1, 2, . . . 10∆t duration of one
simulation step arbitrary ∼250 msα learning rate [0,1] 0.0025β
habituation rate [0,∞] 1γ discount factor for future rewards [0,1]
0.8τ temperature (randomness of action selection) [0,∞]
0.095Rfrontal reward for looking at frontal view of caregiver
[−∞,∞] 1Rprofile reward for looking at profile view of caregiver
[−∞,∞] 1Rtarget reward for looking at target [−∞,∞] 1Rnothing
reward for looking at other region [−∞,∞] 0Tmin minimum target
stationary time (steps) [0,∞] 4pshift probability of target shift
per time step [0,1] 0.5pvalid predictiveness of caregiver gaze
[0,1] 0.75
CGI gaze shifts to caregiver
# gaze shifts
#.=
GFI
gaze shifts from caregiver to correct location# gaze shifts
#
.=
Figure 2 Emergence of gaze following in simple environment with
just one interesting target present at any time. The solid curve
plots the caregiver index (CGI), the solid curve with circles plots
the gaze following index (GFI) and the dotted curve plots average
reward per time step, as functions of the number of learning
iterations. Error bars indicate standard deviations across 15
simulations.
-
134 Jochen Triesch et al.
© 2006 The Authors. Journal compilation © 2006 Blackwell
Publishing Ltd.
development of a preference for looking at the caregiveris
visible as the increase in the amount of white pixels(caregiver
fixations) during the first few rows. Thesubsequent increase in
target fixations (black pixels) isthe effect of the emergence of
gaze following. Gazefollowing episodes are shown by black pixels to
the rightof white pixels.4 The increase in the number of such
epi-sodes during learning directly reflects the increasing
GFI(compare Figure 2).
Figure 4 shows that gaze following will still be learnedin more
complex environments, where multiple interestingevents occur
simultaneously. In this case, the learning issomewhat slower
because the infant may temporarilylearn incorrect associations
between a particular caregiverhead pose and a gaze shift to a
location not looked atby the caregiver but that nevertheless
contains aninteresting event.
4 Note that there can be instances of black pixels to the right
of whitepixels that do not correspond to gaze following. This
occurs when theinfant looks away from the caregiver to a location
not looked at by thecaregiver that happens by chance to hold the
interesting object. Theseinstances are comparatively rare, however.
More precisely, the prob-ability of the infant finding the target
this way is only (1 − pvalid)/(N − 1),where N is the number of
locations in the environment.
Figure 4 Gaze following in the presence of multiple targets for
various values of pvaild. The gaze following performance averaged
over 100 000 steps (y-axis) is plotted as a function of the number
of targets that are present simultaneously (x-axis). Error bars
indicate standard error across 15 simulations. Gaze following is
diminished if significant ambiguities due to multiple targets
exist. Also, a reduced predictiveness of the caregiver pvaild has a
negative impact on gaze following performance. The dashed
horizontal line marks the ‘chance level’ of gaze following expected
for an infant who first looks to the caregiver and then shifts gaze
randomly to any of the N locations.
Figure 3 Microscopic analysis of model behavior for normally
developing (top), autism-like (center) and Williams-like (bottom)
model. Each row of pixels shows the target of the infant’s gaze as
a function of time (for 50 steps). The gaze target is color coded,
with white corresponding to the caregiver, black corresponding to
the target, and grey corresponding to other regions of space. In
particular, an instance of gaze following is represented by a black
pixel lying to the right of a white pixel. Different rows show the
behavior at different times during the learning process (every 4000
steps).
-
The emergence of gaze following 135
© 2006 The Authors. Journal compilation © 2006 Blackwell
Publishing Ltd.
We have also experimented with making Rprofile smallerthan
Rfrontal to capture infants’ preference for frontalfaces (Sai &
Bushnell, 1998). We found that gaze followingperformance is largely
determined by Rprofile, with higherRprofile values leading to
faster learning. The value of Rfrontalplays a comparatively small
role, because the currentcaregiver model only looks at the infant
infrequently. Asystematic analysis of learning speed as a function
of care-giver reward is given below in the context of
modelingdevelopmental disorders.
Analysis of model parameters
Predictiveness of caregiver’s gaze
An important parameter of the model is pvalid (see Figure
4).Unless pvalid is high enough, gaze following will notemerge. For
pvalid = 0.25, the GFI remains very poor,even when there is only
one interesting target in theenvironment. There are two
interpretations of this result,corresponding to the two
interpretations of pvalid (seeabove). First, a highly informative
caregiver, i.e. one whofrequently looks at the interesting target,
facilitates theacquisition of gaze following. This confirms the
import-ance of one component of the Basic Set: a structuredsocial
environment. Second, limitations of the infant’sability to
discriminate head poses will delay the infant’sacquisition of gaze
following. Currently, little is knownabout how real infants’
ability to discriminate headposes develops, but such data would be
most useful inconstraining the model (see also Lau & Triesch,
2004).
Speed of learning: learning rate and habituation
We hypothesized that the learning rate α and the habitu-ation
rate β might both influence the speed with whichgaze following can
be acquired. In the trivial case ofα = 0 no learning takes place at
all, and gaze followingobviously cannot emerge. However, too high a
learning ratecan also cause problems. This is illustrated in Figure
5,top. In general, an intermediate value for the learningrate seems
to be optimal, which is common for reinforce-ment learning
models.
Figure 5, bottom, shows the effect of the habituationrate β on
the learning process. It shows that an infantthat habituates faster
(high β) learns to follow gaze morequickly. By contrast, slow
habituation (low β) will resultin less frequent gaze shifts between
objects and thereforeto fewer opportunities for the necessary
learning experi-ences. Interestingly, however, even without any
habitu-ation (β = 0) gaze following is still learned, but very
slowly.In this case, gaze shifts away from the most rewardingobject
occur only through the random selection of
exploratory actions. The infant will spend most timelooking at
the caregiver, which is the optimal thing to do.Due to the random
softmax action selection mechanism,however, which sometimes
explores the consequences ofseemingly suboptimal actions, the
infant will look awayfrom the caregiver, which creates an
opportunity to dis-cover the benefit of following gaze. We conclude
thatalthough habituation is not strictly necessary if there are
Figure 5 Top: Effect of learning rate on emergence of gaze
following. A higher learning rate α leads to accelerated initial
learning as measured by the gaze following index (GFI). However, a
high learning rate can lead to problems in the long run. The infant
may never acquire a high level of gaze following. Error bars
indicate standard errors across 15 runs. Bottom: Effect of
habituation rate on learning of gaze following. Faster habituation
leads to accelerated learning as measured by the gaze following
index (GFI). Even without any habituation gaze following is still
learned – albeit very slowly. Error bars indicate standard error
across 15 simulations.
-
136 Jochen Triesch et al.
© 2006 The Authors. Journal compilation © 2006 Blackwell
Publishing Ltd.
other mechanisms for exploratory gaze shifting, learningmay be
very slow without it. The model thus predictsthat infants who
habituate quickly (in the sense of themodel) may learn gaze
following faster than their peers.This prediction is consistent
with some evidence thatinfants who are ‘fast habituators’ at 5
months havebetter social and communicative skills at 13
months(Tamis-LeMonda & Bornstein, 1989), although care hasto be
taken because our notion of habituation as adecaying reward for a
visual stimulus is not identical tothe common behavioral measures
of habituation.
In summary, both learning rate and habituation rateinfluence the
speed of learning and may be related toindividual differences in
the emergence of gaze followingin real infants. However, they act
on the learning processin different ways. The learning rate α
determines howmuch an individual learning experience changes
theinfant’s future behavior. The habituation rate β deter-mines how
many relevant learning experiences the infantencounters during a
fixed amount of time.
Modeling failures of the emergence of gaze following in autism
and Williams syndrome
Any account of gaze following should answer why gazefollowing
emerges, and why gaze following may notemerge under certain
circumstances. An important lineof research concerns differences in
shared attention skillsin developmental disorders such as autism
and Williamssyndrome. Autism is a Pervasive Developmental
Disordercharacterized by impairment in social interactions
andcommunication (e.g. Dawson, Meltzoff, Osterling, Rinaldi&
Brown, 2004), as well as atypical cognitive processing.Shared
attention deficits are the most consistent earlypredictors of the
social and language deficits of autism(Osterling & Dawson,
1994). Thus, a critical test of ourmodel is its capacity to
simulate autistic failure of gazefollowing.
A more subtle challenge is to test the model’s capacityto
simulate a disorder that is associated with less strikingand more
idiosyncratic differences in joint attention.Williams syndrome is a
rare genetic disorder that ischaracterized by (among other things)
hypersocial behavior,differences in face processing and deficits in
learning andattention. Most importantly for us, there is also
someevidence for deficits in triadic shared attention
skills(Bertrand, Mervis, Rice & Adamson, 1993; Laing,
Butter-worth, Ansari, Gsödl, Longhi, Panagiotaki, Paterson&
Karmiloff-Smith, 2002; Mervis, Morris, Klein-Tasman,Bertrand,
Kwitny, Appelbaum & Rice, 2003), althoughmore research is
needed in this area.
While traditional nativist/modularist accounts typic-ally
propose broken or missing modules as the origin of
developmental disorders (Baron-Cohen, 1995), our accountprompts
us to look for potential differences in thecomponents of the Basic
Set that may lead to differentdevelopmental trajectories. The goal
here is not to pro-vide a comprehensive model of these
developmentaldisorders, but to show how specific aspects of
thesedisorders may contribute to deficits in gaze following.
Changes in the reward structure
In the last section we have already seen how differencesin
learning rate or habituation rate can slow down or evenprevent the
emergence of gaze following. For autismspectrum disorders and
Williams syndrome, however, aparticularly interesting candidate is
the reward structureof the model, because in both kinds of
disorders theaffective value of faces may be altered.
An intriguing attribute of autism is disinterest in faces.In
general, the interest in or appeal of social stimuli isdiminished
in autism (Adrien, Lenoir, Martineau, Perrot,Hameury, Larmande
& Sauvage, 1993; Chawarska,Klin & Volkmar, 2003; Maestro,
Muratori, Cavallaro, Pei,Stern, Golse & Palacio-Espasa, 2002;
Tantam, Holmes& Cordess, 1993; Klin, Jones, Schultz &
Volkmar, 2003;Dawson, Meltzoff, Osterling, Rinaldi & Brown,
1998).For some (but not all) individuals with autism, directeye
contact even seems to be aversive, a phenomenonknown as gaze
avoidance (Hutt & Ounsted, 1966; Richer& Coss, 1976;
Langdell, 1978). It has been proposedmany times that a disruption
in face processing maybe an underlying cause for social deficits in
autism(e.g. Trepagnier, 1996; Howard, Cowell, Boucher, Broks,Mayes,
Farrant & Roberts, 2000; Klin et al., 2003). Whyfaces are in
some ways less salient or rewarding toindividuals with autism is
not clear. It may be that facesare too unpredictable for autistics,
an idea consistentwith the hypothesis that autistics prefer highly
predictablestimuli (Gergely & Watson, 1999); it may also be
thatanatomical differences in the amygdala (which particip-ates in
processing facial affect displays) play a role(e.g. Howard et al.,
2000; Baumann & Kemper, 2005).Regardless of the cause, this
symptom, and its long-termeffect on social learning, bears more
precise (ideallyquantitative) specification.
In contrast to the disinterest in faces in autism, childrenwith
Williams syndrome show a high preference forlooking at faces over
looking at other objects (Bertrandet al., 1993; Bellugi,
Lichtenberger, Jones, Lai & StGeorge, 2000; Mervis et al.,
2003). In addition, alteredas well as delayed emergence of face
processing skillshas been reported (Karmiloff-Smith, Thomas,
Annaz,Humphreys, Ewing, Brace, Van Duuren, Pike, Grice
&Campbell, 2004).
-
The emergence of gaze following 137
© 2006 The Authors. Journal compilation © 2006 Blackwell
Publishing Ltd.
What would happen in the model if looking at thecaregiver was
made aversive, as for an atypical baby whofinds faces unpredictable
and overstimulating, or madehighly positive, as for a hypersocial
infant with anextreme preference for human faces over other
sights?
To test the effect of different reward structures on thelearning
process, we systematically varied the rewardparameters Rfrontal,
Rprofile and Rtarget over a range of values.For simplicity we
restricted ourselves to the case whereRprofile = Rfrontal. Figure 6
summarizes the results. For eachcombination of reward values we ran
the simulation for105 time steps and measured the GFI at the end of
thistime. Figure 6 plots the GFI averaged over 10 experi-ments as a
function of Rfrontal and Rtarget.
For Rtarget ≤ 0 no gaze following behavior emerges.This makes
intuitive sense because if the targets thatthe caregiver tends to
look at are not rewarding for theinfant, there is no benefit in
gaze following behavior.That is, no additional reward can be
obtained by follow-ing the caregiver’s gaze. If Rfrontal and
Rprofile are small oreven negative, modeling reduced interest in or
aversionto faces as seen in autism, gaze following behavior doesnot
develop normally. Depending on the caregiver andtarget reward, the
infant model will spend little timelooking at the caregiver. For
example, while the ‘normal’model with a base reward of 1 for the
caregiver (frontaland profile) and for the target spends 49% of its
timelooking at the caregiver and 14% of the time looking atthe
target (averaged over the entire learning period),
the‘autistic-like’ model with caregiver reward of −1 willspend only
1% of its time looking at the caregiver and11% looking at the
target (which it occasionally finds by
chance without utilizing the caregiver’s gaze). As a
con-sequence, the learning process is slowed down or evenprevented,
and the GFI stays close to zero. The micro-scopic behavior of such
a model is shown in Figure 3(middle). Thus, a reduced reward for
looking at the care-giver’s face or aversiveness of the caregiver
is sufficientto explain delays or complete failure in the emergence
ofgaze following.
It is interesting to note that an analysis of the modelshows
that even for negative caregiver rewards, themodel will
nevertheless slowly learn how to follow gaze,even if it does not
exhibit the behavior on a regularbasis. By analyzing the infant’s
action selection probabil-ities we found that the probability for
following thecaregiver’s gaze once the infant is looking at the
caregiverslowly but clearly rises above those for other
actions.However, the model rarely executes a complete gazefollowing
sequence because it is unrewarding to do so,due to first having to
look at the aversive caregiver. Thisbehavior of the model might
explain a puzzling findingby Leekam, Baron-Cohen, Perret, Milders
and Brown(1997) that autistic children can follow gaze if
explicitlytold to do so, though they may rarely do it
spontaneously.This finding is very problematic for previous
accounts ofthe emergence of gaze following. We know of no
theorythat offers a satisfactory explanation for it.
Subsequentstudies by Leekam and colleagues (Leekam et al.,
1998;Leekam, López & Moore, 2000) suggest that autisticchildren
can be trained to follow gaze through contin-gent presentation of
rewarding visual stimuli (Whalen& Schreibman, 2003), but that a
lack of motivation toengage with the experimenter may impede
learning.These findings are also consistent with our account.
Theassociation from caregiver head pose to regions in spaceis
learned (although slowly) due to the constant low levelof random
exploration, but gaze following is simply notrewarding enough to be
produced on a regular basis. If,however, an additional incentive
for following gaze ispresent (e.g. being asked to look where
another personis looking, or being trained via operant
conditioning),the behavior can be elicited. Also, it is in line
with thefinding that gaze following in response to static
picturesmay be ‘easier’, if we make the additional assumption
thatstatic pictures of faces are not as aversive as dynamicdisplays
(Klin et al., 2003).
It should be noted that an infant who looks less at facesdue to
a diminished reward for faces can be expected todevelop deficits in
face processing skills such as fine dis-crimination of head poses
or estimation of the directionof gaze. This will likely corroborate
delays in the emerg-ence of gaze following. The model could capture
this bymaking the parameter pvalid a function of the total amountof
time the infant has been looking at the caregiver.
Figure 6 Learning performance as a function of caregiver and
target reward. For the caregiver reward we use Rfrontal = Rprofile
≡ Rcaregiver. The z-axis corresponds to the GFI after 105 time
steps of learning, averaged over 10 repetitions of the
experiment.
-
138 Jochen Triesch et al.
© 2006 The Authors. Journal compilation © 2006 Blackwell
Publishing Ltd.
We also tested what would happen if the reward forlooking at the
caregiver is much higher than the rewardfor looking at the target.
This manipulation may bethought of as an attempt to model
differences in Williamssyndrome, where children exhibit an
abnormally highpreference for faces. Our experiments with the
modelshow that in this case, somewhat surprisingly, the learningof
gaze following can be substantially delayed (Figure 6).To give an
example, a ‘Williams-like’ model with a basereward of 5 for looking
at the caregiver and a basereward of 0.5 for looking at the target
will spend 51% ofits time looking at the caregiver but only 5%
looking atthe target. Thus, little gaze following will be observed,
asillustrated in Figure 3 (bottom). The reason is thatbecause the
caregiver is relatively so rewarding to lookat, it makes little
difference to the infant where it looksin between fixations on the
caregiver: the probabilityof looking at the target is only slightly
higher than theprobability of looking at any other region of space
underthe model’s probabilistic action selection rule.
Deficits in attention-shifting
Another important aspect of autism spectrum disordersare
deficits in shifting attention. For example, manystudies have shown
that people with autism are slowerto shift attention between
targets (e.g. Casey, Gordon,Manheim & Rumsey, 1993;
Wainwright-Sharp & Bryson,1993; Goldberg, Lasker, Zee, Garth,
Tien & Landa, 2002;Landry & Bryson, 2004). This deficit
might be relatedto cerebellar abnormalities (Harris, Courchesne,
Townsend,Carper & Lord, 1999). Slow attention shifting can
beincorporated into the model in the following way.Instead of gaze
shifts taking effect immediately, weintroduce a latency Tlat of 1
to 3 time steps. After theinfant makes a decision to shift gaze, it
has to waitTlat time steps before the gaze shift takes effect.
Figure 7shows how this affects the emergence of gaze following.In
these experiments all other parameters were set totheir default
values. The error bars indicate standarderrors of 15 independent
simulations per condition. Ascan be seen in the figure, the
additional latency can slowdown or even prevent the emergence of
gaze followingbehavior, because there is a growing probability that
bythe time the infant shifts gaze, the rewarding sight hasmoved to
a different location. This effect is clearly visiblein infants with
a normal, positive caregiver reward(Figure 7, top). However, it is
more pronounced for acaregiver reward of zero, i.e. infants who
find their care-givers uninteresting but not aversive (Figure 7,
bottom),and it is even more pronounced for a model with negat-ive
caregiver reward (not shown). These results and theprevious ones
show that either different reward structures,
or poor attention-shifting, or both, can explain gazefollowing
deficits in autism within the proposed model.
Regarding Williams syndrome, a noteworthy recentreport on the
perception of faces in adults with Williamssyndrome finds less
accuracy in determining the direc-tion of gaze, and significantly
longer response latenciesduring face perception (Mobbs, Garrett,
Menon, Rose,Bellugi & Reiss, 2004). Given our results above, we
canconclude that both of these symptoms, if present ininfants,
would corroborate problems in the emergence ofgaze following. Less
accuracy in determining the direc-tion of gaze will lower the
predictiveness of the caregiver(smaller pvalid), while longer
response latencies can bethought of as increasing Tlat. In a
similar vein, recentlyobserved inaccuracies of saccade targeting
and a higher
Figure 7 Learning performance for infant models with attention
shifting deficits of varying degree. Top: for normal, positive
caregiver reward. Bottom: for zero caregiver reward. Note the
different scales on the axes. Error bars indicate standard error
across 15 simulations.
-
The emergence of gaze following 139
© 2006 The Authors. Journal compilation © 2006 Blackwell
Publishing Ltd.
number of corrective saccades in Williams syndrome(van der
Geest, van Haselen, van Hagen, Govaerts, deCoo, de Zeeuw &
Frens, 2004) may also contribute tolonger latencies before the
target of a gaze shift is reached,corroborating difficulties in
learning to follow gaze.
Summary
To summarize, simple manipulations to the rewardstructure and
attention shifting behavior of the modelmotivated by findings on
two very different developmen-tal disorders lead to deficits in the
emergence of sharedattention. What is needed for further
constraining themodel is more experimental data on how, for
example,the accuracy of infants’ head pose discrimination, or
thepreference for viewing frontal vs. profile faces developsfor
normally and atypically developing infants.
Summary of model predictions
Although our model is simple and incorporates onlywell-known and
accepted infant skills, it makes a numberof novel predictions,
summarized below. The list iscertainly not exhaustive, since there
are many ways ofmanipulating the model (we invite readers to
downloadthe software from http://mesa.ucsd.edu and derive
newpredictions). Of course, not all predictions of the modelwill
lend themselves to experimental investigation, andsome
manipulations would be unethical to do with realinfants. Leaving
these concerns aside, the model makesthe following predictions.
1. Fast habituation leads to quicker acquisition of
gazefollowing. The systematic variation of the habitu-ation
parameter β showed an advantage in learningspeed for faster
habituation. Fast habituation in themodel leads to more gaze shifts
per time interval onaverage, which produces more opportunities to
learnthe predictive value of the caregiver’s direction ofgaze, all
else being equal.
2. Face perception skills should correlate with gazefollowing
ability. One interpretation of the parameterpvalid was that it
reflected accuracy of head pose esti-mation in infants. The model
showed that without asufficiently high pvalid, gaze following will
not emerge.
3. Infants with general learning deficits should also havean
impairment in the acquisition of gaze following.Choosing too small
a learning rate in the modelleads to delays in the emergence of
gaze following.Not surprisingly, though, too high a learning
ratewas also found to be maladaptive.
4. Infants whose visual preferences do not match
theircaregivers’ should have deficits in gaze following. The
model shows that if the reward values associatedwith the
objects/events that caregivers tend to lookat are not higher than
those for random locations,gaze following will not emerge. By the
same token,infants whose caregivers produce few predictive gazecues
(e.g. due to visual deficits) should also learngaze following more
slowly.
5. Infants who find faces too attractive should have defi-cits
in gaze following. Using a caregiver reward muchhigher than the
target reward leads to deficits ingaze following in the model.
6. Infants who find faces uninteresting or aversive shouldhave
deficits in gaze following. Using small positiveor negative rewards
for looking at the caregiver leadsto gradual deficits in the
emergence of gaze follow-ing. This problem may be corroborated by a
poordevelopment of face processing skills caused byaversiveness (or
even neutrality) of faces.
7. Infants with deficits in attention-shifting should
exhibitdelays in learning gaze following. The model showsthat slow
attention-shifting (Tlat > 0) leads to a slug-gish emergence of
gaze following behavior.
8. Amount of caregiver contact should influence emerg-ence of
gaze following. An infant who experiencesfew face-to-face
interactions with caregivers may beslower to acquire gaze following
because of a shortageof relevant learning experiences.
9. Differences in caregiver behavior can aid or hinderthe
emergence of gaze following. Varying the modelparameters related to
the caregiver behavior (pshift,Tmin) while keeping the parameters
of the infantidentical, leads to differences in learning speed. It
islikely that ‘optimal’ caregiver behavior depends onparticular
infant parameters. Thus, the optimal care-giver behavior will
generally be different for eachinfant – especially in the case of
abnormally devel-oping infants. More work is needed to
understandthese issues and their potential ramificationsfor
therapeutic interventions (Teuscher & Triesch,2004).
10. Lesioning certain neural pathways should impair
gazefollowing behavior. We assume that informationabout the
caregiver’s direction of gaze is extractedfrom face processing
areas including (but not neces-sarily limited to) the Fusiform Face
Area (Kanwisher,McDermott & Chun, 1997). Control of gaze
shiftsis assumed to be mediated through areas such asthe Frontal
Eye Fields (Tehovnik, Sommer, Chou,Slocum & Schiller, 2000).
Our temporal differencelearning model assumes that pathways
betweenthese sites (direct or indirect) are modified duringlearning
and lesioning these pathways may impairgaze following.
-
140 Jochen Triesch et al.
© 2006 The Authors. Journal compilation © 2006 Blackwell
Publishing Ltd.
Discussion
We have proposed a model of the emergence of gazefollowing in
situated infant–caregiver interactions. Ouraccount is an
elaboration of ideas that explain the emerg-ence of gaze following
as a learning process driven byhedonic principles (Moore &
Corkum, 1994). Infants areviewed as pleasure-driven agents, who
learn to exploitinformation about their caregiver’s head movement
andhead pose (and, later, eye direction) to find interestingsights
in their environment. More specifically, we haveproposed a Basic
Set of structures and mechanisms thatallow the infant to succeed in
learning in an appropri-ately structured environment where the
caregiver tendsto look at things that the infant will find
interesting. Theproposed Basic Set has a small number of elements
but,as our computer simulations demonstrate, it is sufficientfor
gaze following to emerge. In particular, no additionalspecialized
cognitive modules are necessary to explainthe emergence of gaze
following in infant–caregiverinteractions. Note that all elements
of our proposedBasic Set are established within days of birth (or,
forattention-shifting, at around 3 months) in typicallydeveloping
infants. This does not mean that we think allother mechanisms are
unimportant for a comprehensiveaccount of the emergence of gaze
following. It merelymeans that other mechanisms are not required
forexplaining the basic gaze following phenomenon.
We have used the model to demonstrate how the BasicSet
mechanisms are sufficient to allow an infant to learnto associate a
particular head pose of the caregiver witha gaze shift to a
location outside of the infant’s field ofview. This specific
ability emerges rather late in normaldevelopment. Earlier signs of
gaze following may belearned in a very similar way, however. The
presence ofthe Basic Set mechanisms in even very young infantsmakes
a learning account of any earlier gaze followingcompetence
plausible. For example, in the context of thepresent model it is
easy to see that, say, gaze followingto targets inside the infant’s
field of view may be learnedwith the same mechanisms – only more
easily and faster/earlier. The only Basic Set element for which
there is noevidence of its presence within days of birth is the
abilityto shift gaze away from a central stimulus. Indeed,all
demonstrations of very early ‘gaze following’ have toremove the
face stimulus after the gaze shift to facilitatea gaze shift to the
periphery. Overall, we find it hard toenvision an account of the
progressive expansion of gazefollowing competence in infancy that
is not based on agradual learning process. Again, as stated in the
intro-duction, this view does not at all preclude the presenceof
evolved rudimentary propensities that contribute togaze following
in specific situations, but it places a clear
emphasis on learning, especially for the emergence ofmore
advanced gaze following skills.
It has been noted that infants will follow not only theline of
regard of humans, but also that of non-humanobjects with face-like
features, or objects that behavecontingently to them (Johnson,
Slaughter & Carey, 1998).This suggests that infants’ capacity
for joint attention isa generalizable skill that is not tightly
tied to specificsituations with specific caregivers. Rather, it is
a robustskill that extends flexibly to various social
interactions.Our model readily accounts for these findings, if
theadditional assumption is made that such non-humanobjects may be
able to activate some of the same headpose and gaze direction
sensitive neurons in the infant’sface processing areas that are
utilized for following thegaze of humans.
Related work
A few related models have recently been proposed in
theliterature. The idea of using temporal difference learningto
model the acquisition of gaze following was firstmentioned by
Matsuda and Omori (2001). They modela learning situation as used by
Corkum and Moore (1998),where an experimenter monitors the infant’s
behaviorand gives visual rewards to the infant when it follows
thecaregiver’s gaze. Their paper lacks details, however, anddoes
not explicitly model how the caregiver’s directionof gaze becomes
associated with certain gaze shifts. Weconsider explaining this
process to be the centralproblem of learning gaze following.
A recent model by Nagai, Hosoda, Morita and Asada(2003) has been
implemented in a robot. Their model,which was developed
concurrently with ours, shares anumber of aspects of our model
(Fasel et al., 2002; Carlson& Triesch, 2003). In Nagai et al.’s
model the infant alsolearns to associate head poses of the
caregiver withappropriate gaze shifts based on the success or
failure offinding a visually appealing stimulus. To this end,
aneural network is trained to map the robot’s current gazedirection
and an image of the caregiver’s face onto thedesired gaze shift.
Their model, however, does not utilizetemporal difference learning,
but rather an ad hoc learn-ing mechanism. Also, no attempts are
made to explainfailures of the emergence of gaze following in
eitherdevelopmental disorders or in other species. On thepositive
side, the authors do not make the simplifyingassumption that
caregiver head poses have a one-to-onecorrespondence with regions
in space, which we have usedhere. Nagai et al. also attempt to
explain the progressivedevelopment of gaze following skills as
described byButterworth and Jarrett (1991). However, a closer
lookat their model reveals that the most sophisticated
-
The emergence of gaze following 141
© 2006 The Authors. Journal compilation © 2006 Blackwell
Publishing Ltd.
so-called representational stage cannot be achieved. Incontrast,
new models of our group correctly capture thesequential emergence
of all skill levels described byButterworth and Jarrett (Lau &
Triesch, 2004; Jasso, Triesch& Teuscher, 2005). Interestingly,
these models predictthat limitations in head pose discrimination
ability and/or depth perception ability may be the factors
preventingyounger infants from learning advanced gaze
followingskills (Butterworth’s geometric and representational
stages).Taken together, the current study and our more recentones
point to the possibility that simple perceptuallimitations may
limit the emergence of advanced gazefollowing skills. We think it
is crucial for the field tocarefully study how perceptual skills
(head pose discrimi-nation, gaze direction estimation, depth
perception) andgaze following skills co-develop in the same
individual,in order to test the predicted causal relation
betweenthese factors.
Developmental disorders
Our account of the emergence of gaze following offersnew
perspectives on failures of its emergence in develop-mental
disorders. If a small Basic Set of ‘ingredients’ isdemonstrably
sufficient for the emergence of gaze fol-lowing, in situations
where the learning process doesnot succeed, one or several elements
of the Basic Set, ortheir interaction, has been compromised.
Elaborating onthis idea, we showed how changes to the model
motiv-ated by two different developmental disorders (autismand
Williams syndrome) can lead to delays or deficits inlearning gaze
following. In particular, our model is con-sistent with the idea
that in autism an initial reductionin preference for faces might be
at the root of a cascadeof problems leading to deficits in gaze
following andattention-sharing. Our account is also consistent
withevidence of the success of therapeutic interventions
whereinfants are explicitly rewarded for a desired behavior suchas
following gaze (Whalen & Schreibman, 2003). Finally,the model
points to the possibility that various combi-nations of a few small
alterations in the developinginfant, none of which may be critical
by itself, couldconspire to produce severe deficits. This is
consistent withthe characterization of autism as a spectrum
disorder.
While our accounts of deficits in gaze following indifferent
developmental disorders may seem simplistic, itnevertheless offers
important lessons. Most prominently,the model shows that very
different causes can lead todeficits in the emergence of gaze
following. These causesinclude (but are not limited to) parameters
related toface perception, learning, habituation and
value/rewardsystems. Given that several completely
independentcauses can all lead to deficits in gaze following,
it
appears ill-advised to use deficits in gaze following todefine a
disorder. This is still the case in autism, wheredeficits in social
interaction skills such as gaze followingare used to define the
syndrome. Our hope is that com-putational modeling efforts like
ours will help in under-standing complex developmental disorders by
helping tobetter differentiate symptoms and narrow down
theirprimary causes. This, in turn, will suggest promisingavenues
for treatment and early diagnosis.
Cross-species differences
A good account of the emergence of gaze followingshould also
explain differences in the emergence of gazefollowing behavior, or
the complete absence of it, inother species. Since a simple Basic
Set of structures andmechanisms is sufficient for gaze following to
emerge,any species with the Basic Set should be able to acquiregaze
following to some degree. Deficits or differences inthe Basic Set
may limit the emergence of gaze following,as seen in our discussion
of developmental disorders.
Across vertebrate species some Basic Set elementssuch as
habituation and reward-driven learning areessentially ubiquitous,
suggesting that these are likelynot the missing factors. This
inference demands somecaution, however, because the presence of,
say, reward-driven learning does not mean that just any
contingenciescan be learned. Nevertheless, we feel that differences
inother Basic Set elements are more relevant.
Regarding perceptual skills and preferences, the basicquestions
are how infants of other species might preferto look at
conspecifics, and how well they might distin-guish different head
or eye orientations. The first ques-tion can be studied with
controlled preferential lookingparadigms to evaluate visual
preferences for looking atconspecifics (or humans) (e.g. Bard,
Platzman, Lester &Suomi, 1992). Our model predicts that a (not
too big)preference for looking at conspecifics’ faces is
beneficial(although not strictly necessary) for gaze following
toemerge.
In terms of the ability to distinguish different heador eye
poses of conspecifics, there is evidence that, forexample, many
primate species can do so to some extent(Itakura, 2004).
Interestingly, eye direction may be par-ticularly easy to discern
for humans because of the whitesclera (Kobayashi & Kohshima,
1997; Emery, 2000).We assume that gaze direction (orientation of
the eyes)is more informative than just head pose, but it is
alsoharder to perceptually discriminate, because the eyes aresmall.
A first attempt to relate such differences to ourmodel is as
follows. If an animal with a weaker percep-tual system can only
inaccurately estimate a conspecific’shead position, then this cue
will be less predictive of
-
142 Jochen Triesch et al.
© 2006 The Authors. Journal compilation © 2006 Blackwell
Publishing Ltd.
interesting sights compared to accurate knowledge ofthe
conspecific’s direction of gaze. Thus, as explainedabove, we can
attempt to model limited perceptual skillsby reducing the
predictiveness of the caregiver’s gaze pvaild.As our experiments
showed, reducing pvaild slows theemergence of gaze following or
prevents it altogether. Thus,some species may not learn to follow
gaze at all or may onlylearn primitive forms of gaze following
because their per-ceptual apparatus does not allow them to gather
sufficientlyaccurate information about conspecifics’ gaze
directionto make gaze following worthwhile. A more detailedanalysis
of the perceptual requirements for higher gazefollowing skills
specifically implicates depth perceptionabilities and accuracy of
gaze direction estimation aspossible culprits (Lau & Triesch,
2004). Generally speak-ing, we can expect advanced gaze following
skills only inthose species that have adequate perceptual
abilities.
A related issue is foveation. The more foveation thereis in an
animal’s visual system, the more important itis to look directly at
the most relevant regions of theenvironment. Gaze following can
help to identify suchregions. At the same time, a more foveated
vision systemwill be better at making fine discriminations, say, of
aconspecific’s direction of gaze, which benefits gazefollowing.
Thus, we suspect that there may be a correla-tion between the
degree of foveation of a species’ visualsystem and its propensity
to follow gaze.
Regarding a structured social environment, a first con-dition
for the emergence of gaze following is that speciesmust live in
social groups. Further, the gaze of con-specifics must be
predictive of informative events. Note thatgaze can have a number
of other meanings in socialspecies that could potentially impact
gaze following. Forinstance, gaze aversion is found in several
monkeyspecies (Argyle & Cook, 1976). In such species, direct
eyecontact is a gesture of aggression and it is
particularlyimportant for members of such species to be sensitive
todirect versus averted gaze, as indicated by head and eyedirection
(Coss, 1978; Emery, 2000).
Point following
Although we have focused on gaze following in thispaper, note
that point following may be learned basedon the same principles.
Pointing with an outstretchedand aligned arm, hand and finger is
the most naturalway to intentionally direct another’s attention to
a newtarget, and caregivers and (older) infants do producepointing
gestures to direct each other’s attention (Bates,Camaioni &
Volterra, 1975; Lempers, 1979; Leung &Rheingold, 1981). To
model the emergence of pointfollowing, we could simply choose to
identify differentcaregiver head poses in the current model with
different
pointing gestures performed by the caregiver. However,there are
certain differences to consider. First, while thecaregiver
frequently shifts gaze, pointing gestures duringnaturalistic
exchanges are rare by comparison (Deáket al., 2004). Second,
pointing gestures are likely to bemore salient for infants because
of the large amount ofmovement involved. Third, infants may be
better atdiscriminating pointing direction than head
directionbecause the extended arm provides a better directionalcue
(Deák et al., 2000). Fourth, pointing gestures arelikely to be more
predictive of interesting events, becausecaregivers will tend to
engage in this ‘effort’ only whena particularly relevant
environmental stimulus is present.All but the first of these four
points suggest that it mightbe easier for infants to learn point
following. In fact,human infants by 9 months follow gaze much
morereliably when it is accompanied by a point (Flom, Deák,Phill
& Pick, 2003), and a quasi-naturalistic observa-tional study
shows that infants from 5 to 10 months arefar more likely to follow
a parent’s point than a parent’sgaze shift (Deák et al., 2004).
Future work
Of course, our model and the ones discussed above mustbe seen as
only first steps towards a full computationalaccount of the
emergence of gaze following. In manyrespects, these models are
still overly simplistic. Examplesof simplifications in our model
are the restriction to a smallset of discrete spatial regions, the
absence of peripheralvision and the stereotypic, non-interactive
behavior ofthe caregiver model, just to name a few. Recent work
hasstarted to address some of these issues (Lau & Triesch,2004;
Teuscher & Triesch, 2004; Jasso & Triesch, 2004).Another
limitation is that the model currently does notaddress how higher
attention sharing skills may emerge.Future work needs to
demonstrate that models such asthe present one can be scaled up to
explain the emerg-ence of more advanced attention sharing skills.
Despitethese shortcomings and limitations, we think our modelis a
useful step in theorizing about the emergence ofgaze following and
share