Top Banner
Dynamics of the Vocal Imitation Process: How a Zebra Finch Learns Its Song Ofer Tchernichovski, 1 * Partha P. Mitra, 2 * Thierry Lints, 1 * Fernando Nottebohm 1 * Song imitation in birds provides good material for studying the basic biology of vocal learning. Techniques were developed for inducing the rapid onset of song imitation in young zebra finches and for tracking trajectories of vocal change over a 7-week period until a match to a model song was achieved. Exposure to a model song induced the prompt generation of repeated structured sounds ( prototypes) followed by a slow transition from repetitive to serial delivery of syllables. Tracking this transition revealed two phenomena: (i) Imitations of dissimilar sounds can emerge from successive renditions of the same prototype, and (ii) developmental trajectories for some sounds followed paths of increasing acoustic mismatch until an abrupt correction occurred by period doubling. These dynamics are likely to reflect underlying neural and articulatory constraints on the production and imitation of sounds. Vocal imitation is guided by auditory infor- mation, requires intact hearing, and is very sensitive to the age or reproductive condition of the individual (1, 2). The brain circuits that govern this skill in songbirds have been de- scribed (3). We here report on conditions that bring vocal learning under fine experimental control and provide a detailed acoustic anal- ysis of the sound transformations that under- lie the learning process. Zebra finch (Taeniopygia guttata) males develop their song between 35 and 90 days after hatching, a time known as the sensitive period for vocal learning (4 ). This song con- sists of complex sounds (“syllables”) separat- ed by silent intervals (5). A song motif is composed of dissimilar syllables repeated in a fixed order (5). When a young male zebra finch is reared singly in the company of an adult male, it develops a song that is a close copy of the sounds and temporal order of that male’s song (4, 6 ). Acquisition of the audi- tory memory of the model song can start as early as 25 days after hatching, but this onset can be delayed by withholding exposure to the model (7, 8). Once acquired, a stored representation of the model song can be con- verted to a motor imitation. This conversion has been modeled by assuming simple Heb- bian and reinforcement learning rules (9). Nevertheless, past technical limitations en- countered when studying early song develop- ment have left much of the fine-grained struc- ture of the imitation process unexplored. In many songbirds, as in humans, first acquisition of auditory memories of learned sounds occurs before the onset of vocal learn- ing (10). Under such conditions, it can be difficult to distinguish between the learned and the innate component of the developing sounds. In the zebra finch, however, the sen- sory phase of model acquisition overlaps with the period of motor development of learned vocalization (5). We took advantage of this overlap to delay model acquisition so as to obtain a baseline of “untutored” song during the early subsong stage, and we then exam- ined the effect of exposure to a model song during the remainder of the sensitive period for vocal learning. Untutored subsong was recorded, and then birds were trained, starting on day 43 after hatching, to peck at a key to trigger a short song playback from a small speaker housed within a plastic male model (11). Training persisted until the end of the experiment. To enhance the identification of time-frequency structure in subsong, which typically consists of poorly structured sounds, we used multitaper spectral analysis techniques and estimated spectral derivatives that act like “edge detectors” in the time- frequency plane of the spectrogram (12) (Fig. 1). Figure 1B presents spectral derivatives of the emerging song of a bird just before train- 1 Field Research Center, The Rockefeller University, Millbrook, NY 12545, USA. 2 Bell Laboratories, Lucent Technologies, Murray Hill, NJ 07974, USA. *All authors contributed equally to this work. To whom correspondence should be addressed. E- mail: [email protected] Fig. 1. An example of training. (A) Acclima- tion to the training apparatus from days 30 to 42 after hatch- ing, in the presence of a plastic model of an adult male (on middle perch). (B) Untutored subsongs were record- ed on day 43. Spectral derivatives provide a representation of song that is similar but su- perior to the tradi- tional sound spectro- gram. Instead of pow- er spectrum versus time, we present di- rectional derivatives (changes of power) on a gray scale so that the detection of fre- quency contours is lo- cally optimized. This was particularly useful for the analysis of ju- venile song. (C) The keys were then uncov- ered. The bird learned to peck on either one of the keys to induce a short song playback from the plastic mod- el. (D) Song playback was composed of two renditions of the mod- el song depicted. The overall daily exposure was limited to 28 s. As shown, the bird’s song had changed by (E) the second and (F) the third day of training. R ESEARCH A RTICLES www.sciencemag.org SCIENCE VOL 0 MONTH 2001 1 rich4/se-sci/se-sci/se-orig/se9293d00r byrds S53 3/9/01 18:27 Art: Fig. 1 - 4/c NO. RA1058522/JCL/NEUROSCIENCE
7

Dynamics of the Vocal NO. RA1058522/JCL ...ofer.hunter.cuny.edu/publications/pubs/Science_Article.pdfDynamics of the Vocal Imitation Process: How a Zebra Finch Learns Its Song Ofer

Apr 21, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dynamics of the Vocal NO. RA1058522/JCL ...ofer.hunter.cuny.edu/publications/pubs/Science_Article.pdfDynamics of the Vocal Imitation Process: How a Zebra Finch Learns Its Song Ofer

Dynamics of the VocalImitation Process: How a Zebra

Finch Learns Its SongOfer Tchernichovski,1*† Partha P. Mitra,2* Thierry Lints,1*

Fernando Nottebohm1*

Song imitation in birds provides good material for studying the basic biologyof vocal learning. Techniques were developed for inducing the rapid onset ofsong imitation in young zebra finches and for tracking trajectories of vocalchange over a 7-week period until a match to a model song was achieved.Exposure to a model song induced the prompt generation of repeated structuredsounds (prototypes) followed by a slow transition from repetitive to serialdelivery of syllables. Tracking this transition revealed two phenomena: (i)Imitations of dissimilar sounds can emerge from successive renditions of thesame prototype, and (ii) developmental trajectories for some sounds followedpaths of increasing acoustic mismatch until an abrupt correction occurred byperiod doubling. These dynamics are likely to reflect underlying neural andarticulatory constraints on the production and imitation of sounds.

Vocal imitation is guided by auditory infor-mation, requires intact hearing, and is verysensitive to the age or reproductive conditionof the individual (1, 2). The brain circuits thatgovern this skill in songbirds have been de-scribed (3). We here report on conditions thatbring vocal learning under fine experimentalcontrol and provide a detailed acoustic anal-ysis of the sound transformations that under-lie the learning process.

Zebra finch (Taeniopygia guttata) malesdevelop their song between 35 and 90 daysafter hatching, a time known as the sensitiveperiod for vocal learning (4). This song con-sists of complex sounds (“syllables”) separat-ed by silent intervals (5). A song motif iscomposed of dissimilar syllables repeated ina fixed order (5). When a young male zebrafinch is reared singly in the company of anadult male, it develops a song that is a closecopy of the sounds and temporal order of thatmale’s song (4, 6). Acquisition of the audi-tory memory of the model song can start asearly as 25 days after hatching, but this onsetcan be delayed by withholding exposure tothe model (7, 8). Once acquired, a storedrepresentation of the model song can be con-verted to a motor imitation. This conversionhas been modeled by assuming simple Heb-bian and reinforcement learning rules (9).Nevertheless, past technical limitations en-countered when studying early song develop-ment have left much of the fine-grained struc-ture of the imitation process unexplored.

In many songbirds, as in humans, firstacquisition of auditory memories of learnedsounds occurs before the onset of vocal learn-

ing (10). Under such conditions, it can bedifficult to distinguish between the learnedand the innate component of the developingsounds. In the zebra finch, however, the sen-sory phase of model acquisition overlaps withthe period of motor development of learnedvocalization (5). We took advantage of thisoverlap to delay model acquisition so as toobtain a baseline of “untutored” song duringthe early subsong stage, and we then exam-ined the effect of exposure to a model songduring the remainder of the sensitive periodfor vocal learning. Untutored subsong wasrecorded, and then birds were trained, startingon day 43 after hatching, to peck at a key totrigger a short song playback from a smallspeaker housed within a plastic male model(11). Training persisted until the end of theexperiment. To enhance the identification oftime-frequency structure in subsong, whichtypically consists of poorly structuredsounds, we used multitaper spectral analysistechniques and estimated spectral derivativesthat act like “edge detectors” in the time-frequency plane of the spectrogram (12) (Fig.1). Figure 1B presents spectral derivatives ofthe emerging song of a bird just before train-

1Field Research Center, The Rockefeller University,Millbrook, NY 12545, USA. 2Bell Laboratories, LucentTechnologies, Murray Hill, NJ 07974, USA.

*All authors contributed equally to this work.†To whom correspondence should be addressed. E-mail: [email protected]

Fig. 1. An example oftraining. (A) Acclima-tion to the trainingapparatus from days30 to 42 after hatch-ing, in the presence ofa plastic model of anadult male (on middleperch). (B) Untutoredsubsongs were record-ed on day 43. Spectralderivatives provide arepresentation of songthat is similar but su-perior to the tradi-tional sound spectro-gram. Instead of pow-er spectrum versustime, we present di-rectional derivatives(changes of power) ona gray scale so thatthe detection of fre-quency contours is lo-cally optimized. Thiswas particularly usefulfor the analysis of ju-venile song. (C) Thekeys were then uncov-ered. The bird learnedto peck on either oneof the keys to induce ashort song playbackfrom the plastic mod-el. (D) Song playbackwas composed of tworenditions of the mod-el song depicted. Theoverall daily exposurewas limited to 28 s. Asshown, the bird’s songhad changed by (E)the second and (F) the third day of training.

R E S E A R C H A R T I C L E S

www.sciencemag.org SCIENCE VOL 0 ● MONTH 2001 1

rich

4/se

-sci

/se-

sci/

se-o

rig/

se92

93d0

0rby

rds

S53

3/9/

0118

:27

Art

:Fig

.1

-4

/cN

O.

RA

1058

522/

JCL

/NE

UR

OSC

IEN

CE

Page 2: Dynamics of the Vocal NO. RA1058522/JCL ...ofer.hunter.cuny.edu/publications/pubs/Science_Article.pdfDynamics of the Vocal Imitation Process: How a Zebra Finch Learns Its Song Ofer

ing (on the morning of training day 1), andFig. 1, E and F, shows the spectral derivativeson training days 2 and 3. As shown, song hadchanged remarkably by 2 days after the onsetof training: Sounds became more structuredand appeared in a more predictable temporalorder. On training day 3, some sounds werealready similar to those of the model song.

Indirect Imitation TrajectoriesViewed most simply, an imitation trajectorycould be represented by a path leading direct-ly from the acoustic features of sounds pro-duced before exposure, to those of a targetsound present in the model song (13). Alter-natively, an imitation trajectory might deviatefrom a direct path to negotiate constraintsimposed, for example, by propensities ofbrain function and/or the physics of soundgeneration by the vocal organ. For example,in the bird’s vocal organ, as in some musicalinstruments, pitch might become unstableacross certain ranges of airflow (14). Anindirect trajectory of sound changes couldeither avoid those ranges or take advantage ofthem. An example of an automatically traced(15) imitation trajectory of a simple harmonicstack (16) is presented in Fig. 2A. A rawimitation of the model’s harmonic stack wasapparent on training day 5, although the pitchat that time was slightly higher than the mod-el’s. We measured the pitch of harmonicstacks produced by this bird every day untilthe pitch matched that of the model. Asshown, the pitch error between the develop-ing sound and the model sound increasedslowly and consistently from training days 5

to 13, and then when the pitch reached thefrequency of the model’s first harmonic, itwas corrected by an abrupt period doubling.Assessed on the basis of pitch alone, modelapproximation in this case was indirect, butin other cases, it was direct (Fig. 2B). Aperiod-doubling trajectory makes sense onlyif the initial pitch is higher than that of themodel (17). To test for a possible effect ofinitial pitch on the trajectory taken, we exam-ined a sample of 10 period-doubling trajec-tories and found that the initial pitch rangedbetween 61 and 582 Hz above that of thetarget. Tracking 10 non–period-doubling tra-jectories gave a significantly lower initialpitch (range, 156 Hz below the target to 132Hz above it; t test, P , 0.01). We concludethat the initial pitch predicts most, though notall, period-doubling trajectories.

The above findings do not contradict themodel-approximation theory, which does notspecify how the approximation is achieved.For example, the “indirect” trajectory may beoperationally short if it is easier to increasepitch and then take advantage of the nonlin-ear dynamics of sound production by thesyrinx to reduce pitch by half (14). Our re-sults show, however, that the imitation trajec-tory of even a simple sound cannot be ex-plained by just invoking a gradual reductionof acoustic error. We now describe the songimitation process more generally.

The Early Generative PhaseThe song development process is complex,and to make progress in quantification, wereduced the song to a set of four features:

Wiener entropy, spectral continuity, pitch,and frequency modulation (12, 18) (Fig. 3A).On any 1 day, we characterized the song bythe distribution of these features, computedon a frame-by-frame basis over a 10-s timeperiod (12). This obviated the need for parti-tioning or classifying sounds (12, 15). Thus,in this section, study of song development isreduced to studying the development of fea-ture distributions. We studied the changes inboth the mean values and the SDs of thefeatures. In addition, we examined how thefeature distribution approached that of themodel using the Kolmogorov-Smirnov (KS)statistic (the changes in mean, SD, and KSstatistic do not necessarily mirror each other).

We used the SDs of features to define ameasure called “feature diversity” (19),which is an estimate of the range of differentsounds produced by a bird during a specificstage of learning. As shown in Fig. 3B, train-ing birds with a model song induced anabrupt increase of song feature diversity bythe second day of training (paired t test, n 542, P , 0.0001; in 29 out of 42 birds trained,feature diversity increased above the upper0.05% confidence interval, and in 2 birds, itdecreased below the lower 0.05% confidenceinterval). The changes in the mean featurevalues for Wiener entropy and spectral con-tinuity are shown in Fig. 3C. On average, thesounds produced had higher spectral continu-ity and lower Wiener entropy, signifyinghigher temporal stability and higher tonalityrespectively; that is, sounds became morestructured. Moreover, not only were pitch,spectral continuity, and Wiener entropy val-

Fig. 2. Indirect and di-rect approaches tothe imitation of har-monic stacks. (A)Spectral derivativesof a developing har-monic stack in refer-ence to a syllablefrom a model song(left). The pitch of theharmonic stack is giv-en at the top of eachpanel (16). A quanti-tative examination ofthe pitch error be-tween this developingharmonic and themodel harmonic stackshows a gradual in-crease of error, fol-lowed by an abruptperiod doubling thatreduced the error in asingle step (right).The graph presents themean pitch values ofharmonic stacks pro-duced by this birdacross 30-s samples ofsubsong recorded oneach training day. (B) An example of harmonic stack imitation where pitch error gradually decreased until a match to the model syllable was reached.

R E S E A R C H A R T I C L E S

● MONTH 2001 VOL 0 SCIENCE www.sciencemag.org2

Page 3: Dynamics of the Vocal NO. RA1058522/JCL ...ofer.hunter.cuny.edu/publications/pubs/Science_Article.pdfDynamics of the Vocal Imitation Process: How a Zebra Finch Learns Its Song Ofer

ues on the second day of training significant-ly different from those on day 1 (paired t test,n 5 42, P , 0.01), they also differed fromthose of age-paired males kept under similarconditions but without exposure to a modelsong (t test, n1 5 42, n2 5 12, P , 0.05). Wedo not have statistical data on moment-to-moment changes across birds, but in one birdthat was recorded continuously during thesecond day of training, we observed abruptacoustic changes over a period of 3 hours(Fig. 3D). The changes of feature diversityfor this bird from day 1 to day 2 were withinthe high end of the typical range (percen-tile 5 26).

It could be that the transitions that oc-curred during the second day of training re-sulted from excitement caused by song stim-ulation, rather than from learning. To test fora model-specific effect, we trained 10 birds

with a high-pitch (up to 6086 Hz) model songand 10 birds with a low-pitch (up to 2567 Hz)model song. As shown in Fig. 3E, the high-pitch model induced the generation of higherpitch sounds than the low-pitch model asearly as training day 2 (t test, P , 0.05),indicating that the early training-inducedchanges in acoustic structure were, at leastpartly, due to vocal learning. The first andlargest changes in Fig. 3, B through D, oc-curred during the second day of training, aftera night’s sleep (20). Thereafter, feature diver-sity increased moderately but significantly(correlation coefficient r 5 0.65, P , 0.05).We conclude that training birds with a modelsong induced the production of structuredsounds. This effect is in line with a generativeprocess (Fig. 3F), where structure emergesconcurrently with a steep increase in the di-versity of sounds produced. We refer to this

period of song learning as the early genera-tive phase.

Although the untutored subsong of anyone individual exhibited low feature diversi-ty, the feature distribution across individualsspanned a wide range of feature values [e.g.,the subsong of five randomly selected untu-tored birds had a low mean feature diversityof 0.87; when these samples were pooled,they gave a high feature diversity of 0.97(compare to Fig. 3B)]. We therefore won-dered whether birds that (by chance) startedwith song features similar to those of themodel imitated better than other birds. Thiswas not so: There was no correlation betweenthe KS statistic before training and the KSstatistic when imitation was complete (n 542, r 2 5 0.06). In other words, similarity tothe song model before training did not predictthe quality of final imitation.

Fig. 3. The effect oftraining on song fea-tures. (A) The foursong features: Wienerentropy (a measure of“tonality” from puretone to white noise),spectral continuity (ameasure of the con-tinuity of frequencycontours), pitch, andfrequency modula-tion (FM) (the changeof pitch over time,e.g., frequency down-sweep). (B through D)The effect of trainingwith a model songacross birds (n 5 42);error bars represent SEuniformly throughoutthe panels. There aresignificant changes offeature diversity (B)and in Wiener entropyand spectral continuity(C) during the secondday of training. An ex-ample of moment-to-moment changes insong features in an in-dividual bird is shownin (D). (E) Exposure toa high-pitch modelsong versus exposureto a low-pitch modelsong. The high-pitchmodel induced higherpitch sounds in thebird’s song starting ontraining day 2. (F) Pro-cesses of selection andgeneration can lead tosimilar outcomes. Todistinguish betweenthem, one needs longi-tudinal data. In thecase of selection, fea-ture diversity (represented here by the range of colors) decreases as imita-tion proceeds, whereas in the case of generation feature, diversity increases[see (B)]. (G) We used the KS statistic to trace the approximation of the

developing song to the model song in terms of distances between thedistributions of features. The KS statistic is 0.0 when a perfect match isachieved (e.g., when a song is compared to itself ).

R E S E A R C H A R T I C L E S

www.sciencemag.org SCIENCE VOL 0 ● MONTH 2001 3

Page 4: Dynamics of the Vocal NO. RA1058522/JCL ...ofer.hunter.cuny.edu/publications/pubs/Science_Article.pdfDynamics of the Vocal Imitation Process: How a Zebra Finch Learns Its Song Ofer

Although the feature distributions showedmarked changes during the first days of train-ing, they nevertheless did not move substan-tially closer to the corresponding distribu-tions of the model song (Fig. 3G). The fol-lowing example clarifies this: The mean pitchof the subsong shown in Fig. 1B was 952 Hz,whereas the mean pitch of the model (Fig.1D) was 1265 Hz. On training day 2, how-ever, the mean pitch of the bird’s song was1933 Hz. The mean pitch on day 2, thoughhigher than its initial value, was less similarto the model because high-pitch sounds weredisproportionately more frequent than in themodel song. The slow changes in the KSstatistic indicate that although the generativephase is intense, achieving accurate similarityto the entire model song is a slow process, towhich we now turn.

Transition from Repetitive to SerialProductionImmelmann and others (4, 5, 21) found thatsyllables imitated from a model song ap-peared in subsong before the appropriateserial order of syllables was apparent. Weobserved that young zebra finches tend toproduce back-to-back repetitions of similarsounds. As imitation proceeded, the birdsproduced a greater diversity of syllablesdelivered in a serial order, as in the modelsong. These events are reminiscent of theemergence in human infants of reduplicated(canonical) babbling (22–24 ) and the tran-sition to variegated babbling (10, 25). It istherefore of interest to measure this transi-tion and to examine how it might be affect-ed by training the bird with a specific mod-el song.

To measure the transition from repeti-

tive to serial delivery, we estimated themedian duration between two repetitions ofsound, termed “period” (26 ). As shown inFig. 4, this period increased during trainingfrom ;300 to 525 ms. We then partitionedthe 10-s sample of song into syllables (27 ).As shown in Fig. 4, the mean number ofsyllables per period increased as well, from1.5 to 3 syllables per period. Nevertheless,syllable duration did not change, indicatingthat the increase of period was not due to anincrease in syllable duration, nor to theassembly of short syllables into long sylla-bles. Rather, the increase in period reflect-ed the emergence of longer sequences ofdifferent syllables. Despite the marked in-crease in song structure on training day 2,there was only a moderate (nonsignificant)increase in the number of syllables perperiod during the first week of training.This result is in line with our observationthat the increase in song structure was ini-tially confined to one or two types of syl-lables that the bird sang repeatedly. Nosignificant increase in period was observedin control birds that were kept under similarconditions but that were not trained (28).Moreover, the song of some of our controlbirds (isolates) consisted of abnormal back-to-back repetitions of a same syllable type(29). We conclude that although sequentialpatterns of structured sounds appeared insome socially isolated birds, a transitionfrom repetitive to sequential sound produc-tion was encouraged by exposure to a mod-el song.

Sound Differentiation in SituWe now turn to questions of how localtransitions of acoustic features are linked tochanges in the temporal organization ofsounds and whether sounds of differenttypes, such as harmonic stacks, high-pitchnotes, and frequency downsweeps, emergefrom primitive versions of each type. Toassess syllable origins, we used an automat-ed procedure to trace the development ofspecific imitations backward in time (15,18) to reconstruct a likely trajectory ofsound alterations that could explain thefinal outcome. Imitation trajectories of fourbirds are shown in Fig. 5. In bird A, wetraced the imitation of two different sylla-bles of a model song: The first syllableincludes a high-pitch note, a vibrato, and aslightly modulated harmonic sound. Thesecond syllable, similar to a “male longcall” (5), starts with a harmonic down-sweep that is followed by a nonmodulatedharmonic stack. As shown, the imitationtrajectories of these two syllables originatefrom repetitions of the same early “proto-type” sound. The prototype sound was, inthis case, more similar to the first syllableand was transformed markedly to give rise

to the second one. Transformations oc-curred, apparently, while the relative posi-tions of the sounds involved remained un-changed. We call this effect “sound differ-entiation in situ,” to convey the notion thatantecedents of the sounds of the adult songdifferentiated in their final temporal rela-tion, with no translocations. Visual inspec-tion of the sound spectrograms supportedthis interpretation.

In bird B (Fig. 5), we present an imitationtrajectory of a different song syllable. Thefirst part of this syllable consists of a non-modulated harmonic stack; the second partconsists of a sharp high-pitch sweep and abroadband sound. There is a short high-pitchnote present in the bird’s song on training day35. There was no similar note in the modelsong, but the origin of the high note becomesclear when tracking it back in time. Althoughthe first and second parts of the imitatedsyllable were quite different, they emergedby transforming two back-to-back renditionsof the same prototype sound. Once again,sounds were differentiated in situ. Becausethe prototype was similar to the second partof the model syllable, developing the har-monic stack in the first part required majortransitions. This result is counterintuitive, asuntutored harmonic sounds could have pro-vided excellent raw material to develop theimitation of the harmonic stack. This latterstrategy appears to have been the one firstadopted by bird D (Fig. 5) during training day5. However, the harmonic stack present inthat bird’s song on day 5 was not generated insitu and was eventually abandoned. On train-ing day 21, bird D started to generate another(rather inaccurate) version of the harmonicstack, but this time in situ, and it persistedinto adulthood. This observation suggeststhat the laborious in situ differentiation of theharmonic sound was necessitated by con-straints that hinder sound translocation anddid not arise from a lack of “appropriate rawmaterial” for generating this sound.

We examined in 10 birds the develop-ment of the syllable shown in Fig. 5, Bthrough D. In all cases the harmonic stackdeveloped in situ and the high-pitch sweepdeveloped before the harmonic stack start-ed to emerge. Therefore, the three examples(Fig. 5, B through D) are representative ofhow this syllable developed. We infer thatsome aspects of an imitation trajectory arestereotyped across birds. The in situ differ-entiation of two back-to-back renditions ofa same prototype sound provides a mecha-nism of transition from the primitive, repet-itive state to a mature state where sequenc-es of dissimilar sounds are produced in afixed order. During this transition, ensem-bles of sounds, either within or across syl-lables, differentiate in fixed sequential re-lations and in a chronological order that is

Fig. 4. The transition from repetitive to serialproduction of sounds. The period of sound rep-etition (blue) increased during song develop-ment. The syllable duration (green) did notchange consistently, but the number of sylla-bles per period (red) increased throughout songdevelopment. Error bars indicate SE.

R E S E A R C H A R T I C L E S

● MONTH 2001 VOL 0 SCIENCE www.sciencemag.org4

Page 5: Dynamics of the Vocal NO. RA1058522/JCL ...ofer.hunter.cuny.edu/publications/pubs/Science_Article.pdfDynamics of the Vocal Imitation Process: How a Zebra Finch Learns Its Song Ofer

idiosyncratic to that ensemble. We do notknow if all types of sounds found in zebrafinch song differentiate in situ, but clearlythis is a common mechanism. In the currentdata set, we found at least one instance of insitu differentiation in 9 out of 10 birds (two

groups of 5, each group tutored with adifferent song composed of three or foursyllables). In five of these birds, there weretwo instances of in situ differentiation. Inaddition, our observations suggest that syl-lables of different types need not emerge

from primitive versions of each of thesetypes (Fig. 5E).

A Revised View of Vocal ImitationZebra finches can imitate the song syllablesthat they first hear during the sensitive period

Fig. 5. Sound differentiation in situ. Imitation of specific sounds wastracked back from parts of the mature version of a bird’s song(recorded on day 90 after hatching) until the automated procedurecould no longer find a suitable match among earlier sounds. Theautomated tracking of a bird’s imitation trajectory backward throughtime proceeds from the top to the bottom of the figure, and onlysamples of this tracking are shown here; the emergence of the bird’ssong is followed from bottom to top. The training day is indicated oneach spectrograph. (A) Bird A generated two different syllables fromsuccessive renditions of a common prototype. (B through D) Three

birds, respectively, trained with the same model song, generated twoparts of a syllable from successive renditions of the same prototype.The imitation trajectories across birds are similar. The red arrow (birdB) indicates a remnant of the prototype that generated this sound.The blue arrow (bird D) indicates harmonic stacks that emergedduring early stages of training that were eventually omitted. Theseharmonic stacks were not identified by the automated procedure, butby visual inspection. (E) Two models of song imitation from proto-types. For birds A through D, sounds of different types emerged froma common prototype as schematized in the right panel.

R E S E A R C H A R T I C L E S

www.sciencemag.org SCIENCE VOL 0 ● MONTH 2001 5

Page 6: Dynamics of the Vocal NO. RA1058522/JCL ...ofer.hunter.cuny.edu/publications/pubs/Science_Article.pdfDynamics of the Vocal Imitation Process: How a Zebra Finch Learns Its Song Ofer

for vocal learning (4, 5). We looked at howimitation proceeded when a model song wasfirst presented at 43 days of age, when juve-nile subsong is already in place. The phenom-enology we describe might have been differ-ent had model songs been presented earlier orlater. This caveat aside, it is clear that thetrajectories for vocal imitation we encoun-tered were neither arbitrary nor straightfor-wardly determined by acoustic differencesbetween the model syllables and the subsongof a juvenile before training.

The song imitation process may face cen-tral and peripheral constraints. Constraintsmay emerge, for example, from the nonlinearperipheral dynamics of sound production,placing demands on subtle control of thevocal apparatus if a complex sound is to beimitated. The period-doubling trajectoriesthat we observed represent examples of howthe song imitation process may take into ac-count such nontrivial peripheral dynamics.Constraints shaping the imitation trajectorymay also emerge from the need to integratesuccessive vocal gestures delivered with onlybrief or no intervening silent gaps—a contex-tual effect that perhaps favors in situ differ-entiation of sounds. In addition, central con-straints may arise from the way in whichforebrain song nuclei modulate the activity ofbrainstem circuits that first evolved to pro-duce simple, unlearned sounds or rhythmicbehaviors.

We found that different final sounds in thesong do not necessarily emerge from a prim-itive version of each of those types but maybe generated from a same prototype. Muchexperimental work will be necessary to iden-tify the variables that shape these imitationtrajectories. In some instances, these trajec-tories can be indirect, leading us to proposethat, in such cases, direct imitation trajecto-ries might be costly in terms of the overallcontrol effort required to arrive at the finalsound ensemble. A systematic description ofthe diversity of prototype sounds and of theoperations that birds perform to achieve awide range of mature syllables may helpclarify the contribution of innate and externalfactors shaping the imitation trajectory.

Although vocal learning remains a highlycomplex phenomenon, the tools that we usedsimplify the objective study of its dynamics.A fully automated recording system is nowavailable (30) that can capture the entirety ofvocal ontogeny and analyze changes in realtime, paving the way for identifying the mo-lecular, cellular, and circuit events that mustunderlie the moment-to-moment progressiontoward vocal imitation.

References and Notes1. W. H. Thorpe, Ibis 100, 535 (1958).2. M. Konishi, Z. Tierpsychol. 22, 770 (1965).3. F. Nottebohm, in The Design of Animal Communica-

tion, M. Hauser, M. Konishi, Eds. (MIT Press, Cam-bridge, MA, 1999), pp. 63–110.

4. K. Immelmann, in Bird Vocalization, R. Hinde, Ed.(Cambridge Univ. Press, Cambridge, 1969), pp. 61–74.

5. R. E. Zann, The Zebra Finch: Synthesis of Field andLaboratory Studies (Oxford Univ. Press, Oxford,1996).

6. O. Tchernichovski, F. Nottebohm, Proc. Natl. Acad.Sci. U.S.A. 95, 8951 (1998).

7. L. A. Eales, Anim. Behav. 33, 1293 (1985).8. A. E. Jones, C. ten Cate, P. J. B. Slater, J. Comp.

Psychol. 110, 354 (1996).9. T. W. Troyer, A. J. Doupe, J. Neurophysiol. 84, 1204

(2000).10. A. J. Doupe, P. K. Kuhl, Annu. Rev. Neurosci. 22, 567

(1999).11. Young males were raised by their mothers (no adult

male present) until they were 30 days old. Eachjuvenile was then placed singly in a soundproofchamber that contained a plastic model of an adultzebra finch male (Fig. 1A). At 35 to 40 days afterhatching, birds started to produce soft subsong, re-corded for each bird on days 42 and 43. Trainingstarted on day 43. Within 36 hours, most birds (42out of 50) began to peck at either one of two keys(31, 32); the 8 birds that failed to peck by that timewere not included in the data. We provided two keysto encourage pecking activity (e.g., to overcome pref-erences to one side of the cage). Pecking either one ofthe keys induced the playback of the same short (1.4s) model song from a tiny speaker placed inside theplastic bird (32). Each playback consisted of twoidentical repetitions of a single song motif recordedfrom an adult bird. Training day 1 was taken as thefirst day the bird activated the song control keys.Each day consisted of two training sessions (32). Inbrief, during each session, we reinforced the first 10key pecks with a song playback. Additional key peckswere allowed but were not reinforced, so that theoverall daily quota of model song that a bird couldtrigger was, at most, 28 s. We trained a total of 42birds with one of four different model songs, and 12additional birds were kept in the training apparatusas controls and not trained. A few minutes of songwere recorded digitally (16 bits, 44.1 kHz) from eachbird at least once a day during days 42 to 52 afterhatching and at least once a week thereafter, untilday 90 (by which time they produced stable songtypical of adults).

12. O. Tchernichovski, F. Nottebohm, C. E. Ho, B. Pesaran,P. P. Mitra, Anim. Behav. 59, 1167 (2000).

13. C. W. Clark, P. Marler, K. Beeman, Ethology 76, 101(1987).

14. M. S. Fee, B. Shraiman, B. Pesaran, P. P. Mitra, Nature395, 67 (1998).

15. Similarity measurements were performed as de-scribed in (12) and (18). Briefly, two sounds wereconsidered similar if feature analysis for the twosounds yielded at least 90% similarity. In tracingimitation trajectories, we took a conservative ap-proach and also verified the trajectories a posterioriby visual inspection. For example, a syllable from themature song was compared to a random sample of10 s of song recorded on an earlier day (e.g., day 83).If the similarity criterion was met and an earlierversion of the sound was identified, then this earlierversion was used as a point of entry for the nextcomparison with a still earlier sample of song, and soon recursively. This approach circumvents the needto partition juvenile sounds, as the similarity sectionautomatically provides boundaries. Although the pro-cedure could trace similarity of one sound to two ormore earlier fragments (e.g., to test for sound trans-location), robust implementation of this capabilitywould require continuous recording, which was notdone in the present study. To ensure that imitationtrajectories were reproducible, we repeated the pro-cedure using different 10-s samples of sound fromeach day that song was recorded. We confirmed thatan imitation trajectory was reproducible by calculat-ing the similarity across the two independently iden-tified antecedents. We maintained confidence in thetrajectory as long as the significant similarity be-tween antecedent versions on a same day was above80% [(18), pp. 43–45].

16. Harmonic stacks are composed of nonmodulatedharmonic frequencies, where the fundamental fre-quency defines the pitch. Harmonic stacks weretraced automatically as described (12, 15, 18). Eventhough period doubling causes a discontinuouschange of pitch, tracing is still possible with the otherfeatures. Pitch was measured in the middle of thetraced sound.

17. When the initial pitch (at the beginning of the tra-jectory) was lower than the target, reaching the firstharmonic would require that the fundamental becrossed first, matching the pitch to the model soundbefore period doubling could occur, which then wouldseem to make period doubling unnecessary.

18. O. Tchernichovski, P. P. Mitra, Sound Analysis UserManual, version 1.01 (ed. 2, 2000) (available athttp://talkbank.org/animal/sa.pdf ).

19. The feature diversity d(t ) is a scaled average (acrossbirds and features) of the standard deviations sa(t ,b) of song features a 5 1, . . ., 4, computed for a givenbird b and for a given time epoch t (t denotes theday of the 10-s sample) with the samples of thefeatures involved during that 10-s epoch. The SDs fordifferent features have different scales; this is takencare of by defining a scale for each feature by poolingall observations for that feature across birds b andmeasurement windows t

s̄a 51

TNbO

t

Ob

sa~t, b!

where T is the total number of epochs and Nb is thenumber of birds. For each training day we compute

d(t) 51

4NbOa51

4 Ob51

Nb

sa~t, b)

s̄a

The SE of the feature diversity is computed for agiven epoch by treating the birds as different statis-tical samples. The error bars on Fig. 3C are SEs of themean feature values. To obtain these, we computedthe mean feature values for individual birds for agiven epoch; the grand mean of these individualmeans is then plotted. The SEs are computed asbefore by treating the birds as different statisticalsamples. This measure should not be confused withthe feature diversity, which denotes the spread offeature values within individual birds. The two arenot in principle related.

20. A. S. Dave, A. C. Yu, D. Margoliash, Science 282, 2250(1998).

21. H. Hultsch, D. Todt, Anim. Behav. 44, 590 (1992).22. R. E. Stark, in Child Phonology, G. H. Yeni-Komshian,

J. F. Kavanagh, C. A. Ferguson, Eds. (Academic Press,New York, 1980), pp. 73–92.

23. D. K. Oller, in Child Phonology, G. H. Yeni-Komshian,J. F. Kavanagh, C. A. Ferguson, Eds. (Academic Press,New York, 1980), pp. 93–112.

24. M. M. Vihman, in Developmental Neurocognition:Speech and Face Processing in the First Year of Life, B.de Boysson-Bardies et al., Eds. (Kluwer, Dordrecht,Netherlands, 1993), pp. 411–419.

25. L. Roug, I. Landberg, J. Lundberg, J. Child Lang. 16, 19(1989).

26. Period was calculated by implementing the similaritymeasurement (15) as follows. Starting from a ran-domly selected frame within a 10-s epoch, the timeframe is first moved forward from the point of originuntil it encounters a frame that is less than 75%similar to the sound at the selected frame, so as todepart from the original sound. The time window ismoved further forward, until it encounters a soundthat is at least 90% similar to the original frame. Thisgives an instantiation of the time elapsed betweentwo similar sounds. This procedure is repeated 50times, each time starting from a different randomlyselected frame within the 10-s epoch, and the medi-an of the durations thus identified is a measure ofperiodic structure in the song. We call this the period.

27. To characterize the overall temporal structure of thejuvenile song with reproducible statistical results, wepartition sounds to syllables (continuous sounds)based on “silent intervals,” where no frequency con-tours are detected (12). Although the statistical es-timate of the number of syllables is reproducible, the

R E S E A R C H A R T I C L E S

● MONTH 2001 VOL 0 SCIENCE www.sciencemag.org6

Page 7: Dynamics of the Vocal NO. RA1058522/JCL ...ofer.hunter.cuny.edu/publications/pubs/Science_Article.pdfDynamics of the Vocal Imitation Process: How a Zebra Finch Learns Its Song Ofer

units themselves are not similarly robust in the ju-venile song and would not be suitable for automatedcategorization.

28. To test for a possible effect of training on the tran-sition from repetitive to serial delivery of syllables,we raised 12 control birds without exposure to themodel song under housing conditions otherwise iden-tical to those encountered by the birds that weretrained. We found that the period of the mature songin the control birds was significantly shorter than thefinal period in the trained birds (366 6 66 ms versus524 6 26 ms, P , 0.01) and was within the range ofthe period soon after the onset of training.

29. P. Price, J. Comp. Physiol. Psychol. 93, 260 (1979).30. Our software for automated recording, song recog-

nition, and real-time analysis of the imitation pro-cess, as well as the raw sound data, are available (atno charge) at http://talkbank.org/animal.

31. P. Adret, Anim. Behav. 46, 149 (1993).32. O. Tchernichovski, T. Lints, P. P. Mitra, F. Nottebohm,

Proc. Natl. Acad. Sci. U.S.A. 96, 12901 (1999).33. We thank M. Konishi, M. Schmidt, P. Marler, and the

reviewers for their useful comments and the staff ofThe Rockefeller University Field Research Center for

their technical support. Supported by U.S. PublicHealth Service (PHS) grant DC04722-01 to O.T., PHSgrant MH18343 to F.N., the Mary Flagler Cary Char-itable Trust, the Herbert and Nell Singer Foundation,and the Phipps Family Foundation.

21 December 2000; accepted 27 February 2001Published online 15 March 2001;10.1126/science.1058522Include this information when citing this paper.

Article is 1039 picas

R E S E A R C H A R T I C L E S

www.sciencemag.org SCIENCE VOL 0 ● MONTH 2001 7