Top Banner
Shared and distinct neural correlates of vowel perception and production Krystyna Grabski a, * , Jean-Luc Schwartz a , Laurent Lamalle b, c , Coriandre Vilain a , Nathalie Vallée a , Monica Baciu d , Jean-François Le Bas b, e , Marc Sato a, * a Gipsa-Lab, Département Parole & Cognition, UMR CNRS 5216 & Grenoble Université, France b Structure Fédérative de Recherche n 1 RMN Biomédicale et Neurosciences”– Unité IRM 3T Recherche, Centre Hospitalier Universitaire de Grenoble, France c INSERM, France d Laboratoire de Psychologie et Neurocognition, UMR CNRS 5105 & Université Pierre Mendès France, France e Centre Hospitalier Universitaire de Grenoble, France article info Article history: Received 19 November 2012 Accepted 25 November 2012 Keywords: Speech perception Speech production Sensorimotor interactions Phonetic features Mirror system fMRI Sparse sampling abstract Recent neurobiological models postulate that sensorimotor in- teractions play a key role in speech perception and speech motor control, especially under adverse listening conditions or in case of complex articulatory speech sequences. The present fMRI study aimed to investigate whether isolated vowel perception and pro- duction might also induce sensorimotor activity, independently of syllable sequencing and coarticulation mechanisms and using a sparse acquisition technique in order to limit inuence of scan- ner noise. To this aim, participants rst passively listened to French vowels previously recorded from their own voice. In a subsequent production task, done within the same imaging session and using the same acquisition parameters, participants were asked to overtly produce the same vowels. Our results demonstrate that a left postero-dorsal stream, linking auditory speech percepts with articulatory representations and including the posterior inferior frontal gyrus, the adjacent ventral premotor cortex and the tem- poroparietal junction, is an inuential part of both vowel percep- tion and production. Specic analyses on phonetic features further conrmed the involvement of the left postero-dorsal stream in vowel processing and motor control. Altogether, these results * Corresponding authors. GIPSA-LAB, UMR CNRS 5216, Grenoble Universités, Domaine Universitaire BP 46, 38402 Saint Martin dHères cedex, France. Tel.: þ33 04 76 57 50 61; fax: þ33 04 76 57 47 10. E-mail addresses: [email protected] (K. Grabski), [email protected] (M. Sato). Contents lists available at SciVerse ScienceDirect Journal of Neurolinguistics journal homepage: www.elsevier.com/locate/ jneuroling 0911-6044/$ see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.jneuroling.2012.11.003 Journal of Neurolinguistics xxx (2013) 384408
25

Shared and distinct neural correlates of vowel perception and production

Mar 10, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Shared and distinct neural correlates of vowel perception and production

Journal of Neurolinguistics xxx (2013) 384–408

Contents lists available at SciVerse ScienceDirect

Journal of Neurolinguisticsjournal homepage: www.elsevier .com/locate/

jneurol ing

Shared and distinct neural correlates of vowelperception and production

Krystyna Grabski a,*, Jean-Luc Schwartz a, Laurent Lamalle b,c,Coriandre Vilain a, Nathalie Vallée a, Monica Baciu d, Jean-François Le Bas b,e,Marc Sato a,*

aGipsa-Lab, Département Parole & Cognition, UMR CNRS 5216 & Grenoble Université, Franceb Structure Fédérative de Recherche n�1 “RMN Biomédicale et Neurosciences” – Unité IRM 3T Recherche,Centre Hospitalier Universitaire de Grenoble, Francec INSERM, Franced Laboratoire de Psychologie et Neurocognition, UMR CNRS 5105 & Université Pierre Mendès France, FranceeCentre Hospitalier Universitaire de Grenoble, France

a r t i c l e i n f o

Article history:Received 19 November 2012Accepted 25 November 2012

Keywords:Speech perceptionSpeech productionSensorimotor interactionsPhonetic featuresMirror systemfMRISparse sampling

* Corresponding authors. GIPSA-LAB, UMR CNRMartin d’Hères cedex, France. Tel.: þ33 04 76 57 5

E-mail addresses: [email protected]

0911-6044/$ – see front matter � 2013 Elsevier Lthttp://dx.doi.org/10.1016/j.jneuroling.2012.11.003

a b s t r a c t

Recent neurobiological models postulate that sensorimotor in-teractions play a key role in speech perception and speech motorcontrol, especially under adverse listening conditions or in case ofcomplex articulatory speech sequences. The present fMRI studyaimed to investigate whether isolated vowel perception and pro-duction might also induce sensorimotor activity, independently ofsyllable sequencing and coarticulation mechanisms and usinga sparse acquisition technique in order to limit influence of scan-ner noise. To this aim, participants first passively listened to Frenchvowels previously recorded from their own voice. In a subsequentproduction task, done within the same imaging session and usingthe same acquisition parameters, participants were asked toovertly produce the same vowels. Our results demonstrate thata left postero-dorsal stream, linking auditory speech percepts witharticulatory representations and including the posterior inferiorfrontal gyrus, the adjacent ventral premotor cortex and the tem-poroparietal junction, is an influential part of both vowel percep-tion and production. Specific analyses on phonetic features furtherconfirmed the involvement of the left postero-dorsal stream invowel processing and motor control. Altogether, these results

S 5216, Grenoble Universités, Domaine Universitaire BP 46, 38402 Saint0 61; fax: þ33 04 76 57 47 10.pg.fr (K. Grabski), [email protected] (M. Sato).

d. All rights reserved.

Page 2: Shared and distinct neural correlates of vowel perception and production

K. Grabski et al. / Journal of Neurolinguistics xxx (2013) 384–408 385

suggest that vowel representations are largely distributed oversensorimotor brain areas and provide further evidence for a func-tional coupling between speech perception and productionsystems.

� 2013 Elsevier Ltd. All rights reserved.

1. Introduction

Do speech representations draw on procedural knowledge and sensory-motor experience? Althougha functional distinction between frontal motor sites for speech production and temporal auditory sitesfor speech perception has long been postulated, some recent psycholinguistic and neurobiologicalmodels of speech perception and/or production rather argue for a functional coupling between sensoryandmotor systems. Thesemodels have in common to postulate that sensorimotor interactions play a keyrole in speech perception and speech motor control (Callan, Jones, Callan & Akahane-Yamada, 2004;Guenther, 2006; Guenther & Vladusich, 2012; Hickok, Houde, & Rong, 2011; Hickok & Poeppel, 2000,2004, 2007; Perkell, 2012; Rauschecker, 2011; Rauschecker & Scott, 2009; Schwartz, Ménard, Basirat, &Sato, 2012; Scott & Johnsrude, 2003; Skipper, Van Wassenhove, Nusbaum, & Small, 2007; Wilson &Iacoboni, 2006). During speech production, modulation of neural responses observed within the audi-tory and somatosensory cortices are thought to reflect feedback control mechanisms in which sensoryconsequences of the speech-motor act are evaluatedwith actual sensory input in order to further controlproduction (for recent reviews, see Guenther & Vladusich, 2012; Perkell, 2012). Conversely, motor systemactivity observed during speech perception has been proposed to partly constrain phonetic inter-pretation of the sensory inputs through the internal generation of candidate articulatory categorizations(for recent reviews, see d’Ausilio, Craighero, & Fadiga, 2012; Schwartz, Sato, & Fadiga, 2008).

The working hypothesis of the present functional magnetic resonance imaging (fMRI) study capi-talizes on this ‘embodied’ approach and on the theoretical proposal that phonetic processing is driven byboth sensory and motor constraints and develops from self-organization processes at the ontogeneticand cultural scales (Schwartz, Abry, Boë, & Cathiard, 2002; Schwartz, Boë, & Abry, 2007; Schwartz et al.,2012). In this framework, the present study is focused on oral vowels, considered as elementary units,which present the double interest (1) to be the most simple conceivable speech units, poorly con-taminated by complex coarticulation effects, and hence able to reveal the core perceptuo-motor networkfor speech perception and speech production; and (2) to be well specified and described in articulatoryand auditory terms, around a universal set of phonetic features defining their place of articulation.Although some of the relevant cortical structures have been identified in recent years, precisely whichstructures of the brain are specialized in auditory processing and motor control of vowels, and how theyare organized functionally during both speech perception and production remain largely unclear.

1.1. Sensorimotor interactions in speech perception and production

One long-standing problem in understanding how listeners process the acoustic signal to recoverphonetic information comes from the high variability of the speech signal. Indeed, the coarticulation-driven composition of articulatory commands during speech production is non-linearly transformedinto a complex composition of acoustic features, so that the correspondence between sounds andphonemes is far from transparent (see Perkell & Klatt, 1986). From this problem, several psycholin-guistic models of speech perception argue that phonetic interpretation of sensory speech inputs isdetermined, or at least partly constrained, by articulatory procedural knowledge (Fowler, 1986;Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967; Liberman & Mattingly, 1985; Liberman &Whalen, 2000; Schwartz et al., 2002, 2012; for recent reviews, Galantucci, Fowler, & Turvey, 2006;Schwartz et al., 2008).

Over the past two decades, thanks to new techniques and new discoveries about the primate andhuman brains, this question has resurfaced again and has lead to heated debates. This resurgence ofinterest is largely due to the discovery of ‘mirror neurons’, consisting of a particular set of premotor and

Page 3: Shared and distinct neural correlates of vowel perception and production

K. Grabski et al. / Journal of Neurolinguistics xxx (2013) 384–408386

parietal brain regions and forming a specialized action observation/execution matching system in themacaque’s brain. Mirror neurons are polymodal visuo-motor or audio-visuo-motor neurons in theventral premotor and posterior parietal cortices (areas F5 and PF) of the macaque monkey which havebeen shown to discharge both when themonkey performs hand ormouth actions andwhen it views orlistens to similar actions made by another individual (Di Pellegrino, Fadiga, Fogassi, Gallese, &Rizzolatti, 1992; Ferrari, Gallese, Rizzolatti, & Fogassi, 2003; Fogassi et al., 2005; Gallese, Fadiga,Fogassi, & Rizzolatti, 1996; Kohler et al., 2002; Keysers et al., 2003; Rizzolatti, Fadiga, Gallese, &Fogassi, 1996). The existence of mirror neurons suggests that action observation partly involves thesame neural circuits that are used in action performance. Since then, numerous neurophysiological andbrain imaging experiments have provided evidence for the existence of a putative mirror-neuronsystem in humans (for reviews, Rizzolatti & Craighero, 2004; Rizzolatti, Fogassi, & Gallese, 2001). Inaddition to action recognition, the human mirror neuron system has been proposed to play a funda-mental role in speech processing by providing a neurophysiological mechanism that creates ‘motorparity’ between communicating individuals (e.g., Arbib, 2005; Gentilucci & Corballis, 2006; Rizzolatti &Arbib, 1998; Rizzolatti & Craighero, 2004; see also Aboitiz & Garcia, 1997).

More direct evidence for a link between speech perception and production comes from theobserved activity of speech motor areas (i.e., the left posterior part of the inferior frontal gyrus, theventral premotor and primary motor cortices) and areas subserving proprioception related to mouthmovements (i.e., the somatosensory cortex) in speech perception tasks. These motor and somato-sensory areas are active during auditory, visual and/or auditory-visual speech perception (e.g., Callan,Callan, Gamez, Sato, & Kawato, 2010; Callan et al., 2003; Callan et al., 2004; Calvert & Campbell, 2003;Möttonen et al., 2004; Nishitani & Hari, 2002; Ojanen et al., 2005; Paulesu et al., 2003; Pekkola et al.,2006; Pulvermüller et al., 2006; Skipper, Nusbaum, & Small, 2005; Skipper et al., 2007; Tremblay &Small, 2011; Wilson & Iacoboni, 2006; Wilson, Saygin, Sereno, & Iacoboni, 2004). In addition, resultsfrom both single-pulse transcranial magnetic stimulation (TMS) and fMRI studies suggest that activitywithin the ventral premotor cortex and the orofacial primary motor cortex are somatotopicallyorganized in relation to the orofacial effectors involved in the production of the perceived speechstimuli (Fadiga, Craighero, Buccino, & Rizzolatti, 2002; Pulvermüller et al., 2006; Roy, Craighero, Fabbri-Destro, & Fadiga, 2008; Sato, Buccino, Gentilucci, & Cattaneo, 2010; Skipper et al., 2007). Finally, recentrepetitive and double-pulse TMS studies also suggest that the speech motor centers are causallyrecruited during speech categorization, especially in case of acoustically ambiguous syllables or whenphonological segmentation or working memory processes are strongly required (D’Ausilio et al., 2009,2011; Meister, Wilson, Deblieck, Wu, & Iacoboni, 2007; Möttönen & Watkins, 2009; Sato, Tremblay, &Gracco, 2009; see also Sato et al., 2011). From these latter studies, it is worthwhile noting that, althoughit is not a matter of debate that speech perception and production systems interact in some ways (e.g.,Diehl, Lotto, & Holt, 2004; Lotto, Hickok, & Holt, 2009), the question whether articulatory processesmediate speech perception in adults still remains vigorously debated (e.g., d’Ausilio et al., 2009; Hickok& Poeppel, 2007; Lotto et al., 2009; Sato et al., 2009; Scott, McGettigan, & Eisner, 2009; Sato et al., 2011).Rather, it has been argued that sensorimotor integration processes in speech perception is crucial forspeech development and the acquisition of a new vocabulary (Hickok & Poeppel, 2000, 2004, 2007) or,in adults, during conversational exchange, especially for the timing of turn-taking (Scott et al., 2009).

The existence of sensorimotor interaction is also a central idea in speech production research. Forinstance, modulation of the auditory cortex responses during self-produced overt speech, comparedwith when the utterances are taped and replayed to the subjects, and during covert speech in theabsence of speech-contingent auditory input, has been shown in a number of brain imaging studies(e.g., Christoffels, Formisano, & Schiller, 2007; Christoffels, van de Ven, Waldorp, Formisano, & Schiller,2011; Curio, Neuloh, Numminen, Jousmaki, & Hari, 2000; Houde, Nagarajan, Sekihara, & Merzenich,2002; Numminen & Curio, 1999; Paus, Perry, Zatorre, Worsley, & Evans, 1996). In addition, duringovert speech production, increased activity has been observed in the auditory cortex during altered ordelayed auditory feedback (e.g., Christoffels et al., 2007; Hashimoto & Sakai, 2003; Heinks-Maldonado,Nagarajan, & Houde, 2006; Tourville, Reilly, & Guenther, 2008), as well as in the anterior supramarginalgyrus during unexpected somatosensory feedback (Golfinopoulos et al., 2011), as compared to normalauditory/somatosensory feedback. Taken together, these results are usually interpreted in the frame-work of feedback corrective sensory-to-motor loops. For instance, in the DIVA model of speech

Page 4: Shared and distinct neural correlates of vowel perception and production

K. Grabski et al. / Journal of Neurolinguistics xxx (2013) 384–408 387

production (Directions Into Velocities of Articulators; e.g., Guenther, 2006; Guenther & Vladusich,2012), modulated responses within the auditory cortex are thought to reflect online corrective con-trol mechanisms in which sensory consequence of the speech-motor act are evaluated with actualsensory input in order to further control production and to help distinguishing the sensory conse-quences of our own actions from sensory signals due to changes in the outside world (for similarmodels derived from state feedback control theory and internal model of the vocal tract, see alsoHickok et al., 2011; Ventura, Nagarajan, & Houde, 2009).

1.2. Shared and distinct neural correlates of vowel perception and production

Taken together, these neurobiological models and brain imaging studies argue for a functionalcoupling between speech perception and production. However, despite clear support from brain im-aging and TMS studies for sensorimotor interactions in speech processing and motor control, few fMRIstudies conjointly examined the neural correlates of both speech perception and speech production,and all of them investigated speech at the syllabic level (Callan et al., 2010; Okada & Hickok, 2006;Pulvermüller et al., 2006; Skipper et al., 2007; Tremblay & Small, 2011; Wilson et al., 2004; Zheng,Munhall, & Johnsrude, 2010).

In the present fMRI experiment, we sought to determine the shared and distinct neural correlates ofsteady-state vowel perception and production, as elementary speech units. Based on the above-mentioned studies, one fundamental question is whether vowel perception and production bothinvolve a left postero-dorsal processing stream (e.g., Callan et al., 2004, 2010; Hickok & Poeppel, 2000,2004, 2007; Hickok et al., 2011; Rauschecker, 2011; Rauschecker & Scott, 2009; Scott & Johnsrude,2003; Skipper et al., 2007; Tremblay & Small, 2011; Wilson & Iacoboni, 2006), linking auditoryspeech percepts with articulatory representations in the motor cortex. In addition, does vowel pro-ductionmodulate the sensory cortex responses, compared to speech perception (e.g., Curio et al., 2000;Christoffels et al., 2007, 2011; Houde et al., 2002; Numminen & Curio, 1999; Paus et al., 1996)? Sincethey are assumed to crucially depend on acoustic/articulatory and phonetic complexity, sensorimotorinteractions during vowel perception and production, as elementary speech units and independentlyof any syllable sequencing and coarticulation mechanisms, would provide further support for a func-tional coupling between the auditory and motor systems.

Another goal of the experiment was to further test whether phonetic information about distinctivevowel features might lead to a topographic segregation in the sensory and motor cortices. Indeed,recent fMRI studies of speech perception using multivariate statistical pattern recognition analysesdemonstrated robust above-chance classification for a set of isolated vowel stimuli in temporal areasranging from lateral Heschl’s gyri anterior and posterior, down into the superior temporal sulcus(Formisano, DeMartino, Bonte, & Goebel, 2008), as well as for vowel versus consonant categories in theanterior part of the superior temporal gyrus (Obleser, Leaver, Vanmeter, & Rauschecker, 2010). How-ever, these studies specifically focused on the auditory system and, in our best knowledge, no studiesexamined the neural coding of phonetic features during both vowel perception and production.

To these aims, participants first passively listened to 9 French steady-state vowels (multiples oc-currences of /i/, /y/, /u/, /e/, /ø/, /o/, / 3/, /œ/ and /ɔ/ vowels) previously recorded from their own voice. Ina subsequent production task, done within the same imaging session and using exactly the sameacquisition parameters, participants were asked to overtly produce the same 9 French steady-statevowels. These vowels were selected in order to compare the following phonetic features: height(close, mid-close and mid-open vowels), backness (front and back vowels) and roundedness (roundedand unrounded vowels). As expected (Ladefoged, 2006; Ménard, Schwartz, & Aubin, 2008; Schwartz,Boë, Vallée, & Abry, 1997a, 1997b), acoustical analyses showed that height feature, corresponding tothe vertical position of the jaw, was inversely correlated with the relative frequency of the first formant(F1) for close, mid-close and mid-open vowels. Roundedness/backness features, corresponding to theconfiguration of the lips (rounded or not) and to the position of the tongue (front or back), werecorrelated with the relative frequency of the second formant (F2) decreasing from unrounded/front torounded/front and to rounded/back vowels. Therefore, our experimental design varied orthogonallyheight (close, mid-close and mid-open vowels) and backness/roundedness (unrounded/front, roun-ded/front and rounded/back vowels) in these two acoustic/articulatory dimensions.

Page 5: Shared and distinct neural correlates of vowel perception and production

K. Grabski et al. / Journal of Neurolinguistics xxx (2013) 384–408388

Finally, it is also worthwhile noting that speech interactions and phonetic processing are known tovary as a function of the external environment and most of the studies which conjointly examined theneural correlates of both speech perception and speech production used fMRI acquisition protocolsentailing continuous loud scanner noise, during either the perception or production tasks or both(Callan et al., 2010; Okada & Hickok, 2006; Skipper et al., 2007; Tremblay & Small, 2011; Wilson et al.,2004). As previously mentioned, it is well known that auditory feedback is crucial in monitoring vocaloutput. Altering the quality of acoustic feedback during speech production not only induces com-pensatory articulatory changes but also activity changes in brain areas related to auditory monitoring,speech-motor planning and control, as compared to normal auditory feedback (e.g., Christoffels et al.,2007, 2011; Tourville et al., 2008; Zheng et al., 2010). In addition, some neurobiological models ofspeech perception also assume that the motor system is strongly recruited under adverse conditions inorder to resolve phonetic ambiguity (Callan et al., 2004; Skipper et al., 2005, 2007; Wilson & Iacoboni,2006). This proposal is indirectly supported by fMRI studies showing stronger activity of the motorsystem depending on the intelligibility of speech input, as during masked or distorted versus intelli-gible speech (e.g., Binder, Liebenthal, Possing, Medler, & DouglasWard, 2004; Zekveld, Heslenfeld,Festen, & Schoonhoven, 2006) or during the auditory identification of non-native versus native pho-nemes (e.g., Callan et al., 2004; Wilson & Iacoboni, 2006). Therefore, in order to limit possible noise-induced bias in our results, we used a sparse, clustered acquisition fMRI technique during bothvowel perception and production, which allows stimulus presentation and overt responses to occur inrelative silence as well as to eliminate the susceptibility artifact due to articulatory-related movement.

2. Methods

2.1. Participants

Fourteen healthy adults (eleven males and three females with a mean age of 29 years, ranging from21 to 44 years), native French speakers, participated in the study after giving their informed consent.All were right-handed according to standard handedness inventory (Oldfield, 1971), had normal orcorrected-to-normal vision and reported no history of motor, speaking or hearing disorders. Partici-pants were screened for neurological, psychiatric, other possible medical problems and contraindi-cations to MRI. The protocol was approved by the Grenoble University Ethical Committee and wascarried out in accordance with the ethical standards of the 1964 Declaration of Helsinki.

2.2. Stimuli

Prior to the experiment, multiple utterances of 9 French steady-state vowels (/i/, /y/, /u/, /e/, /ø/, /o/,/ 3/, /œ/ and /ɔ/) were individually recorded by each participant in a soundproof room. The vowels wereselected according to the following phonetic features: height (close, mid-close and mid-open vowels),backness (front and back vowels) and roundedness (rounded and unrounded vowels). Each vowel wasproduced 10 times in a pseudo-randomized order (the same vowel never occurring twice in succes-sion). For each utterance, a 1000 ms visual instruction informed the participants about the vowel toproduce and indicated the onset and offset of the vowel production. Participants were instructed toproduce each vowel from a neutral closed-mouth position as soon as they perceived the visual in-struction and to maintain the production until the visual cue disappeared. The interval between eachutterance was 5 s. In total, 1260 vowels were recorded and digitized in individual sound files ata sampling rate of 44.1 kHz with 16-bit quantization recording. For each vowel and each participant,the fundamental frequency (F0), the first and second formants (F1, F2) and the durationwere calculatedusing the Praat software (Institute of Phonetic Sciences, University of Amsterdam, NL). Out of 10 oc-currences, 6 clearly articulated tokens were selected per speaker and per vowel for the fMRI experi-ment (perception task). The corresponding data are provided in Table 1 and displayed on Fig. 1A. Vowelduration was around 500 ms, showing that subjects typically began their production 500 ms after theonset of the 1000 ms-visual instruction. The distribution of formant values for French vowels appearedclassical, with some dispersion due to inter-subject differences (including gender effects). As expected(Ladefoged, 2006; Ménard et al., 2008; Schwartz et al., 1997a, 1997b), with F1 and F2 values expressed

Page 6: Shared and distinct neural correlates of vowel perception and production

Table 1Mean values of fundamental frequency (F0 in Hz), first and second formants (F1, F2 in Hz) and duration (in ms) for each pre-recorded vowel type averaged across participants (standard errors of the mean are indicated).

Vowel Height Backness Roundedness Mean F0 Mean F1 Mean F2 Mean duration

/i/ Close Front Unrounded 157 � 13 267 � 09 2141 � 49 0.521 � 0.041/y/ Close Front Rounded 154 � 12 269 � 10 1803 � 31 0.506 � 0.044/u/ Close Back Rounded 158 � 12 287 � 12 759 � 30 0.519 � 0.044/e/ Mid-close Front Unrounded 152 � 12 372 � 13 2105 � 56 0.525 � 0.042/ø/ Mid-close Front Rounded 149 � 11 365 � 11 1380 � 43 0.520 � 0.040/o/ Mid-close Back Rounded 151 � 12 378 � 11 784 � 28 0.525 � 0.042/ 3/ Mid-open Front Unrounded 148 � 12 534 � 23 1826 � 53 0.505 � 0.045/œ/ Mid-open Front Rounded 149 � 12 508 � 24 1425 � 44 0.524 � 0.042/ɔ/ Mid-open Back Rounded 149 � 12 530 � 29 973 � 36 0.541 � 0.041

K. Grabski et al. / Journal of Neurolinguistics xxx (2013) 384–408 389

as a percentage from the grand mean for all vowels, vowel height (close, mid-close and mid-openvowels) was inversely correlated with the relative frequency of F1, while F2 remains stable (seeFig. 1B, Top). Conversely, vowel roundedness/backness (from unrounded/front to rounded/front and torounded/back vowels) was positively correlated with the relative frequency of F2, while F1 values re-mains stable (see Fig. 1B, bottom). Therefore, our experimental design varied orthogonally height(close, mid-close and mid-open vowels) and backness/roundedness (unrounded/front, rounded/frontand rounded/back vowels) in these two acoustic/articulatory dimensions.

2.3. Procedure

The fMRI experiment consisted of 6 functional runs (see Fig. 2): the first 3 runs involved passivevowel listening (perception task) and the last 3 runs involved overt vowel production (productiontask). To minimize possible covert motor simulation, the perception task was first performed with theparticipants just being instructed to pay attention to the auditory stimuli. In order tomatch as closely aspossible the volume of the auditory stimuli in the perception task and that of the auditory feedback inthe production task (despite of bone/skull conduction), an intensity matching procedure was done foreach participant prior to the experiment (Christoffels et al., 2007). To this aim, after being placed in thescanner, participants first overtly produced several vowels and then passively listened to few vowelspreviously recorded from their own voice. Participants were asked whether or not the volume wassimilar to hearing their own voice and, if necessary, the volume of the stimuli was increased ordecreased.

Fig. 1. A) Two-dimensional F1–F2 acoustic space of the pre-recorded 9 French steady-state vowels according to the 1260 utteranceswith the mean F1–F2 values averaged across participants for each vowel type (error bars represent standard errors of the mean,SEM). See Table 1 for details. B) Mean F1 and F2 values expressed as a percentage from the grand mean of all vowels according toheight (close, mid-close, mid-open vowels – top) and roundedness/backness phonetic dimensions (unrounded/front, rounded/front,rounded/back vowels – bottom).

Page 7: Shared and distinct neural correlates of vowel perception and production

Fig. 2. A) Experimental design. The fMRI experiment consisted of 6 functional runs: the first 3 runs involved passive vowel listening(perception task) and the last 3 runs involved overt vowel production (production task). For both tasks, each vowel or restingcondition occurred 18 times in a pseudorandomized order. B) Timeline of a single trial. For each trial, the time interval between theperceived or produced vowel and the midpoint of the following functional scan acquisition was randomly varied between 4 s, 5 s or6 s. TR: Repetition Time; TA ¼ Acquisition Time.

K. Grabski et al. / Journal of Neurolinguistics xxx (2013) 384–408390

In theperception task, participantspassively listened to the9Frenchsteady-state vowels (/i/, /y/, /u/, /e/, /ø/, /o/, / 3/, /œ/ and /ɔ/) previously recorded from their own voice (6 distinct occurrences per vowel). Aresting condition, without any movement or auditory stimulation, served as baseline. Each of the 3functional runs consisted of 63 trials and each trialwas 10 s in length (with a single vowel being presentedor without any auditory presentation in the resting condition). In the production task, done within thesame imaging session and using exactly the same acquisition parameters, participants were asked toovertly produce the same 9 French steady-state vowels. As in the perception task, a resting condition,without any movement or auditory stimulation, was also added. There were 63 trials in each functionalrun andeach trialwas10 s in length. In each trial, avisual instruction related to thevowel (e.g., “i”) or to therestingcondition (“–”)wasdisplayed for1000ms, indicating theonset andoffset for thevowelproduction.Participants were instructed to produce each vowel from a neutral closed-mouth position as soon as theyperceived the visual instruction and to maintain the production until the visual cue disappeared.

Apart from vowel production, participants were instructed not to move during the whole exper-imental session to avoid head-movement artifacts. They were trained a few days prior to the scanningsession and both tasks were practiced again just before entering into the scanner. After the experiment,no participant reported any difficulty performing the tasks. During the production task, a digital audiorecording was made of the participants’ verbal responses to monitor their production during scanningand for offline analysis (see below).

2.4. Data acquisition

Magnetic resonance images were acquired with a 3T whole-body MRI scanner (Bruker MedspecS300) equipped with a transmit/receive quadrature birdcage head coil. Participants laid supine in thescanner with head movements minimized with foam cushions. To reduce auditory exposition toscanner noise, they wore earplugs in addition to protective ear shells equipped with noise-reducingpassive material and MRI-compatible headphones, through which auditory stimuli were adminis-tered to participants during the perception task. In the production task, vowel productions wererecorded using an MRI-compatible microphone. Visual instructions were presented using the Pre-sentation software (Neurobehavioral Systems, Albany, USA) and displayed on a screen situated behindthe scanner and viewed on a mirror fixed above the subject’s eyes.

The fMRI experiment consisted of six functional runs and one anatomical run. Functional imageswere obtained using a T2*-weighted, echoplanar imaging (EPI) sequence with whole-brain coverage(TR ¼ 10 s, acquisition time ¼ 2600 ms, TE ¼ 30 ms, flip angle ¼ 90�). Each functional scan comprisedforty axial slices parallel to the anteroposterior commisural plane acquired in interleaved order(72 � 72 matrix; field of view: 216 � 216 mm2; 3 � 3 mm2 in plane resolution with a slice thickness of3 mm without gap). A high-resolution T1-weighted whole-brain structural image was acquired foreach participant after the third functional run (MP-RAGE, sagittal volume of 256� 224�176mm3witha 1 mm isotropic resolution, inversion time ¼ 900 ms, two segments, segment repetition time-¼ 2500 ms, segment duration ¼ 1795 ms, TR/TE ¼ 16/5 in ms with 35% partial echo, flip angle ¼ 30�).

Page 8: Shared and distinct neural correlates of vowel perception and production

K. Grabski et al. / Journal of Neurolinguistics xxx (2013) 384–408 391

In order to avoid movement artefacts due to vowel production and to minimize scanner noiseduring both vowel perception and production, a “sparse sampling” acquisition paradigmwas used (e.g.,Birn, Bandettini, Cox, & Shaker,1999; Bohland & Guenther, 2006; Grabski et al., 2012; Gracco, Tremblay,& Pike, 2005; Hall et al., 1999; Obleser et al., 2006, 2007; Özdemir, Norton, & Schlaug, 2006;Pulvermüller et al., 2006; Sörös et al., 2006; Tourville et al., 2008; Uppenkamp, Johnsrude, Norris,Marslen-Wilson, & Patterson, 2006; Zaehle et al., 2007; Zheng et al., 2010). This acquisition techni-que is based on neurophysiological properties of the slowly rising hemodynamic response, which isestimated to occur with a 4-6 s delay in case of speech perception and production (e.g., Bohland &Guenther, 2006; Grabski et al., 2012; Gracco et al., 2005; Sörös et al., 2006; Zaehle et al., 2007). Inthe present study, functional scanning therefore occurs only during a fraction of the TR, alternatingwith silent interscanning periods, where participants listened to or produced single vowels. In theperception task, in each run and for each TR, the time interval between the vowel onset and themidpoint of the following functional scan acquisition was varied between 4 s, 5 s or 6 s. In the pro-duction task, due to a 500 ms estimated oral response delay (coherent with what is observed in thepreliminary vowel recording session, see above), the time interval between the visual instruction onsetand the midpoint of the following functional scan acquisitionwas varied between 4.5 s, 5.5 s or 6.5 s ineach run and for each TR.

Each of the 6 functional runs consisted of 63 functional scans andwas 10.5min in length. In each run,the9vowels and the resting conditionwerepresentedorproduced6 times inapseudorandomizedorder(the same vowel or phonetic feature never occurring twice in succession). During vowel perception andproduction, the order of delay times was also pseudorandomly counterbalanced (the same delay ofacquisition never occurred twice in successive functional scans), with the three delay times occurring 18times (2 times per vowel). Three “dummy” scans at the beginning of each run were added to allow forequilibration of the MRI signal and were removed from the analyses. In total, 198 functional scans weretherefore acquiredduring both theperception andproduction tasks, consisting of 18 functional scans foreach of the 9 vowels, 18 functional scans per baseline and the 6 � 3 “dummy” scans.

2.5. Data analysis

Data were analyzed using the SPM5 software package (Wellcome Department of Imaging Neuro-science, Institute of Neurology, London, UK) running onMatlab 7.1 (Mathworks, Natick, MA, USA). Brainactivated regions were labeled using the SPM Anatomy toolbox (Eickhoff et al., 2005) and, whennecessary, using the Talairach Daemon software (Lancaster et al., 2000). For visualization, activationmaps were superimposed on a standard brain template using theMRICRON software (http://www.sph.sc.edu/comd/rorden/mricron/).

2.5.1. Production errorsParticipant responses in the production task were analyzed offline for possible error productions. In

total, 2268 vowels and 252 baseline trials were recorded and digitized at a sampling rate of 44.1 kHzwith 16-bit quantization recording. An auditory inspection of all vowels and baseline trials was carried-out to ensure the correctness of participant responses. Three types of errors were observed: omission,wrong production and hesitation. In total,19 functional scans inwhich an error occurredwere removedfrom the statistical analyses (on average 0.75% � 0.25% of errors per participant), with general per-formance exceeding 99%, and at least 90% correct responses in all runs for all participants.

2.5.2. Data preprocessingApart from production errors and before statistical analyses, the first three volumes of each run

(“dummy” scans) were also discarded. For each participant, the functional series were first realigned byestimating the 6 movement parameters of a rigid-body transformation in order to control for headmovements between scans. After segmentation of the T1 structural image and coregistration to themean functional image, all functional imageswere spatially normalized into standard stereotaxic spaceof the Montreal Neurological Institute (MNI) using segmentation parameters of the T1 structuralimage. All functional images were then smoothed using a 6 mm full-width at half maximum Gaussian

Page 9: Shared and distinct neural correlates of vowel perception and production

K. Grabski et al. / Journal of Neurolinguistics xxx (2013) 384–408392

kernel, in order to improve the signal-to-noise ratio and to compensate for the anatomical variabilityamong individual brains.

2.5.3. Data analysisFor each participant, neural activations related to the perception and production tasks were ana-

lyzed using the General Linear Model (GLM; Friston et al., 1995). In the GLM, each run included 9 re-gressors of interest (one for each perceived or produced vowel) and 6 realignment parameters, withthe silent trials forming an implicit baseline. In total, all regressors of interest and baseline conditionscomprised 18 functional scans. The blood-oxygen-level dependence (BOLD) response for each eventwas modeled using a single-bin finite impulse response (FIR) basis function spanning the time ofacquisition (2.6 s). Before estimation, a high-pass filtering with a cutoff period of 128 s was applied.Beta weights associated with the modeled FIR responses were then computed to fit the observed BOLDsignal time course in each voxel for each condition. For both perception and production tasks, indi-vidual statistical maps were calculated for each vowel contrasted with the related baseline and sub-sequently used for group statistics.

In order to draw population-based inferences (Friston, Holmes, & Worsley, 1999), a second-levelrandom effect group analysis was carried-out. Our experimental design varied orthogonally height(close, mid-close and mid-open vowels) and backness/roundedness (unrounded/front, rounded/frontand rounded/back vowels) in both the acoustic and articulatory dimensions. Therefore, a three-wayrepeated measures analysis of variance (ANOVA) was performed, with the “Task” (2 levels: percep-tion, production), “Height” (3 levels: close, mid-close, mid-open) and “Roundedness/Backness” (3levels: unrounded/front, rounded/front, rounded/back) conditions as within-subject factors and thesubjects treated as a random factor. Two t-contrasts were first calculated to determine the mean ac-tivations specifically associated with vowel perception and production, irrespectively of the phoneticfeatures. To identify overlapping regions of activation between the perception and production tasks,a conjunction analysis (Friston, Holmes, Price, Buchel, & Worsley, 1999; Friston, Holmes, & Worsley,1999; Nichols, Brett, Andersson, Wager, & Poline, 2005) was then conducted. Conversely, an F-con-trast related to the main effect of the task factor was assessed to determine brain regions that showedsignificant change in activity between the perception and production tasks. Finally, two F-contrastsrelated to themain effect of height and roundedness/backness dimensions were calculated for both theperception and production tasks. All inferences are reported at a false discovery rate (FDR; Genovese,Lazar, & Nichols, 2002) corrected level of p < .05 and a cluster extent of 25 voxels. Finally, specific t-contrasts were computed for each vowel in both the perception and production tasks (see Table S1 andFigure S1 in the supplementary materials).

For all contrasts, maximum activation peaks were first determined in each cluster. The location ofactivation peaks were then labeled according to probabilistic cytoarchitectonic maps (Eickhoff et al.,2005) as implemented in the SPM Anatomy toolbox. If a brain region was not assigned or not speci-fied in the SPM Anatomy toolbox, the coordinates of the activation peak were converted from MNIspace to the standard stereotaxic space of Talairach and Tournoux (1988) and the related brain regiondetermined using the Talairach Daemon software (Lancaster et al., 2000).

3. Results

Surface rendering of brain activity and maximum activation peak observed in the perception andproduction tasks, the conjunction analysis and the main effect of task are provided in Tables 2 and 3 andFigs. 3 and 4. Surface rendering of brain activity according to the main effect of height and roundedness/backness dimensions in the perception and production tasks are provided inTable 4 and Figs. 5–7. Surfacerendering of brain activity and maximum activation peak observed for each vowel in the perception (A)and the production (B) tasks are provided in Figure S1 and Table S1 in the supplementary materials.

3.1. Production

Vowel production induced bilateral activations of the primary sensorimotor and premotor cortices,the posterior inferior frontal gyrus (pIFG; pars opercularis), the rolandic operculum and the superior

Page 10: Shared and distinct neural correlates of vowel perception and production

Table 2Maximum activation peak summary for the perception and production tasks and the conjunction analysis. All contrasts arecomputed from the random-effect group analysis (p < .05, FDR corrected, cluster extent threshold of 25 voxels, coordinates inMNI space).

Regions H BA Perception BA Production BA Conjunction

x y z T x y z T x y z T

Frontal regionsInferior frontal gyrus L 44 �54 8 26 5.87 44 �54 8 26 7.14 44 �54 8 26 4.87

L 45 �52 32 16 3.17R 44 60 4 16 3.07 44 54 4 �2 8.66

Premotor cortex L 6 �60 0 22 15.51R 6 36 �16 28 3.59 6 46 �6 58 8.76

Primary motor cortex L 4 �46 �12 34 23.45R 4 52 �6 36 27.48

Middle Frontal Gyrus R 10 28 48 16 3.45Dorsolateral prefrontal cortex L 9 �42 8 30 5.32

L 46 �44 26 28 4.08Superior frontal gyrus L 9 �28 42 34 3.49Supplementary motor area L 6 �2 �4 62 11.27Temporal regionsTemporopolar area L 38 �56 8 �8 4.33 38 �56 8 �6 4.16 38 �56 8 �6 4.16

R 38 38 2 �18 3.10Superior temporal gyrus L 22 �60 �44 10 7.22 22 �58 �44 8 5.92 22 �58 �44 8 5.92

R 22 56 �8 4 12.64 22 50 �18 6 17.05 22 56 �8 4 12.64Transverse temporal gyrus L 41 �52 �20 6 14.47 41 �36 �34 16 18.18 41 �52 �20 6 14.47

R 41 42 �24 12 13.50 41 56 �26 10 7.64Middle temporal gyrus L 21 �46 �60 10 3.59

L 37 �56 �58 14 3.33R 21 40 �8 �10 3.63

Fusiform gyrus R 37 42 �50 �14 2.94Parietal regionsSomatosensory cortex L 3 �18 �40 50 2.57

R 2 26 �40 50 2.48Supramarginal gyrus L 40 �48 �38 18 8.31 40 �52 �38 16 9.03 40 �48 �38 18 8.31

R 40 54 �30 18 7.68 40 52 �30 28 10.17Rolandic operculum L 43 �62 �8 8 14.27

R 43 62 �20 12 9.46 43 40 �8 18 9.35 43 62 �18 14 7.99Superior parietal

lobule/PrecuneusL 7 �8 �74 36 3.49 7 �22 �50 42 4.47R 7 30 �60 60 4.11 7 22 �54 64 3.04

Insular cortexclaustrum L 13 �46 �20 26 4.20 13 �38 4 4 8.43

L �32 2 8 3.48Limbic systemHippocampus L �28 �32 �2 3.89Parahippocampal gyrus R 19 40 �48 �6 3.25Mammillary body L �2 �14 �12 3.68Anterior cingulate cortex L 24 0 �8 26 4.78 32 �4 12 40 8.31

R 24 14 �6 32 3.96 32 8 14 34 4.22Posterior cingulate cortex L 23 �6 �32 24 5.84Thalamus L �14 �32 �2 3.40 �12 �18 4 8.76 �14 �32 �2 3.40

R 14 �14 �2 3.62 14 �14 �2 4.27 14 �14 �2 3.62Basal gangliaStriatum L �8 0 26 3.72 �28 �14 �2 9.89

R 8 16 16 4.76Substantia nigra L �12 �22 �12 3.48 �12 �22 �12 3.29 �12 �22 �12 3.29

R 12 �26 �8 5.56 10 �24 �8 4.64 10 �24 �8 4.57MidbrainRed nucleus L �8 �26 �6 4.23 �8 �26 �6 5.12 �8 �26 �6 4.23

CerebellumCulmen L �14 �64 �12 8.09Occipital regions

(continued on next page)

K. Grabski et al. / Journal of Neurolinguistics xxx (2013) 384–408 393

Page 11: Shared and distinct neural correlates of vowel perception and production

Table 2 (continued )

Regions H BA Perception BA Production BA Conjunction

x y z T x y z T x y z T

Striate cortex L 17 �2 �70 12 3.62 17 �6 �74 12 10.54 17 �2 �70 12 3.62R 17 6 �84 10 4.59 17 20 �62 10 9.45

Extrastriate cortex L 19 �44 �64 �2 3.18 18 �2 �68 20 3.91R 18 10 �66 14 8.09

K. Grabski et al. / Journal of Neurolinguistics xxx (2013) 384–408394

temporal gyrus/sulcus (STG/STS). Activations of the supplementary motor area (SMA, extending to theanterior cingulate cortex), the left anterior insular cortex and the superior and inferior parietal cortices(including the supramarginal gyrus (SMG), the precuneus and the parietal operculum) were alsoobserved. Further activity was found in the basal ganglia (caudate nucleus and substantia nigra), thelimbic system (anterior cingulate cortex and thalamus), the red nucleus, the cerebellum (culmen) andthe visual cortex.

3.2. Perception

As in the production task, vowel perception induced large bilateral activation of the STG/STS, fromthe temporopolar area anteriorly to the temporo-parietal junction posteriorly. STG/STS activityincluded the primary, secondary and associative auditory cortices and extended medially to the pos-terior insular cortex and inferiorly to the middle temporal gyrus (MTG). Bilateral frontal activationswere observed in the posterior IFG around the pars opercularis, extending to the adjacent ventralpremotor cortex, and with additional activation of the pars triangularis in the left hemisphere. Addi-tional frontal activations were also displayed in the left dorsolateral prefrontal cortex, the left superiorfrontal gyrus and the right middle frontal gyrus. Superior and inferior parietal activations wereobserved in the SMG, the precuneus and the right rolandic operculum. Further activity was observed inthe basal ganglia (including the striatum and the substantia nigra), in limbic structures (including thethalamus, the cingulate cortex and the hippocampus) and in the red nucleus. Finally, small clusters ofstriate and extrastriate visual activations were also observed.

3.3. Conjunction

The conjunction analysis revealed common bilateral activation of the STG/STS (including the pri-mary, secondary and the associative auditory cortices), extending rostrally to the left temporopolararea and dorsally to the left SMG and to the right parietal operculum. Activation of the opercular part ofthe inferior frontal gyrus, extending to the adjacent ventral premotor cortex, was observed in the lefthemisphere. Additional activity was observed in the thalamus, the substantia nigra, the red nucleusand the visual cortex.

3.4. Main effect of task

The main effect of tasks revealed significant activity differences between the perception andproduction tasks. Compared to their respective baseline, enhanced activity in the production taskwas observed in several brain areas with almost no activity in the perception task (see Fig. 4, top).These brain regions included parts of the primary motor and premotor cortices, the SMA, the rightSMG, the insular cortex, the basal ganglia (caudate nucleus and globus pallidus), the limbic system(anterior cingulate cortex and thalamus) and the visual cortex. In addition, temporal and parietalregions in the left superior temporal and tranverse gyri as well as in the right rolandic operculumalso showed enhanced activity compared to the baseline in both the production and perceptiontasks but with stronger activity in the production task. Finally, ‘deactivations’ (i.e., lower activitycompared to both the baseline) were also observed in the production task (see Fig. 4, bottom) inpart of the left IFG (pars triangularis) and the right prefrontal cortex, the left middle frontal gyrus,

Page 12: Shared and distinct neural correlates of vowel perception and production

Table 3Maximum activation peak summary for the main effect of task. All contrasts are computed from the random-effect groupanalysis (p < .05, FDR corrected, cluster extent threshold of 25 voxels, coordinates in MNI space).

Regions H BA Main effect – task Contrast estimates

x y z F Perception Production

Production > perceptionFrontal regionsPremotor cortex L 6 �46 �12 34 171 �0.06 0.76

R 6 52 �6 36 216 �0.03 0.71Primary motor cortex L 4 �40 �16 38 143 �0.05 0.73

R 4 20 �30 60 36 �0.03 0.20Supplementary motor area L 6 �2 �4 62 39 �0.04 0.48Temporal regionsSuperior temporal gyrus L 22 �60 4 4 15 0.09 0.45Transverse temporal gyrus L 41 �42 �28 14 20 0.29 0.59Parietal regionsSupramarginal gyrus R 40 46 �28 26 19 0.01 0.21Rolandic operculum R 43 56 �10 14 31 0.13 0.47Insular cortex

L 13 �36 �10 16 16 �0.03 0.13R 13 38 �8 18 27 �0.03 0.19

Limbic systemAnterior cingulate cortex L 32 �4 10 40 19 �0.05 0.08

R 32 20 32 16 16 �0.05 0.08Thalamus L �12 �16 2 24 �0.02 0.15

R 28 �2 �8 19 �0.04 0.16Basal gangliaCaudate nucleus R 36 �40 6 10 �0.04 0.07Globus pallidus L �18 0 �2 13 �0.05 0.06Occipital regionsStriate cortex L 17 �14 �66 6 22 �0.01 0.30

R 17 22 �62 8 27 �0.03 0.26Extrastriate cortex L 18 �12 �52 �4 16 �0.03 0.27

L 19 �4 �82 30 12 0.02 0.29R 18 10 �62 0 16 �0.04 0.20

Production < perceptionFrontal regionsInferior frontal gyrus/Prefrontal cortex L 45 �52 30 10 13 0.07 �0.14

R 9 34 10 36 18 0.03 �0.15Middle frontal gyrus L 10 �38 46 18 23 0.84 �0.19

R 6 42 24 46 20 0.00 �0.22Dorsolateral prefrontal cortex/ L 46 �40 18 28 10 0.06 �0.08Superior frontal gyrus L 8 �16 28 52 11 0.08 �0.09

R 6 20 24 58 28 �0.02 �0.26R 8 38 18 54 23 0.01 �0.23R 9 32 46 32 13 0.02 �0.15R 10 22 50 26 13 0.05 �0.09

Medial frontal gyrus L 6 �20 6 56 16 0.08 �0.12Temporal regionsSuperior temporal gyrus R 22 54 �54 16 20 0.01 �0.20Middle temporal gyrus L 37 �38 �64 8 17 0.07 �0.12

R 22 52 �48 �4 28 0.00 �0.19R 37 32 �38 �14 15 0.00 �0.14

Parietal regionsAngular gyrus L 39 �34 �74 28 13 0.03 �0.16Supramarginal gyrus L 40 �54 �52 22 21 �0.01 �0.22

R 40 60 �52 22 24 0.09 �0.17Superior parietal lobule/Precuneus L 7 �6 �58 50 10 0.01 �0.16

R 7 32 �66 54 21 0.02 �0.26Limbic systemPosterior cingulate cortex L 31 �12 �42 32 12 0.03 �0.12

R 31 4 �42 38 36 0.06 �0.28

K. Grabski et al. / Journal of Neurolinguistics xxx (2013) 384–408 395

Page 13: Shared and distinct neural correlates of vowel perception and production

Fig. 3. Surface rendering of brain regions activated in the vowel perception and production tasks (perception, production) andshowing overlapping activity between the two tasks (conjunction). All contrasts are computed from the random-effect groupanalysis (p < .05, FDR corrected, cluster extent threshold of 25 voxels, coordinates in MNI space). See Table 2 for details.

K. Grabski et al. / Journal of Neurolinguistics xxx (2013) 384–408396

the dorsolateral prefrontal cortex and superior frontal gyrus and the left medial frontal gyrus.Additional deactivations were found in the angular gyrus, extending in the posterior part of thesuperior temporal gyrus, the middle temporal gyus and the SMG, in the superior parietal lobule andthe posterior cingulate cortex.

3.5. Main effects of height and backness/roundedness phonetic dimensions

The main effect of height in the production task showed enhanced activity in several brain areas formid-open vowels, as compared to close and mid-close vowels (see Fig. 6 and Table 4). Activity dif-ferences were observed in the posterior IFG (pars opercularis), the left anterior insular cortex, thepremotor cortex, bilaterally, the SMA, the SMG and the somatosensory cortex. No brain regions showedsignificant change in activity for roundedness/backness dimension in the production task, or for heightand roundedness/backness dimensions in the perception task.

4. Discussion

Recent neurobiological and psycholinguistic models of speech perception and production argue fora functional connection between sensory and motor systems (Callan et al., 2004; Guenther, 2006;

Page 14: Shared and distinct neural correlates of vowel perception and production

Fig. 4. Surface rendering of brain regions showing significant change in activity between the two tasks (main effect – top:Production > Perception; bottom: Perception > Production). All contrasts are computed from the random-effect group analysis(p < .05, FDR corrected, cluster extent threshold of 25 voxels, coordinates in MNI space). See Table 3 for details.

K. Grabski et al. / Journal of Neurolinguistics xxx (2013) 384–408 397

Guenther & Vladusich, 2012; Hickok & Poeppel, 2000, 2004, 2007; Hickok et al., 2011; Perkell, 2012;Rauschecker, 2011; Rauschecker & Scott, 2009; Scott & Johnsrude, 2003; Schwartz et al., 2012; Skipperet al., 2007; Wilson & Iacoboni, 2006). In speech perception, phonetic interpretation of the acousticspeech signal is thought to be constrained by internal motor simulation based on articulatory proce-dural knowledge. Motor inferential processes are primarily assumed to tackle the problem of variabilityof the acoustic speech signal in relation with coarticulation phenomena by use of articulatory motorprocedural knowledge, and to be particularly necessary under adverse listening conditions. In speechproduction, it is hypothesized that sensorimotor corrective loops help registering possible discrepancybetween the intended and the actual sensory states in order to further control production. Correctivecontrol mechanisms would be strongly recruited in case of altered auditory feedback and/or complex

Table 4Maximum activation peak summary for the main effect of height phonetic dimension in the production task (no brain regionsshowed significant change in activity for roundedness/backness dimension in the production task, or for height and rounded-ness/backness dimensions in the perception task). The contrast is computed from the random-effect group analysis (p< .05, FDRcorrected, cluster extent threshold of 25 voxels, coordinates in MNI space).

Regions H BA Main effect Contrast estimates

x y z F Close Mid-close Mid-open

Frontal regionsInferior frontal gyrus L 44 �52 8 6 11.08 0.51 0.55 1.02Premotor cortex L 6 �44 0 42 13.91 �0.03 0.09 0.38

R 6 28 �8 54 11.83 �0.10 0.05 0.24Supplementary motor area L 6 �2 12 52 22.57 �0.17 0.18 0.70Parietal regionsInferior parietal lobule/Supramarginal gyrus L 40 �32 �56 40 10.54 �0.17 �0.05 0.20Somatensory cortex R 2 �44 �40 46 9.53 �0.09 �0.10 0.25Insular cortexInsula L 13 �30 20 10 12.50 0.04 0.22 0.44

Page 15: Shared and distinct neural correlates of vowel perception and production

Fig. 5. Height phonetic dimension: surface rendering of brain regions activated for close, mid-close and mid-open vowels in theperception and production tasks. All contrasts are computed from the random-effect group analysis (p < .05, FDR corrected, clusterextent threshold of 25 voxels, coordinates in MNI space).

K. Grabski et al. / Journal of Neurolinguistics xxx (2013) 384–408398

articulatory speech sequences but, conversely, would not play a key role during the production ofoverlearned and simple speech units in normal circumstances (Guenther & Vladusich, 2012). Thepresent study was designed to further refine these hypotheses by determining possible sensorimotorinteractions during steady-state vowel perception and production, independently of syllablesequencing and coarticulation mechanisms, and using a sparse acquisition technique in order to limitinfluence of scanner background noise on auditory processing and movement-induced artefacts. Ourresults demonstrate that, in addition to the auditory cortex, a left postero-dorsal stream, including theopercular part of Broca’s area, the adjacent ventral premotor cortex and the temporo-parietal junction,is an influential part of vowel processing and motor control. These brain areas were indeed found bya conjunction analysis to be activated during both vowel perception and production. However, whilespecific analyses on phonetic features further confirmed the involvement of a left postero-dorsalstream, no topographic segregation between vowels was observed. Altogether, these results stronglysuggest that vowel representations are largely distributed over sensorimotor brain areas, and providefurther support for a close relationship between speech perception and production.

Fig. 6. Height phonetic dimension: surface rendering of brain regions showing significant change in activity for the heightdimension in the production task. No brain regions showed significant change for the height dimension in the perception task. Thecontrast is computed from the random-effect group analysis (p < .05, FDR corrected, cluster extent threshold of 25 voxels, co-ordinates in MNI space). See Table 4 for details.

Page 16: Shared and distinct neural correlates of vowel perception and production

Fig. 7. Roundedness/backness phonetic dimension: surface rendering of brain regions activated for unrounded/front, rounded/frontand rounded/back vowels in the perception and production tasks. No brain regions showed significant change for roundedness/backness dimension in both the perception and production tasks. All contrasts are computed from the random-effect group analysis(p < .05, FDR corrected, cluster extent threshold of 25 voxels, coordinates in MNI space).

K. Grabski et al. / Journal of Neurolinguistics xxx (2013) 384–408 399

4.1. Vowel production

Although the neural correlates of speech motor control have been extensively examined (e.g.,Bohland & Guenther, 2006; Brown, Ingham, Ingham, Laird, & Fox, 2005; Brown, Ngan and Liotti, 2008;Chang, Kenney, Loucks, Poletto, & Ludlow, 2009; Guenther, Ghosh, & Tourville, 2006; Murphy et al.,1997; Özdemir et al., 2006; Riecker, Ackermann, Wildgruber, Dogil, & Grodd, 2000; Riecker,Ackermann, Wildgruber, Meyer, et al., 2000; Riecker et al., 2005; Riecker, Brendel, Ziegler, Erb, &Ackermann, 2008; Sörös et al., 2006; Terumitsu, Fujii, Suzuki, Kwee, & Nakada, 2006; Wise, Greene,Büchel, & Scott, 1999), previous studies, however, have mostly focused on syllable production tasksthat involve complex phoneme and/or syllable sequencing as well as coarticulation mechanisms.Several issues also limit the information gained from the few studies investigating the neural struc-tures governing the production of single phonemes, i.e. steady-state vowels (Brown, Ngan and Liotti,2008; Ghosh, Tourville, & Guenther, 2008; Grabski et al., 2012; Özdemir et al., 2006; Sörös et al.,2006; Terumitsu et al., 2006). Indeed, these studies only involved the production of a single vowel(Söros et al., 2006: /a/; Terumitsu et al., 2006: /e/; Brown, Ngan and Liotti, 2008: /Ə/; Grabski et al.,2012: /i/), focused on the primary motor system (Terumitsu et al., 2006), reported results by collaps-ing vowels and monosyllables (Ghosh et al., 2008) and/or, more importantly, used vowel production asa control task to specifically determine neural activity related to the production of syllables or words/sentences (Ghosh et al., 2008; Özdemir et al., 2006; Sörös et al., 2006). By using 9 distinct steady-statevowels, our results allow to further confirm and refine the brain processes underlying vowel pro-duction. A large-scale neural network of cortical and subcortical motor regions was observed (see Fig. 3and Table 2; see also Figure S1 and Table S1 in the supplementarymaterials), including activation in theprimary sensorimotor and premotor cortices – extending ventrally to the pIFG, the rolandic operculumand the left insular cortex and dorsally to the SMA – in the basal ganglia, the red nucleus and thecerebellum. Bilateral activation of the STG/STS, including the primary, secondary and associativeauditory cortices and extending from the temporopolar area anteriorly to the temporo-parietal junc-tion posteriorly, was also observed. Further activity was found in the superior and inferior parietalcortices, the limbic system (anterior cingulate cortex and thalamus) and the visual cortex.

The vowel production network observed is in general agreement with previous studies of speechmotor control and with the recent suggestion of a “minimal network for overt speech production”(Bohland & Guenther, 2006). This neural network comprises distinct cortical and subcortical brainregions involved in motor preparation, execution, auditory/phonological processing and sensorimotor

Page 17: Shared and distinct neural correlates of vowel perception and production

K. Grabski et al. / Journal of Neurolinguistics xxx (2013) 384–408400

regulation loops (for reviews, Bohland & Guenther, 2006; Guenther, 2006; Guenther & Vladusich,2012;Jürgens, 2002; Riecker et al., 2008; Sörös et al., 2006). These brain areas are traditionallyassigned to the initiation and suppression of verbal utterances (SMA and anterior portions of thecingulate cortex), planning of movement sequences (premotor ventrolateral-frontal cortex and leftanterior insula), auditory feedback and phonological processing (STG/STS) and, finally, motor executioninvolving finemuscular coordination of the intendedmovement and innervation of vocal tract muscles(corticobulbar system, basal ganglia, cerebellum, red nucleus). In addition, online regulation of motorcommands during sustained vowel productionmay also involve adaptive timingmechanisms (superiorcerebellum as well as basal ganglia via thalamo-motor projections) and feedback corrective mecha-nisms, in case of discrepancy between the intended and the actual auditory and proprioceptive inputs(STG/STS, primary and associative somatosensory areas and inferior parietal cortex; see below).

Specific analyses on height (close, mid-close and mid-open vowels) and roundedness/backness(unrounded/front, rounded/front and rounded/back vowels) phonetic features further confirmed theinvolvement of these brain areas in vowel production, with similar, almost entirely overlapping brainareas and no topographic segregation observed between features (see Figs. 4 and 6). However, while nobrain regions showed significant change in activity for roundedness/backness dimensions, enhancedactivity was observed in the posterior IFG, the left anterior insular cortex, the premotor cortex, bilat-erally, the SMA, the SMG and the somatosensory cortex for mid-open vowels, as compared to close andmid-close vowels (see Fig. 6 and Table 4). From this later result, it is worthwhile noting that, comparedto other vowels, French mid-open vowels (especially /œ/ and /ɔ/) are rarely produced in isolation, butrather appear in closed syllabic context (Durand, Laks, & Lyche, 2002; Straka, 1981). This lower ‘nat-uralness’ in isolated context might have influenced motor accuracy, a proposal indirectly supported bya larger variability/distribution of F1 values for mid-open compared to closed and mid-closed pre-recorded vowels (see Fig. 1 and Table 1). Interestingly, increased activity was observed not only in brainareas classically devoted to motor initiation and planning (SMA, pIFG and premotor cortex, anteriorinsular cortex) but also in the secondary somatosensory cortex and the inferior parietal lobule. In theDIVA model (Guenther, 2006; Guenther & Vladusich, 2012), somatosensory state and error maps,located in these regions, are hypothesized to compare the somatosensory target associated with theexpected tactile and proprioceptive sensations of the produced speech sound with the actual so-matosensory feedback from the vocal tract. The output of the somatosensory error map then propa-gates to the motor cortex in order to provide corrective motor commands. Consistent with thishypothesis, Golfinopoulos et al. (2011) observed bilateral increased activity in the anterior supra-marginal gyrus when somatosensory feedback was unexpectedly modified during speech productionby means of jaw perturbation. In the present study, the observed increased activity in the somato-sensory cortex and the inferior parietal lobule for mid-open vowels might therefore reflect strongersomatosensory monitoring demands during speech production in order to further control production.Interestingly, no such activity modulation was observed between phonetic features in the auditorycortex, thus suggesting distinct corrective feedback loops for somatosensory and auditory monitoring.It has to be further noted that lower activity was observed in these regions for close and mid-closevowels not only compared to mid-open vowels but also to the resting condition (see Table 4). Thissuggests that activity within these parietal regions is suppressed during self-generated production ofoverlearned vowels. Taken together, these results provide further support for the existence of efferencecopy of motor commands that reduce somatosensory sensation in the inferior parietal lobule and thesomatosensory cortex in case of accurate vowel production vowels but also, in contrast, entailsincreased activity under less natural circumstances and higher monitoring demands (Guenther, 2006;Guenther & Vladusich, 2012; Golfinopoulous et al., 2011; see also Christoffels et al., 2011).

Finally, the comparison between vowel production and perception showed three different hemo-dynamic patterns (see Table 3 and Fig. 4). As expected (see Jardri et al., 2007), a number of brain regionswere only activated in the speech production condition (primary motor and premotor cortices, SMA,and parts of right SMG, insular cortex, basal ganglia, limbic system and visual cortex). Other areaspresented no reactivity for the passive listening condition but were deactivated during vowel pro-duction, with respect to the resting condition (dorsolateral prefrontal cortex and superior frontal gyrus,middle and medial frontal gyrus, pars triangularis of the inferior frontal gyrus, angular gyrus andadjacent supramarginal gyrus, posterior part of the superior temporal and middle temporal gyri,

Page 18: Shared and distinct neural correlates of vowel perception and production

K. Grabski et al. / Journal of Neurolinguistics xxx (2013) 384–408 401

superior parietal lobule and posterior cingulate cortex). These brain areas showing higher levels ofactivity at ‘rest’ are commonly referred to as the ‘default mode’ network (Raichle et al., 2001). It isindeed acknowledged that a resting state entails organized, functional brain activity including verbaland visual imagery and self-awareness (Binder et al., 1999; Gusnard, Akbudak, Shulman, & Raichle,2001; McKiernan, Kaufman, Kucera-Thompson, & Binder, 2003; Mazoyer et al., 2001; Raichle et al.,2001). Albeit debated, ‘Task-induced Deactivations’ (TID) have been proposed to result from reallo-cation of attentional resources, that is deactivation in brain areas needed for the actual task executionoccurs because ‘they require attentional input to stay active; when those attentional resources areneeded for processing other information and are reallocated, these brain regions become deactivated’(McKiernan et al., 2003, 2006). Finally, temporal and parietal regions in the left superior temporal andtranverse gyri as well as in the adjacent right rolandic operculum showed enhanced activity comparedto the baseline in both the production and perception tasks but with stronger activity in the productiontask. This latter result appears at odds with previous fMRI studies showing a decrease of the auditorycortex responses during self-produced overt speech, as compared to when the utterances are tapedand replayed to the subjects (Christoffels et al., 2007, 2011). Although an intensity matching procedurewas performed prior to the experiment in order to match as closely as possible the volume of theauditory stimuli in the perception task and that of the auditory feedback in the production, one likelyexplanation is that this enhanced activity results from auditory perceptual differences due to skullbone conduction and middle ear contraction.

4.2. Vowel perception

Apart from the auditory system, vowel perception induced parietal activations in the SMG, theprecuneus and the right rolandic operculum, and frontal activations in the posterior IFG around thepars opercularis, extending to the adjacent ventral premotor cortex, and with additional activation ofthe pars triangularis in the left hemisphere. Additional frontal activations were also evident in the leftdorsolateral prefrontal cortex, the left superior frontal gyrus and the right middle frontal gyrus. Furtheractivity was observed in the basal ganglia (including the caudate nucleus and the substantia nigra), inlimbic structures (including the thalamus, the cingulate cortex and the hippocampus) and in the rednucleus. Finally, small clusters of striate and extrastriate visual activations were also observed (seeFig. 3 and Table 2; see also Figure S1 and Table S1 in the supplementary materials).

As in the production task, vowel perception induced large bilateral activation of the STG/STS,including the Heschl’s gyrus and extending anteriorly to the temporopolar area, posteriorly to theposterior planum temporale and the temporo-parietal junction, and ventrally to the MTG. These ac-tivations appear in line with previous fMRI studies of vowel perception and, more generally, withrecent neurobiological models of speech perception (e.g., Callan et al., 2004; Guenther, Nieto-Castanon,Ghosh, & Tourville, 2004; Hickok & Poeppel, 2007; Hickok et al., 2011; Obleser et al., 2006;Rauschecker, 2011; Rauschecker & Scott, 2009; Scott & Johnsrude, 2003; Skipper et al., 2007;Uppenkamp et al., 2006; Wilson & Iacoboni, 2006). During auditory speech perception, there is indeedlarge agreement that the acoustic speech signal is processed via parallel streams specialized for ana-lyzing different aspects of the speech signal, with initial, low-level, acoustic processing of speech andnon-speech sounds occurring bilaterally in the dorsal superior temporal gyri (including the Heschl’sgyrus and the planum temporale). The exact localization of brain areas specialized in phonetic pro-cessing and categorization remains however controversial (for a recent review, see Turkeltaub &Coslett, 2010). Some researchers propose that complex acoustic signals, not necessarily specific tospeech, are first processed in the lateral middle superior temporal gyrus/sulcus, with the phoneticrepresentations of speech sounds ultimately instantiated in the anterior part of the STG/STS (e.g.,Obleser et al., 2006; Rauschecker, 2011; Rauschecker & Scott, 2009; Scott & Johnsrude, 2003). On thecontrary, other researchers argue that phonetic categories are instantiated posteriorly, in the posteriorpart of the middle temporal gyrus and STS (Hickok & Poeppel, 2007; see also Belin & Zatorre, 2000).From these models, recent studies also suggest different brain localization for phonetic processingduring vowel perception. For instance, Obleser et al. (2006) observed that listening to vowel sequenceselicited more activation than did non-speech noise in the anterior superior temporal cortex as well asa topographic distinction between front and back phonetic features (albeit using fixed-effect analyses).

Page 19: Shared and distinct neural correlates of vowel perception and production

K. Grabski et al. / Journal of Neurolinguistics xxx (2013) 384–408402

Guenther et al. (2004) rather observed increased activity in the anterior and posterior middle temporalgyri during the auditory perception of prototypical examples of a vowel category, compared to soundsthat were not clear members of a vowel category. Finally, by contrasting vowel perception with non-speech auditory stimuli, Uppenkamp et al. (2006) observed specific bilateral region of activationlocated in the STS, midway along the temporal lobe in the anteroposterior direction and inferior toHeschl’s gyrus. While all the above-mentioned regions were found to be activated in the present study,our results do not allow arguing in favor of a particular hypothesis since non-speech control stimuliwere not included. However, they nevertheless do not contradict the notion of an auditory anterior‘what’ (Scott & Johnsrude, 2003) or a posterior ‘ventral’ (Hickok & Poeppel, 2007) stream involved inacoustic-phonetic decoding during vowel perception.

Further activations were observed in the posterior IFG around the pars opercularis, extending to theadjacent ventral premotor cortex, with additional activation of the pars triangularis in the left hemi-sphere, and in the temporo-parietal junction (including the posterior part of the STG/STS and theventral part of the supramarginal gyrus). These activations appears in line with recent neurobiologicalmodels of speech perception (Callan et al., 2004; Hickok & Poeppel, 2000, 2004, 2007; Rauschecker,2011; Rauschecker & Scott, 2009; Scott & Johnsrude, 2003; Skipper et al., 2007; Wilson & Iacoboni,2006) and the existence of a left posterior ‘how’ or ‘dorsal’ processing stream, linking auditoryspeech percepts with articulatory representations in the posterior part of the inferior frontal gyrus andthe ventral premotor cortex via the inferior parietal lobule (Rauschecker, 2011; Rauschecker & Scott,2009) or via area ‘SPT’ (a region at the junction of the parieto-temporal boundary; Hickok &Poeppel, 2007). These models postulate that, by the means of successive sensory-to-motor andmotor-to-sensory projections, the analysis of incoming speech inputs is mediated by a simulationsignal from the speech motor system (i.e., ”efference copy”) that predicts the sensory consequences ofthe activated articulatory motor commands, thus constraining the phonetic interpretation of thesensory inputs. For example, Skipper et al. (2007) propose that sensory inputs of speech (auditory and/or visual) induce early speech representations in the left posterior superior temporal gyrus (STG) thatcan be understood as mere hypotheses of the final perception. For the final interpretation, theserepresentations would first be mapped onto the motor control commands of speech production,representing the goal of the articulatory movement and localized in the left pIFG. The activation of themotor control commands would be further mapped to the vPMC and primary motor cortices onto themotor commands that could be used to reach that goal. The activated motor commands are thenassumed to generate predictions of both the acoustic and somatosensory consequences of the real-ization of those commands through an efferent copy. In the model, these predictions would be sentback to the left pSTG, where they would be compared with the initial perception hypotheses, whichwould ultimately result in the strongest hypothesis to form the final interpretation. Similarly,Rauschecker and Scott (2009; see also Rauschecker, 2011) propose that, starting out from the Heschl’sgyrus, speech perception involves acoustic-phonetic decoding in an anterior ventral stream all the wayto the anterior temporal cortex and to category-invariant inferior frontal regions, with further motormapping to the vPMC onto articulatory representation, and finally to the inferior parietal lobule andpSTG/STS by means of efference copy. In reverse direction, a dorsal stream pivots around the inferiorparietal lobule, as a sensorimotor integration region, where comparisons between the predictiveefference copy from the vPMC and sensory motor information takes place and help the disambiguationof phonological information.

Finally, specific analyses on phonetic features failed to reveal any topographic segregation or ac-tivity differences between features (see Figs. 4 and 6). This null finding appears coherent with a recentfMRI study (Obleser et al., 2010) in which univariate subtraction-based analyses did not provide evi-dence for activation differences between phonetic categories (but see Obleser et al., 2006 using a fixed-effect analysis). In contrast, by using multivariate statistical pattern recognition analyses, Obleser et al.(2010) demonstrated in the same study above-chance classification for vowel versus consonant cat-egories in the entire superior temporal cortex, with regional differences in classification accuracy forthe two vowel and consonant categories. This latter result appears in line with a seminal study byFormisano, et al. (2008) also using multivariate analyses and demonstrating discriminative maps fora set of isolated vowel stimuli in temporal areas ranging from lateral Heschl’s gyri anterior and pos-terior, down into the superior temporal sulcus. Capitalizing on these findings, our results further

Page 20: Shared and distinct neural correlates of vowel perception and production

K. Grabski et al. / Journal of Neurolinguistics xxx (2013) 384–408 403

underline the current limits of localization methods based on univariate subtraction-based analysesand indirectly argue for largely distributed vowel representations over a wide array of sensorimotorbrain areas (Obleser et al., 2010).

4.3. Concluding remarks on sensorimotor interactions

As confirmed by the conjunction analysis, common sensorimotor activation during vowel percep-tion and production were observed in the STG/STS, extending dorsally to the left temporo-parietaljunction, and in the opercular part of the left inferior frontal gyrus and the adjacent ventral pre-motor cortex. Altogether, these results argue for a functional connection between sensory and motorsystems during both vowel perception and production. It is not without interest to note that this ac-tivity has been observed in conditions that could be conceived as extremely favorable for the per-ceptual system: that is, with “simple” units not much contaminated with coarticulation mechanisms;with quasi stationary stimuli, produced by the speakers themselves and hence not imposing the needfor normalization procedures for their categorization; and perceived in silence thanks to the sparse-sampling paradigm selected in this study. Therefore, it can be suggested that the involvement of themotor system in speech perception in this work can be regarded as evidence that such involvement isquasi systematic in speech perception.

However, while our results clearly demonstrate the recruitment of the motor system during passivevowel perception, they are intrinsically correlational and cannot be used to address causality. Actually,both electrocortical stimulation studies during neurosurgical operations, TMS and clinical data fromfrontal aphasic patients are largely inconclusive regarding a possible functional role of the motorsystem in speech processing under normal listening conditions (for a review, see Sato et al., 2009).Although recent TMS studies showed that temporarily disrupting the activity of the motor systemdisrupts subjects’ ability to perform syllable categorization in case of acoustically ambiguous or noisysyllables (d’Ausilio et al., 2009; Meister et al., 2007; Möttönen &Watkins, 2009), no interference effectswas observed in syllable discrimination tasks with syllables presented without noise (d’Ausilio,Bufalari, Salmas, & Fadiga, 2011; Sato et al., 2009; but see Sato et al., 2011 for postperceptual bias ef-fects). Furthermore, it is also worthwhile noting that despite common activation in the left pIFG andthe adjacent ventral premotor cortex, stronger activation was observed in these regions only duringvowel production of mid-open vowels but not during vowel perception. In a recent fMRI study,Tremblay and Small (2011) also observed activation in the left ventral premotor cortex during speechperception and production, but activity modulation due to articulatory complexity only during theproduction task, a result that do not confirm a completely ‘specified efferent motor signal duringspeech perception’ (Tremblay & Small, 2011).

From these results and given the fact that motor activity was observed in vowel perception inde-pendently of any syllable sequencing and coarticulation mechanisms and limited influence of scannerbackground noise, alternative functions of the motor system, apart from phonetic disambiguation,have to be discussed. As previously noted, even for critics of a causal role of the motor system inphonetic processing, it is not a matter of debate that speech perception and production systemsinteract in some ways (Diehl et al., 2004; Hickok et al., 2011; Lotto et al., 2009). Notably, correlatedneural activity in auditory and motor systems can be seen as unsurprising, since coactivation ofauditory and motor regions during self-produced speech sounds might lead to formation of specificarticulatory–auditory links by means of associative learning and Hebbian principles (Lotto et al., 2009;Pulvermüller et al., 2006), notably during speech acquisition in infancy. From this view, the observedsensorimotor dorsal stream in speech perceptionwould not be strictly involved in phonetic processingbut rather the consequence of previous associative learning processes. Another possibility is that motoractivity observed during vowel perception might represent a specific case of automatic sensorimotorconvergence (Sato et al., 2009; Scott et al., 2009). In adults, convergence effects have indeed beenshown to be systematic during conservational interactions and to manifest under many differentforms, which include posture (Shockley, Santana, & Fowler, 2003), head movements and facial ex-pressions (Estow, Jamieson, & Yates, 2007), respiratory movements (McFarland, 2001), vocal intensity(Natale, 1975) and rate of speech (Giles, Coupland, & Coupland, 1991). At the phonetic level, previousstudies have also highlighted a strong tendency by a speaker to ‘imitate’ a number of phonetic

Page 21: Shared and distinct neural correlates of vowel perception and production

K. Grabski et al. / Journal of Neurolinguistics xxx (2013) 384–408404

characteristics in another speaker’s speech, the so-called phonetic convergence effect. While someprevious studies on phonetic convergence involved natural settings and conversational interactions(e.g., Aubanel & Nguyen, 2010; Pardo, 2006; Sancier & Fowler, 1997), this effect also appears in labo-ratory tasks during the exposure of auditory and/or visual speech stimuli (e.g., Delvaux & Soquet, 2007;Gentilucci & Bernardis, 2007; Gentilucci & Cattaneo, 2005; Kappes, Baumgaertner, Peschke, & Ziegler,2009; Kerzel & Bekkering, 2000). In that respect, although highly speculative, motor activity observedin vowel perception might represent automaticsensorimotor adaptive processes of the perceivedstimuli and therefore might indirectly argue for a central role ‘for motor representations and processesin conversation, an essential aspect of human language in its most basic use’ (Scott et al., 2009).

Acknowledgments

This study was supported by research grants from CNRS (Centre National de la Recherche Scien-tifique) and ANR (Agence Nationale de la Recherche, ANR SPIM “Imitation in Speech”) to M.S. and fromGrenoble-INP (BQR ‘Modyc: Modélisation dynamique de l’activité cérébrale’) to M.S. and K.G. Wewould like to thank all participants of the study. Any opinions, findings, and conclusions or recom-mendations expressed in this material are those of the authors and do not necessarily reflect the viewsof the funding agencies.

Appendix A. Supplementary dataSupplementary data related to this article can be found at http://dx.doi.org/10.1016/j.jneuroling.

2012.11.003.

References

Aboitiz, F., & Garcia, V. (1997). The evolutionary origin of the language areas in the human brain. A neuroanatomical perspective.Brain Research Reviews, 25, 381–396.

Arbib, M. A. (2005). From monkey-like action recognition to human language: an evolutionary framework for neurolinguistics.Behavioral and Brain Sciences, 28(2), 105–124.

Aubanel, V., & Nguyen, N. (2010). Automatic recognition of regional phonological variation in conversational interaction. SpeechCommunication, 52, 577–586.

d’Ausilio, A., Bufalari, I., Salmas, P., & Fadiga, L. (2011). The role of the motor system in discriminating degraded speech sounds.Cortex, 48(7), 882–887.

d’Ausilio, A., Craighero, L., & Fadiga, L. (2012). The contribution of the frontal lobe to the perception of speech and language.Journal of Neurolinguistics, 25(5), 328–335.

d’Ausilio, A., Pulvermüller, F., Salmas, P., Bufalari, I., Begliomini, C., & Fadiga, L. (2009). The motor somatotopy of speech per-ception. Current Biology, 19(5), 381–385.

Belin, P., & Zatorre, R. J. (2000). ‘What’, ‘where’ and ‘how’ in auditory cortex. Nature Neuroscience, 3, 965–966.Binder, J. R., Frost, J. A., Hammeke, T. A., Bellgowan, P. S. F., Rao, S. M., & Cox, R. W. (1999). Conceptual processing during the

conscious resting state: a functional MRI study. Journal of Cognitive Neuroscience, 11, 80–93.Binder, J. R., Liebenthal, E., Possing, E. T., Medler, D. A., & DouglasWard, B. (2004). Neural correlates of sensory and decision

processes in auditory object identification. Nature Neuroscience, 7, 295–301.Birn, R., Bandettini, P., Cox, R., & Shaker, R. (1999). Event-related fMRI of tasks involving brief motion. Human Brain Mapping,

7(2), 106–114.Bohland, J. W., & Guenther, F. H. (2006). An fMRI investigation of syllable sequence production. NeuroImage, 32(2), 821–841.Brown, S., Ingham, R. J., Ingham, J. C., Laird, A. R., & Fox, P. T. (2005). Stuttered and fluent speech production: an ALE meta-

analysis of functional neuroimaging studies. Human Brain Mapping, 25(1), 105–117.Brown, S., Ngan, E., & Liotti, M. (2008). A larynx area in the human motor cortex. Cerebral Cortex, 18, 837–845.Callan, D., Callan, A., Gamez, M., Sato, M. A., & Kawato, M. (2010). Premotor cortex mediates perceptual performance. Neuro-

image, 51(2), 844–858.Callan, D. E., Jones, J. A., Callan, A. M., & Akahane-Yamada, R. (2004). Phonetic perceptual identification by native- and second-

language speakers differentially activates brain regions involved with acoustic phonetic processing and those involved witharticulatory-auditory/orosensory internal models. NeuroImage, 22, 1182–1194.

Callan, D. E., Jones, J. A., Munhall, K. G., Callan, A. M., Kroos, C., & Vatikiotis-Bateson, E. (2003). Neural processes underlyingperceptual enhancement by visual speech gestures. Neuroreport, 14, 2213–2217.

Calvert, G. A., & Campbell, R. (2003). Reading speech from still and moving faces: the neural substrates of visible speech. Journalof Cognitive Neuroscience, 15(1), 57–70.

Chang, S. E., Kenney, M. K., Loucks, T. M., Poletto, C. J., & Ludlow, C. L. (2009). Common neural substrates support speech andnonspeech vocal tract gestures. Neuroimage, 47, 314–325.

Christoffels, I. K., Formisano, E., & Schiller, N. O. (2007). Neural correlates of verbal feedback processing: an fRI study employingovert speech. Human Brain Mapping, 28(9), 868–879.

Christoffels, I. K., van de Ven, V., Waldorp, L. J., Formisano, E., & Schiller, N. O. (2011). The sensory consequences of speaking:parametric neural cancellation during speech in auditory cortex. Plos One, 6(5), e18307.

Page 22: Shared and distinct neural correlates of vowel perception and production

K. Grabski et al. / Journal of Neurolinguistics xxx (2013) 384–408 405

Curio, G., Neuloh, G., Numminen, J., Jousmaki, V., & Hari, R. (2000). Speaking modifies voice-evoked activity in the humanauditory cortex. Human Brain Mapping, 9, 183–191.

Delvaux, V., & Soquet, A. (2007). The influence of ambient speech on adult speech productions through unintentional imitation.Phonetica, 64, 145–173.

Di Pellegrino, G., Fadiga, L., Fogassi, L., Gallese, V., & Rizzolatti, G. (1992). Understanding motor events: a neurophysiologicalstudy. Experimental Brain Research, 91, 176–180.

Diehl, R. L., Lotto, A. J., & Holt, L. L. (2004). Speech perception. Annual Review of Psychology, 55, 149–179.Durand, J., Laks, B., & Lyche, C. (2002). La phonologie du français contemporain: usages, variétés et structure. In C. Pusch, & W.

Raible (Eds.), Romanistische Korpuslinguistik – Korpora und gesprochene Sprache/Romance Corpus Linguistics – Corpora andSpoken Language (pp. 93–106). Tübingen: Gunter Narr Verlag.

Eickhoff, S. B., Stephan, K. E., Mohlberg, H., Grefkes, C., Fink, G. R., Amunts, K., et al. (2005). A new SPM toolbox for combiningprobabilistic cytoarchitectonic maps and functional imaging data. NeuroImage, 25, 1325–1335.

Estow, S., Jamieson, J. P., & Yates, J. R. (2007). Self-monitoring and mimicry of positive and negative social behaviors. Journal ofResearch in Personality, 41, 425–433.

Fadiga, L., Craighero, L., Buccino, G., & Rizzolatti, G. (2002). Speech listening specifically modulates the excitability of tonguemuscles: a TMS study. European Journal of Neuroscience, 15, 399–402.

Ferrari, P. F., Gallese, V., Rizzolatti, G., & Fogassi, L. (2003). Mirror neurons responding to the observation of ingestive andcommunicative mouth actions in the monkey ventral premotor cortex. European Journal of Neuroscience, 17, 1703–1714.

Fogassi, L., Ferrari, P. F., Gesierich, B., Rozzi, S., Chersi, F., & Rizzolatti, G. (2005). Parietal lobe: from action organization tointention understanding. Science, 308, 662–667.

Formisano, E., De Martino, F., Bonte, M., & Goebel, R. (2008). “Who” is saying “what”? Brain-based decoding of human voice andspeech. Science, 322, 970–973.

Fowler, C. (1986). An event approach to the study of speech perception from a direct-realist perspective. Journal of Phonetics, 14,3–28.

Friston, K. J., Holmes, A. P., Poline, J. B., Grasby, P. J., Williams, S. C., Frackowiak, R. S., et al. (1995). Analysis of fMRI time-seriesrevisited. Neuroimage, 2(1), 45–53.

Friston, K. J., Holmes, A. P., Price, C. J., Buchel, C., & Worsley, K. J. (1999). Multisubject fMRI studies and conjunction analyses.NeuroImage, 10, 385–396.

Friston, K. J., Holmes, A. P., & Worsley, K. J. (1999). How many subjects constitute a study? NeuroImage, 10, 1–5.Galantucci, B., Fowler, C. A., & Turvey, M. T. (2006). The motor theory of speech perception reviewed. Psychonomic Bulletin &

Review, 13(3), 361–377.Gallese, V., Fadiga, L., Fogassi, L., & Rizzolatti, G. (1996). Action recognition in the premotor cortex. Brain, 119, 593–609.Genovese, C. R., Lazar, N. A., & Nichols, T. (2002). Thresholding of statistical maps in functional neuroimaging using the false

discovery rate. Neuroimage, 15(4), 870–878.Gentilucci, M., & Bernardis, P. (2007). Automatic audiovisual integration in speech perception. Neuropsychologia, 45, 608–615.Gentilucci, M., & Cattaneo, L. (2005). Automatic audiovisual integration in speech perception. Experimental Brain Research, 167,

66–75.Gentilucci, M., & Corballis, M. C. (2006). From manual gesture to speech: a gradual transition. Neuroscience and Biobehavioral

Reviews, 30, 949–960.Ghosh, S. S., Tourville, J. A., & Guenther, F. H. (2008). A neuroimaging study of premotor lateralization and cerebellar involvement in

the production of phonemes and syllables. Journal of Speech, Language, and Hearing Research, 51(5), 1183–1202.Giles, H., Coupland, N., & Coupland, J. (1991). Accommodation theory: communication, context, and consequence. In H. Giles, N.

Coupland, & J. Coupland (Eds.), Contexts of accommodation: Developments in applied sociolinguistics (pp. 1–68). Cambridge,UK: Cambridge University Press.

Golfinopoulos, E., Tourville, J. A., Bohland, J. W., Ghosh, S. S., Nieto-Castanon, A., & Guenther, F. H. (2011). fMRI investigation ofunexpected somatosensory feedback perturbation during speech. NeuroImage, 55(3), 1324–1338.

Grabski, K., Lamalle, L., Vilain, C., Schwartz, J.-L., Vallée, N., Troprès, I., et al. (2012). Functional MRI assessment of orofacialarticulators: neural correlates of lip, jaw, laryngeal and tongue movements. Human Brain Mapping, 33(10), 2306–2321.

Gracco, V. L., Tremblay, P., & Pike, G. B. (2005). Imaging speech production using fMRI. Neuroimage, 26, 294–301.Guenther, F. H. (2006). Cortical interactions underlying the production of speech sounds. Journal of Communication Disorders, 39,

350–365.Guenther, F. H., Ghosh, S. S., & Tourville, J. A. (2006). Neural modeling and imaging of the cortical interactions underlying

syllable production. Brain and Language, 96(3), 280–301.Guenther, F. H., Nieto-Castanon, A., Ghosh, S. S., & Tourville, J. A. (2004). Representation of sound categories in auditory cortical

maps. Journal of Speech, Language, and Hearing Research, 47, 46–57.Guenther, F. H., & Vladusich, T. (2012). A neural theory of speech acquisition and production. Journal of Neurolinguistics, 25(5),

408–422.Gusnard, D. A., Akbudak, E., Shulman, G. L., & Raichle, M. E. (2001). Medial prefrontal cortex and selfreferential mental activity:

relation to a default mode of brain function. Proceedings of the National Academy of Sciences of the United States of America,98, 4259–4264.

Hall, D. A., Haggard, M. P., Akeroyd, M. A., Palmer, A. R., Summerfield, A. Q., Elliott, M. R., et al. (1999). “Sparse” temporalsampling in auditory fMRI. Human Brain Mapping, 7(3), 213–223.

Hashimoto, Y., & Sakai, K. L. (2003). Brain activations during conscious self-monitoring of speech production with delayedauditory feedback: an fMRI study. Human Brain Mapping, 20, 22–28.

Heinks-Maldonado, T. H., Nagarajan, S. S., & Houde, J. F. (2006). Magnetoencephalographic evidence for a precise forward modelin speech production. NeuroReport, 17(13), 1375–1379.

Hickok, G., Houde, J., & Rong, F. (2011). Sensorimotor integration in speech processing: computational basis and neural orga-nization. Neuron, 69(3), 407–422.

Hickok, G., & Poeppel, D. (2000). Towards a functional neuroanatomy of speech perception. Trends in Cognitive Science, 4(4),131–138.

Page 23: Shared and distinct neural correlates of vowel perception and production

K. Grabski et al. / Journal of Neurolinguistics xxx (2013) 384–408406

Hickok, G., & Poeppel, D. (2004). Dorsal and ventral streams: a framework for understanding aspects of the functional anatomyof language. Cognition, 92, 67–99.

Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature Reviews Neuroscience, 8, 393–402.Houde, J. F., Nagarajan, S. S., Sekihara, K., & Merzenich, M. M. (2002). Modulation of the auditory cortex during speech: an MEG

study. Journal of Cognitive Neuroscience, 14, 1125–1138.Jardri, R., Pins, D., Bubrovszky, M., Despretz, P., Pruvo, J.-P., Steinling, M., et al. (2007). Self awareness and speech processing: an

fMRI study. NeuroImage, 35, 1645–1653.Jürgens, U. (2002). Neural pathways underlying vocal control. Neuroscience & Biobehavioral Reviews, 26(2), 235–258.Kappes, J., Baumgaertner, A., Peschke, C., & Ziegler, W. (2009). Unintended imitation in nonword repetition. Brain & Language,

111, 140–151.Kerzel, D., & Bekkering, H. (2000). Motor activation from visible speech: evidence from stimulus response compatibility. Journal

of Experimental Psychology: Human Perception and Performance, 26, 634–647.Keysers, C., Kohler, E., Umilta, M. A., Fogassi, L., Gallese, V., & Rizzolatti, G. (2003). Audiovisual mirror neurons and action

recognition. Experimental Brain Research, 153, 628–636.Kohler, E., Keysers, C., Umilta, M. A., Fogassi, L., Gallese, V., & Rizzolatti, G. (2002). Hearing sounds, understanding actions: action

representation in mirror neurons. Science, 297, 846–848.Ladefoged, P. (2006). A course in phonetics (5th ed.). Boston, MA: Thomson Wadsworth.Lancaster, J. L., Woldorff, M. G., Parsons, L. M., Liotti, M., Freitas, C. S., Rainey, L., et al. (2000). Automated Talairach atlas labels for

functional brain mapping. Human Brain Mapping, 10(3), 120–131.Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967). Perception of the speech code. Psychological

Review, 74, 431–461.Liberman, A. M., & Mattingly, I. G. (1985). The motor theory of speech perception revised. Cognition, 21, 1–36.Liberman, A. M., & Whalen, D. H. (2000). On the relation of speech to language. Trends in Cognitive Science, 3(7), 254–264.Lotto, A., Hickok, G., & Holt, L. (2009). Reflections onmirror neurons and speech perception. Trends in Cognitive Sciences, 13, 110–114.McFarland, D. H. (2001). Respiratory markers of conversational interaction. Journal of Speech, Language, and Hearing Research,

44(1), 128–143.McKiernan, K. A., D’Angelo, B. R., Kaufman, J. N., & Binder, J. R. (2006). Interrupting the “stream of consciousness”: an fMRI

investigation. Neuroimage, 29, 1185–1191.McKiernan, K. A., Kaufman, J. N., Kucera-Thompson, J., & Binder, J. R. (2003). A parametric manipulation of factors affecting task-

induced deactivation in functional neuroimaging. Journal of Cognitive Neuroscience, 15(3), 394–408.Mazoyer, B., Zago, L., Mellet, E., Bricogne, S., Etard, O., Houde, O., et al. (2001). Cortical networks for working memory and

executive functions sustain the conscious resting state in man. Brain Research Bulletin, 54, 287–298.Meister, I. G., Wilson, S. M., Deblieck, C., Wu, A. D., & Iacoboni, M. (2007). The essential role of premotor cortex in speech

perception. Current Biology, 17(19), 1692–1696.Ménard, L., Schwartz, J. L., & Aubin, J. (2008). Invariance and variability in the production of the height feature in French vowels.

Speech Communication, 50, 14–28.Möttönen, R., Järveläinen, J., Sams, M., & Hari, R. (2004). Viewing speech modulates activity in the left SI mouth cortex. Neu-

roImage, 24, 731–737.Möttönen, R., & Watkins, K. E. (2009). Motor representations of articulators contribute to categorical perception of speech

sounds. The Journal of Neuroscience, 29(31), 9819–9825.Murphy, K., Corfield, D. R., Guz, A., Fink, G. R., Wise, R. J., Harrison, J., et al. (1997). Cerebral areas associated with motor control of

speech in humans. Journal of Applied Physiology, 83(5), 1438–1447.Natale, M. (1975). Convergence of mean vocal intensity in dyadic communication as a function of social desirability. Journal of

Personality and Social Psychology, 32, 790–804.Nichols, T. E., Brett, M., Andersson, J., Wager, T., & Poline, J. B. (2005). Valid conjunction inference with the minimum statistic.

NeuroImage, 25, 653–660.Nishitani, N., & Hari, R. (2002). Viewing lip forms: cortical dynamics. Neuron, 36, 1211–1220.Numminen, J., & Curio, G. (1999). Differential effects of overt, covert and replayed speech on vowel-evoked responses of the

human auditory cortex. Neuroscience Letters, 272, 29–32.Obleser, J., Boecker, H., Drzezga, A., Haslinger, B., Hennenlotter, A., Roettinger, M., et al. (2006). Vowel sound extraction in

anterior superior temporal cortex. Human Brain Mapping, 27(7), 562–571.Obleser, J., Leaver, A., Vanmeter, J., & Rauschecker, J. P. (2010). Segregation of vowels and consonants in human auditory cortex:

evidence for distributed hierarchical organization. Frontiers in Psychology, 1, 232.Obleser, J., Wise, R. J. S., Dresner, M. A., & Scott, S. K. (2007). Functional integration across brain regions improves speech

perception under adverse listening conditions. Journal of Neuroscience, 27(9), 2283–2289.Ojanen, V., Möttönen, R., Pekkola, J., Jääskeläinen, I. P., Joensuu, R., Autti, T., et al. (2005). Processing of audiovisual speech in

Broca’s area. NeuroImage, 25, 333–338.Okada, K., & Hickok, G. (2006). Left posterior auditory-related cortices participate both in speech perception and speech

production: neural overlap revealed by fMRI. Brain and Language, 98(1), 112–117.Oldfield, R. C. (1971). The assessment and analysis of handedness: the Edinburgh inventory. Neuropsychologia, 9, 97–114.Özdemir, E., Norton, A., & Schlaug, G. (2006). Shared and distinct neural correlates of singing and speaking. NeuroImage, 33,

628–635.Pardo, J. (2006). On phonetic convergence during conversational interaction. Journal of the Acoustical Society of America, 119(4),

2382–2393.Paulesu, E., Perani, D., Blasi, V., Silani, G., Borghese, N. A., De Giovanni, U., et al. (2003). A functional-anatomical model for

lipreading. Journal of Neurophysiology, 90(3), 2005–2013.Paus, T., Perry, D., Zatorre, R., Worsley, K. J., & Evans, A. C. (1996). Modulation of cerebral blood flow in the human auditory

cortex during speech: role of motor-to-sensory discharges. European Journal of Neuroscience, 8, 2236–2246.Pekkola, J., Laasonen, M., Ojanen, V., Autti, T., Jaaskelainen, L. P., Kujala, T., et al. (2006). Perception of matching and conflicting

audiovisual speech in dyslexic and fluent readers: an fMRI study at 3T. NeuroImage, 29(3), 797–807.

Page 24: Shared and distinct neural correlates of vowel perception and production

K. Grabski et al. / Journal of Neurolinguistics xxx (2013) 384–408 407

Perkell, J. S. (2012). Movement goals and feedback and feed forward control mechanisms in speech production. Journal ofNeurolinguistics, 25(5), 382–407.

Perkell, J. S., & Klatt, D. H. (1986). Invariance and variability in speech processes. Hillsdale N.J, New Jersey: L. Erlbaum.Pulvermüller, F., Huss, M., Kherif, F., Moscoso del Prado Martin, F., Hauk, O., & Shtyrov, Y. (2006). Motor cortex maps articulatory

features of speech sounds. Proceedings of the National Academy of Sciences of the United States of America, 103(20), 7865–7870.Raichle, M. E., McLeod, A. M., Snyder, A. Z., Powers, W. J., Gusnard, D. A., & Shulman, G. L. (2001). A default mode of brain

function. Proceedings of the National Academy of Sciences USA, 98, 676–682.Rauschecker, J. P. (2011). An expanded role for the dorsal auditory pathway in sensorimotor control and integration. Hearing

Research, 271, 16–25.Rauschecker, J. P., & Scott, S. K. (2009). Maps and streams in the auditory cortex: nonhuman primates illuminate human speech

processing. Nature Neuroscience, 12(6), 718–724.Riecker, A., Ackermann, H., Wildgruber, D., Dogil, G., & Grodd, W. (2000). Opposite hemispheric lateralization effects during

speaking and singing at motor cortex, insula and cerebellum. Neuroreport, 11(9), 1997–2000.Riecker, A., Ackermann,H.,Wildgruber,D.,Meyer, J., Dogil, G., Haider, H., et al. (2000). Articulatory/phonetic sequencing at the level

of the anterior perisylvian cortex: a functionalmagnetic resonance imaging (fMRI) study. Brain and Language, 75(2), 259–276.Riecker, A., Brendel, B., Ziegler, W., Erb, M., & Ackermann, H. (2008). The influence of syllable onset complexity and syllable

frequency on speech motor control. Brain and Language, 107(2), 102–113.Riecker, A., Mathiak, K., Wildgruber, D., Erb, M., Hertrich, I., Grodd, W., et al. (2005). fMRI reveals two distinct cerebral networks

subserving speech motor control. Neurology, 64(4), 700–706.Rizzolatti, G., & Arbib, M. A. (1998). Language within our grasp. Trends in Neurosciences, 21, 188–194.Rizzolatti, G., & Craighero, L. (2004). The mirror-neuron system. Annual Review of Neuroscience, 27, 169–192.Rizzolatti, G., Fadiga, L., Gallese, V., & Fogassi, L. (1996). Premotor cortex and the recognition of motor actions. Cognitive Brain

Research, 3, 131–141.Rizzolatti, G., Fogassi, L., & Gallese, V. (2001). Neurophysiological mechanisms underlying the understanding and imitation of

action. Nature Review Neuroscience, 2, 661–670.Roy, A. C., Craighero, L., Fabbri-Destro, M., & Fadiga, L. (2008). Phonological and lexical motor facilitation during speech lis-

tening: a transcranial magnetic stimulation study. Journal of Physiology – Paris, 102(1–3), 101–105.Sancier, M., & Fowler, C. A. (1997). Gestural drift in a bilingual speaker of Brazilian Portuguese and English. Journal of Phonetics,

25, 421–436.Sato, M., Buccino, G., Gentilucci, M., & Cattaneo, L. (2010). On the tip of the tongue: modulation of the primary motor cortex

during audiovisual speech perception. Speech Communication, 52(6), 533–541.Sato, M., Grabski, K., Glenberg, A., Brisebois, A., Basirat, A., Ménard, L., et al. (2011). Articulatory bias in speech categorization:

evidence from use-induced motor plasticity. Cortex, 47(8), 1001–1003.Sato, M., Tremblay, P., & Gracco, V. (2009). A mediating role of the premotor cortex in phoneme segmentation. Brain and

Language, 111(1), 1–7.Schwartz, J.-L., Abry, C., Boë, L.-J., & Cathiard, M. A. (2002). Phonology in a theory of perception-for-action-control. In J. Durand,

& B. Lacks (Eds.), Phonology: From phonetics to cognition (pp. 240–280). Oxford: Oxford University Press.Schwartz, J. L., Boë, L. J., & Abry, C. (2007). Linking the Dispersion-Focalization Theory (DFT) and the Maximum Utilization of the

Available Distinctive Features (MUAF) principle in a Perception-for-Action-Control Theory (PACT). In M. J. Solé, P. Beddor, &M. Ohala (Eds.), Experimental approaches to phonology (pp. 104–124). Oxford University Press.

Schwartz, J. L., Boë, L. J., Vallée, N., & Abry, C. (1997a). Major trends in vowel system inventories. Journal of Phonetics, 25, 233–254.Schwartz, J. L., Boë, L. J., Vallée, N., & Abry, C. (1997b). The dispersion-focalization theory of vowel systems. Journal of Phonetics,

25, 255–286.Schwartz, J.-L., Ménard, L., Basirat, A., & Sato, M. (2012). The Perception for Action Control Theory (PACT): a perceptuo-motor

theory of speech perception. Journal of Neurolinguistics, 25(5), 336–354.Schwartz, J.-L., Sato, M., & Fadiga, L. (2008). The common language of speech perception and action: a neurocognitive per-

spective. Revue Française de Linguistique Appliquée, 13(2), 9–22.Scott, S. K., & Johnsrude, I. S. (2003). The neuroanatomical and functional organization of speech perception. Trends in Neu-

rosciences, 26(2), 100–107.Scott, S. K., McGettigan, C., & Eisner, F. (2009). A little more conversation, a little less action–candidate roles for the motor cortex

in speech perception. Nature Reviews Neuroscience, 10(4), 295–302.Shockley, K., Santana, M.-V., & Fowler, C. A. (2003). Mutual interpersonal postural constraints are involved in cooperative

conversation. Journal of Experimental Psychology: Human Perception and Performance, 29, 326–332.Skipper, J. I., Nusbaum, H. C., & Small, S. L. (2005). Listening to talking faces: motor cortical activation during speech perception.

NeuroImage, 25, 76–89.Skipper, J. I., Van Wassenhove, V., Nusbaum, H. C., & Small, S. L. (2007). Hearing lips and seeing voices: how cortical areas

supporting speech production mediate audiovisual speech perception. Cerebral Cortex, 17(10), 2387–2399.Sörös, P., Sokoloff, L. G., Bose, A., McIntosh, A. R., Graham, S. J., & Stuss, D. T. (2006). Clustered functional MRI of overt speech

production. Neuroimage, 32(1), 376–387.Straka, G. (1981). Sur la formation de la prononciation française d’aujourd’hui. Travaux de linguistique et de littérature, 19(1),

161–248.Talairach, J., & Tournoux, P. (1988). Co-planar stereotaxic atlas of the human brain. New York: Thieme Medical Publishers.Terumitsu, M., Fujii, Y., Suzuki, K., Kwee, I. L., & Nakada, T. (2006). Human primary motor cortex shows hemispheric special-

ization for speech. Neuroreport, 17(11), 1091–1095.Tourville, J. A., Reilly, K. J., & Guenther, F. H. (2008). Neural mechanisms underlying auditory feedback control of speech.

Neuroimage, 39(3), 1429–1443.Tremblay, P., & Small, S. L. (2011). On the context-dependent nature of the contribution of the ventral premotor cortex to speech

perception. NeuroImage, 57, 1561–1571.Turkeltaub, P. E., & Coslett, H. B. (2010). Localization of sublexical speech perception components. Brain and Language,

114(1), 1–15.

Page 25: Shared and distinct neural correlates of vowel perception and production

K. Grabski et al. / Journal of Neurolinguistics xxx (2013) 384–408408

Uppenkamp, S., Johnsrude, I. S., Norris, D., Marslen-Wilson, W., & Patterson, R. D. (2006). Locating the initial stages of speech–sound processing in human temporal cortex. NeuroImage, 31, 1284–1296.

Ventura, M. I., Nagarajan, S. S., & Houde, J. F. (2009). Speech target modulates speaking induced suppression in auditory cortex.BMC Neuroscience, 10, 58.

Wilson, S. M., & Iacoboni, M. (2006). Neural responses to non-native phonemes varying in producibility: evidence for thesensorimotor nature of speech perception. NeuroImage, 33(1), 316–325.

Wilson, S. M., Saygin, A. P., Sereno, M. I., & Iacoboni, M. (2004). Listening to speech activates motor areas involved in speechproduction. Nature Neuroscience, 7, 701–702.

Wise, R. J., Greene, J., Büchel, C., & Scott, S. K. (1999). Brain regions involved in articulation. Lancet, 353(9158), 1057–1061.Zaehle, T., Schmidt, C. F., Meyer, M., Baumann, S., Baltes, C., Boesiger, P., et al. (2007). Comparison of “silent” clustered and sparse

temporal fMRI acquisitions in tonal and speech perception tasks. NeuroImage, 37(4), 1195–1204.Zekveld, A. A., Heslenfeld, D. J., Festen, J. M., & Schoonhoven, R. (2006). Top–down and bottom–up processes in speech com-

prehension. NeuroImage, 32, 1826–1836.Zheng, Z. Z., Munhall, K. G., & Johnsrude, I. S. (2010). Functional overlap between regions involved in speech perception and in

monitoring one’s own voice during speech production. Journal of Cognitive Neuroscience, 22(8), 1770–1781.