Top Banner
Review paper Speech MRI: Morphology and function Andrew D. Scott a, b , Marzena Wylezinska a, c , Malcolm J. Birch a , Marc E. Miquel a, c, * a Clinical Physics, Barts Health NHS Trust, London EC1A 7BE, United Kingdom b NIHR Cardiovascular Biomedical Research Unit, The Royal Brompton Hospital, Sydney Street, London SW3 6NP, United Kingdom c Barts and The London NIHR CVBRU, London Chest Hospital, London E2 9JX, United Kingdom article info Article history: Received 3 March 2014 Received in revised form 24 April 2014 Accepted 1 May 2014 Available online 28 May 2014 Keywords: Speech imaging MRI dynamic imaging Real-time imaging Vocal tract imaging and modelling Cleft palate Vocal tract morphology Speech production abstract Magnetic Resonance Imaging (MRI) plays an increasing role in the study of speech. This article reviews the MRI literature of anatomical imaging, imaging for acoustic modelling and dynamic imaging. It de- scribes existing imaging techniques attempting to meet the challenges of imaging the upper airway during speech and examines the remaining hurdles and future research directions. Ó 2014 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved. Introduction Basics of speech production The production of human speech is a complex process, see Refs. [1e4] for example, involving numerous organs, namely: the lungs, diaphragm and chest wall; the larynx, pharynx and vocal folds (or cords); the tongue, lips and soft palate (or velum); and the teeth, jaw and nasal cavity. The lungs, driven by the diaphragm and/or chest wall provide the airow which travels through the lower respiratory tract (the bronchioles, the two bronchial tubes and the trachea). Air enters the upper respiratory tract (which will be the focus of this review) through the larynx which contains the vocal cords, a schematic of the upper airway in mid-sagittal view is represented on Fig. 1 . Air is forced through a narrow gap between the vocal folds which vibrate, producing a fundamental frequency plus harmonics. By manipulation of the tension, length and sepa- ration of the vocal cords and control of the airow between them, the fundamental frequency, volume, and therefore the intonation, of speech can be controlled. The remainder of the upper respiratory tract forms a series of connected resonant cavities which can be modied in size and shape using the pharynx, velum, uvula, jaw, tongue and lips. These manipulators modify the formant fre- quencies of speech. For consonant sounds, speech is further complicated by articulation, which is the partial or full obstruction of the vocal tract by a pair of articulators e the tongue tip and upper teeth, tongue body and hard palate, the lips or the velum and tongue dorsum, for example (Fig. 2). Clinical assessment of speech A number of acquired and inherited diseases can affect the function of the speech organs, including cancer [5,6], clefts of the lips and/or palate [4], laryngitis, vocal cord polyps [7], nodules or cysts [8], neurological conditions [9] and some endocrine disorders [10]. Standard diagnostic tools for speech therapists assessing speech disorders include aural assessment, acoustic analysis, oral and nasal airow measurements [11] and external measurement of the impedance across the vocal folds to measure their movement (electroglottography) [12]. Imaging techniques may also be used, most commonly endoscopy. Stroboscopy is a variant of endoscopy which is used for analysis of vocal fold movement in slow motion [13]. X-Ray uoroscopy is also used to assess the movement of the tongue [14], soft palate and pharyngeal wall movements [15,16]. However, endoscopy is invasive (although minimally) potentially causing discomfort and abnormal speech. It is also limited to * Corresponding author. Clinical Physics, St Bartholomews Hospital, Barts Health NHS Trust, London EC1A 7BE, United Kingdom. Tel.: þ44 203 465 6771; fax: þ44 203 465 5785. E-mail address: [email protected] (M.E. Miquel). Contents lists available at ScienceDirect Physica Medica journal homepage: http://www.physicamedica.com http://dx.doi.org/10.1016/j.ejmp.2014.05.001 1120-1797/Ó 2014 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved. Physica Medica 30 (2014) 604e618
15

Speech MRI: Morphology and function - University of Malaya · PDF filefunction of the speech organs, including cancer [5,6], clefts of the lips and/or palate [4], laryngitis, vocal

Mar 16, 2018

Download

Documents

VũDương
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Speech MRI: Morphology and function - University of Malaya · PDF filefunction of the speech organs, including cancer [5,6], clefts of the lips and/or palate [4], laryngitis, vocal

lable at ScienceDirect

Physica Medica 30 (2014) 604e618

Contents lists avai

Physica Medica

journal homepage: http: / /www.physicamedica.com

Review paper

Speech MRI: Morphology and function

Andrew D. Scott a,b, Marzena Wylezinska a,c, Malcolm J. Birch a, Marc E. Miquel a,c,*aClinical Physics, Barts Health NHS Trust, London EC1A 7BE, United KingdombNIHR Cardiovascular Biomedical Research Unit, The Royal Brompton Hospital, Sydney Street, London SW3 6NP, United KingdomcBarts and The London NIHR CVBRU, London Chest Hospital, London E2 9JX, United Kingdom

a r t i c l e i n f o

Article history:Received 3 March 2014Received in revised form24 April 2014Accepted 1 May 2014Available online 28 May 2014

Keywords:Speech imagingMRI dynamic imagingReal-time imagingVocal tract imaging and modellingCleft palateVocal tract morphologySpeech production

* Corresponding author. Clinical Physics, St BartholoNHS Trust, London EC1A 7BE, United Kingdom. Tel.:203 465 5785.

E-mail address: [email protected] (M.E. Miq

http://dx.doi.org/10.1016/j.ejmp.2014.05.0011120-1797/� 2014 Associazione Italiana di Fisica Med

a b s t r a c t

Magnetic Resonance Imaging (MRI) plays an increasing role in the study of speech. This article reviewsthe MRI literature of anatomical imaging, imaging for acoustic modelling and dynamic imaging. It de-scribes existing imaging techniques attempting to meet the challenges of imaging the upper airwayduring speech and examines the remaining hurdles and future research directions.

� 2014 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.

Introduction

Basics of speech production

The production of human speech is a complex process, see Refs.[1e4] for example, involving numerous organs, namely: the lungs,diaphragm and chest wall; the larynx, pharynx and vocal folds (orcords); the tongue, lips and soft palate (or velum); and the teeth,jaw and nasal cavity. The lungs, driven by the diaphragm and/orchest wall provide the airflow which travels through the lowerrespiratory tract (the bronchioles, the two bronchial tubes and thetrachea). Air enters the upper respiratory tract (which will be thefocus of this review) through the larynx which contains the vocalcords, a schematic of the upper airway in mid-sagittal view isrepresented on Fig. 1. Air is forced through a narrow gap betweenthe vocal folds which vibrate, producing a fundamental frequencyplus harmonics. By manipulation of the tension, length and sepa-ration of the vocal cords and control of the airflow between them,the fundamental frequency, volume, and therefore the intonation,of speech can be controlled. The remainder of the upper respiratory

mew’s Hospital, Barts Healthþ44 203 465 6771; fax: þ44

uel).

ica. Published by Elsevier Ltd. All

tract forms a series of connected resonant cavities which can bemodified in size and shape using the pharynx, velum, uvula, jaw,tongue and lips. These manipulators modify the formant fre-quencies of speech. For consonant sounds, speech is furthercomplicated by articulation, which is the partial or full obstructionof the vocal tract by a pair of articulatorse the tongue tip and upperteeth, tongue body and hard palate, the lips or the velum andtongue dorsum, for example (Fig. 2).

Clinical assessment of speech

A number of acquired and inherited diseases can affect thefunction of the speech organs, including cancer [5,6], clefts of thelips and/or palate [4], laryngitis, vocal cord polyps [7], nodules orcysts [8], neurological conditions [9] and some endocrine disorders[10]. Standard diagnostic tools for speech therapists assessingspeech disorders include aural assessment, acoustic analysis, oraland nasal airflowmeasurements [11] and external measurement ofthe impedance across the vocal folds to measure their movement(electroglottography) [12]. Imaging techniques may also be used,most commonly endoscopy. Stroboscopy is a variant of endoscopywhich is used for analysis of vocal fold movement in slow motion[13]. X-Ray fluoroscopy is also used to assess the movement of thetongue [14], soft palate and pharyngeal wall movements [15,16].However, endoscopy is invasive (although minimally) potentiallycausing discomfort and abnormal speech. It is also limited to

rights reserved.

Page 2: Speech MRI: Morphology and function - University of Malaya · PDF filefunction of the speech organs, including cancer [5,6], clefts of the lips and/or palate [4], laryngitis, vocal

Figure 1. Anatomy of the upper airways, mid-sagittal view.

A.D. Scott et al. / Physica Medica 30 (2014) 604e618 605

produce en-face images of exterior surfaces within the vocal tract.In contrast, X-ray fluoroscopy is non-invasive and operates at bothhigh temporal (w30 frames per second (fps)) and high in-planespatial resolution (<0.5 � 0.5 mm2 [17]), but is limited to pro-duce projections of the anatomy. The soft-tissue contrast of X-rayfluoroscopy is relatively poor, but the bony structures are clearlyseen and the outline of both the soft palate and posterior pharyn-geal wall can be identified. Typically, the lateral view is used, butother orientations such as the townes and basal views [15,18,19]have been used to add additional information. Increased contrast[20] is available by coating the vocal tract with barium contrast atthe expense of increased patient discomfort [21]. X-Ray imagingalso results in an ionising radiation dose to both the patient andoperator [22]. Estimates of the average patient effective dose indysphagia studies, which target similar anatomy, vary between0.2 mSv [23], 0.4 mSv [24] and 0.85 mSv [22], but are dependent onthe exposure time. Speech assessments may occur regularly,resulting in repeated radiation exposures and studies are typicallylonger than dysphasia protocols in order to obtain a varied speechsample.

Alternative imaging techniques: CT and ultrasound

Other imaging techniques available for imaging the speech ar-ticulators include ultrasound and X-ray computed tomography(CT). Ultrasound is relatively low cost, widely available, rapid andfree from ionising radiation [25]. Relatively narrow (w2 mm) sec-tions of the mid part of the tongue can be imaged in coronal orsagittal planes, from underneath the chin, at temporal resolutionsof around 30 fps and sub-millimetre spatial resolution. However,while the palate and velum may be observed in some lingual po-sitions, air interfaces, and bone are highly reflective and structuresbeyond these interfaces are not visualised, somewhat limiting thetechnique. Furthermore, the need for the transducer to be in directcontact with the skin may interfere with normal speech.

CT is able to obtain high spatial resolution (<1 � 1 � 1 mm3)volumetric images with delineation of soft tissue and bone, whichmay be reformatted in any plane. However, radiation doses aretypically greater than those from planar X-ray studies. Further-more, temporal resolution is currently limited to around 165 ms at3 gantry rotations per second or around 85 ms using state of the artdual source CT systems [26,27] which is insufficient for detailedanalysis of tongue, lip and velar motion.

MRI: imaging anatomy and function?

MR imaging provides tomographic images with excellent softtissue contrast in any plane without the use of ionising radiationand scanners are now commonplace. While MR was previouslyconsidered as a “slow” imaging modality, modern techniques,largely developed as a result of the desire to capture or freeze themotion of the heart, can result in temporal resolutions farexceeding those available with CT and even ultrasound [28]. As it isalso able to provide images of both the anatomy and function of thevocal tract, MRI as the potential to become themodality of choice inspeech imaging. This has resulted in a large body of literaturedescribing the acquisition of MR images in the vocal tract and thepotential diagnostic and modelling applications.

Despite the growing body of literature, outside technical MRjournals many studies fail to adequately or accurately describe theMR acquisition parameters. Studies often neglect to fully describeimportant parameters such as resolution [29e34], flip angle[33,35e41], echo train length [33,42e45], or sequence type[30,46,47]. Errors in the description of the methods include echotimes (TE) longer than repetition times (TR) [48], descriptions of

multi-slice 2D sequences as 3D acquisitions [46] and a steady stateturbo spin echo sequence (instead of single shot) [49].

While existing work has considered the application of MRI in avariety of situations, challenges and limitations still exist andfurther work is required before themodality is used routinely in thevocal tract. For example, large changes in magnetic susceptibilityare present at tissueeair interfaces which result in sequencedependent artefacts; signal to noise ratio (SNR) must be traded fortemporal and/or spatial resolution; many of the reconstructiontechniques for rapid acquisitions are computationally challenging;and imaging is almost universally performedwith the subject in thesupine position.

No previous review of this large and expanding field has pre-viously been published. In this review we will attempt to provide arational approach to the body of literature divided into imaginganatomy, imaging for acoustic modelling and dynamic imaging. Wewill describe existing techniques attempting tomeet the challengesof imaging the upper airway during speech and examine theremaining hurdles.

Static imaging of the vocal tract

The excellent soft tissue contrast and lack of ionising radiationhave made MR imaging a popular research tool in analysing the

Page 3: Speech MRI: Morphology and function - University of Malaya · PDF filefunction of the speech organs, including cancer [5,6], clefts of the lips and/or palate [4], laryngitis, vocal

Figure 2. Anatomy of the palate and principle musculature involved in speech, en-face view.

A.D. Scott et al. / Physica Medica 30 (2014) 604e618606

morphology of the upper respiratory tract in healthy and patientcohorts. Despite the intrinsically dynamic nature of speech pro-duction, a large amount of research has been performed while thesubject is either silent or during sustained phonation of a relevantsound. For many of these acquisitions, where spatial resolution andcontrast are more important than a short acquisition duration,multi-slice 2D spin echo based techniques are used for their resil-ience to artefacts caused by differences in magnetic susceptibility,resulting in imaging times >10 s. The very first MR studies of theupper respiratory tract were performed in this way at low fieldstrengths for assessment of the vocal tract shape [50e52]. However,such long phonations increase the potential for accidental move-ment or breathing artefact during the sustained phonation andmore rapid gradient echo based sequences have been used as analternative to reduce acquisition duration to 1 s or below (seeSection 4).

Imaging the larynx

MR imaging of the larynx (and the pharynx) is commonplace forthe diagnosis and assessment of benign growths, malignant tu-mours, relapsing polychondritis and necrotising fasciitis [53].However, tomographic imaging of the vocal folds is less commonlyused in assessing speech; as the fundamental frequency of normalspeech is between 100 and 250 Hz [54], the opening and closing ofthe vocal folds are too rapid for the temporal resolution of MRtechniques. Despite this, MR imaging at 1.5 T and below has beenused to image the vocal folds during sustained phonation in orderto aid vocal fold modelling [55], assessment of laryngeal surgery(thyroplasty) [56] and assessment of the angulation and superioreinferior movement of the larynx [44]. The limited duration ofphonation, limits such studies to 2D acquisitions, but the multiplethin contiguous slices available from 3D imaging would be prefer-able. In recent work, high resolution 3D imaging at 3 T was affordedby dividing the phase encode lines between multiple sustainedphonations [57]. Static studies (without phonation) reduce the

limitations on imaging time but are limited by both voluntarymotion and involuntary motion due to respiration, coughing,swallowing and pulsatile carotid flow. In order to address this,Barral et al. [58] used navigator echoes interleaved within a highresolution 3D acquisition to detect and either reacquire or correctphase encode lines corrupted by motion. Studies are typicallyperformed with spin echo based techniques to minimise artefactscaused by sharp changes in magnetic susceptibility present at thetissueeair interfaces. Balanced steady-state free precession (bSSFP)imaging, which is a rapid technique with high SNR has also beendemonstrated for real-time imaging of the larynx during phonation(4 fps) [59], but the resulting artefacts due to magnetic suscepti-bility were severe.

Analysing vocal tract shape

The tomographic nature of MRI with its suitability for use inhealthy volunteers means that it is the modality of choice for im-aging the vocal tract at rest and during sustained phonation. Datafrom such studies is typically used in modelling speech productionand the analysis of the function of the articulators.

The formant frequencies of speech can be predicted from anarea function, which describes the variation in cross-sectional areawith length along the vocal tract (see Fig. 3). A number of studieshave used MRI to obtain area functions [31,48,52,60e63], typicallyspin echo based sequences, which are less sensitive to signal loss atthe tissueeair-interfaces. Most simplistically, area functions can bederived from measurements of the vocal tract on a single mid-sagittal MRI image or X-ray using empirical relationships relatingmid-sagittal width to cross-sectional area [60,64]. However, thecasts used to generate the empirical relationships are typicallyobtained from cadavers and may not fully represent the vocal tractduring speech. Alternatively, area functions may be estimated fromsparsely spaced 2D MR images positioned perpendicular to thevocal tract along its length [32]. The cross-sectional area at pointsbetween the slices can be estimated by fitting a model to the

Page 4: Speech MRI: Morphology and function - University of Malaya · PDF filefunction of the speech organs, including cancer [5,6], clefts of the lips and/or palate [4], laryngitis, vocal

Figure 3. Example of area function calculation for Portuguese vowels. Reproduced from Martins et al. [73] with permission from Elsevier.

A.D. Scott et al. / Physica Medica 30 (2014) 604e618 607

measured cross-sectional areas and the mid-sagittal distance ob-tained from a mid-sagittal image. Demolin et al. [65] used such atechnique in 2 healthy subjects with 14 slices positioned along thevocal tract during sustained phonation of French oral vowels. Themeasured areas were compared with those calculated from pub-lished empirical relationships and subject specific models using themid-sagittal width measured from the same images. Performanceof the model varied between vocal tract sections (oral, nasal,pharyngeal, larynx, etc.) and overall, the subject specific modelsperformed better. In order to avoid such empirical relationshipsaltogether, 2D acquisitions with multiple contiguous slices[62,63,66,67] or even 3D acquisitions [68] have been used to cap-ture the vocal tract configuration during sustained phonation. Thisprovides information on vocal tract configuration unavailable withany other imaging modality except X-ray CT, which has limitedapplicability in healthy subjects. As a result of such studies,measured area functions are available for English fricative conso-nants [66], vowels, nasals and plosives [62,69], laterals [70], rhotics[71], Tamil liquid consonants [72], French vowels [48] and anextensive collection of European Portuguese consonants andvowels [73] (see Fig. 3). In addition, one study considered the dif-ferences in area functions between different voice qualities:normal, “yawny” and “twangy” [74] and another, by the sameauthor [75] considered the changes in vocal tract configuration inthe same speaker over 8 years. While the data was not directlyacquired for analysis of the vocal tract, work by Fitch et al. [76]demonstrates relationships in 129 young subjects between vocaltract length and subject size, sex and pubescent status. Recently,differences in themorphology of thewhole vocal tract among threevoice professionals were also demonstrated using 3D volumetricMRI [68].

The accuracy of the predicted formant frequencies generated bythe measured area functions varies by subject and phonation. Inworkwhich compared the predicted frequencies to thosemeasuredin speech samples recorded outside the scanner, the mean absoluteerror in the first three calculated formants was 7.5 � 8.0% (range0.46%e43.6%) and 9.4 � 6.6% (range 0.2%e22.1%) in a male subject(vowels, nasals and plosives) [62] and female subject (vowels andlaterals) [69] respectively. Vocal tract geometry is often simplifiedto assist in the computation. However, analysis of MRI data has alsodemonstrated the increased error when the piriform fossa [69,77]and the asymmetry of nasal cavities [78,79] are not includedwhen calculating formants.

Recently, the super-resolution reconstruction technique [80]was applied to improve vocal tract MR images. Multiple orthog-onal low-resolution MRI stacks were integrated into one isotropicsuper-resolution volume. This implementation also incorporatedboth edge-preserving regularisation to improve SNR and resolu-tion; and motion correction between acquisitions of different

stacks. When applied to low-resolution MR stacks derived fromlibrary MR datasets (ATR Human Information Science LaboratoriesMRI database of Japanese vowel production [81]), the resultingresolution was improved by a factor of three. The overall areafunction derived from super-resolution volumes also producedbetter prediction of formants, particularly if those formants weresensitive to area perturbation at constrictions (i.e. in the laryngealcavity).

Beyond area functions, MRI has been used to analyse the posi-tioning of the articulators during various sustained phonations[79], particularly the tongue [70e72,82,83]. Work by Narayananand Alwan [70] demonstrated the grooving of the tongue along themid-line with lateral flow channels and a convex shape in theposterior section during the phonation of lateral sounds (e.g. “l”). Incontrast, the rhotic sounds [71] were characterised by a moreretracted tongue with a convex anterior shape and concave pos-terior shape. Such an analysis is only possible using MRI as othermodalities are either unable to obtain this information or unethicalfor use in healthy subjects.

Accurate modelling also requires information on the dentalgeometry. However, the lack of MRI signal from the teeth meansthey must be superimposed onto the MRI images. Various methodshave been used to achieve this, including using data from supple-mentary CT imaging [62,69], but the associated radiation dosemeans that it might be difficult to obtain ethical approval. Otherwork has used dental casts, which may be either measured withcallipers [70,71,82,84] or imaged in a bath of water [67,83]. Alter-natively, Takemoto et al. [85] acquired MRI data of a subject usingblueberry juice as a contrast agent in the oral cavity, leaving theteeth as signal voids. In this case, images of the teeth were obtainedin an acquisition with the subject in the prone position and theimages obtained for speech modelling were acquired separately inthe supine position. However, there is no obvious reason whydental imaging and imaging of sustained phonation could not beperformed in the same session in the same patient position, thusreducing the complications of co-registering the two datasets.

MRI data has also been used in combination with other speechanalysis tools. Palatography coats the tongue with black powderand electropalatography (EPG) uses an oral insert covered withelectrodes attached to the hard palate, to detect the locations oftongue contact [70e72,86]. While MRI and EPG have been used forstatic vocal tract analysis, electromagnetic articulography [87](EMA), which tracks miniature coils attached to the speech artic-ulators, has been used to add dynamic information to the static MRIstudies [72,88]. Dynamic information on the movement of thetongue and lips may also be obtained from video imaging and hasbeen used to supplement static MRI data in speech modelling [83].Recently a method was proposed for use in speech simulation [89]based on volumetric MRI and CT data consisting of sustained

Page 5: Speech MRI: Morphology and function - University of Malaya · PDF filefunction of the speech organs, including cancer [5,6], clefts of the lips and/or palate [4], laryngitis, vocal

Figure 4. High-resolution imaging of the levator veli palatini. (a) Mid-sagittal image showing the plane required to obtain an in-plane view of the muscle (b).

A.D. Scott et al. / Physica Medica 30 (2014) 604e618608

articulations of vowels and consonants. The model was applied tothe synthesis of consonantevowel syllables which were recognisedin 82% of cases in a perception test.

Many such studies have used spin echo based sequences [62,83]which are resistant to signal drop out artefacts at the tissueeairinterfaces. A number of studies have also used spoiled gradientecho acquisitions [48,82,90] which are generally faster and poten-tially allow for a 3D acquisition [73,90], rather than the morecommon multislice 2D techniques. 3D acquisitions allow theacquisition of multiple thin contiguous non-overlapping slices withbetter SNR per unit acquisition time than the equivalent non-interleaved multislice 2D acquisition.

Obtaining images at sufficient resolution for multiple sustainedsounds is time consuming. For example, Story et al. [62] imaged 22sustained sounds using axial 5 mm slices at 0.9 � 0.9 mm2 reso-lution to cover the whole vocal tract in one subject, who spentbetween and 7 and 8 h in the scanner over several sessions. Otherwork [86] has imaged the vocal tract in multiple orientations toproduce slices perpendicular to the vocal tract over the maximumlength, at the expense of even longer acquisition durations. How-ever, compressed sensing [91] was recently used to reconstruct 3Dgradient echo acquisitions of the vocal tract that were obtainedwith 5 fold undersampling [92]. Images at 1.5 � 1.5 � 2.0 mm3 overa 240� 240� 100 mm3

field of view (FOV) were acquired in a timeof 7 s. Such studies provide information used in speech synthesisand increase the understanding of speech production. However,there are few if any clinical applications and, unless techniques likecompressed sensing are employed, any potential future clinicalapplication is limited by the long acquisition durations required.

Imaging the palate and associated musculature

The soft palate is one of the most frequently studied speecharticulators using MRI. The soft-tissue contrast enables non-invasive imaging of the muscular anatomy, providing informationwhich is otherwise only available via surgical or cadaveric tech-niques. As a result, MRI derived information has been used todescribe the normal soft palate and could be used on a per subjectbasis to guide cleft palate surgery. The levator veli palatini (LVP)muscle (Fig. 2) produces the primary elevating force for the soft-palate and much of the static imaging work in MRI of the soft-palate has been focussed here. The LVP forms a sling, from thelateral attachments to the temporal bone and cartilage surroundingthe auditory tube, through the soft palate. In normal subjects [93]MRI data has demonstrated that the LVP is, on average, orientatedat 122.4� (male 122.0�, female 122.8�) and the muscle is best

visualised from images orientated in this plane (see Fig. 4). Moststudies have usedmulti-slice 2D imaging in the LVP plane, but thereis no data on how the angulation varies between subjects andtherefore, whether subject specific orientation of LVP imagingplanes is required or whether imaging at 122.4� is sufficient inevery subject. Several studies in patients [42,45,94] have usedmulti-slice 2D axial acquisitions or subject specific orientation ofimaging which enable the clinical team to determine themorphology of the abnormal LVP in cleft palate patients. Alterna-tively, recent work [95] used a 3D acquisition with isotropic reso-lution and a 3D Turbo Spin Echo (TSE) with variable flip angle(SPACE) imaging sequence [96]. The isotropic 3D imaging volumewas positioned axially and oblique imaging planes were retro-spectively calculated, thus avoiding the requirement to identify theplane of the LVP at acquisition time. Such acquisitions would also bemore amenable to reformatting in alternative planes than 2Dmulti-slice acquisitions, to study the othermuscular anatomy of thesoft palate. However, swallowing or other head motion often de-grades the quality of LVP studies and the nature and severity of theresultant image artefacts are partly dependent on the k-space tra-jectory. 2D studies are often performed with interleaved sliceordering for temporal efficiency, so motion can result in blurring ormis-registration of a subset of slices. For 3D acquisitions, everyreconstructed slice is partly dependent on the data from every k-space location and, therefore, the whole dataset could be ruined bya relatively short period of motion. If, on the other hand, motiononly occurs at the outer k-space positions, the severity of theartefact might be very minor. 3D studies are also generally SNRefficient, but studies of the LVP are usually proton density foroptimal contrast [93,94] or T2 weighted [43,97e99] (although theuse of other contrasts have been described [49,93]) requiring longrepetition (TR) times. For 2D studies, the long TR is used to acquireother slices, but for 3D imaging long echo trains must be used tomaintain efficiency.

Images of the LVP can also be obtained during sustainedphonation, although spatial resolution is compromised to accountfor the limited acquisition duration. While static imaging has beenperformed at up to 0.65 � 0.65 mm2 in-plane resolution and2.2 mm thick slices [43] or 0.8 mm isotropic resolution with 3Dimaging [95], images of the LVP during sustained phonation havetypical spatial resolutions of around 0.65�1.0 mm2 for 4 mm slices[49].

A number of measurements of the static LVP are often per-formed, including: length, distance between points of origin,thickness and angle at the points of origin. However, none of thesemeasurements has, as yet, been shown to be a useful measure/

Page 6: Speech MRI: Morphology and function - University of Malaya · PDF filefunction of the speech organs, including cancer [5,6], clefts of the lips and/or palate [4], laryngitis, vocal

Figure 5. Dynamic imaging methods. Dynamic imaging of speech can be performedusing triggered methods (a), retrospectively reordering techniques (b) or real-timeimaging (c). Triggered techniques repeatedly acquire a segment of k-space (a subsetof the phase encode lines required for an image) while the subject produces a testphrase. The subject repeats the test phrase until enough segments have been acquiredto fill k-space. The retrospective reordering technique of Shadle et al. [36] repeatedlyacquires full images while the subject freely repeats the test phrase. The k-space dataare retrospectively assigned to the imaging frames based on the simultaneously ac-quired audio data. Real-time imaging uses imaging acceleration techniques to acquireimages rapidly while the subject produces the test phrase.

A.D. Scott et al. / Physica Medica 30 (2014) 604e618 609

predictor of velopharyngeal function. Furthermore, while a numberof studies have measured such parameters, the number of partici-pants is generally small (4e12) and the majority of studies fail toprovide estimates of the between subject variance on most vari-ables [49,93,97]. Tian et al. [43] acquired high resolution images in17 normal subjects in the plane of the LVP during quiet breathingand images during sustained phonation in the same oblique planeand in the axial plane. Five variables including: maximum effectivevelopharyngeal ratio, maximum pharyngeal constriction ratio,maximum levator shortening ratio, maximum levator stretch ratio,maximum velar height, derived from measured parameters, wereused to describe velopharyngeal function. These were presentedwith mean and standard deviation for the 17 healthy subjectsimaged. In follow up work, the same group imaged 30 children (10normal, 10 with repaired cleft palate and normal velopharyngealfunction and 10 with repaired cleft palate and velopharyngealinsufficiency), during nasal breathing and sustained phonation[99]. Of the static measurements, the differences between normaland cleft subjects were larger than the differences between therepaired cleft patients with and without velopharyngeal in-sufficiencies. The distance between the points where the LVP in-serts into the velum is narrower in non-cleft patients and thepharynx is less deep. The patients with residual velopharyngealinsufficiency had larger gaps between the uvula and posteriorpharyngeal wall and shorter posterior vela (the distance from thecentre of the LVP to the tip of the uvula). Of the dynamic variables,themaximum effective velopharyngeal ratio (maximumof distancefrom posterior of hard palate to centre of LVP/distance from pos-terior of hard palate to posterior pharyngeal wall) was significantlygreater in the repaired cleft group with correct velopharyngealfunction than in the group with velopharyngeal insufficiency. Themaximum pharyngeal constriction ratio (1 � [minimal pharyngealwidth at phonation/pharyngeal width at rest]) is significantlygreater in the velopharyngeal insufficiency patients than in thenon-cleft subjects.

Normative data to improve understanding of levatormorphology in the presence of a cleft or other condition, was ac-quired in a sample of 30 subjects (15 males and 15 females) [100]using a high resolution 3D technique. Significant differences be-tween men and women were observed: LVP muscle length, dis-tance between levator insertion, points and angle of origin were alllarger inmen than that measured inwomen. However, variations inthe relative size of the cranium or height of the individual were notproportionate to the variations observed in the levator muscle.

Other musculature in the velopharyngeal region has beenidentified in LVP MRI studies [97], but the potential of MRI to studythese structures including palatoglossus and palatopharyngeus isyet to be fulfilled.

Dynamic imaging

Imaging at rest or during sustained phonation can only providelimited information on the vocal tract configuration during normalspeech. The lack of ionising radiation, excellent soft tissue visual-isation and high frame rates now available mean that MR is apowerful technique for the study of both normal and pathologicalspeech. However, achieving the optimum balance in the trade off ofSNR, temporal resolution, spatial resolution and subject tolerance ischallenging. Studies have suggested that in order to capture themotion of the tongue [101] or soft-palate [102], a minimum tem-poral resolution of 20 fps is required, but very little work hasconsidered the minimum spatial resolution required [103]. Forevaluation of soft palate motion temporal resolution is prioritised.Using a spiral imaging sequence 22 fps at 1.9 � 1.9 mm2 has beenachieved [102,104], and using techniques widely available on

clinical MR systems, 20 fps has been achieved at a reduced specialresolution of 2.7 � 2.7 mm2 [103].

Triggered acquisitions

High spatial and temporal resolution can be achieved byrepeatedly acquiring a subset of the phase encode lines (a segment)required to create an image while the subject performs a speechtask (see Fig. 5). The speech task is performed again and the nextsegment of k-space is repeatedly acquired. This is repeated until k-space is filled ([number of phase encode lines in image/number oflines per segment] times), resulting in an image series with atemporal resolution equal to the duration of the segment. In orderto synchronise the production of the speech task with the

Page 7: Speech MRI: Morphology and function - University of Malaya · PDF filefunction of the speech organs, including cancer [5,6], clefts of the lips and/or palate [4], laryngitis, vocal

A.D. Scott et al. / Physica Medica 30 (2014) 604e618610

acquisition a trigger signal is required. It is possible to generate atrigger signal from the electrocardiogram (ECG) monitor usuallyused for cardiac gated studies [40]. However, this means that thesubject must synchronise the speech task with his or her heart rateand changes in the speed of the task are difficult to control. Minimaladditional effort is required to produce a device which produces atrigger signal at a fixed, user defined repetition rate [37,38]. Such adevice can be used to both trigger the scanner (usually via asimulated ECG R-wave) via the ECG interface and provide anaudible or visual indicator to the subject.

Problems arise with these techniques when the speech task isnot perfectly synchronised with the acquisition, which is morelikely after a large number of repetitions of the speech task, orwhen the speech task is not perfectly reproducible. NessAiver et al.[105] analysed the timing of a repeated speech task and founddifferences in the onset of speech features of up to 95 ms. Manystudies make little effort tominimise the number of repetitions andoften acquire only 1 line of k-space per segment [38,39,90] result-ing in acquisitions that require up to 256 repetitions of the speechtask [39]. Typically cardiac cine imaging techniques acquire anumber of lines per segment and sacrifice some signal to noise ratiofor reduced imaging times using parallel imaging techniques[106,107]. Alternatively, one group [36,108] has developed amethod of retrospectively reconstructing high temporal resolutiongated images from a lower temporal resolution acquisition per-formed during free repetitions of the test phase. The raw k-spacedata is saved and the relative temporal position of each phaseencode line is identified from the spectral characteristics of asimultaneously acquired audio recording. According to their rela-tive temporal positions, the lines are rebinned to form images.

Several studies have also managed to acquire multi-slice data byacquiring additional repetitions of the test phrase accordingly. Forexample, Takemoto et al. [46] acquired 20 sagittal slices at 33 mstemporal resolution (1 � 2 � 6 mm3 spatial resolution) over 640repetitions of the test phrase. The retrospective gating technique ofShadle et al. [36] is capable of similarly high spatio-temporal res-olution with multi-slice imaging. They acquired three parallelsagittal slices at 1.9 � 1.9 � 5 mm3 spatial resolution and recon-structed to 16 ms temporal resolution using 288 repetitions of thetest phrase.

Such techniques have been primarily applied to small groups(1e10) of healthy subjects for analysis of articulation during speech[36,47], including the calculation of area functions for acousticmodelling [40,46] and in multi-slice multi-planar acquisitions formonitoring glottal and tongue movement [39,90]. The same tech-nique has been used for assessment of velopharyngeal function[37,38] and, in conjunction with functional MRI, to assess the brainactivation associated with misarticulations in cleft palate patients[109].

Despite their flaws, gated techniques are currently the moststraightforward way of acquiring dynamic vocal tract images athigh temporal and spatial resolution. The highest published ac-quired temporal resolutions are around 17 ms for a single sliceacquisition with a spatial resolution of approximately1� 2 � 6 mm3 spatial resolution [46,90]. The predominant limitingfactor on temporal and spatial resolution is the subject’s ability toreproducibly repeat the test phrase. However, the minimum TR ofthe acquisition and the motion of the articulators during this timecould also limit the effective temporal and spatial resolution.

Tagged acquisitionsVery few techniques can provide information on the internal

deformation of soft tissue structures. Tagged MRI was developedfor tracking deformation of the myocardium during the cardiaccycle [110e113]. Such techniques tag the tissue with a spatial

variation in the magnetisation available for imaging. As the tissuemoves, so do the tags and the deformation of the tags indicates themotion of the tissue. As early as 1992 such techniques were appliedto study the internal deformation of the oral structures, the tonguein particular [114,115]. These early studies used a non-selectiveinversion recovery technique with selective reinversion of a num-ber of planes perpendicular to the imaging plane. After this prep-aration phase, which is applied while the subject is in the restposition, the subject produces a sound or moves his/her tongue andholds the position. The imaging data is acquired while the subjectmaintains the position and there is increased image intensity in theplanes containing the reinverted tissue. A more commonly usedtechnique is spatial modulation of magnetisation or SPAMM [111],which introduces a sinusoidal variation in the magnetisation and,therefore the image intensity, see Fig. 6. SPAMM is typically usedwith cine imaging to dynamically study the internal deformation oftissue. However, the intensity of the tags fades rapidly during thecine images due to longitudinal recovery of magnetisation (timeconstant T1). A variation of this technique, complementary spatialmodulation of magnetisation (CSPAMM) increases tag intensity andreduces tag fade by acquiring two sets of SPAMM tagged imageswith alternating polarity in the tag pattern and subtracting the twoimages [105]. Clearly CSPAMM requires twice as much data andtherefore, the acquisition duration is twice as long.

Tagged acquisitions are typically acquired as triggered acquisi-tions, where the tagging preparation is applied immediately afterthe trigger and the segmented image acquisition is performed af-terwards during the speech task. The segmented nature of theacquisition means that temporal and spatial resolution depend onthe number of repetitions of the test phrase and the quality of theresultant data depends on the reproducibility of the test phrase. Inorder to provide two dimensional motion information, either twotagging preparation modules are applied to form a grid like tagpattern, or two images are acquired with perpendicular taggingdirections and the grid pattern is created by combining the images.

The imaging data can be acquired using a variety of sequences[112] including non-Cartesian and hybrid echo planar imagingtechniques which have advantages for acquiring high temporalresolution data [116,117]. However, all tagged speech studies haveused standard gradient echo techniques, such as spoiled fast lowangle shot (FLASH) [105] or bSSFP.

Tag tracking can be performed by manual tracing [115], which istime consuming for cine acquisitions, or by automated techniques.A variety of automatic and semi-automatic tracking techniquesexist [118e120] but the harmonic phase (HARP) technique is themost widespread. HARP uses SPAMM or CSPAMM data as an input.The sinusoidal variation in the image intensity is represented as anoff-centre peak in k-space. Data outside a small region around theoff-centre peak is nulled and a Fourier transform is then applied tothe data. The magnitude of the resultant data is a low resolutionanatomical image while the phase data varies linearly from 0 to 2pin the tagging direction and thenwrapswith the same period as thetags in the original data. A composite image is formed by multi-plying the magnitude and phase of the HARP data and motionanalysis is performed by automatically following small, easy totrack regions of interest.

In the upper vocal tract, tagged MRI techniques have been usedto study the internal deformation of the tongue, predominantly inthe mid-sagittal plane. Initial studies performed tagging prepara-tion, then included a delay for movement of the tongue beforeimaging while the subject maintained the desired tongue position[114,115,121,122]. The results were analysed with reference to themuscular anatomy of the tongue to provide indications of thepossible muscles involved in producing Japanese vowels[114,115,121,123], later with electromyographic comparisons [123].

Page 8: Speech MRI: Morphology and function - University of Malaya · PDF filefunction of the speech organs, including cancer [5,6], clefts of the lips and/or palate [4], laryngitis, vocal

Figure 6. Images derived from tagged MRI data. Subtraction of the two CSPAMM datasets produces the high intensity sinusoidal variations in image intensity (a). HARP processingof the data results in the phase wrapped (b). Horizontally and vertically tagged images can be combined and processed to form a checkerboard like modulation in signal intensity(c). Adapted from Parthasarathy et al. [126], Copyright Acoustic Society of America.

A.D. Scott et al. / Physica Medica 30 (2014) 604e618 611

Napadow et al. [122] used a SPAMM technique with a FLASH im-aging sequence. Strain maps in the tongue were derived for genericstatic tongue positions in the mid-sagittal plane (anterior protru-sion and sagittal tongue curl) and an axial slice through the tongue(lateral tongue bending). Tagged cine MRI using a segmented im-aging technique was first used to analyse tongue deformationduring production of the sound/ka/in one example subject [124].This work used a triggered acquisition in themid-sagittal plane andmanual tracking of tags. The muscles involved in the phonation/k/and then/a/were inferred from the regional strains calculated fromthe images, although the technique cannot rule out passivecontraction of a particular muscle caused by a neighbouring fibrebundle. Liu et al. [125] used HARP analysis of multiple sagittal andaxial images to derive 3D tongue motion. Parthasarathy et al. [126]applied HARP processing to CSPAMM cine taggedMRI of the tongueacquired in three perpendicular planes. Two images (one hori-zontally tagged and one vertically) for each of 26 slices were ac-quired, while the single subject repeated /disuk/ 416 times. Theautomated nature of HARP analysis meant that the trajectories ofindividual points, strains, principal strains and velocity fields couldbe calculated for every frame and every slice. Stone et al. [127]imaged 8 subjects using a CSPAMM tagged technique while thesubjects said /i/ then /u/ and processed the data using HARP. The 8datasets included 3 acquired in the same subject, three differentnative languages and one patient who had a previous glossectomy.Principal component analysis was used to represent tongue motionas amean velocity fieldwith datasetedataset variations. The resultsdemonstrated large intra-subject differences in the same testphrase and similarities in vowel production between a patient witha glossectomy and a native Japanese speaker, and between nativeTamil and American English speakers.

Real-time acquisitions

Real-time imaging techniquesIn contrast to triggered acquisitions, real-time imaging does not

require reproducible and synchronised repeated speech tasks, butachieving sufficient temporal and/or spatial resolution withadequate SNR is even more challenging. A variety of imaging se-quences have been used which have various strengths and weak-nesses. Sequences widely available on clinical MRI scannerstypically achieve low temporal resolutions (2e10 fps), but are easilyimplemented and well understood. TSE sequences with the zoom[128] and partial Fourier techniques have been used to achievearound 6 fps [65,88,129,130]. The zoom technique uses perpen-dicular excitation and refocusing pulses for the spin echo to excite areduced FOV in the phase encoding direction, therefore saving timewhen imaging. The partial Fourier technique [131] acquires only

slightly more than half of the phase encode lines and uses theconjugate symmetry of k-space to recover the rest of the data.Gradient echo sequences have also been used to achieve typicallylow temporal resolutions (2e10 fps [33,132e134]), although highertemporal resolutions are common in cardiac MRI studies [135,136].Rapid gradient echo acquisitions can be subdivided depending onhow the remaining transverse magnetisation is dealt with at theend of each repeat time (TR). Radiofrequency (RF) spoiled gradientecho techniques or FLASH (fast low angle shot) type sequences[137], discard the remaining transverse magnetisation by modi-fying the phase of every excitation pulse. Alternatively, without RFspoiling the remaining transverse magnetisation may be used inthe next TR to improve SNR. This variant, which we will refer to assteady state free precession (SSFP) [138], requires a constant inte-gral of the gradient waveform on each axis in each TR. While theSNR of the resultant images is increased with respect to the FLASHimages, motion during imaging results in signal loss. The balancedSSFP technique (bSSFP) [139,140] uses additional gradients to fullyrefocus the magnetisation in each TR, further improving SNR. Suchsequences are motion resilient and very SNR efficient, but aresensitive to inhomogeneities in the magnetic field caused by im-perfections in themagnet or the presence of objects in the field, e.g.a patient. Both FLASH [133,141,142], and bSSFP [33,59,143] acqui-sitions have been used for real-time imaging of the vocal tract inthe mid-sagittal plane. While FLASH acquisitions may be slightlyfaster, their lower relative SNR may mean that more data must beacquired to compensate.

The highest temporal resolutions have been achieved by usingnon-Cartesian techniques which sample k-space in a non-rectilinear fashion. The most commonly used non-Cartesian tra-jectories are radial and spiral, which are most often employed asspoiled (FLASH like) sequences. Spiral trajectories can be timeefficient as a relatively large proportion of k-space can be sampledin each TR. However, the images are prone to artefacts [144]including due to the presence of off-resonant spins, caused bymagnetic field inhomogeneity or chemical shift (i.e. fat). The arte-facts appear as a blurring of the off-resonant tissue in all directionsand become more severe with longer spiral trajectories. Methodsdo exist for reducing [145] the image degradation, but such tech-niques make the already non-standard reconstruction morecomplicated and are, therefore, typically only used in speechstudies at 3 T, where the off-resonant effects are more common.Radial acquisitions [146] are less sensitive to off-resonance arte-facts but are inherently inefficient. However, radial acquisitions areusually undersampled, meaning that an insufficient number of k-space profiles are acquired to fullfill the Nyquist criteria at the k-space periphery. In Cartesian imaging, undersampling results inreplicates of the imaged object appearing within the imaged field

Page 9: Speech MRI: Morphology and function - University of Malaya · PDF filefunction of the speech organs, including cancer [5,6], clefts of the lips and/or palate [4], laryngitis, vocal

A.D. Scott et al. / Physica Medica 30 (2014) 604e618612

view. In contrast, undersampled radial imaging results in streakingartefacts, which are usually considered less detrimental to diag-nostic image quality [147,148]. Radial acquisitions are also relativelymotion insensitive as the centre of k-space is heavily oversampled.As for spiral imaging, radial trajectories are primarily confined toresearch institutions and are not typically available for use onstandard clinical MRI scanners found in general radiologydepartments.

Application of clinically available real-time techniquesDespite the relatively low temporal resolutions available, a

number of dynamic speech studies have been performed usingclinically available real-time techniques. As described above, TSEzoom imaging with partial Fourier acquisition has been used forimaging the vocal tract at low spatiotemporal resolution(2 � 4 � 6 mm3 at 4e6 fps) [65]. Alternatively, the same techniquehas been used to study velopharyngeal closure [129] at a resolutionof 1.5 � 3 � 6 mm3 at 5e6 fps, with imaging performed in the mid-sagittal plane, the axial plane at the height of maximal velophar-yngeal closure and in the coronal plane. In a comparisonwith X-rayvideofluoroscopy in 8 subjects there was complete agreement inthe pattern of velopharyngeal closure.

The earliest real-time gradient echo studies in the vocal tract[149] used FLASH sequences, had poor temporal resolution (3 fps),low spatial resolution (3� 6� 8mm3) andwere used to study vocaltract configurations during sustained vowel production in the mid-sagittal plane and axial slices. While dynamic imaging duringsustained phonation avoids blurring and artefacts caused by afailure to completely maintain the phonation during a staticacquisition, it does not fully represent normal speech. Higherspatio-temporal resolutions were made possible with the intro-duction of parallel imaging techniques (SENSE or GRAPPA) andimproved scanner hardware [132,141]. Parallel imaging techniquesacquire undersampled data with multiple receive coils, each with aspatially varying sensitivity profile [106,107]. The data from each ofthe coils with information on the coils sensitivity profile is used toresolve the aliasing. Echternach et al. [132,150e152] have achievedaround 8 fps at 1.4 � 2.2 � 11 mm3 using GRAPPA [153] and FLASHgradient echo imaging for evaluating vocal tract configurations inopera singers during register changes. Alternatively, the additionalperformance obtained using parallel imaging has been used toimprove spatial resolution (1 � 1 � 6 mm3 at 2 fps) in studiesassessing velopharyngeal function in children with velopharyngealinsufficiency [34,142]. Drissi et al. [33] also used a higher spatialresolution at the same temporal resolution (2 fps) with a bSSFPsequence for assessment of velopharyngeal function in children.However, it is debatable whether such high in-plane spatial reso-lution is required to determine whether velopharyngeal closureoccurs. For assessment of velopharyngeal closure during normalspeech, higher temporal resolution is certainly required and themotion blurring at 2 fps is likely to degrade the effective spatialresolution. One potential method for improving spatial or temporalresolution in real-time soft palate imaging is to use adaptive aver-aging [154]. Increased temporal or spatial resolution can be tradedfor reduced SNR which is retrospectively recovered. The adaptiveaveraging algorithm identifies frames where the soft palate is in asimilar configuration and selectively averages these frames toimprove SNR. This technique is demonstrated in theSupplementary data file 3.

A few studies have used bSSFP techniques to acquire real-timevocal tract images for swallowing [143], assessment of velophar-yngeal function [33,103] or assessment of vocal cord configuration[59]. For assessment of velopharyngeal function, Scott et al. [103]compared sequences with spatial resolutions between1.6� 1.6� 10mm3 and 2.7� 2.7� 10mm3 for temporal resolutions

between 9 and 20 fps using SENSE parallel imaging and partialFourier at both 1.5 T (bSSFP) and 3 T (SSFP). bSSFP sequences weretoo sensitive to off resonance effects to use at 3 T, but SSFP se-quences produced reliably good images. Whereas, bSSFP failed insome cases at 1.5 T, but often produced very good images. Forassessment of palate motion the highest temporal resolution waspreferred while when assessment of morphology is required sometemporal resolution was traded for increased spatial resolution.Examples of acquisitions at 1.5 T (10 fps) and 3 T (20 fps) are shownin the Supplementary data files (files 1 & 2). The only other pub-lished comparison of techniques for real-time imaging of the vocaltract compared hybrid echo-planar imaging sequences (hEPI) toFLASH acquisitions for assessment of swallowing [134]. While thehEPI sequence was rejected for use in imaging swallowing due toartefacts, by using a longer echo train length (w11 c.f. 3 in Ana-gnostara et al. [134]) with SENSE and partial Fourier techniques onamodern 1.5 T scanner, other studies [154] were able to obtain hightemporal resolution with few artefacts when imaging the softpalate. Images acquired with these methods are shown in theSupplementary data file 3.

Applications of non-Cartesian imaging techniquesInitial studies [101,155e157] using spiral imaging in the upper

vocal tract obtained similar acquired spatial and temporal resolu-tions to that possible with Cartesian techniques (2.7 � 2.7 � 5 mm3

at 9 fps). However, as multiple spiral trajectories (each rotated withrespect to the last) are required to form a complete image, higherframe rate images can be reconstructed using a sliding windowtechnique see Fig. 7. In the initial spiral work, mid-sagittal images ofthe vocal tract during speech were acquired at 9 fps and recon-structed using a sliding window at 24 fps [101], example videosacquired using this technique are available at http://sail.usc.edu/production/rtmri/jasa2004/. This protocol was also successfullyapplied used in a real time study of sound production by malebeatbox artist [158], movie files of different sounds are available athttp://sail.usc.edu/span/beatboxing/. While each sliding windowimage has the temporal resolution of the original acquisition, thetime difference between the start of each frame can be much less. Aspiral-sliding window technique [41] was also used to study thetiming differences betweenmotion of the tongue tip and soft palatefrom mid-sagittal images. Timing differences were demonstratedbetween the movement of the tongue and the palate, the durationof which was dependent on the position of a nasal consonantsound, [n] [159], in the word and the location of the stress in theword. In another study [160], a similar MRI protocol was applied toinvestigate articulatory setting in speech production at 22.4 fps.The study was performed on five healthy volunteers and showedsignificant differences between vocal tract postures adopted duringinter-speech pauses and those at absolute rest before speech.

In order to acquire spiral data at higher temporal resolution inthe mid-sagittal plane, studies have used a reduced FOV [104].Signal in the brain and posterior section of the neck is eliminatedusing saturation bands, preventing it from aliasing back into thereduced FOV. The data was acquired at 3 T, where signal to noiseratio is higher and an alternating TE was used to provide data forfield inhomogeneity corrections. The average acquired frame ratewas 21 fps (1.9 � 1.9 � 6.5 mm3 spatial resolution), including thetime for the saturation pulses, and reconstructed to 30 fps. Thistechnique was used to investigate the influence of the velophar-yngeal mechanism on the acoustic characteristics of a nasalisedconsonant in a dynamic setting.

An alternative way of accelerating spiral acquisitions is to useparallel imaging, which is more complex than for Cartesian data.Kim et al. [161] used a golden angle spiral trajectory [162], whichresults in interleaves with an almost equal radial spacing for any

Page 10: Speech MRI: Morphology and function - University of Malaya · PDF filefunction of the speech organs, including cancer [5,6], clefts of the lips and/or palate [4], laryngitis, vocal

Figure 7. Spiral acquisition with sliding window reconstruction. In this example k-space is fully sample every 24 interleaved spiral acquisitions. Using a sliding window of 4 spirals,it is possible to reconstruct 5 intermediate frames between each full k-space acquisition.

A.D. Scott et al. / Physica Medica 30 (2014) 604e618 613

number of interleaves. Images at multiple temporal resolutionswere reconstructed, albeit with lower SNR for higher frame rates.The final combined dataset had higher temporal resolution in moredynamic regions and lower temporal resolution, but better SNR instationary areas. In another study using accelerated radial imagingNiebergall et al. [163] have achieved 30 fps at 1.5 � 1.5 � 10 mm3

resolution for imaging the vocal tract in the mid-sagittal plane andthe larynx in the coronal plane. An example from this study isshown in the Supplementary data (file 4). In contrast to regularparallel imaging techniques, the coil sensitivity information wascalculated iteratively during the parallel imaging reconstruction[164]. Temporal and spatial filters were applied retrospectively[28]. They were able to demonstrate differences in tongue contoursbetween production of vowel sounds individually, within a singleword and within in a sentence. The results also demonstratednasalisation of a vowel sound (i.e. lowering of the velum) prior to anasal consonant.

Other recent work has focussed on obtaining real-time speechdata in multiple orientations [163,165,166]. This has been achievedusing three perpendicular slices, acquired as interleaved radialtrajectories, resulting in a frame rate of 156 ms at2.4� 2.4� 6 mm3. A real-time acquisition of three planes from Ref.[165] is shown in the Supplementary data (file 5). A reduced FOV(20 � 20 cm2) was enabled by using a custom receive coil (Fig. 8)and SNR is boosted because the interleaved slice ordering results ina longer effective TR and, therefore, increased recovery of longitu-dinal magnetisation. This was used to show temporal changes intongue grooving during natural speech. Similar temporal andspatial resolutions with perpendicular or even parallel slices couldbe achieved using segmented Cartesian techniques, hybrid EPI forexample, with parallel imaging. Alternatively, promising initial re-sults demonstrating high frame rates (20 fps, 2.2 � 2.2 � 8 mm3)with 5 parallel sagittal slices, obtained using an interleaved spiral eCartesian trajectory and an iterative reconstruction were recentlypresented [167]. Another group [168] used audio information toalign 2D dynamic data into 3D dynamic movies of vocal tractshaping. This technique allowed reconstruction of 3D vocal tractmovies with 2.4 � 2.4 � 3 mm3 spatial resolution and 78 mstemporal resolution. The same group [166] is working towards atrue real-time 3D technique using a stack of spiral trajectory andhave achieved 17 sagittal images (or kz phase encode steps) at2.4 � 2.4 � 3.0 mm3 resolution for a temporal resolution of 1 s.Substantial acceleration is expected with the inclusion of parallelimaging, and highly undersampled trajectories. Model basedmethods enable reconstruction from heavily undersampled databased on some necessary assumptions. Such an approach was

applied to dynamic 3D vocal tract imaging [169] using an under-sampled stack of spirals imaging sequence and reconstructionbased on a partially separable model [170]. The resultant imageshad a temporal resolution of 8.6 fps (roughly 8� acceleration), andan effective spatial resolution of 3 � 3 � 3.1 mm3.

Other considerations

Acoustic recording

Simultaneous audio recorded during dynamic acquisitions al-lows verification of the correct pronunciation, acoustic analysis onthe actual spoken sounds and audioevideo synchronisation.However, the loud volume of the MR scanner during imaging andthe presence of strong and time varying magnetic fields, with therequirement to isolate the MRI receive equipment from electro-magnetic interference, means that this is challenging. Standardelectrical microphones are generally unsuitable for these reasons,although electrical microphones were used in early work at lowfield [84].

As a result of the difficulties in recording sound during imaging,many studies, particularly for sustained phonation [32,46,48,60,69,78,124] have recorded sound after imaging outside thescanner. This can provide high quality recordings for acousticanalysis, but the recordings are difficult to synchronise with theimaging data and the phonation may vary.

It is standard for MRI scanners to be provided with an intercomsystem for communication with the patient between acquisitions.However, such systems are not designed for patient communica-tion during imaging and it is difficult to hear the patient speakduring many imaging sequences. Despite this, some of the earliestwork recorded speech via the intercom [108,171] but results areoften unsatisfactory [40]. Another workaround solution is totransfer the sound to a microphone in the scanner control roomusing plastic tubing and a cup held in front of the subject [34].

Many recent real-time studies use optical microphone basedsystems, which relay the audio signal from the scanner bore to thecontrol room using fibre optics. The microphone itself consists oftwo optical fibres and a membrane. The first fibre carries a sourcelight signal into the microphone where a membrane modulates theintensity of the light, which is carried back out of the scanner in thesecond fibre. The intensity of the light returned is dependent on thedisplacement of the membrane. A light source and detector/con-verter are located in the control room. The microphone can beconstructed to have a figure of eight type spatial response, so soundentering from both the front and back is significantly attenuated,

Page 11: Speech MRI: Morphology and function - University of Malaya · PDF filefunction of the speech organs, including cancer [5,6], clefts of the lips and/or palate [4], laryngitis, vocal

Figure 8. A 16-channel receive coil array for accelerated upper airway MRI at 3 T. (a) Photograph of the 16-channel coil array. The coil array can be held close to the face (b) or foldedup and back to permit patient entry and exit. (c) Illustration of coil sensitivity and the upper airway regions of interest (ROIs) used in the evaluation of SNR (1 e upper lip, 2 e lowerlip, 3 e front tongue, 4 e mid tongue, 5 e back tongue, 6 e palate, 7 e velum, and 8 e pharyngeal wall). Adapted from Kim et al. [174], Copyright Wiley-Liss, Inc.

A.D. Scott et al. / Physica Medica 30 (2014) 604e618614

whereas sound from only one direction is not. Such devices havebeen used alone [142], but more commonly with adaptive noisecancellation algorithms [105] to reduce the residual scanner noise.More advanced systems [104,155] use a second optical fibremicrophone to record a reference audio signal which is used toreduce the scanner noise.

The high audible noise levels during imaging also mean thatsubjects may find it difficult to hear their own speech. This can leadto louder and, therefore, unnatural speech. To reduce the effects ofthis, studies of sustained phonation have played target soundsduring imaging [84] and other work has provided audio feedback ofthe subject’s own voice [132]. Although, in the latter case, the la-tency of any noise cancellation algorithms used must beconsidered.

Synchronisation

As there is a finite reconstruction time for each MRI image and alatency associated with many noise cancellation algorithms, simplycapturing the video stream from the scanner graphics cardwith thesound from a fibre optic microphone system is usually inadequatefor analysis. Synchronising the recorded audio with video createdfrom the MRI images requires knowledge of the time each of theimages was acquired and the timing of the recorded audio. Syn-chronisation of real-time images with the audio can be performedusing the audible start and end time of the scanner noise withaccurate information on the sequence timings, from a scannersimulator [104]. Otherwise, it may be possible to record a timingtrigger signal from the scanner corresponding to known positionsin the image acquisition [155], which can be converted into anaudio trigger and recorded simultaneously with the speech audio ina separate channel [154] or used to trigger the start of the audiorecording [172]. Automatic synchronisation on the scanner hostcomputer has been performed using custom software written torun on the host and reconstruction computers [142,173].

Receive coil selection

The receive coil used for imaging has an important role indetermining the imaging field of view and has a large influence onthe image SNR. Phased array coils are standard and are a require-ment for parallel imaging techniques. Unless there is a specific

reason to use a birdcage coil (to enable transmit/receive forexample [104]) there is little reason not to use a phased array coil.While the coverage of a standard head coil may not extend inferi-orly enough to cover the vocal tract, except in children [33], headand neck or neurovascular arrays usually cover this region and areoften the receive coil of choice for the vocal tract [103,132,142].However, using a coil with a more localised sensitivity profile canincrease SNR or allow for a reduced FOV and, therefore, faster im-aging. Commercially available coils for carotid artery imaging havebeen used [59] to image the larynx, but have a low penetrationdepth so are not suitable for imaging the other deeper structures ofthe vocal tract. In order to improve sensitivity, combination of asmall flexible phase array coil was used with a bilateral array coilplaced on both sides of the neck. In fact the fastest real-time ac-quisitions [163] were obtained with similar coils.

In order to target specific regions of the vocal tract, purpose builtreceive coils have been used. Dedicated phased array coils havebeen developed for larynx imaging as an alternative to using ca-rotid artery coils [39,58]; they cover only the relevant area and canbe optimised for imaging at the depth of the larynx. Dedicated coilshave also been developed for imaging the oral and velopharyngealregions of the upper vocal tract [60,173,174]. While standard heador head and neck coils encompass large volumes and have coil-elements distributed around this volume, the custom coils tendto only cover the front of the head from the nose to the upper-neck.An example of such a coil designed by Kim et al. [174] andcomprising 16-coil elements is illustrated in Fig. 8. It allows for anincreased SNR when imaging the upper airways (see Fig. 8c) ofbetween 1.5 (posterior pharyngeal wall) and 8.8 (lower lip) forunaccelerated imaging when compared to a neurovascular arraycoil. However, such specialised coils are not commercially availableand must be designed and built in-house, requiring collaborationfrom the scanner manufacturer.

The effects of supine patient positioning

The large majority of MRI scanners are the horizontal bore typeand the patient lies positioned supine for vocal tract imaging.Whilethis scanner arrangement allows for high field strengths (typically1.5e7 T), the supine patient position means that gravity acts in theanterioreposterior direction rather than superioreinferior-direc-tionwhen the patient is upright. The effect of supine positioning on

Page 12: Speech MRI: Morphology and function - University of Malaya · PDF filefunction of the speech organs, including cancer [5,6], clefts of the lips and/or palate [4], laryngitis, vocal

A.D. Scott et al. / Physica Medica 30 (2014) 604e618 615

the positioning of the speech articulators is subject specific and,therefore, difficult to generalise [175,176]. Studies using ultrasound[175], electromagnetometry [177], X-ray microbeam [178] hori-zontal bore [130] and open MRI systems [176] have suggested thatthe predominant tongue motion is a displacement of the tonguebody towards the posterior pharyngeal wall in the supine position,although this is not universally true. This gravity induceddisplacement can be described by a subject-specific rigid bodytranslation [175]. However, there appear to be compensatorymechanisms for protecting the airway patency and allowingnormal speech. Acoustic analysis of supine vs. upright vowelsounds [175], found little difference between the formants pro-duced and the tongue motion for specific sounds was similar. Acomparison of velar positioning and LVP geometry during phona-tion [97] found little difference (although there were only foursubjects) between upright and supine positioning apart from asmall statistically significant reduction in velar height during pro-duction of/i/.

There are variations in vocal tract configuration between up-right and supine patient positioning and further use of open MRIscanners that allow scanning in this position should be made toquantify these variations. However, the differences seem limitedand [178] the compensatory mechanisms, in healthy subjects atleast, appear to be sufficiently effective to permit meaningfulspeech data to be acquired in the supine position.

Conclusions: current research, future directions andapplications

Despite an increasing interest in the field it is undeniable thatMRI has yet to fullfill its full potential in the study of speech. One ofthe current limitations is the limited coverage in the through-planedirection (typically 1e3 slices) coupled with a temporal resolutionthat is still too low for many clinical applications. Much of thecurrent research focuses on addressing those issues. For example, arecent novel method for rapid tracking of soft palate motion duringspeech using navigators was demonstrated for the first time [179].Pencil beam navigators achieved 37 lines per second and a novelturbo navigator technique 62 lines per second; both exceedingcurrent maximum 2D real-time imaging frame rates. Otherresearch focuses on exploiting spatialetemporal correlationswithin the data to accelerate dynamic MRI by use of undersampledacquisition schemes and model-based reconstructions. Techniquessuch as ket SENSE [180] and ket FOCUSS [181] have beendemonstrated in the vocal tract, the latter appearing to be verypromising for speech imaging in clinical practice [182].

Recently, attempts have been made to provide 3D visualisationof the whole vocal tract in real time by applying model based re-constructions [170] to highly undersampled stack of spirals data[169] or via a 3D volumetric navigation strategy which can achieve100 fps [183].

The acquisition of a larger amount of data, necessary at hightemporal resolutions, will inevitably require the use of fastercomputing as the images will need to be reconstructed on-the-flyto become an integral part of a clinical examination. Graphicsprocessing unit (GPU) computing could provide an answer to thisissue and GPU computing has already shown promising results forcomputationally intensive reconstructions in cardiac MRI[180,184e187].

Despite current hardware and software limitations, speech MRIis advancing in the clinical field. In a recent study using clinicalavailable techniques [188], both children and adults were imagedand a full speech examination was carried out, lead by a speechtherapist. However, audioevideo synchronisation was performed

off-line and a better integration with the current scanner softwarewould require an intensive collaboration with the manufacturer.

One key advantage of using MRI in clinical practice would be tocombine anatomical and dynamic information. However, a largenumber of patients requiring speech assessment have orthodonticdevices that could potentially render real-time speech examinationdifficult or impossible [189]. This highlights the need for an inte-grated approach if MRI becomes a routine part of the managementof those patients. Furthermore, themajority of patients with speechdisorders are children of a young age. In our experience, althoughchildren tolerate the MRI examination nearly as well as adults, ef-forts have to be made to complete a full examination in a relativelyshort period of time, ideally less than half an hour. Motion artefactsare more prevalent in younger patients and it would be valuable tointegrate motion-correction schemes such as that described byAnderson et al. [190] in the acquisition. Moving towards true 3Dacquisitions would also limit the time required for planning. This isimportant as the potential range of clinical applications for real-time vocal tract MR imaging is increasing. Apart from the cleftpalate, dynamic vocal tract imaging in sleep apnoea [191] andparkinsonism [192] have been reported.

Finally, beyond clinical applications there is large area of speechand language research [193,194] where the addition of real-timespeech MRI to fMRI studies would provide valuable information.Such studies may provide, for example, a better understanding ofthe mechanisms of phonation; the processes involved in learning asecond language; or the mechanisms of singing [159].

Acknowledgements

ADS is currently funded by the Royal Brompton HospitalNational Institute of Health Research Cardiovascular BiomedicalResearch Unit. MEM is partly (20%) funded by the Barts and TheLondon National Institute for Health Research CardiovascularBiomedical Research Unit. We are grateful to Andreia Freitas forproof-reading the final version of the manuscript.

Appendix A. Supplementary data

Supplementary data related to this article can be found at http://dx.doi.org/10.1016/j.ejmp.2014.05.001.

References

[1] Mathieson L. The voice and its disorders. London: Whurr Publishers Ltd;2001.

[2] LaCroix A. Speech production e physics models and prospective applications.Proceed Image and Signal Processing Analysis e ISPA; 2001.

[3] Fry D. The physics of speech. Cambridge: Cambridge University Press; 1979.[4] Wyatt L, Sell D, Russel J, Harding A, Harland K, Albery F. Cleft palate

dissected: a review of current knowledge and analysis. Br J Plast Surg1996;49:143e9.

[5] Pauloski B, Rademaker A, McConnel F, Heiser M, Cardinale S, Shedd D, et al.Speech and swallowing function after anterior tongue and floor of mouthresection with distal flap reconstruction. J Speech Lang Hear Res 1993;36:267e76.

[6] Pauloski B, Rademaker A, Logemann J, Colangelo L. Speech and swallowing inirradiated and nonirradiated postsurgical and cancer patients. OtolaryngolHead Neck Surg 1998;118:616e25.

[7] Zhang Y, Jiang J. Chaotic vibrations of vocal fold model with a unilateralpolyp. J Acoust Soc Am 2004;115:1266e9.

[8] Childers D, Hicks D, Moore G, Alsaka Y. A model for vocal fold vibratorymotion, contact area, and the electroglottogram. J Acoust Soc Am 1986;80:1309e20.

[9] Ludlow C, Connor N, Bassich C. Speech timing in Parkinson’s and Hunting-ton’s disease. Brain Lang 1987;32:195e204.

[10] Stemple J, Glaze L, Gerdema B. Clinical voice pathology: theory and man-agement. 3rd ed. 2000.

[11] Fletcher S. Theory and instrumentation for quantitative measurement ofnasality. In: Proceedings of International Congress on Cleft Palate, Houston;1970. pp. 601e10.

Page 13: Speech MRI: Morphology and function - University of Malaya · PDF filefunction of the speech organs, including cancer [5,6], clefts of the lips and/or palate [4], laryngitis, vocal

A.D. Scott et al. / Physica Medica 30 (2014) 604e618616

[12] Childers D, Hicks D, Moore G, Eskenazi L, Lalwani A. Electroglottography andvocal fold physiology. J Speech Lang Hear Res 1990;33:245e54.

[13] Wendler J. Stroboscopy. J Voice 1992;6:149e54.[14] Veis S, Logemann J, Colangelo L. Effects of three techniques on maximum

posterior movement of the tongue base. Dysphagia 2000;15:142e5.[15] Havstam C, Lohmander A, Persson C, Dotevall H, Lith A, Lilja J. Evaluation of

VPI-assessment with videofluoroscopy and nasoendoscopy. Br J Plast Surg2005;58:922e31.

[16] Shprintzen RJ, Golding-Kushner KJ. Evaluation of velopharyngeal insuffi-ciency. Otolaryngol Clin North Am 1989;22:519e36.

[17] Geise R, The AAPM/RSNA Physics Tutorial for Residents. Fluoroscopy:recording of fluoroscopic images and automatic exposure control. Radio-Graphics 2001;21:227e37.

[18] Stringer D, Witzel M. Comparison of multi-view videofluoroscopy andnasopharyngoscopy in the assessment of velopharyngeal insufficiency. CleftPalate J 1989;26:88e92.

[19] Golding-Kushner KJ, Argamaso RV, Cotton RT, Grames LM, Henningsson G,Jones DL, et al. Standardization for the reporting of nasopharyngoscopy andmultiview videofluoroscopy: a report from an International Working Group.Cleft Palate J 1990;27:337e47.

[20] Jallon J, Berthommier F. A semi-automatic method for extracting vocal-tractmovements from X-ray films. Speech Commun 2009;51:97e104.

[21] Cohn E, Rood S, McWilliams B, Skolnick L, Abdelmalek L. Barium sulphatecoating of the nasopharynx in lateral view videofluoroscopy. Cleft Palate J1984;21:7e17.

[22] Crawley M, Savage P, Oakley F. Patient and operator dose during fluoroscopicexamination of swallow mechanism. Br J Radiol 2004;77:654e66.

[23] Zammit-Maempel I, Chappel C, Leslie P. Radiation dose in videofluoroscopicswallow studies. Dysphaxia 2007;22:13e7.

[24] Wright R, Boyd C, Workman A. Radiation doses to patients during pharyn-geal videofluoroscopy. Dysphagia 1998;13:133e45.

[25] Stone M. A guide to analysing tongue motion from ultrasound images. ClinLinguist Phon 2004;19:455e500.

[26] Rybicki F, Otero H, Steigner M, Vorobiof G, Nallamshetty L, Misouras D, et al.Initial evaluation of coronary images from 320-detector raw computed to-mography. Int J Cardiovasc Imaging 2008;24:535e46.

[27] Flohr T, McCollough C, Bruder H, Petersilka M, Gruber K, Suss C, et al. Firstperformance evaluation of a dual-source CT (DSCT) system. Eur Radiol2006;16:256e68.

[28] Uecker M, Zhang S, Voit D, Karaus A, Merboldt KD, Frahm J. Real-time MRI ata resolution of 20 ms. NMR Biomed 2010;23:986e94.

[29] Atick B, Bekerecioglu M, Tan O, Etlik O, Davran R, Arslam H. Evaluation ofdynamic magnetic resonance imaging in assessing velopharyngeal insuffi-ciency during phonation. J Craniofac Surg 2009;19:566e72.

[30] Soquet A, Lecuit V, Metens T, Demoloin D. Mid-sagittal cut to area functiontransformations: direct measurements of mid-sagittal distance and area withMRI. Speech Commun 2002;36:169e80.

[31] Sulter A, Miller D, Wolf R, Schutte H, Wit H, Mooyaart E. On the relationbetween the dimensions and resonance characteristics of the vocal tract: astudy with MRI. Magn Reson Imaging 1992;10:365e73.

[32] Greenwood AR, Goodyear CC, Martin PA. Measurements of vocal tract shapesusing magnetic resonance imaging. Commun Speech Vision IEE Proc I1992;139:553e60.

[33] Drissi C, Mitrofanoff M, Talandier C, Falip C, Le Couls V, Adamsbaum C.Feasibility of dynamic MRI for evaluating velopharyngeal insufficiency inchildren. Eur Radiol 2011;21:1462e9.

[34] Silver A, Nimku K, Ashland J, Satrajtt S, vanderKouwe A, Brigger M, et al. Cinemagnetic resonance imaging with simultaneous audio to evaluate pediatricvelopharyngeal insufficiency. Arch Otolaryngol Head Neck Surg 2011;137:258e63.

[35] Masaki S, Tiede M, Honda K, Shimada Y, Fujimoto I, Nakamura Y, et al. MRIbased speech production study using a synchronized sampling method.J Acoust Soc Jpn 1999;20:375e9.

[36] Shadle C, Mohammad M, Carter J, Jackson P. Dynamic magnetic resonanceimaging: new tools for speech research. In: 14th Int Cong Phon Sci; 1999.pp. 623e6.

[37] Kane AA, Butman JA, Mullick R, Skopec M, Choyke P. A new method for thestudy of velopharyngeal function using gated magnetic resonance imaging.Plast Reconstr Surg 2002;109:472e81.

[38] Shinagawa H, Ono T, Honda E, Masaki S, Shimada Y, Fujimoto I, et al. Dy-namic analysis of articulatory movement using magnetic resonance movies:methods and implications in cleft lip and palate. Cleft Palate Craniofac J2005;42:225e30.

[39] Kim H, Honda K, Maeda S. Stroboscopic-cine MRI study of the phasing be-tween the tongue and the larynx in the Korean three-way phonationcontrast. J Phon 2005;33:1e26.

[40] Ventura SM, Freitas DR, Tavares JM. Toward dynamic magnetic resonanceimaging of the vocal tract during speech production. J Voice 2011;25:511e8.

[41] Byrd D, Tobin S, Bresh E, Narayanan S. Timing effects of syllable structure andstress on nasals: a real-time MRU examination. J Phon 2009;37:97e110.

[42] Perry JL, Kuehn DP, Wachtel JM, Bailey JS, Luginbuhl LL. Using magneticresonance imaging for early assessment of submucous cleft palate: a casereport. Cleft Palate Craniofac J 2011;49:35e41.

[43] Tian W, Redett R. New velopharyngeal measurements at rest and duringspeech: implications and applications. J Craniofac Surg 2009;20:532e9.

[44] Ahmad MD, Morin A, Cotton F. Dynamic MRI of larynx and vocal fold vi-brations in normal phonation. J Voice 2009;23:5.

[45] Kuehn DP, Ettema SL, Goldwasser MS, Barkmeier JC, Wachtel JM. Magneticresonance imaging in the evaluation of occult submucous cleft palate. CleftPalate Craniofac J 2001;38:421e31.

[46] Takemoto H, Honda K, Masaki S, Shimada Y, Fujimoto I. Measurement oftemporal changes in vocal tract area function from 3D cine-MRI data.J Acoust Soc Am 2006;119:1037e49.

[47] Inoue MS, Ono T, Honda E, Kurabayashi T, Ohyama K. Application of mag-netic resonance imaging movie to assess articulatory movement. OrthodCraniofac Res 2007;9:157e62.

[48] Clement P, Hans S, Hartl DM, Maeda S, Vaissiere J, Brasnu D. Vocal tract areafunction for vowels using three-dimensional magnetic resonance imaging. Apreliminary study. J Voice 2007;21:522e30.

[49] Ha S, Kuehn DP, Cohen M, Alperin N. Magnetic resonance imaging of thelevator veli palatini muscle in speakers with repaired cleft palate. Cleft PalateCraniofac J 2007;44:494e505.

[50] Lufkin R, Hanafee W. Application of surface coils to MR anatomy of thelarynx. Am J Radiol 1985;145:483e9.

[51] Rokkaku M, Hashimoto S, Imaizumi S, Niimi S, Kiritani S. Measurements ofthe three-dimensional shape of the vocal tract based on the magneticresonance imaging technique. Ann Bull RILP 1986;20:47e54.

[52] Baer T, Gore JC, Boyce S, Nye PW. Application of MRI to the analysis of speechproduction. Magn Reson Imaging 1987;5:1e7.

[53] Becker M, Burkhardt K, Dulguerov P, Abdelkarim A. Imaging of larynx andhypopharynx. Eur J Radiol 2008;66:460e79.

[54] Titze I. Principles of voice production. Prentice Hall; 1994.[55] Pickup B, Thomson S. Flow-induced vibratory response of idealized versus

magnetic resonance imaging-based synthetic vocal fold models. J Acoust SocAm 2010;128:124e9.

[56] Bryant N, Gracco C, Sasaki T, Vining E. MRI evaluation of vocal fold paralysisbefore and after type I thyroplasty. Laryngoscope 1996;106:1386e92.

[57] Frauenrath T, Goemmel A, Butenweger C, Otten M, Niendorf T. 3D Map-ping of the vocal fold geometry during articulatory maneuvers using ul-trashort echo time imaging at 3 T. Proc Intl Soc Magn Reson Med 2010;18:2413.

[58] Barral J, Santos J, Damrose E, Fischbeim N, Nishimura D. Real-time motioncorrection for high-resolution larynx imaging. Magn Reson Med 2011;66:174e9.

[59] Schlamann M, Lehnerdt G, Maderwald S, Ladd S. Dynamic MRI of the vocalcords using phased-array coils: a feasibility study. Indian J Radiol Imaging2009;19:127e30.

[60] Lakshminarayanan A, Lee S, McCutcheon M. MR imaging of the vocal tractduring vowel production. J Magn Reson Imaging 1991;1:71e6.

[61] Moore C. The correspondence of vocal tract resonance with volumes ob-tained from magnetic resonance images. J Speech Lang Hear Res 1992;35:1023e47.

[62] Story B, Titze I, Hoffman E. Vocal tract area functions from magnetic reso-nance imaging. J Acoust Soc Am 1996;100:537e54.

[63] Yang C, Kasuya H. Speaker individualities of vocal tract shapes of Japanesevowels by magnetic resonance imaging. In: Spoken Language, 1996, ICSLP96, Proceedings; 1996. pp. 950e3.

[64] Fant G. Acoustic theory of speech production. Hague: Mouton; 1960.[65] Demolin D, Hassid S, Metens T, Soquet A. Real-time MRI and articulatory

coordination in speech. C R Soc Biol 2002;325:547e56.[66] Narayanan SS, Alwan AA, Haker K. An articulatory study of fricative conso-

nants using magnetic resonance imaging. J Acoust Soc Am 1995;98:1325e47.

[67] Yang C, Members H, Nonmember S, Nonmember T. Accurate measurementof vocal tract shape using magnetic resonance imaging. Electron CommunJpn 1995;78:63e73.

[68] Ventura S, Freitas D, Ramos I, Tavares JMRS. Morphologic differences in thevocal tract resonance cavities of voice professionals: an MRI-based study.J Voice 2013;27:132e40.

[69] Story B, Titze I, Hoffman E. Vocal tract area functions for an adult femalespeaker based on volumetric imaging. J Acoust Soc Am 1998;104:471e87.

[70] Narayanan S, Alwan A, Haker K. Towards articulatory-acoustic models forliquid approximants based on MRI and EPG. Part I. The laterals. J Acoust SocAm 1997;101:1064e77.

[71] Alwan A, Narayanan S, Haker K. Towards articulatory-acoustic models forliquid approximants based on MRI and EPG data. Part II. The rhotics. J AcoustSoc Am 1997;101:1078e89.

[72] Narayanan S, Byrd D, Kaun A. Geometry, kinematics, and acoustic of Tamilliquid consonants. J Acoust Soc Am 1999;106:1993e2007.

[73] Martins P, Carbone I, Pinto A, Silva A, Teixeira A. European Portuguese MRIbased speech production studies. Speech Commun 2008;50:925e52.

[74] Story B, Titze I, Hoffman E. The relationship of vocal tract shape to threevoice qualities. J Acoust Soc Am 2001;109:1651e67.

[75] Story B. Comparison of magnetic resonance imaging-based vocal tract areafunctions obtained from the same speaker in 1994 and 2002. J Acoust Soc Am2008;123:327e35.

[76] FitchW,Gield J.Morphology and development of the human vocal tract: a studyusing magnetic resonance imaging. J Acoust Soc Am 1999;106:1511e22.

[77] Dang J, Honda K, Suzuki H. Acoustic characteristics of the piriform fossa inmodels and humans. J Acoust Soc Am 1996;101:456e65.

Page 14: Speech MRI: Morphology and function - University of Malaya · PDF filefunction of the speech organs, including cancer [5,6], clefts of the lips and/or palate [4], laryngitis, vocal

A.D. Scott et al. / Physica Medica 30 (2014) 604e618 617

[78] Dang J, Honda K, Suzuki H. Morphological and acoustical analysis of the nasaland the paranasal cavities. J Acoust Soc Am 1994;98:2088e100.

[79] Rong P, Kuehn D. The effect of oral articulation on the acoustic characteristicsof nasalized vowels. J Acoust Soc Am 2010;127:2543e53.

[80] Woo J, Murano E, Stone M, Prince J. Reconstruction of high resolution tonguevolumes from MRI. IEEE Trans Biomed Eng 2012;59:3511e34.

[81] Zhou X, Woo J, Stone M, Prince J, Epsy-Wylson C. Improved vocal tractreconstruction and modelling using an image super resolution technique.J Acoust Soc Am Express Lett 2013;133:439e45.

[82] Narayanan S, Alwan A. A nonlinear dynamical systems analysis of fricativeconsonants. J Acoust Soc Am 1995;97:2511e24.

[83] Badin P, Bailly G, Reveret L. Three-dimensional linear articulatorymodelling oftongue, lips and face, based onMRI and video images. J Phon 2002;30:533e53.

[84] Baer T, Gore J, Gracco L, Nye P. Analysis of vocal tract shape and dimensionsusing magnetic resonance imaging: vowels. J Acoust Soc Am 1991;90:799e829.

[85] Takemoto H, Kitamura T, Nishimoto H, Honda K. A method of tooth super-imposition and MRI data for accurate measurement of vocal tract shape anddimensions. Acoust Sci Technol 2004;25:468e74.

[86] Engwall O. Combining MRI, EMA and EPG measurements in a three-dimensional tongue model. Speech Commun 2003;41:303e29.

[87] Perkell J, Cohen M, Svirsky M, Garabieta M, Jackson M. Electromagneticmidsagittal articulometer systems for transducing speech articulatorymovements. J Acoust Soc Am 1992;92:3078e96.

[88] Engwall O. A revisit to the application of MRI to the analysis of speechproduction e testing our assumptions. Proc Intl Seminar Speech Prod2003;6:43e8.

[89] Birkholz P. Modelling consonant-vowel coarticulation for articulatory speechsynthesis. PLoS One 2013;8:17.

[90] Kim Y, Narayanan S, Nayak K. Ultra-high resolution upper airway MRI withcompress sensing and parallel imaging. Proc Intl Soc Magn Reson Med2009;17:382.

[91] Lustig M, Donoho D, Pauly J. Sparse MRI: the application of compressedsensing for rapid MR imaging. Magn Reson Med 2007;58:1182e95.

[92] Kim Y, Narayanan S, Nayak K. Accelerated three-dimensional upper airwayMRI using compress sensing. Magn Reson Med 2009;61:1434e40.

[93] Ettema S, Kuehn D, Perlman A, Alperin N. Magnetic resonance imaging of thelevator veli palatini muscle during speech. Cleft Palate Craniofac J 2002;39:130e44.

[94] Kuehn DP, Ettema SL, Goldwasser MS, Barkmeier JC. Magnetic resonanceimaging of the levator veli palatini muscle before and after primary pala-toplasty. Cleft Palate Craniofac J 2004;41:584e92.

[95] Perry JL, Kuehn DP, Sutton BP. Morphology of the levator veli palatini muscleusing magnetic resonance imaging. Cleft Palate Craniofac J 2011;22:499e503.

[96] Mugler J, Bao S, Mulkern R, Guttman C, Robertson R, Jolesz F, et al. Optimizedsingle-slab three-dimensional spin-echo MR imaging of the brain. Radiology2000;216:891e9.

[97] Perry J. Variations in velopharyngeal structures between upright and supinepositions using upright magnetic resonance imaging. Cleft Palate Craniofac J2011;48:123e33.

[98] Yamawaki Y, Nishimura Y, Suzuki Y, Sawada M, Yamawaki S. Rapid magneticresonance imaging for assessment of velopharyngeal muscle movement onphonation. Am J Otol 1997;18:210e3.

[99] Tian W, Yin H, Li Y, Zhao S, Zheng Q, Shi B. Magnetic resonance imagingassessment of velopharyngeal structures in Chinese children after primarypalatal repair. J Craniofac Surg 2010;21:568e77.

[100] Perry J, Kuehn D, Sutton B, Gamge J. Sexual dimorphism of the levator velipalatini muscle: an imaging study. Cleft Palate Craniofac J; 2013. http://dx.doi.org/10.1597/12-128.

[101] Narayanan S, Nayak K, Lee S, Sethy A, Byrd D. An approach to real-timemagnetic resonance imaging for speech production. J Acoust Soc Am2004;115:1771e6.

[102] Bae Y, Kuehn DP, Conway CA, Sutton BP. Real-time magnetic resonanceimaging of velopharyngeal activities with simultaneous speech recordings.Cleft Palate Craniofac J 2011;48:695e707.

[103] Scott AD, Boubertakh R, Birch MJ, Miquel ME. Towards clinical assessment ofvelopharyngeal closure using MRI: evaluation of real-time MRI sequences at1.5 T and 3 T. Br J Radiol 2012;85:1083e92.

[104] Sutton BP, Conway CA, Bae Y, Seethamraju R, Kuehn DP. Faster dynamicimaging of speech with field inhomogeneity corrected spiral fast low angleshot (FLASH) at 3 T. J Magn Reson Imaging 2010;32:1228e37.

[105] NessAiver MS, Stone M, Parthasarathy V, Kahana Y, Paritsky A. Recordinghigh quality speech during tagged cine-MRI studies using a fiber opticmicrophone. J Magn Reson Imaging 2006;23:92e7.

[106] Larkman D, Nunes R. Parallel magnetic resonance imaging. Phys Med Biol2007;52:15e55.

[107] Deshmane A, Gulani V, Griswold M, Seiberlich N. Parallel MR imaging.J Magn Reson Im 2012;36:55e77.

[108] Mohammad M, Moore E, Carter J, Shadle C, Gunn S. Using MRI to image themoving vocal tract during speech. In: Proc 5th Eurospeech Conf, vol. 4; 1997.pp. 2027e31.

[109] Sato-Wakabayashi M, Inoue-Arai M, Ono T, Honda E, Kurabayashi T,Morlyama K. Combined fMRI and MRI movie in the evaluation of articulationin subjects with and without cleft lip and palate. Cleft Palate Craniofac J2008;45:309e14.

[110] Zerhouni E, Parish D, Rogers W, Yang A, Shapiro E. Human heart tagging withMR imaging e a method for noninvasive assessment of myocardial motion.Radiology 1988;169:59e64.

[111] Axel L, Doherty L. MR imaging of motion with spatial modulation ofmagnetization. Radiology 1989;171:841e5.

[112] Ibrahim E. Myocardial tagging by cardiovascular magnetic resonance: evo-lution of techniques-pulse sequences, analysis, algorithms, and applications.J Cardiovasc Magn Reson 2011;13:1e40.

[113] Simpson R, Keegan J, Firmin D. MR assessment of regional myocardial me-chanics. J Magn Reson Imaging 2012;37:576e99.

[114] Niitsu M, Campeau N, Holsinger-Bampton A. Tracking motion with taggedrapid gradient-echo magnetization prepared MR imaging. J Magn ResonImaging 1992;2:155e63.

[115] Kumada M, Niitsu M, Niimi S, Hirose H. A study on the inner structure of thetongue in the production of the 5 Japanese vowels by tagging snapshot. MRI.Ann Bull RILP 1992;26:1e12.

[116] Ryf S, Kissinger K, Spiegel M, Bonert P, Manning W, Boesinger P, et al. SpiralMR myocardial tagging. Magn Reson Med 2004;51:237e42.

[117] Peters D, Epstein F, McVeigh E. Myocardial wall tagging with undersampledreconstruction. Magn Reson Med 2001;45:562e77.

[118] Guttman M, Prince J, McVeigh E. Tag and contour detection in tagged MRimages of the left ventricle. IEEE Trans Med Imaging 1994;13:74e88.

[119] Prince J, McVeigh E. Motion estimation from tagged MR image sequences.IEEE Trans Med Imaging 1992;11:238e49.

[120] Zhang S, Douglas M, Yaroslavsky L, Summers R, Dilsizian V, Fananapazir L,et al. A Fourier based algorithm for tracking SPAMM tags in gated magneticresonance cardiac images. Med Phys 1996;23:1359e69.

[121] Kumada M, Niitsu M, Niimi S, Hirose H, Itai Y. A study on the inner structureof the tongue for production of 5 Japanese vowels by tagging snapshot MRI;a second report. Ann Bull RILP 1993;27:1e12.

[122] Napadow V, Chen Q, Weden V, RJ RG. Intramural mechanisms of the humantongue in associationwith physiological deformations. J Biomech 1999;32:1e13.

[123] Niimi S, Kumada M, Niitsu M. Functions of tongue related muscle duringproduction of five Japanese vowels. Ann Bull RILP 1994;28:33e40.

[124] Stone M, Davis EP, Douglas AS, NessAiver M, Gullapalli R, Levine WS, et al.Modeling the motion of the internal tongue from tagged cine-MRI images.J Acoust Soc Am 2001;109:2974e82.

[125] Liu X, Stone M, Prince J. Tracking tongue motion in three dimension usingtagged MR images. In: Proceedings IEEE International Symposium onBiomedical Imaging; 2006. pp. 1372e5.

[126] Parthasarathy V, Prince JL, Stone M, Murano EZ, Nessaiver M. Measuringtongue motion from tagged cine-MRI using harmonic phase (HARP) pro-cessing. J Acoust Soc Am 2007;121:491e504.

[127] Stone M, Liu X, Chen H. A preliminary application of principal componentsand cluster analysis to internal tongue deformation patterns. ComputMethods Biomech Biomed Engin 2010;13:493e502.

[128] Buecker A, Adam G, Neuerburg JM, Glowinski A, van Vaals JJ, Guenther RW.MR-guided biopsy using a T2-weighted single-shot zoom imaging sequence(local look technique). J Magn Reson Im 1998;8:955e9.

[129] Beer AJ, Hellerhoff P, Zimmermann A, Mady K, Sader R, Rummeny EJ, et al.Dynamic near-real-time magnetic resonance imaging for analyzing thevelopharyngeal closure in comparison with videofluoroscopy. J Magn ResonIm 2004;20:791e7.

[130] Engwall O. From real-time to 3D tongue movements, vol. 4. ICSLP; 2004.pp. 1109e12.

[131] Feinberg D, Hale J, Watts J, Kaufman L, Mark A. Halving MR imaging time byconjugation: demonstration at 3.5 kG. Radiology 1986;161:527e31.

[132] Echternach M, Sunberg J, Markl M, Richter B. Professional opera tenors’ vocaltract configurations in registers. Folia Phoniatr Logop 2010;62:278e87.

[133] Jager L, Gunther E, Gauger J, Reiser M. Fluoroscopic MR of the pharynx inpatients with obstructive sleep apnea. Am J Neuroradiol 1998;19:1205e14.

[134] Anagnostara A, Stoeckli S, Weber OM, Kollias SS. Evaluation of theanatomical and functional properties of deglutition with various kinetichigh-speed MRI sequences. J Magn Reson Im 2001;14:194e9.

[135] Razavi R, Hill DL, Keevil SF, Miquel ME, Muthurangu V, Hegde S, et al. Cardiaccatheterisation guided by MRI in children and adults with congenital heartdisease. Lancet 2003;362:1877e82.

[136] Sievers B, Schrader S, Hunold P, Barkhausen J, Erbel R. Free breathing 2Dmulti-slice real-time gradient-echo cardiovascular magnetic resonance im-aging: impact on left ventricular function measurements compared withstandard multi-breath hold 2D steady-state free precession imaging. ActaCardiol 2011;66:489e97.

[137] Frahm J, Haase A, Matthaei D. Rapid NMR imaging of dynamic processesusing the FLASH technique. Magn Reson Med 1986;3:321e7.

[138] Sekihara K. Steady-state magnetizations in rapid NMR imaging using smallflip angles and short repetition intervals. IEEE Trans Med Imaging 1987;6:157e64.

[139] Scheffler K, Lehnhardt S. Principles and applications of balanced SSFP tech-niques. Eur Radiol 2003;13:2409e18.

[140] Oppelt A, Graumann R, Barfuß H, Fischer H, Hartl W, S W. FISPda new fastMRI sequence. Electromedica 1986;54:15e8.

[141] Mady K, Sadler R, Zimmermann A, Hoole P, Beer A, Zeihofer H, et al.Assessment of consonant articulation in glossectomee speech by dynamicMRI. In: Proceed 7th International Conference on Spoken Language Pro-cessing, ICSLP 2002; 2002. pp. 961e4.

Page 15: Speech MRI: Morphology and function - University of Malaya · PDF filefunction of the speech organs, including cancer [5,6], clefts of the lips and/or palate [4], laryngitis, vocal

A.D. Scott et al. / Physica Medica 30 (2014) 604e618618

[142] Maturo S, Silver A, Nimkin K, Sagar P, Ashland J, vanderKouwe A, et al. MRIwith synchronized audio to evaluate velopharyngeal insufficiency. CleftPalate Craniofac J 2012;49:761e3.

[143] Akin E, Sayin M, Karacay S, Bulakbasi N. Real-time balanced turbo field echocine-magnetic resonance imaging evaluation of tongue movements duringdeglutition in subjects with anterior open bite. Am J Orthod DentofacialOrthop 2006;129:24e8.

[144] Dellatre B, Heidem N, Crowe L. Spiral demystified. Magn Reson Imaging2010;28:862e81.

[145] Noll D, Pauly J, Meyer C, Nishimura D, Macovski A. De-blurring for non-2DFourier transform magnetic resonance imaging. Magn Reson Med 1992;25:319e34.

[146] Rasche V, Holz D, Proksa R. MR fluoroscopy using projection reconstructionmulti-gradient-echo (prMGE) MRI. Magn Reson Med 1999;42:324e34.

[147] Peters D, Korosec F, Grist T, Block W, Holden J, Vigen K, et al. Undersampledprojection reconstruction applied to MR angiography. Magn Reson Med2000;43:91e102.

[148] Larson A, Simonetti O, Li D. Coronary MRA with 3D undersampled projectionreconstruction TrueFISP. Magn Reson Med 2002;48:594e602.

[149] Crary MA, Kotzur IM, Gauger J, Gorham M, Burton S. Dynamic magneticresonance imaging in the study of vocal tract configuration. J Voice 1996;10:378e88.

[150] Echternach M, Sunberg J, Markl M, Schumaker M, Richter B. Vocal tract infemale registers e a dynamic real-time MRI study. J Voice 2010;24:133e40.

[151] Echternach M, Sunberg J, Markl M, Richter B. Vocal tract configurations inmale Alto register functions. J Voice 2010;25:670e7.

[152] Breyer T, Echternach M, Arndt S, Richer B, Speck O, Schumaker M, et al.Dynamic magnetic resonance imaging of swallowing and laryngeal motionusing parallel imaging at 3 T. Magn Reson Imaging 2009;27:48e54.

[153] Griswold M, Jakob P, Heidemann R, Nittka M, Jellus V, Wang J, et al.Generalized autocalibrating partially parallel acquisitions (GRAPPA). MagnReson Med 2002;47:1202e10.

[154] Scott A, Boubertakh R, Birch M, Miquel M. Adaptive averaging applied todynamic imaging of the soft palate. Magn Reson Med 2012;70:865e74.

[155] Bresch E, Adams J, Pouzer A, Lee S, Byrd D, Narayanan S. Semi-automaticprocessing of real-time MR image sequences for speech production studies.Proc Interspeech; 2006:427e35.

[156] Lee S, Bresch E, Narayanan S. An exploratory study of emotional speechproduction using functional data analysis techniques. Proc Interspeech;2006:11e7.

[157] Sutton B, Tsao J, Shinagawa H, Kuehn D. Dynamic imaging of muscles duringspeech using interleaved spiral FLASH. Proc Intl Soc Magn Reson Med2006;14:3377.

[158] Proctor M, Bresch E, Byrd D, Nayak K, Narayanan S. Paralinguistic mechanismof production in human “beatboxing”: a real-time magnetic resonance im-aging study. J Acoust Soc Am 2013;133:1043e54.

[159] Miller N, Gregory J, Semple S, Aspden R, Stollery P, Gilbert F. The effects ofhumming and pitch on craniofacial and craniocervical morphologymeasured using MRI. J Voice 2012;26:90e102.

[160] Ramanarayanan V, Goldstein L, Byrd D, Narayanan S. An investigation ofarticulatory setting using real-time magnetic resonance imaging. J AcoustSoc Am 2013;134:510e2.

[161] Kim YC, Narayanan SS, Nayak KS. Flexible retrospective selection of temporalresolution in real-time speech MRI using a golden-ratio spiral view order.Magn Reson Med 2011;65:1365e71.

[162] Winkelmann S, Schaeffer T, Koehler T, Eggers H, Doessel O. An optimal radialprofile order on the golden ratio for time-resolved MRI. IEEE Trans MedImaging 2007;26:68e77.

[163] Niebergall A, Zhang S, Kunay E, Keydana G, Job M, Uecker M, et al. Real-timeMRI of speaking at a resolution of 33 ms: undersampled radial FLASH withnonlinear inverse reconstruction. Magn Reson Med 2013;69:477e85.

[164] Uecker M, Hohage T, Block K, Frahm J. Image reconstruction by regularizednonlinear inversion e joint estimation of coil sensitivities and image content.Magn Reson Med 2008;60:674e83.

[165] Kim Y, Proctor M, Shrikanth S, Narayanan S, Nayak K. Improved imaging oflingual articulation using real-time multislice MRI. J Magn Reson Im2012;35:943e9.

[166] Zhu Y, Kim Y, Proctor M, Narayanan S, Nayak K. Towards dynamic 3D MRI ofspeech. Proc Intl Soc Magn Reson Med 2012;20:294.

[167] Fu M, Christodoulou AG, Naber AT, Kuehn DP, Liang Z-P, Sutton BP. High-frame-rate multislice speech imaging with sparse sampling of (k,t)-space.Proc Intl Soc Magn Reson Med 2012;20:12.

[168] Zhu Y, Toutios A, Narayanan S, Nayak K. Faster 3D vocal tract real-time MRIusing constrained reconstruction. Proceed Interspeech; 2013:1292e7.

[169] Zhu Y, Kim Y, M MP, Narayanan S, Nayak K. Visualization of vocal tractshaping during speech. IEEE Trans Med Imaging 2013;32:838e49.

[170] Liang Z. Spatiotemporal imaging with partially separable functions. Proc IEEEISBI; 2007:988e92.

[171] Demolin D, Metens T, Soquet A. Real time MRI and articulatory coordinationin vowels. In: Proceedings 5th Speech Production Seminar; 2000. pp. 86e94.

[172] vanderKouwe A, Sagar P, Silver A, Maturo S, Nimkin K, Hartnick C. Automaticgeneration of movie with sound during speech production for assessingvelopharyngeal insufficiency. Proc Intl Soc Magn Reson Med 2011;19:4275.

[173] Bresch E, Narayanan S. Region segmentation in the frequency domainapplied to upper airway real-time magnetic resonance images. IEEE TransMed Imaging 2009;28:323e39.

[174] Kim Y, Hayes C, Narayanan S, Nayak K. 16-Channel receive coil array foraccelerated upper airway MRI at 3 Tesla. Magn Reson Med 2011;65:1711e8.

[175] Stone M, Stock G, Bunin K, Kumar K, Epstein M. Comparison of speechproduction in upright and supine position. J Acoust Soc Am 2007;122:532e42.

[176] Kitamura T, Takemoto H, Honda K, Shimada Y, Fujimoto I, Syakudo Y, et al.Differences in vocal tract shape between upright and supine postures: ob-servations by an open-type MRI scanner. Acoust Sci Technol 2005;26:465e9.

[177] Tiede M, Masaki S, Wakumoto M, Vatikiotis-Bateson E. Magnetometerobservation of articulation in sitting and supine conditions. J Acoust Soc Am1997;102.

[178] Traser L, Burdumy M, Richter B, Vicari M, Echternach M. The effect of supineand upright position on vocal tract configurations during singing e acomparative study in professional tenors. J Voice 2013;27:141e9.

[179] Scott A, Boubertakh R, Birch M, Pinkstone M, Miquel M. Rapid tracking of softpalate motion during speech using pencil beam and turbo navigators. ProcIntl Soc Magn Reson Med 2013;21:250.

[180] Hansen M, Muthurangu V, Baltes C, Kozerke S, Tsao J, Razavi R, et al. Real-time imaging of speech production using radial k-t SENSE. Proc Intl SocMagn Reson Med 2006;14:3187.

[181] Jung H, Sung K, Nayak K, Kim Y, Ye J. K-t FOCUSS: a general compressedsensing framework for high resolution dynamic MRI. Magn Reson Med2009;61:103e17.

[182] Wylezinska-Arridge M, Birch M, Miquel M. K-t FOCUSS in real-time imagingof the soft palate for speech assessment. Proc Intl Soc Magn Reson Med2014;22:3447.

[183] Fu M, Zhao B, Holtrop J, Perry J, Kuehn D, Liang Z, et al. High-frame-rate full-vocal-tract imaging based on the partial separability model and volumetricnavigation. Proc Intl Soc Magn Reson Med 2013;21:607.

[184] Kowalik G, Steeden J, Pandya B, Odilie F, Atkinson D, Taylor A, et al. Real-timeflow with fast GPU reconstruction for continuous assessment of cardiacoutput. J Magn Reson Imaging 2012;36:1777e83.

[185] Joseph A, Kowallick J, Merboldt K, Voit D, Schaetz S, Zhang S, et al. Real-timeflow MRI of the aorta at a resolution of 40 msec. J Magn Reson Imaging;2013. http://dx.doi.org/10.1002/jmri.24328.

[186] Bassett E, Kholmovski E, Wilson D, DiBella E, Dosdall D, R RR, et al. Evaluationof highly accelerated real-time cardiac cine MRI in tachycardia. NMR Biomed2014;27:175e83.

[187] Hansen M, Atkinson D, Sorensen T. Cartesian SENSE and k-t SENSE recon-struction using commodity graphics hardware. Magn Reson Med 2008;59:463e9.

[188] Miquel M, Wylezinska-Arridge M, Pinkstone M, Theobald C, Birch M, Scott A.Assessment of velopharyngeal closure and soft palate anatomy using MRI incleft palate patients. Med Phys Int 2013;1:459.

[189] Wylezinska-Arridge M, Pinkstone M, Hay N, Birch M, Miquel M. Orthodonticappliances and quality of cleft and speech MRI. Magn Reson Mater Phys2013;26:469.

[190] Anderson A, Velikina J, Block W, Wieben O, Samsonov A. Adaptive retro-spective correction of motion artefacts in cranial MRI with multicoil three-dimensional radial acquisitions. Magn Reson Med 2013;69:1094e103.

[191] Kim Y, Lebel R, Wu Z, Ward SD, Khoo M, Nayak K. Real-time 3D magneticresonance imaging of the pharyngeal airway in sleep apnea. Magn ResonMed; 2013:1e10.

[192] Kumaran S, Gudwani S, Saxena M, Behari M. Study of articulatory movementfrom the single slice dynamic imaging of the vocal tract in parkinsonism.Proc Intl Soc Magn Reson Med 2013;21:2843.

[193] Cathy J, Price A. A review and synthesis of the first 20 years of PET and fMRIstudies of heard speech, spoken language and reading. Neuroimage 2012;62:816e48.

[194] Simmonds A, Wise R, Dhanjal N, Leech R. A comparison of sensory motoractivity during speech in first and second languages. J Neurophysiol2011;106:470e9.