Top Banner
Natural variations in speech intelligibility: An fMRI study Han-Gyol Yi 1 , Rajka Smiljanic 2 , Bharath Chandrasekaran 1,3 1 Department of Communication Sciences and Disorders, 2 Department of Linguistics, 3 Department of Psychology The University of Texas at Austin, Austin, TX, United States 1524 BACKGROUND Speech is often processed in adverse listening situations. Listeners use visual cues to compensate for degraded speech intelligibility 7 . However, nonnative visual cues are incorporated poorly, although visual cues should be the most beneficial 10 . Figure 2 | Poor integration of nonnative visual cues R Figure 1 | Neural processing of intelligible speech • Results from experiments that have artificially degraded speech intelligibility suggest that intelligibile speech is processed in anterior and posterior superior temporal sulci 5 . Corresponding results are replicated here (spectrally rotated speech; n=23; TFCE; FWE-corrected p-values). However, more relevant in real-life speech communication are (a) natural variations in speaking style: native vs. nonnative speech 2 , and (b) availability and utilization of visual cues: audiovisual (AV) vs. audio-only (AO) modalities 7 . METHODS Participants. Young adult (18<age<35) native English speakers (n=22; 13f). Scan parameters. Siemens Magnetom Skyra 3T MRI Scanner. T1: MPRAGE (1.0x*1.0*1.0mm). T2*: EPI (flip 60°; 2.0*2.0*2.0mm; 36 slices; 50% distance factor; TR=1.8s; TE=30ms). fMRI analysis 3,4,6,9 . FSL v5.0. T2* registered to T1 (linear; df=6) to MNI152 (2mm; linear; df = 12). Permuted p-values (n=50000; TFCE) and FWE corrected [ref3]. Behavioral analysis 1 . Neural index value outliers (n=2) excluded. Linear mixed-effects models using S4 classes (lme4). P-values obtained via MCMC sampling (linear; n=10000) or z-statistics (binomial). See supplementary materials for the details on the behavioral analysis. RESULTS Figure 4 | Main effect of nativeness • Native speech (AVnat+AOnat-AVnat-AOnon) activated the right posterior middle and superior temporal gyri, and the right angular gyrus. • Nonnative speech (AVnon+AOnon-AVnat-AOnat) activated (a) the speech-language network, including the bilateral middle superior temporal gyri, left inferior frontal gyrus; (b) the cognitive network, including the left dorsolateral prefrontal cortex and the anterior cingulate gyrus. Activation pattern for nonnative speech, relative to native speech, suggests more effortful processing 8 . Figure 5 | Main effect of modality AV (AVnat+AVnon-AOnat-AOnon) activated the ventral portion of the cortex including the bilateral occipital cortices and anterior portions of the bilateral superior and inferior temporal sulci. AO (AOnat+AOnon-AVnat-AVnon) activated the dorsal portion of the cortex including the right parietal and the dorsolateral prefrontal cortices. Activation pattern for AO modality, relative to AV modality, suggests more effortful processing 8 . Figure 6 | Modality by nativeness interaction * Contrast pre-masked by an independent localizer task. The modality by speaker interaction (AVnat-AOnat-AOnat+AOnon) effect was observed in V1. This area showed greater activation in the AV than in AO modality, but more so for native than for nonnative speech. Each participant’s percent contrast activation in this region was used as the neural index to establish the brain-behavior link in nonnative speech intelligibility processing. Greater neural index was associated with: (a) better audiovisual integration for speech perception in noise in general (p<.0001) but poorer nonnative audiovisual integration (p=.0092); (b) decreased clarity rating for nonnative sentences, regardless of modality (p=.0052); (c) increased implicit bias towards making Caucasian-American and Asian-Foreign associations (p<.0001). x=-10 Pseudorandom presentation (2 to 3 s ITI) Clarity rating: 1-to-4 Likert scale (passive listening) Speech in noise perception (outside the scanner; n=40) Speech - Rotated P-values; <.05; FWE-Corrected AV(Nat-Non) – AO(Nat-Non) P-values; <.05; FWE-Corrected Native vs. nonnative speech in AV vs. AO Implicit association test Congruous: Caucasian-American; Asian-Foreign Iincongruous: Caucasian-Foreign; Asian-American Slower RT indicates implicit association against the condition DISCUSSION References This work was funded by the Longhorn Innovation Fund for Technology (BC & RS). The authors thank Kirsten Smayda, Jasmine E. B. Phelps, and Rachael Gilbert for significant contributions in data collection and processing; the faculty and the staff of the Imaging Research Center at the University of Texas at Austin for technical support and counsel; the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing grid resources that have contributed to the research results reported within this poster. 1. Bates, D. (2011). ‘lme4: Linear mixed-effects models using S4 classes’. 2. Bradlow, A. R. (2008). ‘Perceptual adaptation to non-native speech’, Cognition, vol. 106, no. 2, pp. 707-729. 3. Jenkinson, M. (2012). ‘FSL’, NeuroImage, vol. 62, pp. 782-790. 4. Nichols, T. E. (2002). ‘Nonparametric permutation tests for functional neuroimaging: A primer with examples’, Human Brain Mapping, vol. 15, pp. 1-25. 5. Scott, S. K. (2013). ‘The neural processing of masked speech’, Hearing Research, In Press. 6. Smith, S. M. (2004). ‘Advances in functional and structural MR image analysis and implementation as FSL’, NeuroImage, vol. 23, no. S1, pp. 208-219. 7. Sumby, W. H. (1954), ‘Visual contribution to speech intelligibility in noise’, Journal of the Acoustical Society of America, vol. 26, no. 2, pp. 212-215. 8. Wong, P. C. M. (2009). ‘Aging and cortical mechanisms of speech perception in noise’, Neuropsychologia, vol. 47, no.3, pp. 693-703. 9. Woolrich, M. W. (2009). ‘Bayesian analysis of neuroimaging data in FSL’, NeuroImage, vol. 45, pp. S173-S186. 10. Yi, H. (Submitted). ‘Visual influences on non-native speech perception’, Journal of the Acoustical Society of America. Poster presented at the 19th Annual Meeting of the Organization for Human Brain Mapping, Seattle, WA. Acknowledgment Why are nonnative visual cues poorly integrated? How is natural speech intelligibility processed? Processing of speech is more effortful for nonnative than for native speech, and for AO than for AV modality. V1 activation is greater in AV than in AO modality. However, V1 activation is also greater for native speech than in nonnative speech. These results suggest that: (a) Less intelligible speech necessitates additional effort in resolving degraded auditory cues. (b) Native AV speech leads to increased V1 involvement because the visual cues can be used to increase intelligibility, or at least considered so. V1, crucial for AV processing, is activated stronger for native AV than nonnative AV speech. Listeners with greater implicit Asian-foreign associations also consider nonnative speech to be less intelligible, inefficiently integrate nonnative visual cues, and exhibit greater modality by nativeness interaction in V1 activation. • Inefficient nonnative AV speech processing may be affected by an underlying bias of assuming non-Caucasian speakers to be nonnative speakers, which modifies V1 activity during nonnative speech. R R R SoundBrainLab Figure 3 | Implicit bias in nonnative speech AV - AO AO - AV P-values; <.05; FWE-Corrected Nat - Non Non - Nat P-values; <.05; FWE-Corrected In the American English speaking environment, Asian speakers are assumed to be nonnative speakers of English. The implicit association test can measure this automatic bias 10 . Listeners with greater implicit Caucasian-American and Asian- Foreign associations were more likely to enhanced performance for native speech in AV (r(17)=.482; p=.037) 10 . • This finding suggests that a non-linguistic bias can affect nonnative speech perception 10 . Stimuli Procedures Meaningful English sentences (n=80) Native vs. nonnative English (native Korean) speakers (n=8; 4f) Audiovisual (AV) vs. audio-only (AO) modality Face images: Caucasian vs. Asian Scene images: American vs. Foreign Stimuli Procedures Questions How does the brain resolve natural variations in speech intelligibility? - Native vs. nonnative speech - Audiovisual (AV) vs. audio-only (AO) modalities Findings Nonnative and AO speech lead to more effortful processing Greater V1 activation to native AV speech associated with inefficient nonnative AV speech processing
1

Natural variations in speech intelligibility: An fMRI study › sites › default › files › online... · 1Department of Communication Sciences and Disorders, 2Department of Linguistics,

Jul 04, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Natural variations in speech intelligibility: An fMRI study › sites › default › files › online... · 1Department of Communication Sciences and Disorders, 2Department of Linguistics,

Natural variations in speech intelligibility: An fMRI studyHan-Gyol Yi1, Rajka Smiljanic2, Bharath Chandrasekaran1,3

1Department of Communication Sciences and Disorders, 2Department of Linguistics, 3Department of PsychologyThe University of Texas at Austin, Austin, TX, United States

1524

BACKGROUND

•Speech is often processed in adverse listening situations. Listeners use visual cues to compensate for degraded speech intelligibility7.

•However, nonnative visual cues are incorporated poorly, althoughvisualcuesshouldbethemostbeneficial10.

Figure 2 | Poor integration of nonnative visual cues

R

Figure 1 | Neural processing of intelligible speech•Resultsfromexperimentsthathaveartificiallydegradedspeech

intelligibility suggest that intelligibile speech is processed in anterior and posterior superior temporal sulci5. Corresponding results are replicated here (spectrally rotated speech; n=23; TFCE; FWE-corrected

p-values).•However, more relevant in real-life speech communication are

(a) natural variations in speaking style: native vs. nonnative speech2, and (b) availability and utilization of visual cues: audiovisual (AV) vs. audio-only (AO) modalities7.

METHODSParticipants. Young adult (18<age<35) native English speakers (n=22; 13f). Scan parameters. SiemensMagnetomSkyra3TMRIScanner.T1:MPRAGE(1.0x*1.0*1.0mm).T2*:EPI(flip60°;2.0*2.0*2.0mm;36slices;50%distancefactor;TR=1.8s;TE=30ms).fMRI analysis3,4,6,9. FSL v5.0.T2*registeredtoT1(linear;df=6)toMNI152(2mm;linear;df =12).Permutedp-values(n=50000; TFCE) and FWE corrected [ref3]. Behavioral analysis1. Neuralindexvalueoutliers(n=2) excluded. Linear mixed-effects models using S4 classes (lme4). P-values obtained via MCMC sampling (linear; n=10000) or z-statistics (binomial). See supplementary materials for the details on the behavioral analysis.

RESULTS

Figure 4 | Main effect of nativeness•Nativespeech(AVnat+AOnat-AVnat-AOnon) activated the right posterior

middle and superior temporal gyri, and the right angular gyrus.•Nonnativespeech(AVnon+AOnon-AVnat-AOnat) activated (a) the

speech-language network, including the bilateral middle superior temporal gyri, left inferior frontal gyrus; (b) the cognitive network, including the left dorsolateral prefrontal cortex and the anterior cingulate gyrus.

•Activation pattern for nonnative speech, relative to native speech, suggests more effortful processing8.

Figure 5 | Main effect of modality•AV (AVnat+AVnon-AOnat-AOnon) activated the ventral portion of the

cortex including the bilateral occipital cortices and anterior portions of the bilateral superior and inferior temporal sulci.

•AO (AOnat+AOnon-AVnat-AVnon) activated the dorsal portion of the cortex including the right parietal and the dorsolateral prefrontal cortices.

•Activation pattern for AO modality, relative to AV modality, suggests more effortful processing8.

Figure 6 | Modality by nativeness interaction* Contrast pre-masked by an independent localizer task.

•The modality by speaker interaction (AVnat-AOnat-AOnat+AOnon) effect was observed in V1. This area showed greater activation in the AV than in AO modality, but more so for native than for nonnative speech.

•Each participant’s percent contrast activation in this region was used as the neural index to establish the brain-behavior link in nonnative speech intelligibility processing.

•Greater neural index was associated with: (a) better audiovisual integration for speech perception in noise in general (p<.0001) but poorer nonnative audiovisual integration (p=.0092); (b) decreased clarity rating for nonnative sentences, regardless of modality (p=.0052); (c) increased implicit bias towards making Caucasian-American and Asian-Foreign associations (p<.0001).

x=-10

Pseudorandom presentation (2 to 3 s ITI)Clarity rating: 1-to-4 Likert scale (passive listening)Speech in noise perception (outside the scanner; n=40)

Speech - Rotated P-values; <.05; FWE-Corrected

AV(Nat-Non) – AO(Nat-Non) P-values; <.05; FWE-Corrected

Native vs. nonnative speech in AV vs. AO

Implicit association test

Congruous: Caucasian-American; Asian-ForeignIincongruous: Caucasian-Foreign; Asian-AmericanSlower RT indicates implicit association against the condition

DIscussIoN

ReferencesThis work was funded by the Longhorn Innovation Fund for Technology (BC & RS). The authors thank Kirsten Smayda, Jasmine E. B. Phelps, and RachaelGilbertforsignificantcontributionsindatacollectionandprocessing;the faculty and the staff of the Imaging Research Center at the University of Texas at Austin for technical support and counsel; the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing grid resources that have contributed to the research results reported within this poster.

1. Bates, D. (2011). ‘lme4: Linear mixed-effects models using S4 classes’.2. Bradlow,A.R.(2008).‘Perceptualadaptationtonon-nativespeech’,Cognition,vol.106,no.

2, pp. 707-729.3. Jenkinson,M.(2012).‘FSL’,NeuroImage,vol.62,pp.782-790.4. Nichols,T.E.(2002).‘Nonparametricpermutationtestsforfunctionalneuroimaging:A

primer with examples’, Human Brain Mapping, vol. 15, pp. 1-25.5. Scott, S. K. (2013). ‘The neural processing of masked speech’, Hearing Research, In Press.6. Smith, S. M. (2004). ‘Advances in functional and structural MR image analysis and

implementationasFSL’,NeuroImage,vol.23,no.S1,pp.208-219.7. Sumby, W. H. (1954), ‘Visual contribution to speech intelligibility in noise’, Journal of the

AcousticalSocietyof America,vol.26,no.2,pp.212-215.8. Wong, P. C. M. (2009). ‘Aging and cortical mechanisms of speech perception in noise’,

Neuropsychologia,vol.47,no.3,pp.693-703.9. Woolrich,M.W.(2009).‘Bayesiananalysisof neuroimagingdatainFSL’,NeuroImage,vol.

45,pp.S173-S186.10. Yi,H.(Submitted).‘Visualinfluencesonnon-nativespeechperception’,Journalof the

Acoustical Society of America.

Poster presented at the 19th Annual Meeting of the Organization for Human Brain Mapping, Seattle, WA.

Acknowledgment

Why are nonnative visual cues poorly integrated?

How is natural speech intelligibility processed?•Processing of speech is more effortful for nonnative than for

native speech, and for AO than for AV modality.•V1 activation is greater in AV than in AO modality. However,

V1 activation is also greater for native speech than in nonnative speech.

•These results suggest that:(a) Less intelligible speech necessitates additional effort in resolving degraded auditory cues.(b) Native AV speech leads to increased V1 involvement because the visual cues can be used to increase intelligibility, or at least considered so.

•V1, crucial for AV processing, is activated stronger for native AV than nonnative AV speech.

•Listeners with greater implicit Asian-foreign associations also considernonnativespeechtobelessintelligible,inefficientlyintegrate nonnative visual cues, and exhibit greater modality by nativeness interaction in V1 activation.

• InefficientnonnativeAVspeechprocessingmaybeaffectedby an underlying bias of assuming non-Caucasian speakers tobenonnativespeakers,whichmodifiesV1activityduringnonnative speech.

R

R

R

SoundBrainLab

Figure 3 | Implicit bias in nonnative speech

AV - AO AO - AV

P-values; <.05; FWE-Corrected

Nat - Non Non - Nat

P-values; <.05; FWE-Corrected

• In the American English speaking environment, Asian speakers are assumed to be nonnative speakers of English. The implicit association test can measure this automatic bias10.

•Listeners with greater implicit Caucasian-American and Asian-Foreign associations were more likely to enhanced performance for native speech in AV (r(17)=.482; p=.037)10.

•Thisfindingsuggeststhatanon-linguisticbiascanaffectnonnative speech perception10.

Stimuli

Procedures

Meaningful English sentences (n=80)Native vs. nonnative English (native Korean) speakers (n=8; 4f)Audiovisual (AV) vs. audio-only (AO) modality

Face images: Caucasian vs. AsianScene images: American vs. Foreign

Stimuli

Procedures

Questions •How does the brain resolve natural variations in speech intelligibility?

- Native vs. nonnative speech- Audiovisual (AV) vs. audio-only (AO) modalities

Findings •Nonnative and AO speech lead to more effortful processing

•Greater V1 activation to native AV speech associated with inefficientnonnativeAVspeechprocessing