Top Banner
Speech Science XIII Speech perception is special (deutsche Begleitnotizen) Version WS 2007-8
18

Speech Science XIII Speech perception is special (deutsche Begleitnotizen) Version WS 2007-8.

Mar 26, 2015

Download

Documents

Eric Curtis
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Speech Science XIII Speech perception is special (deutsche Begleitnotizen) Version WS 2007-8.

Speech Science XIII

Speech perception is special(deutsche Begleitnotizen)

Version WS 2007-8

Page 2: Speech Science XIII Speech perception is special (deutsche Begleitnotizen) Version WS 2007-8.

Topics

• Speech perception as simple pattern matching?• Evidence for and against a “speech mode” of speech

perception.• A bird’s-eye view of the perception landscape

• Reading: BHR(3rd ed.), chapter 6 (part 2), pp. 203-229 (5th ed.), chap. 11, pp. 237-272

P.-M., 3.2.2., part 2 + 3.2.3. pp. 162-173

Page 3: Speech Science XIII Speech perception is special (deutsche Begleitnotizen) Version WS 2007-8.

Speech perception as pattern matching

• The “acoustic cue” concept suggests acoustic patterns which can be stored in memory (learned)

• But huge variability in the acoustic structure of any linguistic unit (sound, syllable or word) argues against a simple pattern-matching mechanism.

• The issue of how much of the variability is stored and used when perceiving speech divides scientists. The brain is very powerful, but how is the power used!

• Most agree that we don’t just (passively) receive input, but that we actively work with it to create our percepts. But how? – We look first at vowels

Page 4: Speech Science XIII Speech perception is special (deutsche Begleitnotizen) Version WS 2007-8.

How do we deal with vowels?

• Vowel formants vary greatly with the size of the vocal tract.

• But formants change in relation to one another, and they change together with other properties, (e.g. F0: children – adults; women – men)

• The relative values of formants have therefore been examined.

• We do change our interpretation of formant values a) as a function of (very) different F0 valuesb) as a function of preceding formant values.

• And – our two-formant model of vowels is not reality

Page 5: Speech Science XIII Speech perception is special (deutsche Begleitnotizen) Version WS 2007-8.

Two or more formants?

Two-formant synthetic vowels which best match natural vowels (nach Carlsson et al. 1975, Fig. 1)

Page 6: Speech Science XIII Speech perception is special (deutsche Begleitnotizen) Version WS 2007-8.

F0 as a factor in perceived vowel quality?

For 140 Hz fundamental,the same vowels are generally perceived with 80 Hz lower F1 valuesthan for a 280 Hz F0(after Miller 1953)

Page 7: Speech Science XIII Speech perception is special (deutsche Begleitnotizen) Version WS 2007-8.

Vowels relative to preceding context.

Ladefoged and Broadbent (1957)demonstrated that the size of thespeaker producing a carrier phrase(and therefore the values of the speaker‘s vowel formants) affected the intrepetation of the test wordsat the end of the carrier phrase.(the test words were not producedby different speakers)

Speaker

//

//

//

//

Relation of carrier-phrase formantsrelative to testword formantvalues. (e.g. F1 up = higher carrierphrase formants, therefore testwordheard as less open lower F1)

Formants of carrierrelative to testword

Page 8: Speech Science XIII Speech perception is special (deutsche Begleitnotizen) Version WS 2007-8.

Immediate vs. wider context

• The carrier phrase influence shows effects of wider context. The F0 effect is vowel-intrinsic, but average F0 over a phrase also provides a wider F0 context.

• So one important question is, whether we simply change the frame within which we process vowel formants according to the information about the speaker that we collect during the utterance?

• This would mean that vowels would be more difficult to identify at the beginning of utterances (from unknown speakers!) – i.e., vowels offered with no prior information.

….Is this the case?

Page 9: Speech Science XIII Speech perception is special (deutsche Begleitnotizen) Version WS 2007-8.

Isolated vowels vs. vowels in syllabic context

Formants rarely stay constantfor long in C_C syllabic context.

But Stevens (1968) showed thatsteady-state isolated vowels are,in fact, less well identified thansyllable-context vowels.

This could lead to the assumptionthat isolated vowels with well-defined, steady-state formantsshould be identified with more certainty.

Page 10: Speech Science XIII Speech perception is special (deutsche Begleitnotizen) Version WS 2007-8.

Syllabic context 2

Percent errors42.6%

31.2%

17.0%

9.5%

Strange et al. (1976) showedthat the effect of syllabiccontext was more important(21.7 - 25-6% difference)than the effect of listeningto one speaker at a time(7.5 – 11.4% difference)

Page 11: Speech Science XIII Speech perception is special (deutsche Begleitnotizen) Version WS 2007-8.

The importance ofvowel-target info.vs. vowel-dynamics

Verbrugge & Rakerd (1986)investigated the contributionof the dynamic, movementinformation vs. the “vowel-defining” target information.

The whole syllable was clearlyeasiest to recognise (91.7%). But even if the central target section was missing, almost 80% werecorrectly identified.

Page 12: Speech Science XIII Speech perception is special (deutsche Begleitnotizen) Version WS 2007-8.

The Motor Theory of Speech Perception

• The assumption of an articulatory basis to our speech perception mechanisms has been explicit for over 40 years (internationally since a landmark Speech Communication Seminar in Stockholm in 1962)

• The Haskins Laboratories (USA) presented evidence (from earlier experimental work) that:We identify acoustically different stimuli as one and the same articulatorily defined speech soundWe can only discriminate acoustic differences between stimuli that cross category boundaries, although the differences within categories are just as great.

Page 13: Speech Science XIII Speech perception is special (deutsche Begleitnotizen) Version WS 2007-8.

Categorical Perception

Series of acoustically equidistant stimuli1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

E.g., 1 is a typical // F2-transition, 8 is a typical // transition and 15 is a typical // transition. Stimuli 2-7 and 9-14 are steps between these typical stimuli.

No.

of

jud g

e me n

t s f

or a

ca t

e go r

y

// // //

x xx

x x

x x x x

x

x

x x

Dis

crim

i nab

i li t

y of

st i

mul

us p

air s

Page 14: Speech Science XIII Speech perception is special (deutsche Begleitnotizen) Version WS 2007-8.

Categorical Perception 2

• Further experiments with many other acoustic propertieswhich come from articulations which are not categorically separable (VOT, /l – r/, vowel categories, etc.) broughtabout a theoretical modification ….

• Categorical perzeption is “acquired” and the increased distinctiveness between categories is also acquired. The low-sensitivity baseline between the category boundariescan be seen as psychoacoustically normal sensitivity.

• Normal perception in persons with disturbed articulationinduced a theoretical fall-back to a position where the linkbetween perception and production was more abstract….

The position was referred to as “the speech mode” of perception. This still made speech perception special.

Page 15: Speech Science XIII Speech perception is special (deutsche Begleitnotizen) Version WS 2007-8.

The Speech Mode of perception

• Many experiments showed that the functional goal ofspeech perception made it special:

• Even more dramatic is the perceptual “switch” which canoccur with “sine analogue speech”. Some people hear itas strange music until they are asked whether they canunderstand what is being said. They then hear it as speech(and cannot switch back to the music mode)

• Dichotic signals (different parts played into the left andright ear) were heard as one speech sound, but the separateelements were still audible

• Separate words played into the left and right ear were heard as one word, if the sounds of the two words could

combine: E.g. “pay” + “lay” “play”. This was heardeven if the /l/ started before the release of the /p/!

Page 16: Speech Science XIII Speech perception is special (deutsche Begleitnotizen) Version WS 2007-8.

Other influences on phonetic perception:

Visual Information • The prime input in speech perception is the acoustic

signal, but we can also often see the person who isspeaking and have therefore a sub-conscious knowledgeof the visual information accompanying the acoustics.

• This “McGurk” effect (after the person who discovered it)has since been systematically investigated. It confirms thatwe cannot ignore visual information, but the synchronisationmust be accurate for fusion to take place.

• A laboratory mistake led to the discovery, that a videoclip of a spoken // together with the acoustic Signal of// is often perceived as //. Acoustic // with a video of //, on the other hand, is heard as //.

Page 17: Speech Science XIII Speech perception is special (deutsche Begleitnotizen) Version WS 2007-8.

Semantische Einflüsse

Es gibt einen Effektvon fast 25% in derErkennung eines echten Wortes im Vergleich zu einemNichtwort entlangeiner Stimulusreihemit einem Wort bzw.einem Nichtwort alsEndstimulus:Ganongeffekt

Page 18: Speech Science XIII Speech perception is special (deutsche Begleitnotizen) Version WS 2007-8.

Anti-Speech-Mode • There are still many scientists who consider the speech-

mode approach too much like “hocus pocus”. They concentrate on a more direct relationship between theacoustic signal and the percept.

• “Feature detectors” have been another attempt to link theacoustic signal directly with the linguistic units in a morepassive model of speech perception. Animals have high-level neuronal detectors linked to vital functions, so whynot humans?

• Stevens’ “quantal theory” of (plosive) perception rests on thefact that /t, d/ tend to have high-frequency energy, /g, k/ havemiddle-frequency energy, and /b, p/. Therefore, the same relative acoustic information serves the distinction independent of context.