Top Banner
Speech Perception Speech Perception Richard Wright Linguistics 453
34
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Speech Perception Richard Wright Linguistics 453.

Speech PerceptionSpeech Perception

Richard Wright

Linguistics 453

Page 2: Speech Perception Richard Wright Linguistics 453.

Class OverviewClass Overview

PhysiologyAuditory Shaping of the signalAuditory CuesNormalization and ContextExperiment types

Page 3: Speech Perception Richard Wright Linguistics 453.

Physiology 1: The EarPhysiology 1: The Ear

Outer: Pinna, Ear Canal, Ear Drum

Middle: Ossicles, Oval Window

Inner: Cochlea — Basilar Membrane, Tectorial Membrane, Hair Cells

Page 4: Speech Perception Richard Wright Linguistics 453.

ear canal (external auditory meatus)

ear drum(tympanic membrane)

ossicular chain

pinna

cochlea

auditory nerve

oval window

Page 5: Speech Perception Richard Wright Linguistics 453.

Physiology 1: The Outer EarPhysiology 1: The Outer Ear

Pinna: directional hearingEar Canal: high frequency emphasis

(very short resonator closed at one end)Ear Drum: membrane’s vibrations

convert pressure fluctuations to mechanical movement

Page 6: Speech Perception Richard Wright Linguistics 453.

Physiology 1: The Middle EarPhysiology 1: The Middle Ear

Convert eardrum movement to movement of oval window — overcomes air to fluid impedance.

Lower frequency emphasis (500-4000 Hz)

Lessen impact of very loud noises by stiffening (damping)

Ossicles (Malleus, Incus, Stapes):

Page 7: Speech Perception Richard Wright Linguistics 453.

Physiology 1: The Inner EarPhysiology 1: The Inner EarCochlea: fluid filled cavity, wave propagation

in fluid caused by movement of oval window

Basilar Membrane:stiff and narrow at base — wide and flaccid at apex: base = high frequencies and apex = low frequencies (acts like series of band pass filters). Most of membrane is devoted to sounds below 5000 Hz.

Shearing between Basilar and Tectorial

membranes displace hair cells exciting cochlear nerve endings

Page 8: Speech Perception Richard Wright Linguistics 453.

Physiology 2: Nerual PathwayPhysiology 2: Nerual Pathway

Cochlear NerveCochlear NucleusLateral LemniscusAuditory Cortex

Page 9: Speech Perception Richard Wright Linguistics 453.

Superior olive

Medial geniculate

CortexAuditory raditaions

Lateral lemniscus

Inferior coliculus

Probst

Monakow

Held

Cochlear nerve

Mid-line

CIC

Cochlearnucleus

Page 10: Speech Perception Richard Wright Linguistics 453.

Auditory Shaping of the SignalAuditory Shaping of the Signal

Frequency Selectivity: Changes in frequency of stimulus do not result in equivalent changes in sensitivity

Non-linear loudness sensitivityPhase Locking and noise reductionLateral Inhibition and TuningOnsets and neural spikes

Page 11: Speech Perception Richard Wright Linguistics 453.

Bark function

0

2

4

6

8

10

12

14

16

18

0 1000 2000 3000 4000 5000

Hz

Frequency SelectivityFrequency Selectivity

Page 12: Speech Perception Richard Wright Linguistics 453.

rapid

adaptation

short term

adaptation

consonant release transient

formant transitions

steady state (saturated response)

schematic of

speech signal

F2

F1

spontaneous level

of fiber

Onset AdvantageOnset Advantage

Delgutte and Kiang (1984)

Page 13: Speech Perception Richard Wright Linguistics 453.

What are Cues?What are Cues?

Cues: information in the signal that listeners use in recovering the segmental content of the utterance– Place cues– Manner cues– Voicing cues– Vowel quality cues

Page 14: Speech Perception Richard Wright Linguistics 453.

Distribution of CuesDistribution of Cues

F3

F2

F1

stop release burst

fricative noise

F2 transitions nasal pole and zero

Place cues

Page 15: Speech Perception Richard Wright Linguistics 453.

Distribution of CuesDistribution of Cues

Manner cues

F3

F2

F1

stop release burst

fricative noise nasal pole and zero

abruptness and

degree of attenuation

slope of formant

transitions

nasalization

of vowel

Page 16: Speech Perception Richard Wright Linguistics 453.

Distribution of CuesDistribution of Cues

Voicing cues

F3

F2

F1

release burst amplitude

aspiration noise

vowel

duration

vowel duration

VOT

periodicity

stricture

duration

Page 17: Speech Perception Richard Wright Linguistics 453.

Stop release bursts are very brief and difficult to recover: stops rely on formant transition cues

Distribution of CuesDistribution of Cues

Page 18: Speech Perception Richard Wright Linguistics 453.

Stop release bursts are very brief and difficult to recover: stops rely on formant transition cues

Fricative noise, particularly sibilant, contains robust cues: fricatives may be recovered in the absence of formant transitions

Distribution of CuesDistribution of Cues

Page 19: Speech Perception Richard Wright Linguistics 453.

Stop release bursts are very brief and difficult to recover: stops rely on formant transition cues

Fricative noise, particularly sibilant, contains robust cues: fricatives may be recovered in the absence of formant transitions

Nasals contain strong manner cues but weak place cues

Distribution of CuesDistribution of Cues

Page 20: Speech Perception Richard Wright Linguistics 453.

Onset AdvantageOnset Advantage

Redundancy advantage:Onset stops automatically have both a releaseburst and a set of formant transitions

Coda stops may be unreleased and thereforehave less cue redundancy

Page 21: Speech Perception Richard Wright Linguistics 453.

Onset AdvantageOnset Advantage

Onset consonant with flanking vowels

F2 Transitions

F2 Transition

Release burst

Abrupt attenuation

Abrupt attenuation

VOT

Vowellength

Vowellength

Constriction duration

Page 22: Speech Perception Richard Wright Linguistics 453.

Experimental TasksExperimental Tasks

IdentificationDiscriminationRatingMethod of Adjustment (MOA)

Page 23: Speech Perception Richard Wright Linguistics 453.

Exp.Tasks 1: IdentificationExp.Tasks 1: Identification

Listeners are asked to identify stimuli as speech sounds...

Open set: options openForced choice: listeners choices

constrained

Page 24: Speech Perception Richard Wright Linguistics 453.

Experiment 1: Onset vs CodaExperiment 1: Onset vs Coda

Stimuli– male speaker of American English– /ba, da, ga, ab, ad, ag/ bursts excised– 16 bit, 22 kHz– mixed in three levels of white noise:

• no noise

• noise at 2 dB above RMS of signal

• noise at 2 dB below RMS of signal

Page 25: Speech Perception Richard Wright Linguistics 453.

Experiment 1: Onset vs CodaExperiment 1: Onset vs Coda

Task– onsets & codas mixed and randomized– presented binaurally over headphones– 3 way forced choice task: “B D G”– labeled button press– self paced

Page 26: Speech Perception Richard Wright Linguistics 453.

Exp.Tasks 2: DiscriminationExp.Tasks 2: Discrimination

Listeners are asked to respond “same” or “different” to presented sets of stimuli

AX discrimination: fixed initial stimulus, variable second stimulus (same/different)

ABX discrimination: two fixed initial stimuli, variable third stimulus (same A, same B)

Page 27: Speech Perception Richard Wright Linguistics 453.

Experiment 2: vowel discriminationExperiment 2: vowel discrimination

Stimuli– Synthetic vowel continuum– Equal steps: 2.37 Bark along F1-F2

dimension– 16 bit, 11 kHz– variable AX design

Page 28: Speech Perception Richard Wright Linguistics 453.

Task– same/different response to vowel pairs– presented binaurally over headphones– labeled button press– speeded (limited time to decide)

Experiment 2: vowel discriminationExperiment 2: vowel discrimination

Page 29: Speech Perception Richard Wright Linguistics 453.

Exp.Tasks 3: RatingsExp.Tasks 3: Ratings

Listeners are asked to rate a stimulus in some way: goodness, similarity, accentedness

Example: Effect of intonational contour on naturalness: listeners hear sentences with and without f0 contour and rate naturalness on a 1-5 scale.

Page 30: Speech Perception Richard Wright Linguistics 453.

Exp.Tasks 4: MOAExp.Tasks 4: MOA

Listeners are asked to adjust a stimulus along some dimensions until it fits some criterion: matches another stimulus, sounds most natural, matches a category, etc. (can be identification, discrimination, or rating exp.)

Page 31: Speech Perception Richard Wright Linguistics 453.

Advantages and shortcomings 1Advantages and shortcomings 1

Open identification– Good: most natural, subjects understand

– Bad: time consuming, little control of variables, stats difficult (non-comparable resoponses across subjects

Forced choice identification– Good: less time consuming, control of response variables

– Bad: not as natural

Page 32: Speech Perception Richard Wright Linguistics 453.

Advantages and shortcomings 2Advantages and shortcomings 2

Discrimination– Good: allows experimenter to map relationship between

classification and discrimination

– Bad: very time consuming, not at all natural, unintuitive to subjects

Page 33: Speech Perception Richard Wright Linguistics 453.

Advantages and shortcomings 3Advantages and shortcomings 3

Rating– Good: allows experimenter to map preferences in a

multidimensional space, allows for correlation between one or more aspects of stimulus

– Bad: hard to control interactions between preferences and stimulus variables, not that natural

Page 34: Speech Perception Richard Wright Linguistics 453.

Advantages and shortcomings 4Advantages and shortcomings 4

Method of adjustment (MOA)– Good: much quicker method of mapping multidimensional

perceptional

– Bad: not natural, complex interaction of stimulus variables