Speech acoustics

Objectives: Describe relative frequency and intensity

of phonemes by voice, manner, and formant frequency.

Describe various phonemic cues.Describe speech constraints.

Average speech intensity

~65 dB SPL (~45 dB HL) 30 dB range Any vowel has more power than any

consonant

Average speech frequency

~50 – 10,000 Hz Most energy below 1000 Hz

Fundamental frequency Men: 100 Hz Women: 200 Hz Children: 300 Hz Crying babies: 500 Hz

Cues for talker identity

Average speech duration

Vowels: 130 – 360 msec Consonants: 20 – 150 msec Rate: ~5 syllables/second; ~12

phonemes/second

Vowel formants

High F1

Low F2

High F1

High F2

Low F1

Low F2

Low F1

High F2

Vowel formants

Consonants: place, manner, voicing

Consonants: energy bandsFrequency Bands

Consonant 1 2 3 4 Intensity

r 600-800 1000-1500 1800-2400 46

l 250-400 2000-3000 43

sh 1500-2000 4500-5500 41

ng 250-400 1000-1500 200-3000 41

ch 1500-2000 4000-5000 38

n 250-350 1000-1500 2000-3000 37

m 250-350 1000-1500 2500-3500 35

th (ð) 250-350 4500-6000 34

t 2500-3500 34

h 1500-2000 32

k 2000-2500 34

j 200-300 2000-3000 36

f 4000-5000 34

g 200-300 1500-2500 33

s 5000-6000 32

z 200-300 4000-5000 31

v 300-400 3500-4500 31

p 1500-2000 30

d 300-400 2500-3000 29

b 300-400 2000-2500 29

th (θ) ~6000 28

Phonemic cues - Stops

Closure Voiceless stops – silent period Voiced stops – low level energy

Burst Wide-band energy ~40 msec Greater intensity for voiceless stops Frequency depends on place

Formant transition First formant always rising Second formant transition depends on

Phonemic cues - Stops

Voice easier to detect than place For voiced stops

Voice-onset time is earlier Energy present at fundamental frequency Burst energy is lower in amplitude Vowels are longer in duration before voiced

final stops (“eyes” v. “ice”)

Phonemic cues - Nasals

Always voiced Continuant Nasal resonance

highest for /m/ lowest for /n/

Second formant (frequency and transition) gives place information

Phonemic cues - Fricatives

Hissing quality Voiced fricatives

Periodic Lower frequency Lower amplitude Greater overall energy (from

fundamental) Sibilants (s, z, sh, zh)

Higher amplitude than other fricatives

-f- -θ- -s- -S-

Suprasegmental cues

Stress changes in fundamental frequency,

intensity, duration Intonation

changes in fundamental frequency, pitch pattern

expresses attitudes, feeling, meaning (command, request, statement)

Duration variations in speech sounds due to

context of other sounds

Speech constraints

Syntactic S = NP (Aux) VP

NP = (Det) (AP) N (PP) “the naughty boy in the daycare…”

VP = V (NP) (PP) (Adv) “…took the toy away brusquely”

Speech constraints

Syntactic S = NP (Aux) VP

NP = (Det) (AP) N (PP) “the naughty boy in the daycare…”

VP = V (NP) (PP) (Adv) “…took the toy away brusquely”

Speech constraints

SyntacticThe question “What should you eat”

Answer is a noun phrase

The question “How should you eat” Answer is an adverbial phrase

Speech constraints

Semantic Words in a sentence are related

meaningfully “Plug the mouse into the computer”

Situational Conversation usually refers to the context

of the environment “I like that oat!”

Mall vs. Farm

Overlapping cues help protect the signal from noise

Speech predictability helps protect the signal from noise

Noise can come from the speaker (poor intelligibility, etc) the environment (distractions, etc) the listener (ESL, etc)

Effects of hearing loss on speech perception

Objectives: Describe speech characteristics that are

lost and that are preserved for hearing losses of various degree, type and configuration.

0 20 50 100 200 500 1000 2000 5000 10000 200000

Auditory Response Area

0 20 50 100 200 500 1000 2000 5000 10000 200000

Speech audiogram

X X X X X X

Speech audiogram

r 600-800 1000-1500 1800-2400 46

l 250-400 2000-3000 43

sh 1500-2000 4500-5500 41

ng 250-400 1000-1500 200-3000 41

ch 1500-2000 4000-5000 38

n 250-350 1000-1500 2000-3000 37

m 250-350 1000-1500 2500-3500 35

th 250-350 4500-6000 34

t 2500-3500 34

h 1500-2000 32

k 2000-2500 34

j 200-300 2000-3000 36

f 4000-5000 34

g 200-300 1500-2500 33

s 5000-6000 32

z 200-300 4000-5000 31

v 300-400 3500-4500 31

p 1500-2000 30

d 300-400 2500-3000 29

b 300-400 2000-2500 29

th ~6000 28

r 600-800 1000-1500 1800-2400 46

l 250-400 2000-3000 43

sh 1500-2000 4500-5500 41

ng 250-400 1000-1500 200-3000 41

ch 1500-2000 4000-5000 38

n 250-350 1000-1500 2000-3000 37

m 250-350 1000-1500 2500-3500 35

th 250-350 4500-6000 34

t 2500-3500 34

h 1500-2000 32

k 2000-2500 34

j 200-300 2000-3000 36

f 4000-5000 34

g 200-300 1500-2500 33

s 5000-6000 32

z 200-300 4000-5000 31

v 300-400 3500-4500 31

p 1500-2000 30

d 300-400 2500-3000 29

b 300-400 2000-2500 29

th ~6000 28

r 600-800 1000-1500 1800-2400 46

l 250-400 2000-3000 43

sh 1500-2000 4500-5500 41

ng 250-400 1000-1500 200-3000 41

ch 1500-2000 4000-5000 38

n 250-350 1000-1500 2000-3000 37

m 250-350 1000-1500 2500-3500 35

th 250-350 4500-6000 34

t 2500-3500 34

h 1500-2000 32

k 2000-2500 34

j 200-300 2000-3000 36

f 4000-5000 34

g 200-300 1500-2500 33

s 5000-6000 32

z 200-300 4000-5000 31

v 300-400 3500-4500 31

p 1500-2000 30

d 300-400 2500-3000 29

b 300-400 2000-2500 29

th ~6000 28

Speech audiogram

34 dots

Correlating SII to speech

Adult values (children would be worse)

Digits easy

Words hard

X X X X X X

Correlating SII to speech

Deafness

No access to average speech

Severe

Access to only loudest components of speech

Speech production High airflow rate Speech initiation at low lung volumes Poor velar control (nasality) High fundamental frequency Slow speech rate

Moderate

Access to louder half of speech, or to loud speech

Speech production Substitutions and distortions Errors in affricate, fricatives and blends

Slight to Mild

Access to all but the quietest components of speech

Speech production Fewer distortions/substitutions Good intelligibility

Rising v. Sloping loss

SII = 64 SII = 45

Speech acoustics

speech constraintssyntactics

speech perceptionobjectives

speech acousticsobjectives

speech characteristics

speech audiogramx

average speech frequency

average speech intensity

average speech durationvowels

Education

IEEE International Conference on Acoustics, Speech, and...

Developing acoustics models for automatic speech recognition...

Speech Acoustics and the Keyboard Telephone: Rethinking...

How to test the Acoustics in Meeting RoomsHow to test the...

Speech Science VIII The articulation behind the acoustics...

IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL...

2015 IEEE International Conference on Acoustics,...

Remainder of Course Acoustics of Speech and Hearing ·...

Speech signal processing -...

Acoustics Speech and Signal Processing IEEE Transactions on....

5/5/20151 Acoustics of Speech Julia Hirschberg CS 4706.

Image Processing Algorithm for Speech Acoustics

ECE 598: The Speech Chain Lecture 5: Room Acoustics;...

Consonants Consonants Vs. Vowels - UCL · 1 The Acoustics.....

Triphone clustering in Finnish continuous speech - TKK...

Designing for speech in a circular room · Proceedings of.....