Top Banner
3. Production and Classification of Speech Sounds (Most materials from these slides come from Dan Jurafsky)
49

3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Mar 14, 2018

Download

Documents

lammien
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

3. Production and Classification of Speech Sounds

(Most materials from these slides come from Dan Jurafsky)

Page 2: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Tractament Digital de la Parla 2

Speech Production Process   Respiration:

  We (normally) speak while breathing out. Respiration provides airflow. “Pulmonic aggressive airstream”

  Phonation:

  Airstream sets vocal folds in motion. Vibration of vocal folds produces sounds. In voiceless signals they do not vibrate. Sound is then modulated by:

  Articulation and Resonance

  Shape of vocal tract, characterized by:   Oral tract

  Teeth, soft palate (velo del paladar), hard palate (paladar duro)

  Tongue (lengua), lips (labio), uvula (campanilla)   Nasal tract

Page 3: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Basic facts about sound waves (review) f = c/λ

Where c = speed of sound, and λ = wave length (longitud de onda, in meters)

c=3440 cm/s (≈350 m/s) at 21 degrees Celsius at sea level Example: with λ=10m, frequency f=35Hz

λ

Page 4: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Simple model of speech production

Page 5: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Source/filter model of speech production 22 The Speech Signal

LinearSystem

ExcitationGenerator excitation signal speech signal

Vocal TractParameters

ExcitationParameters

Fig. 2.2 Source/system model for a speech signal.

simulation of sound generation and transmission in the vocal tract[36, 93], but, for the most part, it is su!cient to model the produc-tion of a sampled speech signal by a discrete-time system model suchas the one depicted in Figure 2.2. The discrete-time time-varying linearsystem on the right in Figure 2.2 simulates the frequency shaping ofthe vocal tract tube. The excitation generator on the left simulates thedi"erent modes of sound generation in the vocal tract. Samples of aspeech signal are assumed to be the output of the time-varying linearsystem.

In general such a model is called a source/system model of speechproduction. The short-time frequency response of the linear systemsimulates the frequency shaping of the vocal tract system, and since thevocal tract changes shape relatively slowly, it is reasonable to assumethat the linear system response does not vary over time intervals on theorder of 10 ms or so. Thus, it is common to characterize the discrete-time linear system by a system function of the form:

H(z) =

M!

k=0

bkz!k

1 !N!

k=1

akz!k

=

b0

M"

k=1

(1 ! dkz!1)

N"

k=1

(1 ! ckz!1)

, (2.1)

where the filter coe!cients ak and bk (labeled as vocal tract parametersin Figure 2.2) change at a rate on the order of 50–100 times/s. Someof the poles (ck) of the system function lie close to the unit circleand create resonances to model the formant frequencies. In detailedmodeling of speech production [32, 34, 64], it is sometimes useful toemploy zeros (dk) of the system function to model nasal and fricative

Page 6: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Tractament Digital de la Parla 6

Speech Production

Fundamental frequency/F0/pitch

Formant frequencies

Page 7: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Tractament Digital de la Parla 7

Nasal Cavity

Pharynx (faringe)

Vocal Folds (pliegues vocales, within the Larynx = laringe)

Trachea (tráquea)

Lungs (pulmón )

(Techmer 1880)

Section of the Vocal Tract

Page 8: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Tractament Digital de la Parla 8

Larynx and Vocal Folds The Larynx (voice box) Located above the trachea (tráquea) and below the pharynx (faringe) Contains the vocal folds (adjective for larynx: laryngeal) Vocal Folds (pliegues vocales) Two bands of muscle and tissue in the larynx Can be set in motion to produce sound (voicing)

Page 9: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Vocal cords

!"!#"!##$ %&'()*+',)-./01/23++45/6+407'8,80'923++45/:;0(&4,80'

<

!"#$%&'"()*!"#$%&'"()*

=5+/>04)-/40;(./?10-(.@/10;*/)/;+-)A),80'/0.48--),0;B/C8;/3;+..&;+/D&8-(./&3/)'(/D-0E./,5+*/)3);,B//C8;/1-0E./,5;0&75/,5+/0;8184+/)'(/3;+..&;+/(;03./)--0E8'7/,5+/>04)-/40;(./,0/4-0.+B//=5+'/,5+/4F4-+/8./;+3+),+(B

The vocal cords (folds) form a relaxation oscillator. Air pressure builds up and blows them apart. Air flows through the orifice and pressure drops allowing the vocal cords to close. Then the cycle is repeated.

Page 10: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

10

Bernouilli's Principle in the Glottis (3D movement of the glottis)

vocal folds

basic horizontal open/close voicing cycle

refinement with vertical vocal fold motion

Vertical view

Air from the lungs makes a pressure difference that makes the vocal folds open. When pressure is equaled they close again.

Page 11: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Tractament Digital de la Parla 11

Vocal Fold Configurations

aspiration voicing aspirated voicing (air blowing)

Page 12: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Tractament Digital de la Parla 12

Vocal Folds Vibration

UCLA Phonetics Lab Demo

Page 13: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Glottal flow

In a voiced sound the glottis opens/closes letting air go through in bursts (see image).

In unvoiced sounds the air just goes through it.

Page 14: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Organs involved in speech production Through the modifications in the position of the speech

articulators we modify the sound coming from the vocal cords to generate sounds.

The speech articulators are the lips, jaw, the body, tip and velum of the tongue, and the hyoid bone position (which sets larynx height and pharynx width)

!"!#"!##$ %&'()*+',)-./01/23++45/6+407'8,80'923++45/:;0(&4,80'

!<

!"##$%&'()*+$,-).&/)*#0!"##$%&'()*+$,-).&/)*#0

Page 15: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Resonances of the vocal tract

The human vocal tract as an open tube Air in a tube of a given length will tend to vibrate and resonate at certain frequencies

Page 16: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Tractament Digital de la Parla 16

Resonances of the vocal tract   The vocal tract is a cylindrical tube open at one end.   Standing waves form in tubes   Waves will resonate if their wavelength corresponds to

dimensions of tube. The associated frequencies are called formants.

  Constraint: Pressure differential should be maximal at

(closed) glottal end and minimal at (open) lip end.

Source Mouth

Air pressure

Page 17: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Tractament Digital de la Parla 17

First Formant for neutral vowel

Length of the tube (vocal tract) L=17.5 cm

F1 = c/λ1 = c/(4L) = 35000 (cm/

s)/4*17.5 cm = 500Hz So we expect a neutral vowel to

have 1st resonance (formant) at 500 Hz

Page 18: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Tractament Digital de la Parla 18

Other Formants

L=3/4 λ L=5/4 λ L=7/4 λ

Page 19: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Tractament Digital de la Parla 19

Change of Vocal Tract Shape for Generating Different Spectra

Page 20: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Example: different vocal tract shapes

Martin Riches – Talking machine

Page 21: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Making speech visible: Spectrograms

Speech spectrogram: represents the sound intensity versus time and frequency.

Depending on how it is computed, it can be classified as: •  Wideband spectrogram: Spectral analysis of short

waveform sections (~10ms) with 1ms scroll. •  Frequency resolution is low •  Spectral intensity resolves individual periods of the speech and

shows vertical lines in voiced regions •  Narrowband spectrogram: Spectral analysis of long

waveform sections (~50ms) with 1ms scroll. •  Frequency resolution is high •  Spectral intensity resolves individual pitch harmonics and shows

horizontal lines in voiced regions

Page 22: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Narrowband/wideband example

Page 23: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Narrowband Spectrogram:

Page 24: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Wideband Spectrogram:

Page 25: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Voiced sounds

These are the sounds generated when the glottis is vibrating

!"!#"!##$ %&'()*+',)-./01/23++45/6+407'8,80'923++45/:;0(&4,80'

!<

!"#$%&!"#'()&*"'&+",()%&!"#$%&!"#'()&*"'&+",()%&!"#$%-!"#$%-

Page 26: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Tractament Digital de la Parla 26

Unvoiced sounds When vocal cords are open, air passes through unobstructed.

The source is usually modeled with a random number generator

Different sounds are generated the same way by changing the shape and movements of our resonant cavity

There are two kinds: •  Created by aspiration: the noide is produced in the glottis (for

example [h] in “house”) •  Created by Frication: the noise is produced above the glottis. Special case: If the air moves very quickly, the turbulence

causes a different kind of phonation: whisper

Page 27: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Tractament Digital de la Parla 27

Consonants and Vowels •  Consonants: •  Produced sometimes with changes in the vocal tract (e.g. /R/,

plosives) •  The vocal tract is usually partially or totally constricted •  phonetically, sounds with audible noise produced by a

constriction •  Vowels: •  Produced using a fixed vocal tract shape •  There is no audible noise produced by a constriction •  They are relatively long, compared to most consonants •  They are sustained sounds, always voiced •  The position of the tongue is the most important to determine the

vowel sound

Page 28: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Phonemes •  A phoneme is the link between the orthography (written

words) and the sound (spoken words). It tells us how a written word is spoken.

•  It is most important in languages like English, where many times there is no direct relationship between phonemes and graphemes (letters)

•  The phonetic transcription is the written representation of phonemes. It is based on a phonetic alphabet, which varies for every language as their sounds usually vary (e.g. /r/ in Spanish and English)

•  There are several phonetic alphabet conventions, like the IPA (International phonetic alphabet) or the Arpabet, which focuses on being able to type all phoneme symbols using a computer keyboard

Page 29: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Phonetic alphabets

Page 30: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Phonetic transcriptions examples

!"!#"!##$ %&'()*+',)-./01/23++45/6+407'8,80'923++45/:;0(&4,80'

<<

!"#$%&'()*+,$-(+'.&'#$-!"#$%&'()*+,$-(+'.&'#$-= >).+(/0'/!"#$% ?(84,80');@9>).+(A/3;0'&'48),80'./01/)--/B0;(./8'/.+',+'4+C DE@/')*+/8./F);;@G9"E"/"HI"9"J"/"HI"/"E"9"KL"/"M"9"F"/"HN"/"6"/"KI"

C DL0B/0-(/);+/@0&G9"L"/"HO"9"PO"/"F"/"Q"9"HH"/"6"9"I"/"RO"C D23++45/3;04+..8'7/8./1&'G9"2"/":"/"KI"/"SL"9":"/"6"/"HL"/"2"/"NL"/"2"/"KL"/"JT"9"KL"/"M"9"%"/"HL"/"J"

= B0;(/$&'!()!*+ )>0&'(.C D-8U+.G9"F"/"KL"/"V"/"M"/?5+/-8U+./5+;+A/U+;.&./"F"/"HI"/"V"/"M"/?)/4),/5)./'8'+/-8U+.A

C D;+40;(G9"6"/"NL"/"W"/"N6"/"Q"/?5+/50-(./,5+/B0;-(/;+40;(A/U+;.&./"6"/"KI"/"W"/"HO"/"Q"/?3-+).+/;+40;(/*@/1)U0;8,+/.50B/,0'875,A

In real life it depends on the coarticulations that exist between words to define the final phonetic transcription for the sentence

Page 31: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Phonemes classification

Phonemes can be classified according to: •  Place of articulation: where the major constriction

happens inside the vocal tract •  Manner of articulation: in which way the sound is

produced •  Phonation: Whether they are voiced or unvoiced

Usually vowels are just classified by the place of articulation of the tongue, and subdivided into vertical and horizontal positioning.

Page 32: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Tractament Digital de la Parla 32

Place of Articulation (in consonants) Consonants are classified according to the location where

the airflow is most constricted (aire es más encogido). This is called place of articulation Three major kinds of place of articulation:

  Labial (with lips, con el labio)   Coronal (using tip or blade of tongue, utilizando la punta o la

hoja de la lengua)   Dorsal (using back of tongue, utilizando el espalda de la lengua)

Page 33: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Labial place

Page 34: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Coronal place

Page 35: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Dorsal place

Page 36: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Tractament Digital de la Parla 36

Manner of Articulation There are three main manners of articulation: •  Obstruent: causes the sound by obstructing

airflow, causing increased air pressure in the vocal tract •  Examples are plosives (p,t,k,b,d,g)

•  Sonorant: it is produced without any turbulent airflow in the vocal tract •  Examples are vowels and nasals

•  Lateral: is produced with a partial occlusion along the lips and letting air flow through the sides •  Examples are the L sound

Page 37: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Tractament Digital de la Parla 37

Consonants

Place of articulation (where the constriction happens)

Man

ner o

f arti

cula

tion

Page 38: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Tractament Digital de la Parla 38

Spanish Consonants

Page 39: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Review for English consonants

!"!#"!##$ %&'()*+',)-./01/23++45/

6+407'8,80'923++45/:;0(&4,80'

<=

!"#$"%&$"'()*(+$,-(#!"#$"%&$"'()*(+$,-(#>-)..81?/'0'9@0A+-"'0'9(835,50'7/.0&'(./8'/,+;*./01/(8.,8'4,8@+/1+),&;+.

B 3-)4+/01/);,84&-),80'

C D8-)E8)-/F-83.GH3IEI*IA

C J)E80(+',)- FE+,A++'/-83./)'(/1;0',/01/,++,5G91I@

C K+',)-/F,++,5G9,5I(5

C L-@+0-);/F1;0',/01/3)-),+G9,I(I.IMI'I-

C :)-),)-/F*8((-+/01/3)-),+G9.5IM5I;

C N+-);/F),/@+-&*G9OI7I'7

C :5);?'7+)-/F),/+'(/01/35);?'PG95

B *)''+;/01/);,84&-),80'

C Q-8(+H.*00,5/*0,80'9AI-I;

C R).)-H-0A+;+(/@+-&*9*I'I'7

C 2,03H40'.,;84,+(/@04)-/,;)4,93I,IOIEI(I7

C %;84),8@+H,&;E&-+',/.0&;4+91I,5I.I.5I@I(5IMIM5I5

C N0848'7H@084+(/.0&;4+9EI(I7I@I(5IMIM5I*I'I'7IAI-I;

C S8P+(/.0&;4+HE0,5/@0848'7/)'(/&'@084+(9TI45

C U58.3+;+(995

Page 40: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Tractament Digital de la Parla 40

Vowels Place of articulation of the tongue (horizontal) P

lace

of a

rticu

latio

n of

the

tong

ue (v

ertic

al)

Page 41: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Tractament Digital de la Parla 41

Spanish Vowels

Page 42: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Tractament Digital de la Parla 42

Page 43: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Tractament Digital de la Parla 43

Vowels

IY AA UW

Page 44: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Tractament Digital de la Parla 44

Vowel Formant Frequencies

Page 45: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Vowel formant frequencies (II)

Taken from an english database with 5 male speakers and 10 repetitions per vowel and speaker

Page 46: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Vowel spectrograms

!"!#"!##$ %&'()*+',)-./01/23++45/6+407'8,80'923++45/:;0(&4,80'

$<

!"#$%&'($)*+",+-./!"#$%&'($)*+",+-./

Page 47: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

Spanish vowels

Page 48: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

El golpe de timón fue sobrecogedor…

Page 49: 3. Production and Classification of Speech · PDF file3. Production and Classification of Speech Sounds ... First Formant for neutral vowel ... varies for every language as their sounds

…para el tenaz capitán y su tripulación