Top Banner
CE 40763 Digital Signal Processing Fall 1992 Speech Production Hossein Sameti Department of Computer Engineering Sharif University of Technology
31

CE 40763 Digital Signal Processing Fall 1992 Speech Production

Mar 22, 2016

Download

Documents

hanley

CE 40763 Digital Signal Processing Fall 1992 Speech Production. Hossein Sameti Department of Computer Engineering Sharif University of Technology. Speech Generation and Perception :. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CE 40763 Digital Signal Processing Fall 1992 Speech Production

CE 40763Digital Signal Processing

Fall 1992

Speech ProductionHossein Sameti

Department of Computer Engineering Sharif University of Technology

Page 2: CE 40763 Digital Signal Processing Fall 1992 Speech Production

2

Speech Generation and Perception :

The study of the anatomy of the organs of speech is required as a background for articulatory and acoustic phonetics.

An understanding of hearing and perception is needed in the field of both speech synthesis and speech enhancement and is useful in the field of automatic speech recognition.

Page 3: CE 40763 Digital Signal Processing Fall 1992 Speech Production

3

Schematic diagram of the human speech production :

Page 4: CE 40763 Digital Signal Processing Fall 1992 Speech Production

4

Organs of Speech : Lungs and trachea :◦ source of air during speech.

◦ The vocal organs work by using compressed air; this is supplied by the lungs and delivered to the system by way of the trachea.

◦ These organs also control the loudness of the resulting speech.

◦ The trachea and lungs together constitute the pulmonary tract.

Page 5: CE 40763 Digital Signal Processing Fall 1992 Speech Production

5

Organs of Speech : The Larynx :◦ This is a complicated system of cartilages and muscle

containing and controlling the vocal cords. Principle parts are : Cricoid cartilage Thyroid cartilage Arytenoid cartilage Vocal cords

◦ The place where the vocal folds come together is called the glottis.

Page 6: CE 40763 Digital Signal Processing Fall 1992 Speech Production

6

Glottal closure during voicing

folds

folds

Vocal folds

During breathing

Page 7: CE 40763 Digital Signal Processing Fall 1992 Speech Production

7

Organs of Speech : The Vocal Tract :◦ Laryngeal pharynx

beneath epiglottis◦Oral pharynx

behind tongue, between epiglottis and velum◦Nasal pharynx

Above velum, rear end of nasal cavity◦Oral cavity

Forward of the velum and bounded by lips, tongue and palate◦Nasal cavity

Above the palate and extending from the pharynx to the nostrils

Page 8: CE 40763 Digital Signal Processing Fall 1992 Speech Production

8

Vocal Tract

Page 9: CE 40763 Digital Signal Processing Fall 1992 Speech Production

9

Vocal Tract Model

Page 10: CE 40763 Digital Signal Processing Fall 1992 Speech Production

10

A General Discrete-Time Model For Speech Production

Page 11: CE 40763 Digital Signal Processing Fall 1992 Speech Production

11

Time Waveform Of Volume Velocity Of The Glottal Source Excitation

Page 12: CE 40763 Digital Signal Processing Fall 1992 Speech Production

12

Magnitude Spectrum Of One Pulse Of The Volume Velocity At The Glottis

Page 13: CE 40763 Digital Signal Processing Fall 1992 Speech Production

13

Position Of The Vocal Cords And Cartilages (a) For Phonation (b) For Whispering

Page 14: CE 40763 Digital Signal Processing Fall 1992 Speech Production

14

Page 15: CE 40763 Digital Signal Processing Fall 1992 Speech Production

15

Speech Production : The operation of the system is divided into two

functions :◦ Excitation◦Modulation

Excitation(glottis)

Modulation(vocal tract)

Radiate

speech

Page 16: CE 40763 Digital Signal Processing Fall 1992 Speech Production

16

Vocal VowelsAH EE

EH OH OO

Duck Call

Page 17: CE 40763 Digital Signal Processing Fall 1992 Speech Production

17

Speech Production : Excitation :is done in several ways◦ Phonation (making of a voiced sound)

This is the oscillation of the vocal cords

The arytenoid cartilages close and stretch the vocal cords

When air forced through the vocal, they vibrate

The opening and closing of the cords breaks the airstream up into pulses

Page 18: CE 40763 Digital Signal Processing Fall 1992 Speech Production

18

Speech Production : The repetition rate of the pulses is termed pitch.

At low levels of air pressure oscillation may become irregular, this irregularities are known as “vocal fry”.

Speech sounds accompanied by phonation are called voiced; others, unvoiced or mute.

◦ Whispering (speak softly) The vocal cord are drown together, but with small triangular

opening between arytenoid cartilages

Page 19: CE 40763 Digital Signal Processing Fall 1992 Speech Production

19

Speech Production :◦ Frication

Frication can occur with or without phonation

◦ Compression If the release is abrupt and clean, the sound is a stop or plosive

If gradual and turbulent, the sound can pass into the related fricative and is termed an affricative

Page 20: CE 40763 Digital Signal Processing Fall 1992 Speech Production

20

Speech Production :◦Vibration

If air is forced through a closure other than the vocal cords, vibrations may be set up

Modulation◦ This is what we do to impose information on the glottal

output Articulatory phonetics: how the organs of speech are positioned to

produce any given speech sound

Acoustic phonetics: what the measurable acoustical correlates of any given speech sound are and how acoustical features in general correspond to phonetic and articulatory ones

Page 21: CE 40763 Digital Signal Processing Fall 1992 Speech Production

21

Hearing and perception : Hearing is a process which sound is received and

convert into nerve impulse

Perception is the post-processing within the brain by which the sounds heard are interpreted and given meaning

Page 22: CE 40763 Digital Signal Processing Fall 1992 Speech Production

22

The structure of peripheral auditory system :

Page 23: CE 40763 Digital Signal Processing Fall 1992 Speech Production

23

Sectional View Of The Human Ear

Page 24: CE 40763 Digital Signal Processing Fall 1992 Speech Production

24

Hearing : The ear is divided into three parts:◦ The outer ear:

Consist of the pinna (visible, convolved cartilage) Its convolved shape is provide some directional cues

The external canal (external auditory meatus) Uniform tube, 2.7 cm long by 0.7 cm across through It has a number of resonant frequencies at 3 kHz

The eardrum (tympanic membrane) Is a stiff, conical structure at the end of the meatus It vibrate in response to the sound

Page 25: CE 40763 Digital Signal Processing Fall 1992 Speech Production

25

Hearing :◦ The middle ear

Is an air-filled cavity

Separated from the outer ear by the tympanic membrane

Connected to the inner ear by the oval and round window

Connected to the outside world by way of the eustachian tube

Page 26: CE 40763 Digital Signal Processing Fall 1992 Speech Production

26

Hearing : eustachian tube permit equalization of air pressure between the

middle air and the surrounding atmosphere

the middle ear contain three tiny bone (ossicles) Malleus (hammer)

Incus (anvil)

Stapes (stirrup)

The function of the ossicles Impedance transformation

Amplitude limiting

Page 27: CE 40763 Digital Signal Processing Fall 1992 Speech Production

27

Hearing :◦ The inner ear

vestibular apparatus Used for balance and sensing orientation

The round and oval window

Cochlea Is a snail-shape passage communication with the middle ear via the round and oval window It consist the transducers which convert acoustical vibration to verve

impulses

Page 28: CE 40763 Digital Signal Processing Fall 1992 Speech Production

28

The Cochlea as It Would Appear If Unwound

Page 29: CE 40763 Digital Signal Processing Fall 1992 Speech Production

29

Cross Section Of One Turn Of The Cochlea

Page 30: CE 40763 Digital Signal Processing Fall 1992 Speech Production

30

Position Of Maximum Amplitude Along Basilar Membrance As A Function Of Applied Frequency

Page 31: CE 40763 Digital Signal Processing Fall 1992 Speech Production

31

Frequency Response Of a Point On The Basilar Membrance