-
12.5 MATLAB Exercise Composite Vocal Tract Model
Program Directory: matlab_gui\composite_vocal_tractProgram Name:
composite_vocal_tract_GUI25.mGUI data file:
composite_vocal_tract.matCallbacks file:
Callbacks_composite_vocal_tract_GUI25.mTADSP: Section 5.2.4, pp.
226-227, Figure 5.47
This MATLAB exercise illustrates the role of each of the three
components of the speech model for voiced andfor unvoiced speech
sounds, namely the vocal tract response, the glottal pulse (for
voiced sounds), and the radiationresponse at the lips.
Composite Vocal Tract Model Theory of Operation
Given a vowel sound, specified by a set ofM = 3 (in this
exercise) variable resonances and one fixed resonance at 4500Hz
(both center frequencies and bandwidths of the resonances are
given), along with a fixed radiation characteristic,and a fixed
glottal pulse, this MATLAB exercise computes and plots (on a common
graphics panel), for a givenvowel sound, the log magnitude
frequency responses of the vocal tract (as specified by the set of
M variable andone fixed resonances), the radiation characteristic,
and the glottal pulse, along with the composite response of
allthree components. For an unvoiced vowel sound (a whispered
vowel) the exercise computes and plots (on a commongraphics panel)
the log magnitude response of the vocal tract along with the log
magnitude radiation characteristic.
For the radiation characteristics, this exercise uses a simple
first order system with response:
R(z) = 1 z1 (1)For the glottal pulse characteristic, this
exercise uses a Rosenberg glottal pulse approximation with a rise
durationof 6% of the pitch period and a decay duration of 3% of the
pitch period. The exercise also solves for the impulseresponse of
the composite system and plots it on the upper graphics panel.The
vowel spectrum is computed as a cascade of M = 3 variable and one
fixed second order systems (each onerepresenting one of the M
formants of the designated vowel sound) of the form:
Hv(z) =
Mk=1
1 2e2piBk/Fs cos(2piFk/Fs) + e4piBk/Fs1 2e2piBk/Fs
cos(2piFk/Fs)z1 + e4piBk/Fsz2 (2)
where M is 3 for this exercise, Fk, k = 1 :M is the set of vowel
formant center frequencies, Bk, k = 1 :M is the setof vowel formant
bandwidths, and Fs is the sampling frequency of the system (10000
Hz is nominally chosen as thesampling rate for this exercise).
Composite Vocal Tract Model GUI Design
The GUI for this exercise consists of two panels, 4 graphics
panels, 1 title box and 11 buttons. The functionality ofthe two
panels is:
1. one panel for the graphics display,
2. one panel for parameters related to the composite vocal tract
model, and for running the program.
The set of four graphics panels is used to display the
following:
1. the impulse response of the vowel selected for this exercise
from a set of 10 standard vowel sounds (top grphicspanel),
-
22. the log magnitude frequency response of the Rosenberg
glottal pulse, the selected vowel sound, and the radiationfilter,
along with the composite vocal tract log magnitude frequency
response for voiced vowel sounds. Forwhispered vowel sounds, this
exercise plots the log magnitude frequency response of the selected
vowel sound,along with the log magnitude frequency response of the
radiation characteristic, and the composite response forunvoiced
vowel sounds (second graphics panel).
3. a wideband spectrogram display of the composite vowel sound,
using either voiced or unvoiced excitation (thirdgraphics
panel),
4. a narrowband spectrogram display of the composite vowel
sound, using either voiced or unvoiced excitation(fourth graphics
panel).
The title box displays the information about the selected vowel
sound, the sampling rate and the period of the excitationpulses (in
msec). The functionality of the 11 buttons is:
1. a popupmenu button that allows the user to select the vowel
sound (from a set of 10 vowel sounds) which is tobe synthesized
using the glottal pulse stream and the vowel formants,
2. an editable button that specifies the period of the voiced
vowel excitation pulses in msec; (the default value is10 msec),
3. an editable button that specifies the sampling rate of the
simulation in samples per second (the default value isFs = 10000
samples per second),
4. a popupmenu button that allows the user to select the vowel
sound excitation as either periodic Glottal Pulses(for voiced vowel
sounds), or Gaussian noise excitation (for whispered vowel
sounds),
5. an editable button that specifies the opening cycle of the
glottal pulse as a percentage of the pitch period of thesound,
6. an editable button that specifies the closing cycle of the
glottal pulse as a percentage of the pitch period of thesound,
7. a pushbutton to compute and plot the log magnitude response
of the three individual vowel sound components(for voiced vowels)
and the two individual vowel sound components (for whispered
vowels), along with a plotof the composite log magnitude frequency
response of the selected vowel sound (either voiced or
whispered),
8. a pushbutton to play, in sequence, the pitch pulse sequence,
the excitation function obtained from the convolutionof the glottal
pulse with the pitch pulse sequence, and the resulting vowel sound
obtained by convolving thevowel impulse response with the
excitation sequence (for voiced vowel sounds), or the noise
excitation functionand the composite whispered vowel sound,
9. an editable button that specifies the spectrogram dynamic
range of plotting; the dynamic range parameter,dyn_range, has a
default of 60 dB,
10. a pushbutton to plot wideband and narrowband spectrograms
for either the voiced vowel or the whispered vowelon separate
graphics panels,
11. a pushbutton to close the GUI.
Composite Vocal Tract Model Scripted RunA scripted run of the
program composite vocal tract GUI25.m is as follows:
1. run the program composite vocal tract GUI25.m from the
directory matlab gui\composite vocal tract,
2. using the popupmenu button, select the vowel sound IY for
this exercise, where the choice of vowels includethe 10 vowel
sounds [IY, IH, EH, AE, AH, AA, AO, UH, UW, ER],
-
33. using the editable buttons, choose a period, iperiod, and a
sampling rate, fsd, for the exercise; the defaultvalues are 10 msec
for iperiod and 10000 samples per second for fsd,
4. using the popupmenu button, select the excitation for the
vowel sound, namely either Glottal Pulses (for voicedvowels) or
White Noise (for whispered vowels),
5. using the editable buttons, choose values for the glottal
pulse opening and closing cycles for voiced vowels,specified as
percentages of the pitch period; the defaults of 6% opening cycle,
and 3% closing cycle representvery narrow glottal pulses, which
produce excellent quality vowel sounds and useful vowel
spectrograms,
6. using the editable button for spectrogram dynamic range, dyn
range, choose a desired value; 60 dB is a fairlygeneral range for
most vowel sounds,
7. hit the Plot Vowel Spectrogram button to plot the components
of the voiced vowel spectrum, namely the glottalpulse spectrum, the
vocal tract response, the radiation response, and the composite
spectrum for voiced vowelsounds; similarly, for whispered vowel
sounds, hitting this button plots the components of the whispered
vowelspectrum, namely the vocal tract response, the radiation
response, and the composite spectrum for whisperedvowel sounds,
8. hit the Play Vowel Sound button to play out a fixed number of
samples of the impulse excitation, followed bythe selected vowel
sound excited by the glottal pulse excitation (for voiced vowel
sounds), or the noise excitationand the whispered vowel (for
whispered vowel sounds); hitting the Play Vowel Sound causes a
message box topop up and specify the sequence of sounds to be
played either for voiced vowel sounds or for whispered speech,
9. experiment with different vowel sounds, pitch periods,
sampling rates, and excitations to see and hear the impacton the
resulting vowel sound,
10. hit the Close GUI button to terminate the run.
An example of the graphical output obtained from this exercise
using the vowel sound /IY/ with an excitation pulseperiod of 10
msec, and a sampling rate of fsd=10000 samples per second with
Glottal Pulses excitation is shown inFigure 1 and with White Noise
excitation is shown in Figure 2. The upper graphics panel shows the
vowel impulseresponse and the second graphics panel shows the log
magnitude frequency response of each of the three componentsof the
vowel response (for voiced vowel sounds), and for each of the two
components of the vowel response (forwhispered vowel sounds), along
with the composite response of the vocal tract. The third graphics
panel shows awideband spectrogram of the vowel sound, and the
fourth graphics panel shows a narrowband spectrogram of thevowel
sound.
Composite Vocal Tract Model Issues for Experimentation1. run the
scripted exercise above, and answer the following:
how long (in msec) is the impulse response of the synthetic
vowel used for this exercise (IY in this case)?
what is the pitch frequency, in Hz, for this exercise?
how much improvement in sound quality is obtained by using the
Rosenberg glottal pulse, the radiationresponse curve at the lips,
and the vocal tract response, as obtained from a 3-formant
representation ofeach of the 10 vowel sounds?
2. change the vowel sound to any other vowel in the list and
repeat the exercise with the new vowel sound
is there a uniformity of the vowel sound quality across the
different vowel sounds?
does any of the vowels sound much better than the others?
does any of the vowels sound much worse than the others?
-
4Figure 1: Voiced vowel sound impulse response (upper graphics
panel) and log magnitude frequency responses ofvocal tract response
components (second graphics panel). Plots of wideband and
narrowband spectrograms, for thevoiced vowel sound, are shown in
the third and fourth graphics panels.
3. change either the pitch pulse period or the sampling rate
what effect does changing the pitch period have on the resulting
vowel quality? what effect does changing the sampling rate have on
the resulting vowel quality?
4. using the IY vowel sequence, answer the following
questions:
the radiation frequency response and the vocal tract frequency
response for the IY vowel emphasize highfrequencies. Why is the
overall composite frequency response lower at high frequencies than
at low fre-quencies?
what feature(s) of the composite frequency response account for
the rapidly varying components of theimpulse response?
note that there are two small bumps in the frequency response at
very low frequencies. Which componentcauses each of these
bumps?
5. using the IH vowel sequence, answer the following
questions:
compare the components of the vowels IH and IY. How have the
formant frequencies changed from IY toIH?
why does the impulse response of IH have more pronounced rapidly
varying components? note that the frequency response of IH has two
distinct peaks at low frequencies. The lowest frequency
peak is called the voice bar for the way it would show up in a
spectrogram. What part of the modelcauses the voice bar peak in the
composite frequency response?
what requirement on the first formant frequency is necessary for
there to be a clearly distinguished voicebar peak?
-
5Figure 2: Whispered vowel sound impulse response (upper
graphics panel) and log magnitude frequency responsesof vocal tract
response components (second graphics panel). Plots of wideband and
narrowband spectrograms, for thewhispered vowel sound, are shown in
the third and fourth graphics panels.