INTRODUCTION:
The objective of this thesis is to research and develop prosodic
features for discriminating proper names uses in alerting (e.g.,
John, can I have that book?) from reverential context (e.g., I saw
John yesterday). Prosodic measurements based on pitch and energy
are analyzed to introduce new prosodic based features to the
Wake-Up-Word Speech Recognition system . During the process of
finding the prosodic features, an innovative data collection method
had been designed and developed.
In the conventional automatic speech recognition system, the
users are required to physically activate the recognition system by
clicking a button, or by manually starting the application. The
Wake-Up-Word Speech Recognition system invented by Kpuska the way
people can activate their systems by enabling the users to use
their voice only. The Wake-Up-Word Speech Recognition System will
eventually further improve the way people use speech recognition
system by enabling speech only interfaces.
In the Wake-Up-Word Speech Recognition system , a word or phrase
is used as a Wake-Up-Word (WUW) word indicating the system that the
user requires its attention (e.g., alerting context). Any user can
call activate the system by uttering WUW (e.g., Operator), that
will enable the application to accept the following command (e.g.,
Next slide please). Since the same word may occur during
referential context, instead of needing attention from the system,
it is important to discriminate accurately between the two. This
use of the same word refers to it as a none-Wake-Up-Word, nonWUW
context. The following examples further demonstrate the use of the
word Operator in those two contexts:
Example sentence 1: Operator, please go to the next
slide.Example sentence 2: We are using the word operator as the
WUW.
Depicted cases above indicate different user intentions. In the
first example, the word operator is been used as a way to alert the
system and get its attention. In the second example, the same word
operator, is used to refer to it and thus the term referential
context. Current Wake-Up-Word Speech Recognition system implements
only the pre & post WUW silence as a prosodic feature to
differentiate the alerting and referential contexts. In this
thesis, pitch and energy based prosodic features are used. The
problem of general prosodic analysis is introduced in Section
1.1.
In Chapter 2, the use of the pitch as a prosodic feature is
described. The pitch in general represents the intonation of the
speech, and the intonation is used to convey linguistic and
paralinguistic information of that speech (Lehiste, 1970) . The
definition and characteristics of pitch will be covered in Section
2.1. In Section 2.2, pitch estimation method named eSRFD (Enhanced
Super Resolution Fundamental Frequency Determinator) (Bagshaw,
1994) is introduced. Finally, in Section 2.3, derivation of
multiple pitch-based features from pitch measurements to find the
best feature to discriminate the WUW used in alerting context from
reverential.
In Chapter 3, an additional prosodic feature based on energy is
described. The definition of prominence, an important prosodic
feature based on energy and pitch, and its characteristics will be
covered in Section 3.1. In the following Section 3.2, description
of energy is computation is presented. Finally, in Section 3.3,
derivation of multiple energy features from the energy measurement
is presented and analyzed.
In Chapter 4, an innovative idea of performing speech data
collection is presented. After a number of prosodic analysis
experiments conducted using WUWII Corpus (Tudor, 2007), the
validation of obtained results was deemed necessary using a
different data set. Since to our knowledge no specialized speech
database is available, the Dr. Wallaces idea was adopted to collect
the data from the movies. We designed a system which extracts
speech from audio channel and if necessary video information from
recorded medium (e.g., DVD) of movies and/or TV series. This
project is currently under development by the Dr. Kpuskas VoiceKey
Group.
The problem definition and system introduction will be explained
in Section 4.1, followed by the system design in Section 4.2.
1.1 Prosodic Analysis
The word prosody refers to the intonation and rhythmic aspect of
a language (Merriam-Webster Dictionary). Its etymology comes from
ancient Greek, where it was used in singing with instrumental
music. In later time, the word was used for the science of
versification and the laws of meter, governing the modulation of
the human voice in reading poetry aloud. In modern phonetics the
word prosody is most often referred to those properties of speech
that cannot be derived from the segmental sequence of phonemes
underlying human utterances. (William J. Hardcastle, 1997).
Based on the phonological aspect; the prosody maybe classified
into structure, tune and prominence.
1. The prosodic structure refers to the noticeable break or
disjunctures between words in sentences which can also be
interpreted as the duration of the silence between words as a
person speak. This factor has been considered in the current
Wake-Up-Word Speech Recognition system where the minimal silence
period before the WUW and after must be present. The silence period
before the WUW is usually longer than the average silence period of
nonWUW or other parts of the sentence.
2. The tune refers to the intonational melody of an utterance
(Jurafsky & Martin) which can be quantified by pitch
measurement also known as fundamental frequency of the speech. The
detail on the pitch characteristic, pitch estimation algorithm and
the usage of pitch features are presented and explained in Chapter
2.
3. Finally, the prominence includes the measurement of the
stress and accent in a speech. The prominence is measured in our
experiments using the energy of the sound. The details of energy
computation, feature derivation based on energy, and experimental
results are presented in Chapter 3.
PITCH FEATURES:
In this chapter intonation melody of an utterance, computed
using pitch measurement, is descried. The pitch feature also
referred as fundamental frequency, and the comparison of various
pitch estimation algorithms are covered in section 2.1. Based on
those results from multiple fundamental frequency algorithms (FDA)
the eSRFD (Enhanced Super Resolution Fundamental Frequency
Determinator) is selected as the algorithm of choice to perform the
pitch estimation. The details of the eSRFD algorithm are covered in
section 2.2. Derivation of multiple pitch-based features and their
performance evaluations are covered in section 2.3.
1.1 Pitch and pitch estimation methods
The intonation is one of the prosodic features that contain the
information that may be the key to discriminate the referential
context and the alerting context. The intonation of a speech is
strictly interpreted as the ensemble of pitch variations in the
course of an utterance (Hart, 1975). Unlike tonal languages such as
Mandarin Chinese language that has lexical forms that are
distinguished by different levels or patterns of pitch of a
particular phoneme. The pitch in the intonation languages such as
English language, Germanic languages, Romance languages, and
Japanese languages, is been used syntactically. In addition, the
intonation patterns in the intonational languages are grouped with
number of words which are called intonation groups. Intonation
groups of words are usually uttered in one single breath. The pitch
measurement in the intonation languages reveals the emotion of a
person and the intention of his/her speech. For example:
Can you pass me the phone?
The pattern of continuous rising pitch in the last three words
in the above sentence indicates a request.
In strict terms, pitch is defined as the fundamental frequency
or fundamental repetition of a sound. The typical pitch range for
an adult male is between 60-200 Hz and 200-400 Hz for adult female
and children. The contraction of vocal fold produces relatively
high pitch and vice versa the expended vocal fold produces lower
pitch. This explains the reason a persons rise in pitch when he/she
gets nervous or surprised. The reason why a male usually has a
lower pitch than female and children can also be explained by the
fact that males usually have longer and larger vocal folds.
After years of development of pitch estimation algorithms, pitch
estimation methods can be classified into the following three
categories:
1. Frequency based methods such as CFD (Cepstrum-based F
determinator) and HPS (Harmonic product spectrum), use frequency
domain representation of the speech signal to find the fundamental
frequency.
2. Time domain based methods such as FBFT (Feature-based F
tracker) (Phillips, 1985) uses perceptually motivated features and
PP (Parallel processing method) produce fundamental frequency
estimates by analyzing the waveform in the time domain.
3. Cross-correlation method such as IFTA (Integrated F tracking
algorithm) and SRFD (Super resolution F determinator) uses a
waveform similarity metric based on a normalized cross-correlation
coefficient.
The method of eSRFD (Enhanced Super Resolution Fundamental
Frequency Determinator) (Bagshaw, 1994) was chosen to extract the
pitch measurement for the Wake-Up-Word because of its high overall
accuracy. According to Bagshaws experiments, the accuracy of the
eSRFD algorithm can have voiced and unvoiced combined error rate
below 17% and low-gross fundamental frequency error rate of 2.1%
and 4.2% for male and female respectively. The Figure 2.1 and
Figure 2.2 below show the error rate comparison charts between
eSRFD and other FDAs for male and female voice respectively.
Figure A1 FDA Evaluation Chart: Male Speech. Reproduced from
(Bagshaw, 1994)
In the Figure 2.1 and Figure 2.2, the purple bars indicate the
low-gross F error which refers to the halving error where the pitch
has been estimated wrongly with a value about half of the actual
pitch. The green bars represent the high-gross F error which refers
to the doubling error where the pitch has been estimated wrongly
with a value about twice of the actual pitch. The voiced error
represented by red bars refers to the miss identified unvoiced
frames as voiced ones by the FDA. Finally the unvoiced error means
the voiced data is been miss identified as unvoiced data and this
error is represented by blue bars.
Figure A2 FDA Evaluation Chart: Female Speech. Reproduced from
(Bagshaw, 1994)
The Figure 2.1 and Figure 2.2, refer to male and female
fundamental frequency evaluation charts. They depict that the eSRFD
algorithm achieves the lowest overall error rate. This result was
confirmed in the more recent study of (Veprek & Scordilis,
2002). Consequently, eSRFD it has been chosen to be the FDA in our
project.
1.2 eSRFD Frequency Determinator Algorithm
The eSRFD is the advanced version of SRFD (Medan, 1991); The
program flow chart of the eSRFD FDA is illustrated in Figure
2.3.
The theory behind the SRFD algorithm is to use a normalized
cross-correlation coefficient to quantify the degree of similarity
between two adjacent, non-overlapping sections of speech. In eSRFD,
a frame is been divided in three consecutive sections instead of
two as in the original SRFD algorithm.
At the beginning, the sample waveform is passed through a
low-pass filter to remove the signal noise. The sample utterance is
then divided into non-overlapping frames of 6.5 ms length
(tinterval = 6.5ms) and each frame contains a set of samples, SN,
where which is divided into 3 consecutive segments each containing
equal number of a varying number of samples, n. The definition of
segmentation is defined by the Equation 21 below and further
described in Figure 2.4 below.
Figure A1 eSRFD Flow chart
Equation 21
Figure A2 Analysis segments of eSRFD FDA
In eSRFDA each frame is processed by a silence detector which
labels the frame as unvoiced if the sum of the absolute values of
xmin, xmax, ymin, ymax, zmin and zmax is smaller than a preset
value (e.g., 50db signal-to-noise level), vice versa, the frame is
voiced if the sum of the absolute values of xmin, xmax, ymin, ymax,
zmin and zmax is larger than a preset value (e.g., 50db
signal-to-noise level). No fundamental frequency will be searched
if the frame is marked as an unvoiced frame. In cases where at
least one of the segments of xn, yn or zn is not defined, which
usually happens at the beginning of the speech file and the end of
the speech file, these frames will be labeled as unvoiced and no
FDA will be applied to these frames.
If the frame of sample is not labeled as silence, then candidate
values for the fundamental period are searched from values of n
within the range Nmin to Nmax by using the normalized
cross-correlation coefficient Px,y(n) as described by Equation
22.
Equation 22
In the Equation 22, the decimation factor L is used to lower the
computational load of the algorithm. The smaller L values allow
higher resolution but also causes increase in computational load of
FDA. Larger L values produce faster computation with lower
resolution search. The L is set to 1 since the purpose of this
research is to find as accurate as possible relationship between
pitch measurements in WUW words, thus the computational speed is
considered secondary and thus is not taken into account. However,
the variable L will be considered when this algorithm is integrated
into the WUW Speech Recognition System.
Figure A3 Analysis segments for Px,y(n) in the eSRFD
The candidate values of the fundamental period of a frame are
found by locating peaks in the normalized cross-correlation result
of Px,y(n). If this value exceeds a specified threshold, Tsrfd,
then the frame is further considered voiced candidate. This
threshold is adaptive and is dependent on the voice classification
of the previous frame and three preset parameters. The definition
of Tsrfd is described in the Equation 23. If the previous frame is
unvoiced or silent, the Tsrfd is equal to 0.88. If the previous
frame is voiced, the Tsrfd is equal to the larger value between
0.75 and 0.85 times the value of Px,y of previous frame Px,y. The
threshold is adjusted because the present frame has higher
possibility to be classified as voiced if the previous frame is
voiced as well.
If the previous frame is unvoiced or silent.
If the previous frame is unvoiced or silent.
Equation 23
In case no candidates for the fundamental period are found in
the frame, the frame is reclassified as unvoiced and no further
processing will be applied to the unvoiced frame. In another case,
the frame is classified as voiced and following process will be
used to find the optimal candidate as described next.
After getting the first normalized cross-correlation coefficient
Px,y, the second normalized cross-correlation coefficient Py,z,
will be calculated for the voiced frame. The normalized
cross-correlation coefficient Py,z is described by the Equation 24
below.
Equation 24
After the second normalized cross-correlation, the score will be
given to all candidates. If the candidate pitch value of a frame
has both Px,y and Py,z larger than Tsrfd, a score of 2 is given to
the candidate. If only Px,y is above Tsrfd, a score of 1 is
assigned to the candidate. The higher score indicates higher
possibility for the candidate to represent the fundamental period
of the frame. After candidate scores are given, if there are one or
more candidates with a score of 2, all candidates score with 1 in
that frame are removed from the candidate list. If there is only
one candidate with score of 2, then the candidate is assumed to be
the best estimation of fundamental period of the particular frame.
If there are multiple candidates with score 1 but no candidate
scores of 2, an optimal fundamental period is sought from the
remaining candidates.
For the case of multiple candidates with score 1 but no
candidate scores of 2, the candidates are sorted in ascending order
of fundamental period. The last candidate of the list which has the
largest fundamental period represents a fundamental period of nM
and nm for mth candidate.
Figure A4 Analysis segments for q(nm) in the eSRFD
Then the third normalized cross-correlation coefficient, q(nm),
between two sections of length nM spaced nm apart, is calculated
for each candidate. The Equation 25 describes the normalized
cross-correlation coefficient, q(nm) used in this case.
Equation 25
After the third normalized cross-correlation coefficient is
generated, the q(nm) of the first candidate on the list is assumed
to be the optimal value. If the following q(nm) multiplied by 0.77,
is larger than the current optimal value, the candidate for which
q(nm) is considered to be the new optimal value. We apply the same
concept through the list of candidates; resulting with the optimal
candidate value.
For the case where only one candidate has score of 1 and no
candidate scores of 2, the possibility for the candidate to be the
true fundamental period of the frame is low. In such a case, if
both previous frames and subsequent frame are silent, the current
frame is an isolated frame and is reclassified as silent frame. If
either the previous or the next frame is voiced frame, we assume
the candidate of the current frame is the optimal and it defines
the fundamental period of the current frame.
The above algorithm has high possibility to miss identify the
voiced frame to unvoiced or silent frames. In order to counteract
this imbalance, biasing is applied when all of the three conditions
below are satisfied:
The two previous frames were voiced frames.
The fundamental period of the previous frame is not temporarily
on hold.
The fundamental frequency of the previous frame is less than 7/4
times the fundamental frequency of its next voiced frame and
greater than 5/8 of the next frame.
After getting the fundamental frequency, in order to further
minimize the occurrence of doubling or halving errors, the pitch
contour is passed through a median filter.
The median filter is of length 7 as the default size, but it
will decrease the size to 5 or 3 in case there are less than 7
consecutive voiced frames. The Figure 2.7 below shows an example of
doubling points being corrected by the medium filter. In the Figure
2.7, the top row shows the pitch measurement generated by eSRFD FDA
and the bottom row shows the fixed measurement by medium filter. As
we can see from the figure, the two points marked as doubling error
were fixed by medium filter.
(Doubling Error)
Figure A5 medium filter example
We applied the above pitch estimation method to the WUWII
(wake-up-word II corpus) which contains approximately 3410
utterances and every utterance contains at least one WUW. The
Figure 2.8 displays a sample utterance containing the following
sentence:
Hi. You know, I have this cool wildfire service and, you know,
I'm gonna try to invoke it right now. Wildfire
Figure A6 Example, WUWII00073_009.ulaw
In the Figure 2.8, the first row shows the waveform of the
speech, the second row shows the pitch estimation from eSRFD FDA,
the third shows the pitch estimation after the medan filter and the
last row shows the spectrogram of the speech. The WUW of this
sentence is Wildfire which is the section delineated between two
red lines.
1.3 Pitch features
The pattern of the fundamental frequency contour of utterance
waveforms represents the intonation of the speech. Since this
problem of discriminating the use of the words in alerting context
from referential context to our best knowledge has never been done
before, a specialized corpus containing WUWs is necessary. In this
project, the corpus named WUWII was chosen. The WUWII corpus
contains 3410 sample utterances and each utterance sentence
contains at least one of the five different WUWs. The 5 WUWs are
Wildfire, Operator, ThinkEngine, Onword and Voyager.
In our hypothesis, the intonation rise on the WUW, thus there
should be an increment on the average pitch and maximum pitch on
the WUW sections compare to the nonWUW sections.
Based on the above hypothesis, the average pitch and maximum
pitch of the WUW are considered and the following twelve features
are derived.
1. APW_AP1SBW: The relative change of the average pitch of WUW
to the average pitch of the previous section just before WUW.
2. AP1sSW_AP1SBW: The relative change of the average pitch of
the first section of WUW to the average pitch of previous section
just before WUW.
3. APW_APALL: The relative change of the average pitch of WUW to
the average pitch of the entire speech sample excluding the WUW
sections.
4. AP1sSW_APALL: The relative change of the average pitch of the
first section of the WUW to the average pitch of the entire speech
sample excluding the WUW sections.
5. APW_APALLBW: The relative change of the average pitch of the
WUW to the average pitch of entire speech sample before the
WUW.
6. AP1sSW_APALL: The relative changes of the average pitch of
the first section of the WUW to the average pitch of the entire
speech sample excluding the WUW sections.
7. MaxP_MaxP1SBW: The relative change of the maximum pitch in
the WUW sections to the maximum pitch in the previous section just
before the WUW.
8. MaxP1sSW_MaxP1SBW: The relative change of the maximum pitch
in the first section of the WUW to the maximum pitch of the
previous section just before the WUW.
9. MaxPW_MaxPAll: The relative change of the maximum pitch of
the WUW to the maximum pitch of the entire speech sample excluding
the WUW sections.
10. MaxP1sSW_MaxPAll: The relative change of the maximum pitch
of the first section of the WUW to the maximum pitch of the entire
speech sample excluding the WUW sections.
11. MaxP1sSW_MaxPAllBW: The percentage changes of the maximum
pitch in the first section of the WUW to the maximum pitch of the
entire speech before the WUW.
12. MaxPW_MaxPAllBW: The percentage changes of the maximum pitch
in the WUW sections to the maximum pitch of the entire speech
sample before the WUW.
In the presented experiment, no significant discriminating
pattern is found from the results. The results of WUW experiments
using the pitch measurement defined above are shown in Table 21.
The best feature for the all WUWs is the relative change of the
maximum pitch of WUW to the maximum pitch of the previous section
just before WUW. The result can be improved if clear syllabic
boundaries are defined. However, syllabuses in English language are
not clearly defined. The details of the results are shown in
Appendix A.
Beside the above features, other approaches such as pitch
measurement patterns can also been used to discriminate the WUWs
and nonWUWs. This is one of the current research topics by Raymond
Sastraputera, a graduate student working with Dr. Kepuska. The
potential approaches of pitch based features are covered in the
Chapter 5.
WUW: All
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
APW_AP1SBW
1415
726
51
0
0
689
49
AP1sSW_AP1SBW
1415
735
52
0
0
680
48
APW_APALL
2282
947
41
0
0
1335
59
AP1sSW_APALL
2282
996
44
2
0
1284
56
APW_APALLBW
2188
962
44
0
0
1226
56
AP1sSW_APALL
2188
1003
46
2
0
1183
54
MaxP_MaxP1SBW
1415
948
67
53
4
414
29
MaxP1sSW_MaxP1SBW
1415
719
51
54
4
642
45
MaxPW_MaxPAll
2282
1020
45
109
5
1153
51
MaxP1sSW_MaxPAll
2282
716
31
213
9
1353
59
MaxP1sSW_MaxPAllBW
2188
1069
49
111
5
1008
46
MaxPW_MaxPAllBW
2188
1003
35
2
10
1183
55
Table A1 Pitch Features Result, All WUWs
ENERGY FEATURES
As it was mentioned in Section 1.1, the prominence can be
measured using energy of the utterance. If pitch represents the
intonation of a speech then the energy is representing the stress
of the speech. In this chapter the same concept from the pitch that
described in Chapter 2 was used with energy to generate the similar
feature set.
1.4 Energy Characteristic
In an English sentence, certain syllables are more prominent
than others and these syllables are called accented syllables.
Accented syllables are usually either louder or longer compared to
the other syllables in the same word. In English language,
different position of the accented syllables on the same word is
used to differentiate the meaning of the word. For example, the
word object (noun [ ab.dzekt ]) compared to the same word object
used as verb (verb [ab.dzekt]) (Cutler, 1986) has a different place
of accented syllables. The position of the accent syllables is
indicated by in the phonetic transcription. If this idea of
accented speech is applied to the entire sentence instead of one
single word, then it may provide additional clues about the use of
a word of interest and its meaning within the sentence.
Classifying the factors that model speakers speech and how they
choose to accentuate a particular syllable within the whole
sentence is a very complex problem. However, the measurement of the
accented syllables can be simply done by using energy of the speech
signal and its pitch change.
1.5 Energy Extraction
The energy of speech signal can be expressed by Parsevals
Theorem as in the Equation 3-1 below.
Equation 31
In the Equation 3-1, the energy of a signal is been defined in
both the time or frequency domain. Both |x[n]|2 and |X()|2
represent the energy density which can be thought as energy per
unit of time and energy per unit of frequency.
The energy of a fixed frame size (6.5ms), same as in pitch
computation, is used here as well. After the energy is calculated
for all samples of each utterance in the WUWII corpus, the energy
features are computed in similar fashion as the pitch features
section 2.3 as described in the next section.
1.6 Energy Features
As in the previous experiments with pitch features, 12 energy
based features were computed and tested. The features are
represented as the relative change which is explained in the
Equation 32.
Relative Change between A and B=
Equation 32
The features are listed below:
1. AEW_AE1SBW: The relative change of the average energy of the
WUW to the average energy of previous section just before the
WUW.
2. AE1sSW_AE1SBW: The relative change of the average energy of
the first section of the WUW to the average energy of previous
section just before the WUW.
3. AEW_AEAll: The relative change of the average energy of the
WUW to the average energy of the entire sample speech excluding the
WUW sections.
4. AE1sSW_AEAll: The relative change of the average energy of
the first section in the WUW to the average energy of the entire
utterance excluding the WUW sections.
5. AEW_AEAllBW: The relative change of the average energy of the
WUW to the average energy of all speech before the WUW.
6. AE1sSW_AEAllBW: The relative change of the average energy of
the first section in the WUW to the average energy of the entire
sample speech before the WUW.
7. MaxEW_MaxE1SBW: The relative change of the maximum energy in
the WUW sections to the maximum energy in the previous section of
the WUW.
8. MaxE1sSW_MaxEAllBW: The relative change of the maximum energy
in the first section of WUW to the maximum energy in the entire
speech before of the WUW.
9. MaxEW_MaxEAll: The relative change of the maximum energy in
the WUW to the maximum energy of the entire speech sample excluding
the WUW section.
10. MaxE1sSW_MaxEAll: The relative change of the maximum energy
in the first section of the WUW to the maximum energy of the entire
speech sample excluding the WUW section.
11. MaxE1sSW_MaxEAllBW: The relative change of the maximum
energy in the first section of the WUW to the maximum energy of the
entire speech before the WUW.
12. MaxEW_MaxEAllBW: The relative change of the maximum energy
in the WUW sections to the maximum energy of the entire speech
sample before the WUW.
In this experiment few of the features may not be implementable
in real-time application since they relay on the measurements after
the WUW word of interest. However for it may lead to interesting
conclusions. For real time speech recognition systems those
features that do not relay on the features past WUW word of
interest are the most useful. The Table 31 below shows the results
of the measurements of on energy features based on all WUWs of
WUWII corpus, namely the words Operator, ThinkEngine, Onword,
Wildfire and Voyager. The details broken done for each word are
included in Appendix B.
WUW: All WUWs
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
AEW_AE1SBW
1479
1164
79
0
0
315
21
AE1sSW_AE1SBW
1479
1283
84
1
0
240
16
AEW_AEAll
2175
1059
49
9
9
1116
51
AE1sSW_AEAll
2175
1155
53
2
0
1018
47
AEW_AEAllBW
1969
1427
72
0
0
542
28
AE1sSW_AEAllBW
1969
1562
79
3
0
404
21
MaxEW_MaxE1SBW
1479
1244
84
20
1
215
15
MaxE1sSW_MaxEAllBW
1479
1221
83
13
1
245
17
MaxEW_MaxEAll
2175
1373
63
13
1
245
17
MaxE1sSW_MaxEAll
2175
1336
61
25
1
814
37
MaxE1sSW_MaxEAllBW
1969
1209
61
16
1
744
38
MaxEW_MaxEAllBW
1969
1562
60
3
1
404
39
Table A1 Energy Feature Result of All WUW
Based on the results shown in Table 31 above, the following
three features performed the best in discriminating WUW word from
others word tokens:
AE1sSW_AE1SBW: The relative change of the average energy of the
first section in the WUW compared to the average energy of the last
section before WUW. Using this feature 84% of data shows the
average energy of the first section of the WUW is higher than the
average energy of the previous section. The result is illustrated
in Figure 3.1 below depicting distribution of features as well as
cumulative distribution.
Figure A1 Distribution and Cumulative plots of energy feature,
AE1sSW_AE1SBW.
MaxEW_MaxE1SBW: The relative change of the Maximum energy in the
WUW sections compared to the maximum energy from the last section
before WUW. Using this feature 84% of the samples show that the
maximum energy in the WUW sections is higher than the maximum
energy of the previous section. The distribution o features as well
as cumulative distribution are shown in the Figure 3.2 below.
Figure A2 Distribution and Cumulative plot of energy feature,
the Max Energy of WUW.
The relative change of the Maximum energy of the first section
of WUW compared to the maximum energy from the last section before
WUW. This feature correctly discriminated 83% of cases that
exhibited higher maximum energy of the first section of WUW than
the maximum energy of the previous section. The cumulative and
distribution plots of this feature are shown in the Figure 3.3.
Figure A3 Distribution and Cumulative plot of energy feature,
the Max Energy of the 1st section in WUW.
The above results are based on all the data including all five
different WUWs. Thus, investigating each word independently may be
more appropriate. The detail performance result of each individual
WUWs is covered in Appendix B.
Linguistically, one of the more appropriate WUWs is the word
Operator. This word is also been used in the current Wake-Up-Word
Speech Recognition System . Based on the result on the Table 32,
two features show over 90% of the WUW cases to have average or
maximum energy is higher than the other regions of the speech.
These two features are:
AE1sSW_AE1SBW: The relative change of the average energy of the
first section in the WUW compare to the average energy of the last
section before WUW. Using this feature 94% of samples has the first
section of the WUW with higher average energy then previous
section.
AE1sSW_AEAllBW: The relative change of the average energy of the
first section in the WUW compared to the average energy of the
entire speech before the WUW sections. Using this feature 91% of
samples show the first section of WUW has higher average
energy.
WUW: Operator
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
AEW_AE1SBW
275
228
83
0
0
47
17
AE1sSW_AE1SBW
275
258
94
0
0
17
6
AEW_AEAll
418
248
59
0
0
170
41
AE1sSW_AEAll
418
290
69
1
0
127
30
AEW_AEAllBW
394
303
77
0
0
91
23
AE1sSW_AEAllBW
394
359
91
1
0
34
9
MaxEW_MaxE1SBW
275
240
87
1
0
34
12
MaxE1sSW_MaxEAllBW
275
243
88
0
0
32
12
MaxEW_MaxEAll
418
290
69
4
1
124
30
MaxE1sSW_MaxEAll
418
285
68
6
1
127
30
MaxE1sSW_MaxEAllBW
394
272
69
4
1
118
30
MaxEW_MaxEAllBW
394
359
68
1
1
34
30
Table A2 Energy Feature Result of WUW Operator
Based on the preformed experiment, WUW Wildfire achieved the
best overall result. Using this word, 4 features scored higher than
90%. The results are shown on Table 33. The four best features
are:
AEW_AE1SBW: The relative change of the average energy of the
entire WUW compared to the average energy of the last section just
before WUW. It shows 90% of the average energy of WUW is higher
than the previous section.
AE1sSW_AE1SBW: The relative change of the average energy of the
first section of the WUW compared to the average energy of the last
section before the WUW. Using this feature 93% of samples show the
first section of WUW has higher average energy.
MaxEW_MaxE1SBW: The relative change of the maximum energy of the
WUW sections compared to the maximum energy in the last section
before WUW. Using this feature 91% of samples show the WUW has
higher maximum energy.
MaxE1sSW_MaxEAllBW: The relative change of the maximum energy of
the WUW sections compared to the maximum energy of all sections
before WUW. Using this feature 90% of samples show the first
section in the WUW has higher maximum energy.
WUW: Wildfire
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
AEW_AE1SBW
282
253
90
0
0
29
10
AE1sSW_AE1SBW
282
261
93
0
0
21
7
AEW_AEAll
340
173
51
0
0
167
49
AE1sSW_AEAll
340
185
54
0
0
155
46
AEW_AEAllBW
298
252
85
0
0
46
15
AE1sSW_AEAllBW
298
265
89
0
0
33
11
MaxEW_MaxE1SBW
282
258
91
8
3
16
6
MaxE1sSW_MaxEAllBW
282
253
90
2
1
27
10
MaxEW_MaxEAll
340
230
68
4
1
106
31
MaxE1sSW_MaxEAll
340
219
64
4
1
117
34
MaxE1sSW_MaxEAllBW
298
195
65
4
1
99
33
MaxEW_MaxEAllBW
298
265
62
0
1
33
36
Table A3 Energy Feature Result of WUW Wildfire
The complete results are shown in Appendix II.
From the obtained results above, it can be concluded that WUW is
frequently accentuated compared to the rest of the words in the
utterance.
DATA COLLECTION
In this chapter, we are introducing a revolution way to collect
speech samples. We will also introduce the preliminary design of
this data collection system in this chapter.
1.7 Introduction to the data collection
After we developed WUW discriminant features based on two
prosodic measurements of pitch and energy, described in Chapter 2
and 3, we realized the data we used to generate those features may
not be the most suitable. The corpus we used in the project was
WUWII corpus. It only provides the data on the WUW under alerting
situation and doesnt contain the data for the same word used under
referential situation. As the result, we can only perform analysis
based on the changes between alerting type of WUW against the
overall sentence and not the information with the same word in the
referential situation. Another drawback of the current WUWII corpus
is that it contains speech that is not spontaneous. The testers
were told to use the WUW to make up a sentence. Under such
circumstance, tester may change the way he/she normally speaks.
In order to perform a more complete analysis, we will need a
corpus which includes both alerting and referential WUW context
with natural speaking utterances. Dr. Wallace came up with an idea
to extract audio samples from movies and TV series.
Extracting speech samples from movies and TV series has the
following advantages compared to the previous data collection
method:
1. The speech examples are more natural. The speech from
professional actors is more natural since they tend to think and
speak like a particular character and act the situation of the
character that they are depicting.
2. The data collection process will cost much less since we are
not compensating individuals to record their voice. We are not
currently considering the problem of the copyright since we use the
data for scientific research purposes only.
3. Large number of data can be collected in a short period of
time once the process is fully automated.
4. The voice channel data is of CD quality. In this project, we
are extracting speech data from recorded videos compared to the
conventional phone line or cell recording contained in WUWII
corpus.
5. No manual labeling is required. We plan to use the
transcripts obtained from the video channel (System Design). The
transcripts provide time stamps for all spoken sentences. Thus,
manual labeling is not needed.
With the listed advantages, we are planning to design an
automatic data collection system to collect specific speech data
suitable for prosodic analysis of the proper name use in
referential context vs. alerting (or WUW) context.
1.8 System Design
The data collection project is a part of the prosodic features
analysis project which is illustrated by the program flow chart in
the. The prosodic features analysis project can be divided into
three sub projects. In the Figure 4.1, the green boxes represent
the project of prosodic features extraction which had been
described in Chapter 2 and 3 of this thesis.
Figure A1 Program Flow Chart
The green boxes in the Figure 4.1 represent the functions of the
prosodic feature extraction and analysis project. The blue boxes
depict the WUW data collection project. Finally, the purple boxes
represent the future project on video analysis.
In the prosodic feature analysis project, we use the prosodic
features generated from acoustic measurement to differentiate the
context of the words. In a part of the WUW data collection project
the language analysis tools will be used to automatically classify
the words of interest in this case referential or alerting. At the
moment the capabilities of this tool, RelEx must be augmented in
order to achieve this goal. The outcome of the WUW Speech Data
Collection project will not only build a specialized corpus for the
prosodic analysis project, but also provide a confirmation to the
result from prosodic analysis. The detailed program flow chart of
the WUW Speech Data Collection System is shown in the Figure 4.2
below.
Figure A2 WUW Audio Data Collection System Program Flow
Diagram
The input of the system will be (1) the video file of the movies
or TV series, (2) video transcription file if provided will be used
otherwise it will be extracted from the video stream , and (3)
English first names dictionary . In the case when there is no video
transcription file and the subtitles are encoded into the video
stream, the subtitle extractor, Subrip will extract subtitles and
time stamps of the sentence from the video stream. An example of
transcription file is provided in the Figure 4.3 below.
Figure A3 Example of Video Transcription File
The transcription files provide the following information: date
and time when the files were been created, subtitle index number,
start time and end time of each subtitle and, the subtitle
transcription.
The audio extractor will extract audio channel from the video
file. Then, using English first names dictionary and the sentence
transcription with time markers, an application program called
sentence parser was developed by VoiceKey team members to select
sentences that include English first names. The Figure 4.4 below
shows an example of the output of the sentence parser.
Figure A4 Example of output of the sentence parser
In the next step, the audio parser will use the information from
the sentence parser to extract the corresponding audio sections
from the audio file produced by media audio extractor .
After extraction of the sentence that contains a name, the RelEx
is used to analyze the selected sentence. The RelEx is an
English-language semantic relationship extractor based on
Carnegie-Mellon link parser . The RelEx is able to provide sentence
information on subject, object, indirect object and various words
tagging such as verb, gender and noun. The current status of the
WUW data collection project is at developing a rule based or
statistical pattern recognition process based on the relationship
information produced by RelEx. Ultimately, the system will be able
to accurately identify if the name in the sentence is used in WUW
or nonWUW context.
A necessary step in automation process is to obtain precise time
markers indicating the words of interest. To achieve this one could
use the HTK , a Hidden Markov Model Toolkit, to perform forced
alignment on the audio input. The HTK was initially developed by
Machine Intelligence Laboratory (formerly known as the Speech
Vision and Robotics Group) of the Cambridge University Engineering
Department (CUED). The HTK uses Hidden Markov model (HMM) which
compares the acoustic features of the incoming audio with the known
acoustic features of the typically 41 English phonemes to predict
the most likely combination of phonemes reflecting to the audio and
maps the words from the lexicon dictionary. In our case, since the
transcription of the sentences is known, HTK is used to perform to
map the phonemes of known words to the corresponding time
intervals. The phoneme time labels or equivalently word boundaries
of the spoken sentence are used to locate in time the WUWs or
nonWUWs. Note that this step can be also performed by Microsofts
SDK speech recognition system that is fully integrated in
Microsofts Vista OS. The advantage of the Microsofts system is that
we do not need to train it since the acoustic models are pre-built.
However, a development of the application incorporating the
Microsofts SDK features is necessary. Alternatively, HTK does not
require any significant integration coding, however it does require
accurate models. Automation of the described data collection
process will be made possible by integrating the outputs from RelEx
with the forced alignment.
With time segmented sentence labels of the audio stream
indicating the WUW or nonWUW context, a new corpus can be generated
just like WUWII corpus. This data will be used to perform prosodic
analysis and develop new or refine existing prosodic features. It
is expected that further study with the new data will not only
validate the current prosodic analysis result, but also provide
directions on developing new prosodic features. The ultimate goal
is to find out the prosodic patterns on WUW, nonWUW and other parts
of the sentence.
Conclusion
This thesis investigated two prosodic features and designed an
innovative way of data collection system.
The pitch based features in section 2.3 did not provide
significant discriminating patterns. The following are the
potential solutions to improve the performance:
1. Build a specialized corpus which contains both WUWs and
nonWUWs. The speech sentences in the current corpus, WUWII, only
contain WUWs but no nonWUW. A new speech data collection system is
designed in chapter 4 in order to improve the performance of the
features.
2. Use different approaches on defining pitch based features.
Instead of using average and maximum pitch measurements of the WUW,
the pitch contour pattern should also been considered. Since we are
interested on the general pattern of WUWs instead of a specific
WUW, patterns which exclude the pitch pattern of the word.
The energy based features in section 3.3 provide significant
discriminating patterns. The future improvement is to quantify the
level of change compare WUWs to the nonWUWs.
The new data collection system is an ongoing project which will
eventually provide sufficient data on both WUWs and nonWUWs. The
data will help us on research new patterns for discriminate
alerting context and referential context.
References
AOAMedia.com. (n.d.). AoA Audio Extractor. Retrieved from
AOAMedia.com.Bagshaw, P. C. (1994). Automatic prosodic analysis
form computer aided pronunciation teaching.Cutler, A. (1986).
Forbear is a homophone: Lexical prosody does not constrain lexical
access. Language and Speech. Hart, J. '. (1975). Integrating
different levels of intonation analysis. Phonetics , pp.
309-327.Jurafsky, D., & Martin, J. H. (n.d.). Speech and
Language Processing. An introduction to Natural Language
Processing, Computational Linguistics, and Speech
Recognition.Kpuska, V. C. (2006). Leading and Trailing Silence in
Wake-Up-Word Speech Recognition. Industry, Engineering &
Management Systems 2006. Cocoa Beach.Kpuska, V. WUWII Corpus.
Lehiste, I. (1970). Supersegmental. Cambridge Massachusetts: The
Massachusetts Institute Technology Press .Machine Intelligence
Laboratory of the Cambridge University Engineering Department .
(n.d.). HTK, The Hidden Markov Model Toolkit .Medan, .. Y. (1991).
Super resolution pitch determination of speech signals. IEEE Trans.
Signal Processing ASSP-39(1), 40-48 .Merriam-Webster Dictionary.
(n.d.). Merriam-Webster Dictionary.Novamente LLC. (n.d.). RelEx
Semantic Relationship Extractor. Retrieved from
http://opencog.org/wiki/RelExPattarapong, R., Ramdhan, R., &
Beharry, X. (2009). Sentence Parser Program.Phillips, M. (1985). A
feature-based time domain pitch tracker. Journal f the Acoustical
Society of America , 77, S9-S10(A).Rojanasthien, P., Ramdhan, R.,
& Beharry, X. (2009). Audio Parser Program.Temperlyey, D.,
Lafferty, J., & Sleator, D. (n.d.). CMU Link Grammar
Parser.Tudor, K. B. (2007). Triple Scoring of Hidden Markov Models
in Wake-Up-Word Speech Recognition.Veprek, P., & Scordilis, M.
(2002). Analysis, Enhancement and Evaluation of Five Pitch
Determination Techniques. Elsevier Science Journal of Speech
Communication , 37, 249-270.William J. Hardcastle, J. L. (1997).
The handbook of phonetic sciences. p. 640.Zuggy. (n.d.).
SubRip.
Pitch Feature Experimental ResultWake-Up-Word: All
Feature
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
APW_AP1SBW
1415
726
51
0
0
689
49
AP1sSW_AP1SBW
1415
735
52
0
0
680
48
APW_APALL
2282
947
41
0
0
1335
59
AP1sSW_APALL
2282
996
44
2
0
1284
56
APW_APALLBW
2188
962
44
0
0
1226
56
AP1sSW_APALL
2188
1003
46
2
0
1183
54
MaxP_MaxP1SBW
1415
948
67
53
4
414
29
MaxP1sSW_MaxP1SBW
1415
719
51
54
4
642
45
MaxPW_MaxPAll
2282
1020
45
109
5
1153
51
MaxP1sSW_MaxPAll
2282
716
31
213
9
1353
59
MaxP1sSW_MaxPAllBW
2188
1069
49
111
5
1008
46
MaxPW_MaxPAllBW
2188
1003
35
2
10
1183
55
Table A1 Pitch Features Result of All WUW
Figure A1 Distribution and Cumulative plot of pitch feature,
APW_AP1SBW
Figure A2 Distribution and Cumulative plot of pitch feature,
AP1sSW_AP1SBW
Figure A3 Distribution and Cumulative plot of pitch feature,
APW_APALL
Figure A4 Distribution and Cumulative plot of pitch feature,
AP1sSW_APALL
Figure A5 Distribution and Cumulative plot of pitch feature,
APW_APALLBW
Figure A6 Distribution and Cumulative plot of pitch feature,
AP1sSW_APALL
Figure A7 Distribution and Cumulative plot of pitch feature,
MaxP_MaxP1SBW
Figure A8 Distribution and Cumulative plot of pitch feature,
MaxP1sSW_MaxP1SBW
Figure A9 Distribution and Cumulative plot of pitch feature,
MaxPW_MaxPAll
Figure A10 Distribution and Cumulative plot of pitch feature,
MaxP1sSW_MaxPAll
Figure A11 Distribution and Cumulative plot of pitch feature,
MaxP1sSW_MaxPAllBW
Figure A12 Distribution and Cumulative plot of pitch feature,
MaxPW_MaxPAllBW
WUW: Operator
WUW:OperatoFeature
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
APW_AP1SBW
268
122
46
0
0
146
54
AP1sSW_AP1SBW
268
113
42
0
0
155
58
APW_APALL
461
184
40
0
0
277
60
AP1sSW_APALL
461
182
39
0
0
279
61
APW_APALLBW
455
187
41
0
0
268
59
AP1sSW_APALL
455
179
39
0
0
276
61
MaxP_MaxP1SBW
268
155
58
12
4
101
38
MaxP1sSW_MaxP1SBW
268
94
35
8
3
166
62
MaxPW_MaxPAll
461
192
42
27
6
240
52
MaxP1sSW_MaxPAll
461
144
31
48
10
269
58
MaxP1sSW_MaxPAllBW
455
209
46
27
6
219
48
MaxPW_MaxPAllBW
455
179
33
0
12
276
55
Table A2 Pitch Features Result of WUW Operator
Figure A13 Distribution and Cumulative plot of pitch feature,
APW_AP1SBW
Figure A14 Distribution and Cumulative plot of pitch feature,
AP1sSW_AP1SBW
Figure A15 Distribution and Cumulative plot of pitch feature,
APW_APALL
Figure A16 Distribution and Cumulative plot of pitch feature,
AP1sSW_APALL
Figure A17 Distribution and Cumulative plot of pitch feature,
APW_APALLBW
Figure A18 Distribution and Cumulative plot of pitch feature,
AP1sSW_APALL
Figure A19 Distribution and Cumulative plot of pitch feature,
MaxP_MaxP1SBW
Figure A20 Distribution and Cumulative plot of pitch feature,
MaxP1sSW_MaxP1SBW
Figure A21 Distribution and Cumulative plot of pitch feature,
MaxPW_MaxPAll
Figure A22 Distribution and Cumulative plot of pitch feature,
MaxP1sSW_MaxPAll
Figure A23 Distribution and Cumulative plot of pitch feature,
MaxP1sSW_MaxPAllBW
Figure A24 Distribution and Cumulative plot of pitch feature,
MaxPW_MaxPAllBW
WUW: Wildfire
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
APW_AP1SBW
266
111
42
0
0
155
58
AP1sSW_AP1SBW
266
132
50
0
0
134
50
APW_APALL
323
70
22
0
0
253
78
AP1sSW_APALL
323
89
28
0
0
234
72
APW_APALLBW
297
73
25
0
0
224
75
AP1sSW_APALL
297
97
33
0
0
200
67
MaxP_MaxP1SBW
266
175
66
12
5
79
30
MaxP1sSW_MaxP1SBW
266
141
53
12
5
113
42
MaxPW_MaxPAll
323
84
26
9
3
230
71
MaxP1sSW_MaxPAll
323
54
17
11
3
258
80
MaxP1sSW_MaxPAllBW
297
79
27
9
3
209
70
MaxPW_MaxPAllBW
297
97
18
0
0
200
79
Table A3 Pitch Features Result of WUW Wildfire
Figure A25 Distribution and Cumulative plot of pitch feature,
APW_AP1SBW, WUW: Wildfire
Figure A26 Distribution and Cumulative plot of pitch feature,
AP1sSW_AP1SBW, WUW: Wildfire
Figure A27 Distribution and Cumulative plot of pitch feature,
APW_APALL, WUW: Wildfire
Figure A28 Distribution and Cumulative plot of pitch feature,
AP1sSW_APALL, WUW: Wildfire
Figure A29 Distribution and Cumulative plot of pitch feature,
APW_APALLBW, WUW: Wildfire
Figure A30 Distribution and Cumulative plot of pitch feature,
AP1sSW_APALL, WUW: Wildfire
Figure A31 Distribution and Cumulative plot of pitch feature,
MaxP_MaxP1SBW, WUW: Wildfire
Figure A32 Distribution and Cumulative plot of pitch feature,
MaxP1sSW_MaxP1SBW, WUW: Wildfire
Figure A33 Distribution and Cumulative plot of pitch feature,
MaxPW_MaxPAll, WUW: Wildfire
Figure A34 Distribution and Cumulative plot of pitch feature,
MaxP1sSW_MaxPAll, WUW: Wildfire
Figure A35 Distribution and Cumulative plot of pitch feature,
MaxP1sSW_MaxPAllBW, WUW: Wildfire
Figure A36 Distribution and Cumulative plot of pitch feature,
MaxPW_MaxPAllBW, WUW: Wildfire
Energy Feature Experimental Result
WUW: All WUWs
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
AEW_AE1SBW
1479
1164
79
0
0
315
21
AE1sSW_AE1SBW
1479
1283
84
1
0
240
16
AEW_AEAll
2175
1059
49
9
9
1116
51
AE1sSW_AEAll
2175
1155
53
2
0
1018
47
AEW_AEAllBW
1969
1427
72
0
0
542
28
AE1sSW_AEAllBW
1969
1562
79
3
0
404
21
MaxEW_MaxE1SBW
1479
1244
84
20
1
215
15
MaxE1sSW_MaxEAllBW
1479
1221
83
13
1
245
17
MaxEW_MaxEAll
2175
1373
63
13
1
245
17
MaxE1sSW_MaxEAll
2175
1336
61
25
1
814
37
MaxE1sSW_MaxEAllBW
1969
1209
61
16
1
744
38
MaxEW_MaxEAllBW
1969
1562
60
3
1
404
39
Table B1 Energy Features Result of All WUW
Figure B1 Distribution and Cumulative plot of energy feature,
the average energy of WUW
Figure B2 Distribution and Cumulative plot of energy feature,
the average energy of WUW
Figure B3 Distribution and Cumulative plot of energy feature,
the average energy of WUW
Figure B4 Distribution and Cumulative plot of energy feature,
the average energy the first section in WUW
Figure B5 Distribution and Cumulative plot of energy feature,
the average energy the first section in WUW
Figure B6 Distribution and Cumulative plot of energy feature,
the average energy the first section in WUW
Figure B7 Distribution and Cumulative plot of energy feature,
the maximum energy of the WUW
Figure B8 Distribution and Cumulative plot of energy feature,
the maximum energy of the WUW
Figure B9 Distribution and Cumulative plot of energy feature,
the maximum energy of the WUW
Figure B10 Distribution and Cumulative plot of energy feature,
the maximum energy of the first section in WUW
Figure B11 Distribution and Cumulative plot of energy feature,
the maximum energy of the WUW
Figure B12 Distribution and Cumulative plot of energy feature,
the maximum energy of the first section in WUW
WUW: Operator
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
AEW_AE1SBW
275
228
83
0
0
47
17
AE1sSW_AE1SBW
275
258
94
0
0
17
6
AEW_AEAll
418
248
59
0
0
170
41
AE1sSW_AEAll
418
290
69
1
0
127
30
AEW_AEAllBW
394
303
77
0
0
91
23
AE1sSW_AEAllBW
394
359
91
1
0
34
9
MaxEW_MaxE1SBW
275
240
87
1
0
34
12
MaxE1sSW_MaxEAllBW
275
243
88
0
0
32
12
MaxEW_MaxEAll
418
290
69
4
1
124
30
MaxE1sSW_MaxEAll
418
285
68
6
1
127
30
MaxE1sSW_MaxEAllBW
394
272
69
4
1
118
30
MaxEW_MaxEAllBW
394
359
68
1
1
34
30
Table B1 Energy Feature Result of WUW Operator
Figure B13 Distribution and Cumulative plot of energy feature,
the average energy of WUW, Operator
Figure B14 Distribution and Cumulative plot of energy feature,
the average energy of WUW, Operator
Figure B15 Distribution and Cumulative plot of energy feature,
the average energy of WUW, Operator
Figure B16 Distribution and Cumulative plot of energy feature,
the average energy the first section in WUW, Operator
Figure B17 Distribution and Cumulative plot of energy feature,
the average energy the first section in WUW, Operator
Figure B18 Distribution and Cumulative plot of energy feature,
the average energy the first section in WUW, Operator
Figure B19 Distribution and Cumulative plot of energy feature,
the maximum energy of the WUW, Operator
Figure B20 Distribution and Cumulative plot of energy feature,
the maximum energy of the WUW, Operator
Figure B21 Distribution and Cumulative plot of energy feature,
the maximum energy of the WUW, Operator
Figure B22 Distribution and Cumulative plot of energy feature,
the maximum energy of the first section in WUW, Operator
Figure B23 Distribution and Cumulative plot of energy feature,
the maximum energy of the WUW, Operator
Figure B24 Distribution and Cumulative plot of energy feature,
the maximum energy of the first section in WUW, Operator
WUW: ThinkEngine
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
AEW_AE1SBW
293
182
62
0
0
111
38
AE1sSW_AE1SBW
293
194
66
1
0
98
33
AEW_AEAll
414
159
38
0
0
255
62
AE1sSW_AEAll
414
178
43
0
0
236
57
AEW_AEAllBW
388
201
52
0
0
187
48
AE1sSW_AEAllBW
388
229
59
1
0
158
42
MaxEW_MaxE1SBW
293
209
71
3
1
81
28
MaxE1sSW_MaxEAllBW
293
195
67
5
2
93
32
MaxEW_MaxEAll
414
197
48
3
1
214
52
MaxE1sSW_MaxEAll
414
186
45
2
0
226
55
MaxE1sSW_MaxEAllBW
388
180
46
3
1
205
53
MaxEW_MaxEAllBW
388
229
45
1
1
158
54
Table B2 Energy Feature Result of WUW ThinkEngine
Figure B25 Distribution and Cumulative plot of energy feature,
the average energy of WUW, ThinkEngine
Figure B26 Distribution and Cumulative plot of energy feature,
the average energy of WUW, ThinkEngine
Figure B27 Distribution and Cumulative plot of energy feature,
the average energy of WUW, ThinkEngine
Figure B28 Distribution and Cumulative plot of energy feature,
the average energy the first section in WUW, ThinkEngine
Figure B29 Distribution and Cumulative plot of energy feature,
the average energy the first section in WUW, ThinkEngine
Figure B30 Distribution and Cumulative plot of energy feature,
the average energy the first section in WUW, ThinkEngine
Figure B31 Distribution and Cumulative plot of energy feature,
the maximum energy of the WUW, ThinkEngine
Figure B32 Distribution and Cumulative plot of energy feature,
the maximum energy of the WUW, ThinkEngine
Figure B33 Distribution and Cumulative plot of energy feature,
the maximum energy of the WUW, ThinkEngine
Figure B34 Distribution and Cumulative plot of energy feature,
the maximum energy of the first section in WUW, ThinkEngine
Figure B35 Distribution and Cumulative plot of energy feature,
the maximum energy of the WUW, ThinkEngine
Figure B36 Distribution and Cumulative plot of energy feature,
the maximum energy of the first section in WUW, ThinkEngine
WUW: Onword
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
AEW_AE1SBW
262
207
79
0
0
55
21
AE1sSW_AE1SBW
262
221
84
0
0
41
16
AEW_AEAll
435
215
49
0
0
220
51
AE1sSW_AEAll
435
226
52
0
0
209
48
AEW_AEAllBW
389
306
79
0
0
83
21
AE1sSW_AEAllBW
389
327
84
0
0
62
16
MaxEW_MaxE1SBW
262
228
87
5
2
29
11
MaxE1sSW_MaxEAllBW
262
226
86
3
1
33
13
MaxEW_MaxEAll
435
229
69
2
0
134
31
MaxE1sSW_MaxEAll
435
295
68
3
1
137
31
MaxE1sSW_MaxEAllBW
389
261
67
2
1
126
32
MaxEW_MaxEAllBW
389
327
66
0
1
62
33
Table B4 Energy Feature Result of WUW Operator
Figure B37 Distribution and Cumulative plot of energy feature,
the average energy of WUW, Onword
Figure B38 Distribution and Cumulative plot of energy feature,
the average energy of WUW, Onword
Figure B39 Distribution and Cumulative plot of energy feature,
the average energy of WUW, Onword
Figure B40 Distribution and Cumulative plot of energy feature,
the average energy the first section in WUW, Onword
Figure B41 Distribution and Cumulative plot of energy feature,
the average energy the first section in WUW, Onword
Figure B42 Distribution and Cumulative plot of energy feature,
the average energy the first section in WUW, Onword
Figure B43 Distribution and Cumulative plot of energy feature,
the maximum energy of the WUW, Onword
Figure B44 Distribution and Cumulative plot of energy feature,
the maximum energy of the WUW, Onword
Figure B45 Distribution and Cumulative plot of energy feature,
the maximum energy of the WUW, Onword
Figure B46 Distribution and Cumulative plot of energy feature,
the maximum energy of the first section in WUW, Onword
Figure B47 Distribution and Cumulative plot of energy feature,
the maximum energy of the WUW, Onword
Figure B48 Distribution and Cumulative plot of energy feature,
the maximum energy of the first section in WUW, Onword
WUW: Wildfire
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
AEW_AE1SBW
282
253
90
0
0
29
10
AE1sSW_AE1SBW
282
261
93
0
0
21
7
AEW_AEAll
340
173
51
0
0
167
49
AE1sSW_AEAll
340
185
54
0
0
155
46
AEW_AEAllBW
298
252
85
0
0
46
15
AE1sSW_AEAllBW
298
265
89
0
0
33
11
MaxEW_MaxE1SBW
282
258
91
8
3
16
6
MaxE1sSW_MaxEAllBW
282
253
90
2
1
27
10
MaxEW_MaxEAll
340
230
68
4
1
106
31
MaxE1sSW_MaxEAll
340
219
64
4
1
117
34
MaxE1sSW_MaxEAllBW
298
195
65
4
1
99
33
MaxEW_MaxEAllBW
298
265
62
0
1
33
36
Table B5 Energy Feature Result of WUW Wildfire
Figure B49 Distribution and Cumulative plot of energy feature,
the average energy of WUW, Wildfire
Figure B50 Distribution and Cumulative plot of energy feature,
the average energy of WUW, Wildfire
Figure B51 Distribution and Cumulative plot of energy feature,
the average energy of WUW, Wildfire
Figure B52 Distribution and Cumulative plot of energy feature,
the average energy the first section in WUW, Wildfire
Figure B53 Distribution and Cumulative plot of energy feature,
the average energy the first section in WUW, Wildfire
Figure B54 Distribution and Cumulative plot of energy feature,
the average energy the first section in WUW, Wildfire
Figure B55 Distribution and Cumulative plot of energy feature,
the maximum energy of the WUW, Wildfire
Figure B56 Distribution and Cumulative plot of energy feature,
the maximum energy of the WUW, Wildfire
Figure B57 Distribution and Cumulative plot of energy feature,
the maximum energy of the WUW, Wildfire
Figure B58 Distribution and Cumulative plot of energy feature,
the maximum energy of the first section in WUW, Wildfire
Figure B59 Distribution and Cumulative plot of energy feature,
the maximum energy of the WUW, Wildfire
Figure B60 Distribution and Cumulative plot of energy feature,
the maximum energy of the first section in WUW, Wildfire
WUW: Voyager
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
AEW_AE1SBW
281
220
78
0
0
61
22
AE1sSW_AE1SBW
281
229
81
0
0
52
19
AEW_AEAll
361
149
41
0
0
212
59
AE1sSW_AEAll
361
161
45
1
0
199
55
AEW_AEAllBW
325
207
64
0
0
118
36
AE1sSW_AEAllBW
325
222
68
1
0
102
31
MaxEW_MaxE1SBW
281
234
83
2
1
45
16
MaxE1sSW_MaxEAllBW
281
231
82
2
1
48
17
MaxEW_MaxEAll
361
172
48
5
1
184
51
MaxE1sSW_MaxEAll
361
167
46
7
2
187
52
MaxE1sSW_MaxEAllBW
325
148
46
3
1
174
54
MaxEW_MaxEAllBW
325
222
44
1
1
102
55
Table B6 Energy Feature Result of WUW, Voyage
Figure B61 Distribution and Cumulative plot of energy feature,
the average energy of WUW, Voyager
Figure B62 Distribution and Cumulative plot of energy feature,
the average energy of WUW, Voyager
Figure B63 Distribution and Cumulative plot of energy feature,
the average energy of WUW, Voyager
Figure B64 Distribution and Cumulative plot of energy feature,
the average energy the first section in WUW, Voyager
Figure B65 Distribution and Cumulative plot of energy feature,
the average energy the first section in WUW, Voyager
Figure B66 Distribution and Cumulative plot of energy feature,
the average energy the first section in WUW, Voyager
Figure B67 Distribution and Cumulative plot of energy feature,
the maximum energy of the WUW, Voyager
Figure B68 Distribution and Cumulative plot of energy feature,
the maximum energy of the WUW, Voyager
Figure B69 Distribution and Cumulative plot of energy feature,
the maximum energy of the WUW, Voyager
Figure B70 Distribution and Cumulative plot of energy feature,
the maximum energy of the first section in WUW, Voyager
Figure B71 Distribution and Cumulative plot of energy feature,
the maximum energy of the WUW, Voyager
Figure B72 Distribution and Cumulative plot of energy feature,
the maximum energy of the first section in WUW, Voyager
Works CitedAOAMedia.com. (n.d.). AoA Audio Extractor. Retrieved
from AOAMedia.com.Bagshaw, P. C. (1994). Automatic prosodic
analysis form computer aided pronunciation teaching.Campbell, M.
(n.d.). Behind The Name. Retrieved from
http://www.behindthename.com/Cutler, A. (1986). Forbear is a
homophone: Lexical prosody does not constrain lexical access.
Language and Speech. Hart, J. '. (1975). Integrating different
levels of intonation analysis. Phonetics , pp. 309-327.Jurafsky,
D., & Martin, J. H. (n.d.). Speech and Language Processing. An
introduction to Natural Language Processing, Computational
Linguistics, and Speech Recognition.Kpuska, V. C. (2006). Leading
and Trailing Silence in Wake-Up-Word Speech Recognition. Industry,
Engineering & Management Systems 2006. Cocoa Beach.Kpuska, V.
WUWII Corpus. Lehiste, I. (1970). Supersegmental. Cambridge
Massachusetts: The Massachusetts Institute Technology Press
.Machine Intelligence Laboratory of the Cambridge University
Engineering Department . (n.d.). HTK, The Hidden Markov Model
Toolkit .Medan, .. Y. (1991). Super resolution pitch determination
of speech signals. IEEE Trans. Signal Processing ASSP-39(1), 40-48
.Merriam-Webster Dictionary. (n.d.). Merriam-Webster
Dictionary.Novamente LLC. (n.d.). RelEx Semantic Relationship
Extractor. Retrieved from http://opencog.org/wiki/RelExPattarapong,
R., Ramdhan, R., & Beharry, X. (2009). Sentence Parser
Program.Pattarapong, R., Ronald, R., & Xerxes, B. (2009). Audio
Parser Program.Phillips, M. (1985). A feature-based time domain
pitch tracker. Journal f the Acoustical Society of America , 77,
S9-S10(A).Temperlyey, D., Lafferty, J., & Sleator, D. (n.d.).
CMU Link Grammar Parser.Tudor, K. B. (2007). Triple Scoring of
Hidden Markov Models in Wake-Up-Word Speech Recognition.Veprek, P.,
& Scordilis, M. (2002). Analysis, Enhancement and Evaluation of
Five Pitch Determination Techniques. Elsevier Science Journal of
Speech Communication , 37, 249-270.William J. Hardcastle, J. L.
(1997). The handbook of phonetic sciences. p. 640.Zuggy. (n.d.).
SubRip.
UnvoicedECDHPSFBFTPPIFTASRFDeSRFD1814471045VoicedECDHPSFBFTPPIFTASRFDeSRFD2071316171511Gross0
Error HighECDHPSFBFTPPIFTASRFDeSRFD4520211Gross 0 Error
LowECDHPSFBFTPPIFTASRFDeSRFD12712120.5UnvoicedECDHPSFBFTPPIFTASRFDeSRFD31.5193.56523VoicedECDHPSFBFTPPIFTASRFDeSRFD222112.51316127Gross0
Error HighECDHPSFBFTPPIFTASRFDeSRFD0.50.50.50.50.511Gross 0 Error
LowECDHPSFBFTPPIFTASRFDeSRFD423.534.55.50
29
}
,...,
|
)
(
{
max
max
N
N
N
i
i
s
s
N
-
-
=
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-AllMAXE)/AllMAXE cumulative plot
(WUWMAXE-AllMAXE)/AllMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stMAXE-AllAMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW cumulative plot
(WUW1stMAXE-AllAMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWAE-LSAE)/LSAE cumulative plot
(WUWAE-LSAE)/LSAE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWAE-AllAE)/AllAE cumulative plot
(WUWAE-AllAE)/AllAE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWAE-AllAE
b
efore
W
UW)/AllAE
b
efore
W
UW cumulative plot
(WUWAE-AllAE
b
efore
W
UW)/AllAE
b
efore
W
UW
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stAE-LSAE)/LSAE cumulative plot
(WUW1stAE-LSAE)/LSAE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stAE-AllAE)/AllAE cumulative plot
(WUW1stAE-AllAE)/AllAE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stAE-AllAE
b
efore
W
UW)/AllAE
b
efore
W
UW cumulative plot
(WUW1stAE-AllAE
b
efore
W
UW)/AllAE
b
efore
W
UW
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-LSMAXE)/LSMAXE cumulative plot
(WUWMAXE-LSMAXE)/LSMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-AllMAXE)/AllMAXE cumulative plot
(WUWMAXE-AllMAXE)/AllMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-AllMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW cumulative plot
(WUWMAXE-AllMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stMAXE-LSMAXE)/LSMAXE cumulative plot
(WUW1stMAXE-LSMAXE)/LSMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-AllMAXE)/AllMAXE cumulative plot
(WUWMAXE-AllMAXE)/AllMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stMAXE-AllAMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW cumulative plot
(WUW1stMAXE-AllAMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWAE-LSAE)/LSAE cumulative plot
(WUWAE-LSAE)/LSAE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWAE-AllAE)/AllAE cumulative plot
(WUWAE-AllAE)/AllAE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWAE-AllAE
b
efore
W
UW)/AllAE
b
efore
W
UW cumulative plot
(WUWAE-AllAE
b
efore
W
UW)/AllAE
b
efore
W
UW
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stAE-LSAE)/LSAE cumulative plot
(WUW1stAE-LSAE)/LSAE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stAE-AllAE)/AllAE cumulative plot
(WUW1stAE-AllAE)/AllAE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stAE-AllAE
b
efore
W
UW)/AllAE
b
efore
W
UW cumulative plot
(WUW1stAE-AllAE
b
efore
W
UW)/AllAE
b
efore
W
UW
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-LSMAXE)/LSMAXE cumulative plot
(WUWMAXE-LSMAXE)/LSMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-AllMAXE)/AllMAXE cumulative plot
(WUWMAXE-AllMAXE)/AllMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-AllMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW cumulative plot
(WUWMAXE-AllMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stMAXE-LSMAXE)/LSMAXE cumulative plot
(WUW1stMAXE-LSMAXE)/LSMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-AllMAXE)/AllMAXE cumulative plot
(WUWMAXE-AllMAXE)/AllMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stMAXE-AllAMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW cumulative plot
(WUW1stMAXE-AllAMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWAE-LSAE)/LSAE cumulative plot
(WUWAE-LSAE)/LSAE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWAE-AllAE)/AllAE cumulative plot
(WUWAE-AllAE)/AllAE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWAE-AllAE
b
efore
W
UW)/AllAE
b
efore
W
UW cumulative plot
(WUWAE-AllAE
b
efore
W
UW)/AllAE
b
efore
W
UW
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stAE-LSAE)/LSAE cumulative plot
(WUW1stAE-LSAE)/LSAE
%
}
,...
1
|
)
(
)
(
{
}
,...
1
|
)
(
)
(
{
}
,...
1
|
)
(
)
(
{
n
i
n
i
s
i
x
z
n
i
i
s
i
x
y
n
i
n
i
s
i
x
x
n
n
n
+
=
=
=
=
-
=
=
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stAE-AllAE)/AllAE cumulative plot
(WUW1stAE-AllAE)/AllAE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stAE-AllAE
b
efore
W
UW)/AllAE
b
efore
W
UW cumulative plot
(WUW1stAE-AllAE
b
efore
W
UW)/AllAE
b
efore
W
UW
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-LSMAXE)/LSMAXE cumulative plot
(WUWMAXE-LSMAXE)/LSMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-AllMAXE)/AllMAXE cumulative plot
(WUWMAXE-AllMAXE)/AllMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-AllMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW cumulative plot
(WUWMAXE-AllMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stMAXE-LSMAXE)/LSMAXE cumulative plot
(WUW1stMAXE-LSMAXE)/LSMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-AllMAXE)/AllMAXE cumulative plot
(WUWMAXE-AllMAXE)/AllMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stMAXE-AllAMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW cumulative plot
(WUW1stMAXE-AllAMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW
%
max}
min
min
/
[
1
]
/
[
1
2
2
]
/
[
1
,
,...;
1
,
0
|
{
)
(
*
)
(
)
(
*
)
(
)
(
N
n
N
i
iL
N
n
jL
y
jL
x
jL
y
jL
x
n
P
L
n
j
L
n
j
L
n
j
y
x
+
=
=
=
=
=
88
.
0
=
srfd
T
)
'
(
'
85
.
0
,
75
.
0
max(
0
,
n
p
T
y
x
srfd
=
max}
min
min
/
[
1
]
/
[
1
2
2
]
/
[
1
,
,...;
1
,
0
|
{
)
(
*
)
(
)
(
*
)
(
)
(
N
n
N
i
iL
N
n
jL
y
jL
x
jL
y
jL
x
n
P
L
n
j
L
n
j
L
n
j
z
y
+
=
=
=
=
=
=
=
=
+
+
+
+
=
]
[
1
]
[
1
2
2
]
[
1
)
(
*
)
(
)
(
*
)
(
)
(
M
M
M
n
j
n
j
m
M
n
j
m
M
m
n
n
j
y
j
s
n
n
j
s
j
s
n
q
-
=
-
=
n
d
X
n
x
p
p
w
w
p
2
2
)
(
2
1
]
[
B
B
A
-
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stAE-LSAE)/LSAE cumulative plot
(WUW1stAE-LSAE)/LSAE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-LSMAXE)/LSMAXE cumulative plot
(WUWMAXE-LSMAXE)/LSMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stMAXE-LSMAXE)/LSMAXE cumulative plot
(WUW1stMAXE-LSMAXE)/LSMAXE
%
-1-0.500.511.522.533.54
0
50
100
150
200
250
(WUWAP-LSAP)/LSAP cumulative plot
(WUWAP-LSAP)/LSAP
%, No. of Data
-1-0.500.511.522.533.54
0
50
100
150
200
250
(WUW1stAP-LSAP)/LSAP cumulative plot
(WUW1stAP-LSAP)/LSAP
%, No. of Data
-1-0.500.511.522.533.54
0
50
100
150
200
250
300
350
400
(WUWAP-AllAP)/AllAP cumulative plot
(WUWAP-AllAP)/AllAP
%, No. of Data
-1-0.500.511.522.533.54
0
50
100
150
200
250
300
350
(WUW1stAP-AllAP)/AllAP cumulative plot
(WUW1stAP-AllAP)/AllAP
%, No. of Data
-1-0.500.511.522.533.54
0
20
40
60
80
100
120
140
160
180
(WUWMAXP-LSMAXP)/LSMAXP cumulative plot
(WUWMAXP-LSMAXP)/LSMAXP
%, No. of Data
-1-0.500.511.522.533.54
0
20
40
60
80
100
120
140
160
180
200
(WUW1stMAXP-LSMAXP)/LSMAXP cumulative plot
(WUW1stMAXP-LSMAXP)/LSMAXP
%, No. of Data
-1-0.500.511.522.533.54
0
50
100
150
200
250
300
350
(WUWMAXP-AllMAXP)/AllMAXP cumulative plot
(WUWMAXP-AllMAXP)/AllMAXP
%, No. of Data
-1-0.500.511.522.533.54
0
50
100
150
200
250
300
(WUWMAXP-AllMAXP)/AllMAXP cumulative plot
(WUWMAXP-AllMAXP)/AllMAXP
%, No. of Data
-1-0.500.511.522.533.54
0
50
100
150
200
250
300
350
400
(WUW1stMAXP-AllAMAXP
b
efore
W
UW)/AllMAXP
b
efore
W
UW cumulative plot
(WUW1stMAXP-AllAMAXP
b
efore
W
UW)/AllMAXP
b
efore
W
UW
%, No. of Data
-1-0.500.511.522.533.54
0
50
100
150
200
250
300
(WUWMAXP-AllMAXP
b
efore
W
UW)/AllMAXP
b
efore
W
UW cumulative plot
(WUWMAXP-AllMAXP
b
efore
W
UW)/AllMAXP
b
efore
W
UW
%, No. of Data
-1-0.500.511.522.533.54
0
10
20
30
40
50
60
70
80
90
100
(WUWAP-LSAP)/LSAP cumulative plot
(WUWAP-LSAP)/LSAP
%, No. of Data
-1-0.500.511.522.533.54
0
10
20
30
40
50
60
70
80
90
100
(WUW1stAP-LSAP)/LSAP cumulative plot
(WUW1stAP-LSAP)/LSAP
%, No. of Data
-1-0.500.511.522.533.54
0
10
20
30
40