8/3/2019 Time Domain Methods1 (1)
1/51
Time-Domain Methods forSpeech Processing
Introduction
8/3/2019 Time Domain Methods1 (1)
2/51
Speech Processing Methods
Time-Domain Method:
Involving the waveform of speech signal
directly.
Frequency-Domain Method:
Involving some form of spectrum
representation.
8/3/2019 Time Domain Methods1 (1)
3/51
Time-Domain Measurements
Averagezero-crossing rate, energy, and the
autocorrelation function.
Very simple to implement.
Provide a useful basis for estimating
important features of the speech signal, e.g., Voiced/unvoiced classification
Pitch estimation
8/3/2019 Time Domain Methods1 (1)
4/51
Time-Domain Methods forSpeech Processing
Time-Dependent
Processing of Speech
8/3/2019 Time Domain Methods1 (1)
5/51
Time Dependent Natural of Speech
This is a test.
8/3/2019 Time Domain Methods1 (1)
6/51
Time Dependent Natural of Speech
8/3/2019 Time Domain Methods1 (1)
7/51
Short-Time Behavior of Speech
Assumption
The properties of speech signal changeslowly with time.
Analysis Frames Short segment of speech signal.
Overlap one anotherusually.
8/3/2019 Time Domain Methods1 (1)
8/51
Time-Dependent Analyses
Analyzing each frame may produce eithera
single number, or a set of numbers, e.g., Energy (a single number)
Vocal tract parameters (a set of numbers)
This will produce a new time-dependentsequence.
8/3/2019 Time Domain Methods1 (1)
9/51
General Form
g
g!!
m
n mnwmxTQ )()]([
n: Frame index
x(m): Speech signal
T[]: A linear or nonlinear transformation.
w(m): A window function (finite of infinite).
8/3/2019 Time Domain Methods1 (1)
10/51
General Form
Qn is a sequence oflocal weightedaverage values of the sequence T[x(m)].
g
g!!
m
n mnwmxTQ )()]([
8/3/2019 Time Domain Methods1 (1)
11/51
Example
g
g!! m mxE )(2
Energy
2
1
( )n
n
m n N
E x m
!
! Short-Time
Energy
8/3/2019 Time Domain Methods1 (1)
12/51
Example
2
1
( )n
n
m n N
E x m
!
! Short-Time
Energy
8/3/2019 Time Domain Methods1 (1)
13/51
2
1
( )n
n
m n N
E x m
!
! Short-Time
Energy
)()]([ 2 mxmxT !
ee
!otherwise
Nmmw
0
101)(
g
g!
!m
n mnwmxTE )()]([
Example
8/3/2019 Time Domain Methods1 (1)
14/51
General Short-Time-Analysis Scheme
T[ ]Linear
Filter
Lowpass
Filter
Depending on the
choice of window
8/3/2019 Time Domain Methods1 (1)
15/51
Time-Domain Methods forSpeech Processing
Short-Time Energy and
Average Magnitude
8/3/2019 Time Domain Methods1 (1)
16/51
Applications
Silence Detection
Segmentation
Lip Sync
8/3/2019 Time Domain Methods1 (1)
17/51
Short-Time Energy
g
g!
!m
n mnwmxE2)]()([
g
g!
!m
mnwmx )()( 22
g
g!
!m
mnhmx )()(2
)(*)(2 mhmx!
8/3/2019 Time Domain Methods1 (1)
18/51
8/3/2019 Time Domain Methods1 (1)
19/51
Block Diagram Representation
[ ]2x(n) x2
(n)
| |x(n) |x(n)|
h(n) En
w(n) Mn
)()( 2 mwnh !
8/3/2019 Time Domain Methods1 (1)
20/51
Block Diagram Representation
[ ]2x(n) x2
(n)
| |x(n) |x(n)|
h(n) En
w(n) Mn
)()( 2 mwnh !
What is the effect of windows?
8/3/2019 Time Domain Methods1 (1)
21/51
The Effects of Windows
Window length
Window function
8/3/2019 Time Domain Methods1 (1)
22/51
Rectangular Window
ee!
otherwiseNnnh
0101)(
)2/sin(
)2/sin()( 2/)1(
NeeH Njj !
8/3/2019 Time Domain Methods1 (1)
23/51
m[(
Mainlobe
width
Rectangular Window
)2/sin(
)2/sin()( 2/)1(
NeeH Njj !
Peak sidelobe
T T T2T 2
|)(|
j
eH
N
2
N
2
N=88
8/3/2019 Time Domain Methods1 (1)
24/51
Rectangular Window
)2/sin(
)2/sin()( 2/)1(
NeeH Njj !What is this?
Discuss the effect of window duration.
Discuss the effect of mainlobe width and sidelobe peak.
m[(
Mainlobe
width
Peak sidelobe
T T T2T 2
|)(|
j
eH
N
2
N
2
N=88
8/3/2019 Time Domain Methods1 (1)
25/51
Commonly Used Windows
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20
R
ectangular
BlackmanHanning
Bartlett
Hamming
8/3/2019 Time Domain Methods1 (1)
26/51
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20
Rectangular
Blackman
Hanning
Bartlett
Hamming
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20
Rectangular
Blackman
Hanning
Bartlett
Hamming
Commonly Used Windows
ee
!otherwise
Nnnw
0
101)(
e
ee
!
otherwise
NnNNn
NnNn
nw
0
12/)1()1/(22
2/)1(0)1/(2
)(
ee
! otherwise
NnNn
nw 0
10)]1/(2cos[5.05.0
)(
ee
!otherwise
NnNnnw
0
10)]1/(2cos[46.054.0)(
ee
! otherwise
NnNnNn
nw 0
10)]1/(4cos[8.0)]1/(2cos[5.042.0
)(
Rectangular
Bartlett
(Triangular)
Hanning
Hamming
Blackman
8/3/2019 Time Domain Methods1 (1)
27/51
Commonly Used Windows
Rectangular
Bartlett
Hanning
Hamming
Blackman
Least mainlobe width
8/3/2019 Time Domain Methods1 (1)
28/51
Examples: Short-Time Energy
RectangularWindow HammingWindow
8/3/2019 Time Domain Methods1 (1)
29/51
Examples: Average Magnitude
RectangularWindow HammingWindow
8/3/2019 Time Domain Methods1 (1)
30/51
The Effects of Window Length
Increasing the window lengthN, decreases
the bandwidth. IfNis too small, e.g., less than one pitch
period, En and Mn will fluctuate very rapidly.
IfNis too large, e.g., on the order of severalpitch periods, En and Mn will change very
slowly.
8/3/2019 Time Domain Methods1 (1)
31/51
The Choice of Window Length
No signal value ofNis entirely satisfactory.
This is because the duration of a pitch period
varies from about 2 ms for a high pitch
female or a child, up to 25 ms for a very lowpitch male.
8/3/2019 Time Domain Methods1 (1)
32/51
Sampling Rate Thebandwidth of both En and Mn is just that
of the lowpass filter. So, they need not be sampled as frequently as
speech signals.
For example Frame size =20ms
Sample period =10ms
8/3/2019 Time Domain Methods1 (1)
33/51
Main Applications ofEn and Mn
To provide the basis for distinguishing
voiced speech segments from unvoicedsegments.
Silence detection.
8/3/2019 Time Domain Methods1 (1)
34/51
Differences ofEn and Mn
g
g!!
m
n mnwmxE2
)]()([
g
g!
!m
n mnwmxM )(|)(|
Emphasizing large sample-to-
sample variations in x(n).
The dynamic range (max/min)
is approximately the square
root ofEn.
The differences in level between voiced and unvoiced
regions are not as pronounced as En.
8/3/2019 Time Domain Methods1 (1)
35/51
FIR and IIR
All the windows that we discussed
are FIRs.
Each of them is a lowpass filter.
It can also be an IIR.
8/3/2019 Time Domain Methods1 (1)
36/51
IIR Example
u
! 00
0
)( n
na
nh
n
11
1
)( ! azzH
Recursive formulas:
)(21 nxaEE nn !
|)(|1 nxaMM nn !
Short-Time Energy:
Short-Time
Average magnitude:
8/3/2019 Time Domain Methods1 (1)
37/51
Time-Domain Methods forSpeech Processing
Short-Time Average
Zero-Crossing Rate
8/3/2019 Time Domain Methods1 (1)
38/51
Voiced and Unvoiced Signals
Th/i/s
Thi/s/
8/3/2019 Time Domain Methods1 (1)
39/51
The Short-Time Average Zero-Crossing Rate
g
g!
!m
n mnwmxmxZ )(|)]1(sgn[)](sgn[|
x(n) FirstDifference
| |ZnLowpass
Filter
u!
0)(1
0)(1)](sgn[
mx
mxmx 10
2
1)( ee! Nm
Nmw
8/3/2019 Time Domain Methods1 (1)
40/51
Distribution of Zero-Crossings
8/3/2019 Time Domain Methods1 (1)
41/51
Example
8/3/2019 Time Domain Methods1 (1)
42/51
Block diagram of the voiced/unvoiced classification.
8/3/2019 Time Domain Methods1 (1)
43/51
Frame-byframe processing of speech signal.
8/3/2019 Time Domain Methods1 (1)
44/51
Zero-Crossings Rate
A zero crossing is said to occur if successive
samples have different algebraic signs.The rate at which zero crossings occur is a
simple measure of the frequency content of a
signal.Zero-crossing rate is a measure of number of
times in a given time interval/frame that the
amplitude of the speech signals passes through a
value of zero.
8/3/2019 Time Domain Methods1 (1)
45/51
Distribution of zero-crossings for unvoiced and voiced speech
8/3/2019 Time Domain Methods1 (1)
46/51
A definition for zero-crossings rate is:
Where
and
8/3/2019 Time Domain Methods1 (1)
47/51
Original speech signal for word four.
8/3/2019 Time Domain Methods1 (1)
48/51
8/3/2019 Time Domain Methods1 (1)
49/51
Summary
Energy of a speech is a parameter for
classifying the voiced/unvoiced parts. The voiced part of the speech has high energy
because of its periodicity and the unvoiced part
of speech has low energy.
8/3/2019 Time Domain Methods1 (1)
50/51
Zero-crossing rate is an important parameter for voiced/
unvoiced classification.
used as a part of the front-end processing in automatic
speech recognition system.
Voiced speech is produced because of excitation of vocal
tract by the periodic flow of air at the glottis and usuallyshows a low zero-crossing count ,
The unvoiced speech is produced by the constriction of the
vocal tract narrow enough to cause turbulent airflow which
results in noise and shows high zero-crossing count.
8/3/2019 Time Domain Methods1 (1)
51/51
The results suggest that
zero crossing rates are low for voiced part andhigh for unvoiced part where as the energy is
high for voiced part and
low for unvoiced part.