I I T B o m b a y a r j a y a n @ e e . i i t b . a c . i n , p c p a n d e y @ e e . i i t b . a c . i n 14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro. Landmark detection Exp. Res . Sum. Detection of Acoustic Landmarks with High Resolution for Speech Processing A. R. Jayan P. C. Pandey V. K. Pandey {arjayan, pcpandey,vinod}@ee.iitb.ac.in EE Dept, IIT Bombay 3 rd February, 2008
29
Embed
IIT Bombay [email protected], [email protected] 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India
1/27Intro. Landmark detection Exp. Res. Sum.
Detection of Acoustic Landmarks withHigh Resolution for Speech Processing
A. R. JayanP. C. PandeyV. K. Pandey
{arjayan, pcpandey,vinod}@ee.iitb.ac.in
EE Dept, IIT Bombay3rd February, 2008
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India
2/27Intro. Landmark detection Exp. Res. Sum.
PRESENTATION OUTLINE1. Introduction
Acoustic properties of clear speech Landmark detection Need for high time resolution
2. Automated landmark detection with high resolution Pass 1 Pass 2
3. Experimental results
4. Summary and conclusion
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India
3/27Intro. Landmark detection Exp. Res. Sum.
1. INTRODUCTIONAcoustic properties of clear speechClear speech: Speech produced with clear articulation when talking to a hearing impaired listener, or in noisy environments
Accurate detection of regions for modification Analysis-modification-synthesis with low processing artifacts Processing without increasing overall speaking rate, increase in transition
regions with a corresponding dicrease in srteady state segments
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India
6/27Intro. Landmark detection Exp. Res. Sum.
Intelligibility enhancement using properties of clear speech
Hazan & simpson, 1998
manually labeled VCV and sentences intensity modification of stop burst + 12 dB, frication + 6dB, nasal + 6dB spectral modification by filtering
Colotte & Laprie, 2000
automated method for identifying regions based on mel-cepstral analysis stops and unvoiced fricatives amplified by +4 dB
transition segments time-scaled by 1.8, 2.0 (TD-PSOLA)
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India
7/27Intro. Landmark detection Exp. Res. Sum.
Landmark detection
Speech landmarks Regions containing important information for speech perception
Associated with spectral transitions
Landmarks types
1. Abrupt-consonantal (AC) – Tight constrictions of primary articulators
2. Abrupt (A) -Fast glottal or velum activity
3. Non-abrupt (N) - Semi-vowel landmarks, less vocal tract constriction
4. Vocalic (V) - Vowel landmarks
Abrupt (~68%) Vocalic (~29%) Non-abrupt (~3%)
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India
8/27Intro. Landmark detection Exp. Res. Sum.
Landmarks
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India
9/27Intro. Landmark detection Exp. Res. Sum.
Liu, 1996
▪ Based on energy variation in 6 spectral bands0-0.4, 0.8-1.5, 1.2-2.0, 2.0-3.5, 3.5-5.0, 5.0-8 kHz
▪ Parameter: First difference of maximum energy (log) in each spectral band
time-step = 50 ms in coarse level, 26 ms in fine level
▪ Matching of peaks across bands for locating boundaries
Application: Extraction of features for supporting speech recognition
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India
10/27Intro. Landmark detection Exp. Res. Sum.
Detection rate vs. temporal resolution
73 %
83 %88 %
44 %
Uses same processing for all types of landmarks
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India
11/27Intro. Landmark detection Exp. Res. Sum.
Niyogi & Sondhi, 2002
for stop consonants total energy & energy above 3 k Hz in log scale measure of spectral flatness non-linear operator optimized for burst detection
Salomon et al., 2002
Hilbert transform based envelope to extract temporal parameters spectral information adaptive time-steps (5 ms for burst onset, 30 ms for frication, 2 х pitch period for periodic regions)
Alani & Deriche, 1999
wavelet transform based decomposition energy variations in 6 bands
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India
12/27Intro. Landmark detection Exp. Res. Sum.
Need for high temporal resolution and detection rate
Application dependent Speech recognition: Analysis is performed around landmarks for parameter
extraction▪ high accuracy▪ moderate temporal resolution (20-30 ms)
Intelligibility enhancement: Modify landmark regions ▪ high temporal resolution (< 5 ms)
▪ some tolerance to detection errors, but low tolerance to insertions as insertions may introduce distortions
Landmark type ▪ Short duration events (bursts) need high time resolution
▪ voicing onsets/offsets may not require this much resolution as signal properties remain same for a long duration
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India
13/27Intro. Landmark detection Exp. Res. Sum.
Factors limiting detection rate and temporal resolution
▪ Effectiveness of parameters in capturing acoustic variations▪ short-time energy variation in spectral bands
weak burst may not get detected▪ centroid frequency
not well defined during low energy segments▪ fixed band boundaries
may not adapt to speech variability
▪ Smoothening performed during parameter extraction
▪ temporal smoothening on spectrum affects time resolution
▪ Type of distance measure ▪ first difference operation not optimized for all types of landmarks
▪ time-step 10 ms is too high for burst detection
▪ Effect of noise on parameters
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India
14/27Intro. Landmark detection Exp. Res. Sum.
Acoustic cues for the different phonetic events are distributed non-homogeneously in the time-frequency plane
Separate detectors are required for each phonetic class
Each detector must use a method most suited for the phonetic event Objective
Automated detection of landmarks for stop consonants with high temporal resolution, for applications in speech intelligibility enhancement
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India
15/27Intro. Landmark detection Exp. Res. Sum.
speech
Short-time spectalanalysis
Computation of energypeaks and centroids in the
frequency bands
Computation of spectraltransition index
Waveletdecomposition
around landmarks
Computation ofshort-time energy
and ZCRs
Pass 1
Computation of energy andcentroid RORs
Landmark localization
Pass 2
Landmarks(pass 1)
Computation ofenergy and ZCR
RORs
Landmark localisation
Landmarks(pass 2)
2. AUTOMATED LANDMARK DETECTION
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India
16/27Intro. Landmark detection Exp. Res. Sum.
Landmark detection using spectral peaks and centroids
Pass 1 Spectrum divided into five non-overlapping bands
▪ 0–0.4, 0.4–1.2, 1.2–2.0, 2.0–3.5, 3.5–5.0 kHz
▪ Sampling frequency 10 k samples/s,
▪ 512-point FFT on 6 ms frames
▪ frame rate 1 ms.
Parameters▪ maximum energy in each spectral band, every 1 ms
▪ band centroids estimated in each band, every 1 ms
▪ features similar to formant peaks and formant frequencies
▪ can be estimated easily
▪ not much affected by noise
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India
17/27Intro. Landmark detection Exp. Res. Sum.
2 22 2
1 1
( , ) /k k
f b n k X X f Nc sk kk k k k
2
1 210( , ) 10 log max ,E b n X k k kp k Peak energy
Centroid frequency
Rate-of-rise functions
Transition index
' , ( , ) ( , )E b n E b n K E b n Kp p p
' ( , ) ( , ) ( , )f b n f b n K f b n Kc c c
5 ' '( ) ( , ) ( , )1
T n E b n f b nr p cb
tracks simultaneous variation of energy and centroid
centroids given less weighting in low energy areas
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India
18/27Intro. Landmark detection Exp. Res. Sum.
Example: /uka/
Peak & centroid contours
0-0.4 kHz
0.4-1.2 kHz
1.2-2.0 kHz
2.0-3.5 kHz
3.5-5.0 kHz
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India
19/27Intro. Landmark detection Exp. Res. Sum.
Example: /uka/
Peak & centroid ROR contours
Time step = 26 ms
0-0.4 kHz
0.4-1.2 kHz
1.2-2.0 kHz
2.0-3.5 kHz
3.5-5.0 kHz
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India
20/27Intro. Landmark detection Exp. Res. Sum.
Example: /uka/
Transition index
derived from RORs with time step = 26 ms
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India
21/27Intro. Landmark detection Exp. Res. Sum.
Example: /uka/
Transition index
derived from RORs with time step = 4 ms
Less sensitive to slow transitions
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India
22/27Intro. Landmark detection Exp. Res. Sum.
Problems
Large time step ( > 20 ms)
▪ detects with less temporal accuracy
▪ detects slowly varying events also (more detection rate)
Small time step (< 5 ms)
▪ detects abrupt transitions with good resolution
▪ misses slow transitions.
Pass 2:
Analyze landmarks detected in Pass 1 with a small time-step
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India
23/27Intro. Landmark detection Exp. Res. Sum.
Improving Temporal resolution : Pass 2
▪ 40 ms window centered around burst landmarks detected in pass 1
▪ decomposed to 6 levels by discrete Meyer Wavelet
▪ detail (high frequency) contents in the lower two levels used for localizing bursts
Parameters ▪ short time energy variation
▪ zero crossing rate
Compute normalized RORs with a time-step of 3 ms
Get a new transition index as
Relocate landmark to the location corresponding to the peak in Tez(n)
2
1( ) 0.5 '( , ) '( , )ez n n
lT n E l n Z l n
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India
24/27Intro. Landmark detection Exp. Res. Sum.
Relocating stop landmarks
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India
25/27Intro. Landmark detection Exp. Res. Sum.
Relocating stop landmarks
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India
26/27Intro. Landmark detection Exp. Res. Sum.
Relocating stop landmarks
IIT B
om
bay
arja
yan
@e
e.i i
tb.a
c .in
, p
c pa
nd
ey@
ee
.i itb
.ac.
in14 th National Conference on Communications , 1-3 Feb. 2008, IIT Bombay, Mumbai, India