Arlindo Veiga Dirce Celorico Jorge Proença Sara Candeias Fernando Perdigão Prosodic and Phonetic Features for Speaking Styles Classification and Detection IberSPEECH 2012 - VII Jornadas en Tecnología del Habla & III Iberian SLTech Workshop November 21-23 2012, Universidad Autónoma de Madrid, Madrid, SPAIN
18
Embed
Arlindo Veiga Dirce Celorico Jorge Proença Sara Candeias Fernando Perdigão Prosodic and Phonetic Features for Speaking Styles Classification and Detection.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
| November 21-23 2012, Universidad Autónoma de Madrid, Madrid, SPAIN
Segmentation performance
:
0.5 s 1.0 s 1.5 s 2.0 s
0.3
0.4
0.5
0.6
0.7
0.8
Collar (seconds)
F1-
scor
eF1-score: collar range 0.5 s to 2.0 s
0.8
0.7
0.6
0.5
0.4
0.3
0.5 1.0 1.5 2.0
12
Results
IberSPEECH 2012
| November 21-23 2012, Universidad Autónoma de Madrid, Madrid, SPAIN
0.5 s 1.0 s 1.5 s 2.0 s
0.5
0.6
0.7
0.8
0.9
1
Collar (seconds)
Acc
urac
yRecall: collar range 0.5 s to 2.0 s
1.0
0.9
0.8
0.7
0.6
0.5
0.5 1.0 1.5 2.0
Segmentation performance
:
13
Results
IberSPEECH 2012
| November 21-23 2012, Universidad Autónoma de Madrid, Madrid, SPAIN
Automatic detection
Speech / non-speech detection
Type of features AT. Speech Non-speech Phonetic 91.5% 94.9% 62.2% Prosodic 93.2% 97.0% 61.0%
Combination 93.3% 96.6% 64.9%
Read / spontaneous detection
Type of features AT. Read Spontaneous Phonetic 76.7% 91.9% 38.6% Prosodic 81.1% 93.0% 51.2%
Combination 83.3% 92.7% 59.6%
“AT” – agreement time = % frame correctly classified
14
Results
IberSPEECH 2012
| November 21-23 2012, Universidad Autónoma de Madrid, Madrid, SPAIN
Classification only (using given manual segmentation)
Speech / non-speech classifier
Type of features Acc. Speech Non-speech Phonetic 93.8% 96.7% 82.0% Prosodic 93.8% 97.5% 81.9%
Combination 94.4% 97.6% 84.0%
Type of features Acc. Read Spontaneous Phonetic 83.2% 92.8% 55.4% Prosodic 86.4% 95.0% 61.6%
Combination 87.4% 93.7% 69.5%
“Acc.” – Accuracy
Read / spontaneous classifier
15
Conclusions and future work
IberSPEECH 2012
| November 21-23 2012, Universidad Autónoma de Madrid, Madrid, SPAIN
Read speech can be differentiated from spontaneous speech with reasonable accuracy.
Good results were obtained with only a few and simple measures of the speech signal.
A combination of phonetic and prosodic features provided the best results (both seem to have important and alternative information).
We have already implemented several important features, such as hesitations detection, aspiration detection using word spotting techniques, speaker identification using GMM and jingle detection based on audio fingerprint.
We intend to automatically segment all audio genres and speaking styles.
16
THANK YOU
IberSPEECH 2012
| November 21-23 2012, Universidad Autónoma de Madrid, Madrid, SPAIN
17
Appendix – BIC
IberSPEECH 2012
| November 21-23 2012, Universidad Autónoma de Madrid, Madrid, SPAIN
BIC (Bayesian Information Criterion)Dissimilarity measure between 2 consecutive segments
Two hypothesizes:H0 – No change of signal characteristics. Model: 1 Gaussian:H1 – Change of characteristics. 2 Gaussians:
μ – mean vector; S – covariance matrixMaximum likelihood ratMaximum likelihood ratio between H0 and H1:
X
X1 X2
1 2
1 22 2 2( ) log log logX X XN N NX X XR i
~ ; ,X XX N x μ Σ
1 1 1 2 2 2~ ; , ; ~ ; , ;X X X XX N x X N xμ Σ μ Σ
18
Appendix – BIC
IberSPEECH 2012
| November 21-23 2012, Universidad Autónoma de Madrid, Madrid, SPAIN