Page 1
Copyright © 2018 IJECCE, All right reserved
49
International Journal of Electronics Communication and Computer Engineering
Volume 9, Issue 1, ISSN (Online): 2249–071X
Classification of the Fricative and Occlusive
Consonants According to the Place and the Mode of
Articulation
Soufyane Mounir*, Karim Tahiry and Abdelmajid farchi
Date of publication (dd/mm/yyyy): 03/03/2018
Abstract – In this article, we study the classification of
occlusive and fricatives consonants in standard modern
Arabic for three different articulation sites: bilabial, alveolar
(dental) / interdental and velar. By calculating the four
spectral moments after pretreatment of our speech signal, we
can classify these consonants according to the place and the
mode of articulation.
Keywords – A Fricative, Occlusive, Place of Articulation,
Spectral Moments
I. INTRODUCTION
Several methods can be adopted to improve the speech
recognition rate. Among these methods are the extraction of
characteristics that are characterized by observation vectors
determined by time methods such as linear predictive
coding (LPC) or Mel Frequency Cepstral Coding (MFCC).
The feature extraction phase is a very important factor in
the development of a recognition system [1, 2, 3].
Nishinuma studied the French language where he tried in
his research to define protocols for detecting consonant
clusters based on temporal size. He used bisyllabic and
trisyllabic words where he inserted target syllables CCV,
VCC, CV and VC associated with the vowels / i, a, ã / [4].
Nishinuma studied the French language where he tried in
his research to define protocols for detecting consonant
clusters based on temporal size. He used bisyllabic and
trisyllabic words where he inserted target syllables CCV,
VCC, CV and VC associated with the vowels / i, a, ã / [4].
Using statistical analysis, he came to retrieve five
parameters which we quote relate voicing, manner of
articulation of the first half of the group, ratio of the
duration of the vowel and consonant duration of the
segment, duration of the consonant segment and the
position in the word. These rules allowed a correct
classification of 90.13% of consonant groups.
Other researchers have thought of exploiting spectral
moments to classify the consonants. It is a popular subject
in phonetic literature over the last decades [5 6 7 9 10], in
the processing and automatic recognition of speech [11 12
13] and in the literature on clinical phonetics [14 15]. Forest
sought to classify occlusive consonants using spectral
moment analysis [16 17 18]; he found reliable results for
certain categories such as the place of fricative articulation
[19 20]. McMurray and Jongman used a broad combination
of measures to model fricative perception and
representation [9].
Other researchers have found it difficult to extract
invariants that make it possible to distinguish fricatives
according to the place of articulation [18 20 9 21].
Measurements on acoustic parameters that are reported for
the fricatives namely: spectral moments, F2onset
frequency, locus equation, slope of the spectrum, location
of spectral peaks, measurement of static and dynamic
amplitudes and the duration of the noise were based on
discrete Fourier transforms [22 23 24]. They concluded that
there is no invariance in the acoustic signal and, therefore,
the categorization of speech by the listeners requires a
massive integration of signals as well as mechanisms of
compensation able to manage the contextual influences.
Spinu and Lilley were interested in the classification of
fricatives. For this, they examined two methods. From a
corpus of Romanian fricatives and for the coding of speech,
the first method is based on the comparison of two acoustic
measurements: the spectral moments and the cepstral
coefficients. For the second method, they aimed at
extracting measurements in segment areas after comparing
two techniques of their determination [25]. For the first
method, Spinu and Lilley divided the phonetic segments
into three zones of almost equal duration, while in the
second method, they used hidden Markov models (HMM)
to break each segment into three regions. For the 2nd
method, they aimed at extracting measurements in
segments areas after comparing two techniques for their
determination [25]. About the 1st method, Spinu and Lilley
divided the phonetic segments into three zones of almost
equal duration, whereas in the second method, they used
models of Markov hidden (HMM) to decompose every
segment into three regions of such kind to minimize the
variances of the measures in every region. Having classified
fricatives according to the place of articulation, the
harmonization, the state of palatalization and the sex by
using the logistic regression, they found relevant results at
the level of the use of the cepstral coefficients which are
more reliable than the spectral moments at the level of the
classification. On the other hand, they ended that the use of
zones identified by HMM possesses a rate of classification
higher than the use of regions of equal duration.
In our study we try to classify the occlusive and fricatives
consonants in standard modern Arabic language for three
places of articulation: (bilabial, alveolar or dental, velar)
and (bilabial, interdental, velar) respectively by means of
spectral moments: spectral mean (m1), standard deviation
(2), skewness (3) and kurtosis (4). We also try to make a
comparison between these two modes of articulation.
II. CORPUS
Articulatory data were collected for fifteen Moroccan
men, by pronouncing the CV syllable on four occasions,
Page 2
Copyright © 2018 IJECCE, All right reserved
50
International Journal of Electronics Communication and Computer Engineering
Volume 9, Issue 1, ISSN (Online): 2249–071X
where "C" " is the consonant and " V " the vowel is. The
concerned consonants are (bilabial: /b/=/ب/, alveolar or
dental consonant: /d/=/د /, velar (/k / =/ ك / for occlusives)
and (bilabial: /f/=/ف/, interdental consonant: /ð/=/ذ/, velar:
/ɣ/=/غ/ for fricatives). For the vowels, we used the short
vowels / a, i, u/. In an isolated room and using " Praat "
software, we used for the recording a microphone (Labtec
AM-232, sensibility: 35 dB, Impedance: 2,2 kOhm,
bandwidth: 20à 8500 Hz) at 20 cms on a PC. With a
frequency of sampling of 22050 Hz, the sound is directly
scanned on a PC. We used the same software to segment the
syllables CV.
III. SPEECH SIGNAL PROCESSING
A. Pretreatment The pretreatment of the voice signal for the automatic
speech recognition of the word is a compression of the data
to facilitate a real time estimation. The estimation itself can
be made in the temporal domain or on the result of an
analysis court-term made by the pretreatment. That will be
useless to deal all of the signal (word / not word), for it we
need to isolate the vocal activity by using a combination of
two techniques: energy level and the passage by zero.
B. Preemphasis We meet a problem of decrease of amplitude in the
spectrogram, for it we have to accentuate the sppech x (n)
by calculating the magnitude x′(n) = x(n) − αx(n − 1). It
is the filter which serves to amplify high frequencies. More
𝛼 est grand, more the magnitude is raised in high frequency.
In our experience, we chose 𝛼 = 0, 95 obtained from the
following formula 𝜶 = 𝒆(−
𝟐𝝅𝟏𝟎𝟎
𝑭𝒔).
C. Windowing and FFT Before extracting the parameters of the speech signal, it
is essential to break it down into segments because it is of a
non-stationary nature. By multiplying each segment by a
Hamming window, we succeed in weakening the
dicontinuities at the ends. This window is given by the
following equation:
𝑥1(𝑛) = 𝑥2(𝑛). (0,54 + 0,46. 𝑐𝑜𝑠 (2𝜋𝑛
𝑁 − 1) (1)
Where N is the number of samples.
The FFT step transforms the speech signal into a
frequency domain [2]:
𝑋𝑛 = ∑ 𝑥𝑘𝑒−2𝜋𝑗𝑘𝑛
𝑁
𝑁−1
𝑘=0
, 𝑛 = {0, 1 … . , 𝑁 − 1} (2)
D. Calculating Spectral Moments In our work, we are interested in the method of spectral
moments after transforming the signal into a frequency
domain. The four spectral moments concerned are: spectral
mean (m1), standard deviation (m2), skewness (m3)
kurtosis (m4).
The spectral mean, the standard deviation, the skewness
and the kurtosis are respectively given by: 𝑚1 =
∑ √𝑃(𝑓𝑖)
∑ 𝑃(𝑓𝑖)𝑛/2𝑖=0
𝑓𝑖𝑛/2𝑖=0 , 𝑚2 = √∑ (𝑓𝑖 − 𝑚1)2𝑛/2
𝑖=0
𝑃(𝑓𝑖)
∑ 𝑃(𝑓𝑖)𝑛/2𝑖=0
,
𝑚3 = ∑ (𝑓𝑖−𝑚1
𝑚2)
3𝑛/2𝑖=0
𝑃(𝑓𝑖)
∑ 𝑃(𝑓𝑖)𝑛/2𝑖=0
and
m4= −3 + ∑ (𝑓𝑖−𝑚1
𝑚2)
4𝑛2⁄
𝑖=0
𝑃(𝑓𝑖)
∑ 𝑃(𝑓𝑖)𝑛/2𝑖=0
. Where: 𝑃(𝑓𝑖) is the
power of the spectrum, 𝑓𝑖 = 2𝑓𝑛𝑦𝑞 .𝑖
𝑛 , 𝑖 = 0, 1, … ,
𝑛
2 𝑒𝑡 𝑛 =
256 and 𝑓𝑛𝑦𝑞 is the Nyquist frequency [26].
IV. RESULTS AND DISCUSSIONS
The value of "m1" of the consonant / k / is greater than /
b / and / d /. The spectral mean of / f / has the largest value
followed by / ð / then by / ɣ / (Figure 1, 2). From these
results, we find that the spectral mean allows to classify the
velar occlusive consonants, on the other hand, it allows to
classify all the places of articulation of the fricative
consonants studied. We also find that in the case of
occlusive consonants, when the consonant is produced at
the level of the posterior cavity, the spectral mean is greater
than that of the consonant produced at the level of the
anterior oral cavity. So here we are talking about the size of
this organ. These latter results are in harmony with what is
found by Nitrouer and Stevens who stated that the oral
cavity and the spectral mean are dependent [27 28]. For the
standard deviation, the values of the alveolar occlusive
consonants are larger than those of the velar than those of
the bilabial ones. As for the fricative consonants, also the
interdentals spectrums are the most dispersed followed by
the bilabial and the velar.
The third spectral moment provides information on the
location of the spectrum compared to the normal
distribution. All occlusive consonants have positive values,
which shows that their spectrum is offset to the left, that the
totality of the acoustic energy is contained in the low
frequencies. More precisely, the velar consonants are the
closest to the axis of symmetry of the normal distribution,
unlike the bilabial consonants. On the other hand, at the
level of consonants fricatives, we find two opposite
(opposed) signs, the spectrum of interdental consonants is
moved to the right of the axis of symmetry, but it remains
closer, on the other hand, two other places of joints
(articulations) possess spectrum where the maximal energy
is contained in the low frequencies where the spectrum of
the bilabial stays farthest of this axis.
The results obtained from the last moment for the
occlusive consonants show that the spectral distribution of
the bilabial is the narrowest seen that it corresponds to the
biggest value of the kurtosis, on the other hand that of the
velar is the most flattened. For fricatives, the spectrum of
the velar is the most flattened with compared with the other
places of articulation. We can say that according to the
place of articulation. The moments "m1 ", "m2" and "m3"
allow to classify occlusive consonants and fricatives
consonants.
Page 3
Copyright © 2018 IJECCE, All right reserved
51
International Journal of Electronics Communication and Computer Engineering
Volume 9, Issue 1, ISSN (Online): 2249–071X
Fig. 1. The four spectral moments of the occlusive consonants for three places in context CV
Fig. 2. The four spectral moments of the fricative consonants for three places in context CV
Page 4
Copyright © 2018 IJECCE, All right reserved
52
International Journal of Electronics Communication and Computer Engineering
Volume 9, Issue 1, ISSN (Online): 2249–071X
Fig. 3. Comparison between the values of the four spectral moments of the occlusive and fricative consonants for three
different places of articulation.
We notice that at the level of the spectral mean, the
bilabial and interdental fricatives present the biggest values
that those of the occlusive consonant bilabial and alveolar
(dental consonants) respectively (Figure 3). On the other
hand, the velar fricatives present a spectral mean less
important than that of the velar occlusive consonants. About
the second moment, the spectral distributions of bilabial and
dental occlusive consonant are less dispersed than those of
the bilabials and interdental consonants fricatives
respectively, but no difference is noticed about velar
occlusive and velar fricative.
Concerning the third moment, the spectral distributions
of bilabial and velar occulsives and fricatives carry the
totality of energy in the low frequencies, but the first ones
are the most offset compared to the second ones. The big
difference is noted about the dentals and interdentals where
they have a right-hand asymmetry (negative value), while
the total energy of the spectral distributions of the dentals is
contained in the low frequencies and this can be due to the
little difference in the place of articulation.
Remarkable results at the fourth moment showed that
bilabial, interdental, and velar spectrum are narrower than
those of bilabial, alveolar and velar occlusive respectively.
We can say that "m1" makes it possible to differentiate
between the occlusive consonants and the fricative
consonants according to the place of articulation. The first
moment is less important in bilabial and alveolar occlusive
compared to bilabial and interdental fricative. On the other
hand, it is more important in velar occlusive than in
fricatives. The moment "m4" makes it possible to classify
the occlusive consonants in front of the fricative consonants
whatever the place of articulation studied since the values
of the two moments of the occlusive for all these places are
more important than those of the fricatives.
V. CONCLUSION
This study, based on the standard modern Arabic
language, where we treated two different modes of
articulation (occlusive and fricative) for three different sites
of articulation (bilabial, dental / interdental, velar), revealed
several interesting results. "m1", "m3" and "m4" can
classify consonants according to the place of articulation for
a single mode of articulation. The moment "m4" makes it
possible to distinguish between the occlusives and the
fricatives irrespective of the place of articulation.
REFERENCES [1] Deroo, O., 1998 , « Modèles Dependant du context et Méthodes
de Fusion de données Appliquées à la reconnaissance de la parole
par Modèles Hybrides HMM/MPL, ’’Faculté Polytechnique de
Mons’’ [2] Bergounioux, M., 2010, « Mathématiques pour le traitement du
signal : cours et exrcices corrigés » Dunod, paris, pp. 270-279
[3] M. Bellanger « Traitement Numérique du signal : Théorie et pratique » edition Masson 1987, p : 363
[4] [Nishinuma, Y., Duez, D., & Paboudjian, C., 1991, « Automatic
classification of consonant clusters in French », Speech
communication, Vol. 10, pp. 395-403.
[5] Hoelterhoff, J., & Reetz, H., 2007, « Acoustic cues discriminating
German obstruents in place and manner of articulation » Journal of the Acoustical Society of America, 121, 1142–1156.
[6] Jesus, L. M. & Jackson, P. J, 2008, « Frication and voicing
classification. In A. Teixeira, V. L. S. de Lima, L. C. de Oliveira, & P. Quaresma (Eds.) », Computational processing of the
Portuguese language: The 8th International Conference on
Page 5
Copyright © 2018 IJECCE, All right reserved
53
International Journal of Electronics Communication and Computer Engineering
Volume 9, Issue 1, ISSN (Online): 2249–071X
Computational Processing of Portuguese (PROPOR) (pp. 11–20).
Berlin, Heidelberg: Springer - Verlag. [7] Jesus, L. M., & Shadle, C. H.., 2002, « A parametric study of the
spectral characteristics of European Portuguese fricatives »,
Journal of Phonetics, 30(3), 437–464. [8] Koenig, L. L., Shadle, C. H., Preston, J. L., & Mooshammer, C.
R.., 2013, « Toward improved spectral measures of /s/: Results
from adolescents », Journal of Speech, Language, and Hearing Research, 56, 1175–1189.
[9] McMurray, B., & Jongman, A., 2011, « What information is
necessary for speech categorization? Harnessing variability in the speech signal by integrating cues computed relative to
expectations », Psychological Review, 118, 219–246.
[10] [Shadle, C. H. & Mair, S. J., 1996, « Quantifying spectral characteristics of fricatives », In Proceedings of the international
conference on spoken language processing (ICSLP 96) (pp. 1517–
1520). Philadelphia. [11] Childers, D. G., Wu, K., Bae, K. S., & Hicks, D. M.,
1998, « Automatic recognition of gender by voice », Paper
presented at the IEEE international conference on acoustics, speech, and signal processing (ICASSP-88).
[12] Farooq, O., & Datta, S., 2001, « Mel filter-like admissible wavelet
packet structure for speech recognition », IEEE Signal Processing Letters, 8(7), 196–198.
[13] Halberstadt, A. K. & Glass, J. R., 1997, « Heterogeneous acoustic
measurements for phonetic classification ». EUROSPEECH [14] Kong, Y.-Y., Mullangi, A., & Kokkinakis, K., 2014,
« Classification of fricative consonants for speech enhancement in hearing devices. PLoS ONE », 9(4), e95001,
http://dx.doi.org/10.1371/ journal.pone.0095001
[15] Todd, A., Edwards, J., & Litovsky, R., 2011, « Production of contrast between sibilant fricatives by children with cochlear
implants », Journal of the Acoustical Society of America, 130,
3969–3979.
[16] Forrest, K., Weismer, G., Elbert, M., & Dinnsen, D. A., 1994,
« Spectral-analysis of target-appropriate T and K produced by
phonologically disordered and normally articulating children », Clinical Linguistics and Phonetics, 8, 267–281.
[17] Forrest, K., Weismer, G., Hodge, M., Dinnsen, D. A., & Elbert,
M., 1990, « Statistical-analysis of word-initial K and T produced by normal and phonologically disordered children », Clinical
Linguistics and Phonetics, 4, 327–340.
[18] Forrest, K., Weismer, G., Milenkovic, P., & Dougall, R., 1988, « Statistical analysis of word-initial voiceless obstruents:
Preliminary data », Journal of the Acoustical Society of America,
84, 115–124. [19] Flipsen, P., Shriberg, L., Weismer, G., Karlsson, H., &
McSweeny, J., 1999, « Acoustic characteristics of /s/ in
adolescents », Journal of Speech, Language, and Hearing Research, 42, 663–677.
[20] Jongman, A., Wayland, R., & Wong, S., 2000, « Acoustic
characteristics of English fricatives ». Journal of the Acoustical Society of America 10/2000, 108(3 Pt 1), 1252–1263.
[21] Tomiak, G., 1990, « An acoustic and perceptual analysis of the
spectral moments invariant with voiceless fricative obstruents (Doctoral dissertation) », State University of New York, Buffalo
[22] Koenig, L. L., Shadle, C. H., Preston, J. L., & Mooshammer, C.
R., 2013, « Toward improved spectral measures of /s/: Results from adolescents », Journal of Speech, Language, and Hearing
Research, 56, 1175–1189.
[23] Lousada, M., Jesus, L., & Pape, D., 2012, « Estimation of stops' spectral place cues using multitaper techniques ». DELTA, 28(1),
1–26.
[24] Zygis, M., Pape, D., & Jesus, L., 2012, « (Non) retroflex Slavic affricates and their motivation: Evidence from Czech and Polish »,
Journal of the International Phonetic Association, 42(3), 281–329.
[25] Spinu, L., & Lilley, J., 2016, « A comparison of cepstral coefficients and spectral moments in the classification of
Romanian fricatives », Journal of Phonetic, Vol, 57, pp. 44-58
[26] Feng, Y., Hao, G, J., Xue, S, A., Max, L, 2011, “Detecting
anticipatory effects in speech articulation by means of spectral
coefficient analyses”, Speech Communication, Vol.53, pp. 842–
854. [27] Nittrouer, S., 1995, “Children learn separate aspects of speech
production at different rates: evidence from spectral moments”, J. Acoust. Soc. Am. Vol. 97, pp. 520–530.
[28] [28] Stevens, K.N., 1998, “Acoustic Phonetics”, The MIT Press,
Cambridge, MA.
AUTHORS’ PROFILES
Pr. Soufyane Mounir
Member of research team "signals and systems" in laboratory of mechanical engineering, industrial management and innovation; Professor
in National School of Applied Sciences, University Hassan 1st, Khouribga,
Morocco.
Phd. Karim Tahiry
Member of research team "signals and systems" in laboratory of mechanical engineering, industrial management and innovation; Phd of
Science and Technical Faculty, University Hassan 1st, Settat, Morocco.
Pr. Abdelmajid Farchi
Chief of research team "signals and systems" in laboratory of mechanical
engineering, industrial management and innovation; Professor in Science and Technical Faculty, University Hassan 1st, Settat, Morocco.