Classification of the Fricative and Occlusive Consonants ......Spinu and Lilley were interested in the classification of fricatives. For this, they examined two methods. From a corpus

Copyright © 2018 IJECCE, All right reserved

49

International Journal of Electronics Communication and Computer Engineering

Volume 9, Issue 1, ISSN (Online): 2249–071X

Classification of the Fricative and Occlusive

Consonants According to the Place and the Mode of

Articulation

Soufyane Mounir*, Karim Tahiry and Abdelmajid farchi

Date of publication (dd/mm/yyyy): 03/03/2018

Abstract – In this article, we study the classification of

occlusive and fricatives consonants in standard modern

Arabic for three different articulation sites: bilabial, alveolar

(dental) / interdental and velar. By calculating the four

spectral moments after pretreatment of our speech signal, we

can classify these consonants according to the place and the

mode of articulation.

Keywords – A Fricative, Occlusive, Place of Articulation,

Spectral Moments

I. INTRODUCTION

Several methods can be adopted to improve the speech

recognition rate. Among these methods are the extraction of

characteristics that are characterized by observation vectors

determined by time methods such as linear predictive

coding (LPC) or Mel Frequency Cepstral Coding (MFCC).

The feature extraction phase is a very important factor in

the development of a recognition system [1, 2, 3].

Nishinuma studied the French language where he tried in

his research to define protocols for detecting consonant

clusters based on temporal size. He used bisyllabic and

trisyllabic words where he inserted target syllables CCV,

VCC, CV and VC associated with the vowels / i, a, ã / [4].

Nishinuma studied the French language where he tried in

his research to define protocols for detecting consonant

clusters based on temporal size. He used bisyllabic and

trisyllabic words where he inserted target syllables CCV,

VCC, CV and VC associated with the vowels / i, a, ã / [4].

Using statistical analysis, he came to retrieve five

parameters which we quote relate voicing, manner of

articulation of the first half of the group, ratio of the

duration of the vowel and consonant duration of the

segment, duration of the consonant segment and the

position in the word. These rules allowed a correct

classification of 90.13% of consonant groups.

Other researchers have thought of exploiting spectral

moments to classify the consonants. It is a popular subject

in phonetic literature over the last decades [5 6 7 9 10], in

the processing and automatic recognition of speech [11 12

13] and in the literature on clinical phonetics [14 15]. Forest

sought to classify occlusive consonants using spectral

moment analysis [16 17 18]; he found reliable results for

certain categories such as the place of fricative articulation

[19 20]. McMurray and Jongman used a broad combination

of measures to model fricative perception and

representation [9].

Other researchers have found it difficult to extract

invariants that make it possible to distinguish fricatives

according to the place of articulation [18 20 9 21].

Measurements on acoustic parameters that are reported for

the fricatives namely: spectral moments, F2onset

frequency, locus equation, slope of the spectrum, location

of spectral peaks, measurement of static and dynamic

amplitudes and the duration of the noise were based on

discrete Fourier transforms [22 23 24]. They concluded that

there is no invariance in the acoustic signal and, therefore,

the categorization of speech by the listeners requires a

massive integration of signals as well as mechanisms of

compensation able to manage the contextual influences.

Spinu and Lilley were interested in the classification of

fricatives. For this, they examined two methods. From a

corpus of Romanian fricatives and for the coding of speech,

the first method is based on the comparison of two acoustic

measurements: the spectral moments and the cepstral

coefficients. For the second method, they aimed at

extracting measurements in segment areas after comparing

two techniques of their determination [25]. For the first

method, Spinu and Lilley divided the phonetic segments

into three zones of almost equal duration, while in the

second method, they used hidden Markov models (HMM)

to break each segment into three regions. For the 2nd

method, they aimed at extracting measurements in

segments areas after comparing two techniques for their

determination [25]. About the 1st method, Spinu and Lilley

divided the phonetic segments into three zones of almost

equal duration, whereas in the second method, they used

models of Markov hidden (HMM) to decompose every

segment into three regions of such kind to minimize the

variances of the measures in every region. Having classified

fricatives according to the place of articulation, the

harmonization, the state of palatalization and the sex by

using the logistic regression, they found relevant results at

the level of the use of the cepstral coefficients which are

more reliable than the spectral moments at the level of the

classification. On the other hand, they ended that the use of

zones identified by HMM possesses a rate of classification

higher than the use of regions of equal duration.

In our study we try to classify the occlusive and fricatives

consonants in standard modern Arabic language for three

places of articulation: (bilabial, alveolar or dental, velar)

and (bilabial, interdental, velar) respectively by means of

spectral moments: spectral mean (m1), standard deviation

(2), skewness (3) and kurtosis (4). We also try to make a

comparison between these two modes of articulation.

II. CORPUS

Articulatory data were collected for fifteen Moroccan

men, by pronouncing the CV syllable on four occasions,


50



where "C" " is the consonant and " V " the vowel is. The

concerned consonants are (bilabial: /b/=/ب/, alveolar or

dental consonant: /d/=/د /, velar (/k / =/ ك / for occlusives)

and (bilabial: /f/=/ف/, interdental consonant: /ð/=/ذ/, velar:

/ɣ/=/غ/ for fricatives). For the vowels, we used the short

vowels / a, i, u/. In an isolated room and using " Praat "

software, we used for the recording a microphone (Labtec

AM-232, sensibility: 35 dB, Impedance: 2,2 kOhm,

bandwidth: 20à 8500 Hz) at 20 cms on a PC. With a

frequency of sampling of 22050 Hz, the sound is directly

scanned on a PC. We used the same software to segment the

syllables CV.

III. SPEECH SIGNAL PROCESSING

A. Pretreatment The pretreatment of the voice signal for the automatic

speech recognition of the word is a compression of the data

to facilitate a real time estimation. The estimation itself can

be made in the temporal domain or on the result of an

analysis court-term made by the pretreatment. That will be

useless to deal all of the signal (word / not word), for it we

need to isolate the vocal activity by using a combination of

two techniques: energy level and the passage by zero.

B. Preemphasis We meet a problem of decrease of amplitude in the

spectrogram, for it we have to accentuate the sppech x (n)

by calculating the magnitude x′(n) = x(n) − αx(n − 1). It

is the filter which serves to amplify high frequencies. More

𝛼 est grand, more the magnitude is raised in high frequency.

In our experience, we chose 𝛼 = 0, 95 obtained from the

following formula 𝜶 = 𝒆(−

𝟐𝝅𝟏𝟎𝟎

𝑭𝒔).

C. Windowing and FFT Before extracting the parameters of the speech signal, it

is essential to break it down into segments because it is of a

non-stationary nature. By multiplying each segment by a

Hamming window, we succeed in weakening the

dicontinuities at the ends. This window is given by the

following equation:

𝑥1(𝑛) = 𝑥2(𝑛). (0,54 + 0,46. 𝑐𝑜𝑠 (2𝜋𝑛

𝑁 − 1) (1)

Where N is the number of samples.

The FFT step transforms the speech signal into a

frequency domain [2]:

𝑋𝑛 = ∑ 𝑥𝑘𝑒−2𝜋𝑗𝑘𝑛

𝑁

𝑁−1

𝑘=0

, 𝑛 = {0, 1 … . , 𝑁 − 1} (2)

D. Calculating Spectral Moments In our work, we are interested in the method of spectral

moments after transforming the signal into a frequency

domain. The four spectral moments concerned are: spectral

mean (m1), standard deviation (m2), skewness (m3)

kurtosis (m4).

The spectral mean, the standard deviation, the skewness

and the kurtosis are respectively given by: 𝑚1 =

∑ √𝑃(𝑓𝑖)

∑ 𝑃(𝑓𝑖)𝑛/2𝑖=0

𝑓𝑖𝑛/2𝑖=0 , 𝑚2 = √∑ (𝑓𝑖 − 𝑚1)2𝑛/2

𝑖=0

𝑃(𝑓𝑖)

∑ 𝑃(𝑓𝑖)𝑛/2𝑖=0

,

𝑚3 = ∑ (𝑓𝑖−𝑚1

𝑚2)

3𝑛/2𝑖=0

𝑃(𝑓𝑖)

∑ 𝑃(𝑓𝑖)𝑛/2𝑖=0

and

m4= −3 + ∑ (𝑓𝑖−𝑚1

𝑚2)

4𝑛2⁄

𝑖=0

𝑃(𝑓𝑖)

∑ 𝑃(𝑓𝑖)𝑛/2𝑖=0

. Where: 𝑃(𝑓𝑖) is the

power of the spectrum, 𝑓𝑖 = 2𝑓𝑛𝑦𝑞 .𝑖

𝑛 , 𝑖 = 0, 1, … ,

𝑛

2 𝑒𝑡 𝑛 =

256 and 𝑓𝑛𝑦𝑞 is the Nyquist frequency [26].

IV. RESULTS AND DISCUSSIONS

The value of "m1" of the consonant / k / is greater than /

b / and / d /. The spectral mean of / f / has the largest value

followed by / ð / then by / ɣ / (Figure 1, 2). From these

results, we find that the spectral mean allows to classify the

velar occlusive consonants, on the other hand, it allows to

classify all the places of articulation of the fricative

consonants studied. We also find that in the case of

occlusive consonants, when the consonant is produced at

the level of the posterior cavity, the spectral mean is greater

than that of the consonant produced at the level of the

anterior oral cavity. So here we are talking about the size of

this organ. These latter results are in harmony with what is

found by Nitrouer and Stevens who stated that the oral

cavity and the spectral mean are dependent [27 28]. For the

standard deviation, the values of the alveolar occlusive

consonants are larger than those of the velar than those of

the bilabial ones. As for the fricative consonants, also the

interdentals spectrums are the most dispersed followed by

the bilabial and the velar.

The third spectral moment provides information on the

location of the spectrum compared to the normal

distribution. All occlusive consonants have positive values,

which shows that their spectrum is offset to the left, that the

totality of the acoustic energy is contained in the low

frequencies. More precisely, the velar consonants are the

closest to the axis of symmetry of the normal distribution,

unlike the bilabial consonants. On the other hand, at the

level of consonants fricatives, we find two opposite

(opposed) signs, the spectrum of interdental consonants is

moved to the right of the axis of symmetry, but it remains

closer, on the other hand, two other places of joints

(articulations) possess spectrum where the maximal energy

is contained in the low frequencies where the spectrum of

the bilabial stays farthest of this axis.

The results obtained from the last moment for the

occlusive consonants show that the spectral distribution of

the bilabial is the narrowest seen that it corresponds to the

biggest value of the kurtosis, on the other hand that of the

velar is the most flattened. For fricatives, the spectrum of

the velar is the most flattened with compared with the other

places of articulation. We can say that according to the

place of articulation. The moments "m1 ", "m2" and "m3"

allow to classify occlusive consonants and fricatives

consonants.


51



Fig. 1. The four spectral moments of the occlusive consonants for three places in context CV

Fig. 2. The four spectral moments of the fricative consonants for three places in context CV


52



Fig. 3. Comparison between the values of the four spectral moments of the occlusive and fricative consonants for three

different places of articulation.

We notice that at the level of the spectral mean, the

bilabial and interdental fricatives present the biggest values

that those of the occlusive consonant bilabial and alveolar

(dental consonants) respectively (Figure 3). On the other

hand, the velar fricatives present a spectral mean less

important than that of the velar occlusive consonants. About

the second moment, the spectral distributions of bilabial and

dental occlusive consonant are less dispersed than those of

the bilabials and interdental consonants fricatives

respectively, but no difference is noticed about velar

occlusive and velar fricative.

Concerning the third moment, the spectral distributions

of bilabial and velar occulsives and fricatives carry the

totality of energy in the low frequencies, but the first ones

are the most offset compared to the second ones. The big

difference is noted about the dentals and interdentals where

they have a right-hand asymmetry (negative value), while

the total energy of the spectral distributions of the dentals is

contained in the low frequencies and this can be due to the

little difference in the place of articulation.

Remarkable results at the fourth moment showed that

bilabial, interdental, and velar spectrum are narrower than

those of bilabial, alveolar and velar occlusive respectively.

We can say that "m1" makes it possible to differentiate

between the occlusive consonants and the fricative

consonants according to the place of articulation. The first

moment is less important in bilabial and alveolar occlusive

compared to bilabial and interdental fricative. On the other

hand, it is more important in velar occlusive than in

fricatives. The moment "m4" makes it possible to classify

the occlusive consonants in front of the fricative consonants

whatever the place of articulation studied since the values

of the two moments of the occlusive for all these places are

more important than those of the fricatives.

V. CONCLUSION

This study, based on the standard modern Arabic

language, where we treated two different modes of

articulation (occlusive and fricative) for three different sites

of articulation (bilabial, dental / interdental, velar), revealed

several interesting results. "m1", "m3" and "m4" can

classify consonants according to the place of articulation for

a single mode of articulation. The moment "m4" makes it

possible to distinguish between the occlusives and the

fricatives irrespective of the place of articulation.

REFERENCES [1] Deroo, O., 1998 , « Modèles Dependant du context et Méthodes

de Fusion de données Appliquées à la reconnaissance de la parole

par Modèles Hybrides HMM/MPL, ’’Faculté Polytechnique de

Mons’’ [2] Bergounioux, M., 2010, « Mathématiques pour le traitement du

signal : cours et exrcices corrigés » Dunod, paris, pp. 270-279

[3] M. Bellanger « Traitement Numérique du signal : Théorie et pratique » edition Masson 1987, p : 363

[4] [Nishinuma, Y., Duez, D., & Paboudjian, C., 1991, « Automatic

classification of consonant clusters in French », Speech

communication, Vol. 10, pp. 395-403.

[5] Hoelterhoff, J., & Reetz, H., 2007, « Acoustic cues discriminating

German obstruents in place and manner of articulation » Journal of the Acoustical Society of America, 121, 1142–1156.

[6] Jesus, L. M. & Jackson, P. J, 2008, « Frication and voicing

classification. In A. Teixeira, V. L. S. de Lima, L. C. de Oliveira, & P. Quaresma (Eds.) », Computational processing of the

Portuguese language: The 8th International Conference on


53



Computational Processing of Portuguese (PROPOR) (pp. 11–20).

Berlin, Heidelberg: Springer - Verlag. [7] Jesus, L. M., & Shadle, C. H.., 2002, « A parametric study of the

spectral characteristics of European Portuguese fricatives »,

Journal of Phonetics, 30(3), 437–464. [8] Koenig, L. L., Shadle, C. H., Preston, J. L., & Mooshammer, C.

R.., 2013, « Toward improved spectral measures of /s/: Results

from adolescents », Journal of Speech, Language, and Hearing Research, 56, 1175–1189.

[9] McMurray, B., & Jongman, A., 2011, « What information is

necessary for speech categorization? Harnessing variability in the speech signal by integrating cues computed relative to

expectations », Psychological Review, 118, 219–246.

[10] [Shadle, C. H. & Mair, S. J., 1996, « Quantifying spectral characteristics of fricatives », In Proceedings of the international

conference on spoken language processing (ICSLP 96) (pp. 1517–

1520). Philadelphia. [11] Childers, D. G., Wu, K., Bae, K. S., & Hicks, D. M.,

1998, « Automatic recognition of gender by voice », Paper

presented at the IEEE international conference on acoustics, speech, and signal processing (ICASSP-88).

[12] Farooq, O., & Datta, S., 2001, « Mel filter-like admissible wavelet

packet structure for speech recognition », IEEE Signal Processing Letters, 8(7), 196–198.

[13] Halberstadt, A. K. & Glass, J. R., 1997, « Heterogeneous acoustic

measurements for phonetic classification ». EUROSPEECH [14] Kong, Y.-Y., Mullangi, A., & Kokkinakis, K., 2014,

« Classification of fricative consonants for speech enhancement in hearing devices. PLoS ONE », 9(4), e95001,

http://dx.doi.org/10.1371/ journal.pone.0095001

[15] Todd, A., Edwards, J., & Litovsky, R., 2011, « Production of contrast between sibilant fricatives by children with cochlear

implants », Journal of the Acoustical Society of America, 130,

3969–3979.

[16] Forrest, K., Weismer, G., Elbert, M., & Dinnsen, D. A., 1994,

« Spectral-analysis of target-appropriate T and K produced by

phonologically disordered and normally articulating children », Clinical Linguistics and Phonetics, 8, 267–281.

[17] Forrest, K., Weismer, G., Hodge, M., Dinnsen, D. A., & Elbert,

M., 1990, « Statistical-analysis of word-initial K and T produced by normal and phonologically disordered children », Clinical

Linguistics and Phonetics, 4, 327–340.

[18] Forrest, K., Weismer, G., Milenkovic, P., & Dougall, R., 1988, « Statistical analysis of word-initial voiceless obstruents:

Preliminary data », Journal of the Acoustical Society of America,

84, 115–124. [19] Flipsen, P., Shriberg, L., Weismer, G., Karlsson, H., &

McSweeny, J., 1999, « Acoustic characteristics of /s/ in

adolescents », Journal of Speech, Language, and Hearing Research, 42, 663–677.

[20] Jongman, A., Wayland, R., & Wong, S., 2000, « Acoustic

characteristics of English fricatives ». Journal of the Acoustical Society of America 10/2000, 108(3 Pt 1), 1252–1263.

[21] Tomiak, G., 1990, « An acoustic and perceptual analysis of the

spectral moments invariant with voiceless fricative obstruents (Doctoral dissertation) », State University of New York, Buffalo

[22] Koenig, L. L., Shadle, C. H., Preston, J. L., & Mooshammer, C.

R., 2013, « Toward improved spectral measures of /s/: Results from adolescents », Journal of Speech, Language, and Hearing

Research, 56, 1175–1189.

[23] Lousada, M., Jesus, L., & Pape, D., 2012, « Estimation of stops' spectral place cues using multitaper techniques ». DELTA, 28(1),

1–26.

[24] Zygis, M., Pape, D., & Jesus, L., 2012, « (Non) retroflex Slavic affricates and their motivation: Evidence from Czech and Polish »,

Journal of the International Phonetic Association, 42(3), 281–329.

[25] Spinu, L., & Lilley, J., 2016, « A comparison of cepstral coefficients and spectral moments in the classification of

Romanian fricatives », Journal of Phonetic, Vol, 57, pp. 44-58

[26] Feng, Y., Hao, G, J., Xue, S, A., Max, L, 2011, “Detecting

anticipatory effects in speech articulation by means of spectral

coefficient analyses”, Speech Communication, Vol.53, pp. 842–

854. [27] Nittrouer, S., 1995, “Children learn separate aspects of speech

production at different rates: evidence from spectral moments”, J. Acoust. Soc. Am. Vol. 97, pp. 520–530.

[28] [28] Stevens, K.N., 1998, “Acoustic Phonetics”, The MIT Press,

Cambridge, MA.

AUTHORS’ PROFILES

Pr. Soufyane Mounir

Member of research team "signals and systems" in laboratory of mechanical engineering, industrial management and innovation; Professor

in National School of Applied Sciences, University Hassan 1st, Khouribga,

Morocco.

Phd. Karim Tahiry

Member of research team "signals and systems" in laboratory of mechanical engineering, industrial management and innovation; Phd of

Science and Technical Faculty, University Hassan 1st, Settat, Morocco.

Pr. Abdelmajid Farchi

Chief of research team "signals and systems" in laboratory of mechanical

engineering, industrial management and innovation; Professor in Science and Technical Faculty, University Hassan 1st, Settat, Morocco.

http://dx.doi.org/10.1371/

Classification of the Fricative and Occlusive Consonants ......Spinu and Lilley were interested in the classification of fricatives. For this, they examined two methods. From a corpus

Documents