Acoustic Segmentation and Analysis

Acoustic Segmentation and Analysis1.Introduction

This study will analyse the formants and phonological

characteristics of unique phonemes in a 10 second utterance

through spectrographic analysis. The phonological attributes

of various people are very unique and diverse, thus this

study will also examine how this particular person’s

phonological attributes are similar and differ from

standardized phonological descriptions of different

phonemes. Additionally, this study will also evaluate the

difficulties faced during the examination of the interesting

aspects of the spectrogram.

Orthographic TranscriptionThere’s a weird muttering before tea I think. Get off;

you’re caught in pure top tomato juice. Don’t bother kicking

and drinking, strut to the church horse stable and like the

kill.

Semi-Narrow Transcription

θez ə wɪəd mʌʔʔɜɹiŋ bəfoɹ tʰiː a f iŋkʰ. geʔ ɔːf ;

jə kʰʌʔ in pʰʊːɹ tʰɒp tʰ əmaʔə dʒuːs. dounʔ bɑðə

kʰɪkʰɪŋ ən dɹ ɪŋkɪŋ, stɹʌʔ tʰu ðə tʃɜːɹtʃ hɔɹs

stɛːbəɫ æn laɪkʰ ðə kʰɪːɫ.

1

2.ConsonantsThis section will discuss the characteristics of several

intriguing allophones and variations of consonants evident

in the utterance. A compare and contrast to several

linguists’ description of these sounds will also be made,

followed by difficulties in segmenting and analysing these

sounds.

2.1. /l/ vs. /ɫ/

2

The velarised lateral

approximant /l/, or /ɫ/,

is one of the unique

features of the

speaker’s speech. The

difference in the

formants between /ɫ/ and

/l/ can be seen in Fig.

1 and Fig. 2. Even

though the two may seem very similar, there is some

difference in the second formants of both instances. As

indicated below, the second formant /ɫ/ has a lower

frequency than the second formant in /l/. Fry states that

the first and second formants of/ɫ/ has frequencies

similar to a back vowel, thus explaining why its first

and second formants are of lower frequency, and /l/’s is

similar to a front vowel, thus explaining why its first

and second formants are higher (Fry 1979:120-121). In

Fig. 1, the frequencies of the first and second formants

are 466 and 1060 Hz respectively, and in Fig. 2 it is

778.7 and 1341 Hz respectively.

Ladefoged states that in

lateral approximants

such as /l/, the formant structure should be similar to

vowels, but with formants of around 250, 1200 and 2400

Hz, and the higher formants should be considerably

reduced in intensity (Ladefoged 2000:185). It is evident

Fig. 1. Velarised /l/

Fig. 2. Clear/l/

3

that the third to fifth formants of both instances are

less intense than the first two; therefore in this

respect it closely resembles Ladefoged’s description.

According to this description the /ɫ/ somewhat closely

resembles it, but the clear /l/ does not fit the

description as much as /ɫ/ because the frequency of F1 is

much higher than the standardised frequency. Even so,

Ladefoged also states that these phonemes can have

different formant structures depending on the phonetic

context, hence it is affected by the phonemes before and

after it (Ladefoged 2000:185). Fig. 2’s F1 and F2 are

much higher than the standardised frequencies because the

vowel after it is /a/, an open back vowel, thus the /l/

exhibits more properties of an open vowel. Therefore the

velarised /l/ accurately resembles the standardised

description, the light /l/ less so due to the effect of

the following vowel’s variability.

Identifying the velarised /l/ was no problem, since it is

situated syllable-finally, and thus it is quite easy to

spot. The /l/, however, was a bit more difficult because

it is situated between a nasal consonant and an open

vowel, which meant that it blended right in, particularly

with the vowel since the formant structure of lateral

approximants are greatly affected by the vowels following

it. Even so, the intensity of /l/’s formants showed that

it is distinct from the nasal consonant, though it was

4

definitely difficult to segment it from the open /a/

vowel.

2.2. Glottal stops /ʔ/

This particular speaker frequently

exhibited instances of glottal stops,

which occurred whenever a /t/ does not

occur word-initially. In Fig. 3 two

consecutive glottal stops can be seen

and discerned, particularly from the

two sudden burst of intensity as well

as sharp increase of formant structure,

each followed by a sudden near-silence

shown by the gaps of sharp decrease in

intensity, as annotated in Fig. 3. These characteristics

easily distinguish glottal stops from the other

consonants.

According to Fry, the gap of silence in voiced stops are

much shorter than in voiceless stops, of which the

silence usually lasts around 70-140 ms, thus why in these

glottal stops the gaps last for 15 and 19 ms, the first

one being shorter due to the next glottal stop and the

second one being longer due to the open-mid vowel

following it (Fry 1979:122). Fry also states that in

plosives, bursts of sound are very short when the sound

has little to no aspiration, lasting around 10-15 ms. The

burst in the first glottal stop shown in Fig. 3 lasts for

5.8 ms. Thus the glottal stops shown here correspond

Fig. 3. Two consecutive glottal

5

quite well to the standardised description of stops, even

though it is definitely very different from other voiced

stops.

Identifying the second glottal stop was no problem at all

since there are obvious lines that indicate the clear-cut

bursts of intensity followed by a gap of silence. Even

so, the first glottal stop was a bit more difficult to

segment due to the voiced vowel /ʌ/ preceding it, whichmeant that the initial burst of intensity blended in with

the vowel’s intense formants. It was still quite easy to

spot, however, due to the obvious gap of silence and the

slightly more intense initial burst of intensity, which

distinguished it from the vowel.

2.3. Labiodentalised /θ/ :

/f/

Another interesting

aspect this speaker

exhibited was the

labiodentalised /θ/,

which occurred in the

middle of two vowels in

rapid speech. As seen in

Figures 4 and 5, there is

marked difference between the two, which led to the

evaluation that even though the pronunciation should be

the same, due to rapid speech the speaker labiodentalised

Fig. 4. /θ/ Fig. 5. /f/

6

the /θ/. Both Ladefoged and Fry agree that the pattern of

these two phonemes is ‘pretty much the same’ (Ladefoged

2000:182). As shown in Figures 4 and 5, the two

fricatives exhibit random patterns with no particular

formant bars, but what makes these two distinct from each

other is the movement of the second formant, which leads

into the next vowel. As shown by the red arrow in Fig. 4,

the second formant of /θ/ exhibits a sudden increase and

a decrease immediately following it, right into the next

vowel. In Fig. 5, however, the second formant shows no

movement into the following vowel at all.

According to Pulgram, voiceless fricatives are pure

noises in which all features of glottal tone are absent

(Pulgram 1959:72). Ladefoged describes these two sounds

as having the same pattern but with a different formant

transition into the following vowel (Ladefoged 2000:182).

Additionally, Fry states that the main noise energy in

both sounds is high frequency because they occur at

around 6000-8000 Hz (Fry 1979:122). Both sounds fulfill

these descriptions pretty accurately, for there are no

indications of glottal tone, and as previously discussed

these two exhibit differences in formant transitions.

Finally the main sound energy in both sounds are 4528 and

4091 Hz respectively, which means that the frequency does

not accurately fit the standardised frequency for both

sounds. Thus these voiceless fricatives are generally

accurately described by standardised characteristics,

7

even though there is some variability evident in the

frequency of the formants.

Deciding whether the /f/ was really a /θ/ was rather

difficult because they are very similar, as previously

discussed. The difference in formant transitions was the

major factor that led to labeling it as /f/.

Additionally, since these voiceless fricatives are random

and exhibit no particular pattern, it was rather easy

segmenting /f/ from the vowels since it is quite distinct

from the formant bars and intensity of the vowels.

3.VowelsThis section will discuss the characteristics of an

intriguing vowel variation in the utterance. A compare and

contrast to several linguists’ description of these sounds

will also be made, followed by difficulties in segmenting

and analysing the sound.

3.1. Nasalised vowels

One of the unique

characteristics of the

utterance is the frequent

nasalised vowel. The

particular nasalised vowel

that will be discussed

here is the schwa, as

shown by Figures 6 andFig. 6. Nasalised schwa

Fig. 7. Normal schwa

8

7, which compares the nasalised schwa /ə/ occuring before

the nasal consonant /m/ and a normal /ə/. Even though

Ladefoged states that spectrograms cannot be used to

measure degrees of nasalisation, it still can be used to

distinguish a nasalised vowel from a normal one

(Ladefoged 2000:193). A difference in the formant

transition in the nasalised schwa can be seen, as a

decrease in the second formant by the can be observed,

and the intensity of the first formant, as well as the

increasing intensity of the second and third formants,

carry on to the following nasal consonant as highlighted

by the box. Additionally, F1 and F2 of the schwa show

that it is a central vowel due to the fact that they are

not as close together as back vowels, but not as far away

as open vowels.

Comparing a schwa to a standardised description of the

sound is rather challenging due to the fact that a

standardised description does not actually exist due to

variability. According to Fry, the frequencies of F1 and

F2 in central vowels are higher, but the frequencies of

central vowels are highly variable (Fry 1979:114), while

the frequency of F1 and F2 of the schwas presented here

are 435 and 591 Hz respectively. Ladefoged and Maddieson

also state that in nasalised vowels, the first formant is

weak and the third formant is high from the beginning

(Ladefoged and Maddieson 1996:299). In this instance of

the nasalised schwa, however, the third formant does

9

remain high but the first formant remains strong

throughout, thus disagreeing with this statement.

Additionally, Ladegofed describes nasal vowels as having

unique formant transitions at the end, characerised by

the decrease of the second formant of the vowel

(Ladefoged 2001:182), which, as previously discussed, the

nasalised schwa fulfills. Thus the nasalised vowel

matches the standardised description quite accurately,

albeit there are some inconsistencies due to variability

and uncertainty in describing central vowels and the

limitations of spectrograms themselves in representing

vowel sounds, for they can only show relative vowel quality

at best (Ladefoged 2001:194).

Distinguishing between a nasalised vowel and a normal

vowel was not easy, for the nasalisation is often quite

subtle and has to be listened meticulously. Since schwas

are usually situated in unstressed syllables, they are

often not articulated properly and are said really

quickly, therefore making them more difficult to hear and

analyse than other vowels, particularly because the

formants are often unclear and are quite difficult to

discern from the other neighbouring sounds. Analysing the

formants was also quite challenging because there is no

set way of analysing a central vowel, and there are many

different degrees of nasalisation to be taken account of.

4. Conclusion

10

Through this study it is evident that although linguists try

their best to describe certain sounds through spectrographic

studies, they can only give a general idea at best due to

many factors that can be accounted for the variability. As

Fry states ‘no two speech sounds, or articulations, can be

acoustically alike’ simply because no two people are alike

(Fry 1979:145), this is especially true in spectrographic

studies. Therefore from these sounds it can be concluded

that even though standardised descriptions of sounds may be

accurate, there still lies some, be it the person, the

relationship between the individual sounds and others, as

well as the limitations of spectrograms themselves.(1873 words)

ReferencesFry, D. (1979). The physics of speech (Cambridge textbooks inlinguistics). Cambridge [Eng.] ; New York: CambridgeUniversity Press.

Ladefoged, P. (2001). A course in phonetics (4th ed.). Boston,Mass.: Heinle & Heinle.

Ladefoged, P., & Maddieson, Ian. (1996). The Sounds of the world'slanguages (Phonological theory). Cambridge, Mass. ; Oxford:Blackwell.

Pulgram, E. (1959). Introduction to the spectrography of speech. (Janualinguarum ; nr. 7). 's-Gravenhage: Mouton.

11

Acoustic Segmentation and Analysis

Documents