LANGUAGE AND SPEECH, 1989,32(3), 221 -248 221 THE ACOUSTIC VOWEL SPACE OF MODERN GREEK AND GERMAN* ALLARD JONGMAN, MARIOS FOURAKIS and JOAN A. SERENO Central Institute for the Deaf, St. Louis The spectral characteristics of vowels in Modem Greek and German were examined. Four speakers of Modern Greek and three speakers of German produced four reperitions of words containing each vowel of their native language. Measurements of the fundamental frequency and the first three formants were made for each vowel token. These measurements were then transformed into log frequency ratios and plotted as points in the three-dimensional auditory-perceptual space proposed by Miller (1989). Each vowel token was thus repre- sented by one point, and the points corresponding to each vowel category were enclosed in three-dimensional target zones. For the present corpus, these zones differentiate the five vowels of Modern Greek with 100% accuracy, and the fourteen vowels of German with 94% accuracy. Implications for the distribution of common vowels across languages as a function of vowel density are discussed. Key words: vowel space, Greek, German INTRODUCTION The process of defining a vowel space for a language, or a universal space for all languages, can be characterized as having three facets corresponding to three different stages in the communication process (Lindblom, 1986). The articulatory stage, at which the vocal tract is shaped so as to produce the intended vowel, defines an articulatory vowel space by the possible positions of the tongue and the jaw, and by the shape of the lips. The acoustic stage, at which the sound radiating from the lips propagates through air, defines an acoustic space by the relative distribution of energy in the time and frequency domains. Finally, the auditory stage, at which the sound is processed by the The first and third authors are presently at the Max Planck Institute for Psycho- linguistics, Nijmegen, The Netherlands. A partial report of the results was presented at the 113th Meeting of the Acoustical Society of America in Indianapolis, IN, May 11-15, 1987. The research was supported by NIH Grant ND21994and AFOSR Grant 860335 to Central Institute for the Deaf. The authors wish to thank James D. Miller, John W. Hawks, Steven J. Sadoff, Frank E. Kramer, and Melissa P. Piasecki for invaluable assistance. Address all correspondence to Allard Jongman, Max Planck Institute for Psycho- linguistics, P.O. Box 310, 6600 AH Nijmegen, The Netherlands.
29
Embed
THE ACOUSTIC VOWEL SPACE OF MODERN GREEK AND … Fourakis... · language and speech, 1989,32(3), 221 -248 221 the acoustic vowel space of modern greek and german* allard jongman,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
LANGUAGE AND SPEECH, 1989,32(3), 221 -248 221
THE ACOUSTIC VOWEL SPACE OFMODERN GREEK AND GERMAN*
ALLARD JONGMAN,
MARIOS FOURAKIS
andJOAN A. SERENO
Central Institute for the Deaf, St. Louis
The spectral characteristics of vowels in Modem Greek and German were examined. Fourspeakers of Modern Greek and three speakers of German produced four reperitions of wordscontaining each vowel of their native language. Measurements of the fundamental frequencyand the first three formants were made for each vowel token. These measurements werethen transformed into log frequency ratios and plotted as points in the three-dimensionalauditory-perceptual space proposed by Miller (1989). Each vowel token was thus repre-sented by one point, and the points corresponding to each vowel category were enclosed inthree-dimensional target zones. For the present corpus, these zones differentiate the fivevowels of Modern Greek with 100% accuracy, and the fourteen vowels of German with94% accuracy. Implications for the distribution of common vowels across languages as afunction of vowel density are discussed.
Key words: vowel space, Greek, German
INTRODUCTION
The process of defining a vowel space for a language, or a universal space for alllanguages, can be characterized as having three facets corresponding to three differentstages in the communication process (Lindblom, 1986). The articulatory stage, at whichthe vocal tract is shaped so as to produce the intended vowel, defines an articulatoryvowel space by the possible positions of the tongue and the jaw, and by the shape of thelips. The acoustic stage, at which the sound radiating from the lips propagates throughair, defines an acoustic space by the relative distribution of energy in the time andfrequency domains. Finally, the auditory stage, at which the sound is processed by the
The first and third authors are presently at the Max Planck Institute for Psycho-linguistics, Nijmegen, The Netherlands. A partial report of the results was presented atthe 113th Meeting of the Acoustical Society of America in Indianapolis, IN, May11-15, 1987. The research was supported by NIH Grant ND21994and AFOSR Grant860335 to Central Institute for the Deaf. The authors wish to thank James D. Miller,John W. Hawks, Steven J. Sadoff, Frank E. Kramer, and Melissa P. Piasecki forinvaluable assistance.
Address all correspondence to Allard Jongman, Max Planck Institute for Psycho-linguistics, P.O. Box 310, 6600 AH Nijmegen, The Netherlands.
222 Vowels in Greek and German
ear and perceived as a linguistic unit, defines an auditory vowel space.A traditional problem has been to uniquely describe and distinguish phonemically
different vowels within this articulatory, acoustically, and auditorily defined vowelspace. Acoustically, it is often the case that vowel sounds perceived as representing thesame phonemic category differ in terms of their spectral and/or temporal properties.Several sources of variability may account for these differences. The acoustic charac-teristics of a specific vowel can be affected by: (1) the sex, age, and size of a speaker(inter-speaker variation); (2) the phonetic context in which the vowel is produced(coarticulatory effects); (3) the rate of speech at the moment the vowel is produced;(4) the stress value assigned to the syllable that contains the vowel; and (5) intonationcontour and phonation types. The last four sources collectively are usually referred to asintra-speaker variation. It is certainly a task for any theory of speech perception toaccount for the way in which a listener processes these differences during normal speechunderstanding. Both inter- and intra-speaker variations may cause instances ofphonemically identical vowels to occupy different regions in the vowel space, or instancesof phonemically different vowels to occupy overlapping or identical regions in the vowelspace (Peterson and Barney, 1952).
Several normalization schemes have been proposed in the literature in order to mapthe vowels of a language onto unique, non-overlapping regions in the vowel space. Suchnormalization algorithms attempt to extract invariant features of vowel sounds by takinginto account the frequency values of two or more formants and by transforming them invarious ways including mel scales (Fant, 1973), deviations from log means of vowelformant frequencies (Nearey, 1978), Bark versus sones/Bark representations (Lindblom,1986), Bark differences (Syrdal and Gopal, 1986), and multidimensional scaling tech-niques (e.g., Wright, 1986). The majority of such normalization schemes has focused onAmerican English vowels. However, even within a single language, normalizing forvariability due to speaker, phonetic context, and speaking rate proves to be quitedifficult. Most recently. Miller (1987a; 1989), in an attempt to account for intra- andinter-speaker differences, has proposed a general framework for the representation ofspeech sounds in an "auditory-perceptual space" in terms of log ratios of FO, Fl, F2,and F3. The present paper is an attempt to extend this framework, which was basedon the speech sounds of American English, to other languages. In particular, we addressthe problem o.f mapping vowels onto unique regions in the vowel space for ModernGreek, a language with five vowels, and for German, a language with 15 vowels, withinthis general approach.
The vowel space in the auditory-perceptual theory
In the auditory-perceptual theory (Miller, 1989), speech sounds, vowels in thisparticular case, are mapped onto an auditory-perceptual space (APS) of three dimensions.The vowel space proposed by Miller can be viewed as an outgrowth of proposals foundin earlier work by Peterson (1952), Shepard (1972), and Pols (1977). The dimensions ofthe auditory-perceptual space are defined by the short-term spectrum of a vowel soundin terms of log frequency ratios, and specifically by the positions of the first threeprominences in the spectrum relative to each other and to a low frequency reference.
A. Jongman, M. Fourakis andJ.A. Sereno 223
The following equations define the auditory-perceptual space;
X = log(SF3/SF2)
y = log(SFl/SR)
z = log(SF2/SFl)
Eq. 1
Eq.2
Eq.3
SFl, SF2, and SF3 represent the frequency locations of the first three significantprominences of the short term spectral envelope of the vowel waveform. SR is a referencefrequency, which is shifted slightly by the average spectrum of the current speaker. Thisreference is calculated as follows;
SR = 168(GMF0/168)V3 Eq. 4
where GMFO is the geometric mean of the speaker s FO for the utterance. The choiceof these variables, and the motivation for the sensory reference, are discussed extensivelyin Miller (1989).
Miller (1989) presented formant values of American English vowels taken from theliterature and from measurements in the C.I.D, laboratories. These values were convertedinto log frequency ratios, using Equations 1 through 4, and were then plotted in thethree-dimensional space. The 406 data points representing the nine non-diphthongizedvowels of English (/i, i, e, », a, A, a, u, u/) are shown plotted in Figure 1 (Panel A; frontview; Panel B; side view). As is clear from Panel B, these points fall in a narrow slab('the vowel slab') of the three-dimensional auditory-perceptual space, with the width ofthe slab being quite small.
The points in the auditory-perceptual space corresponding to the same phonetic vowelcategory can be grouped accordingly into target zones. Figure 2 shows the nine non-overlapping vowel target zones of American English which can account for 93% of thedata. (See Miller, 1989, for a complete listing of the individual data points and corres-ponding target zones.)
These target zones exhibit no overlap even though the data points that define themrepresent vowels spoken by speakers of different ages and gender, in various phoneticcontexts, and at different speaking rates. For expository purposes, the axes of the APSin Figure 2 have been rotated by a transformation to bring the vowel slab to a verticalposition using the following set of equations:
x' = 0.7071 l(y-x) Eq. 5
y' = 0.81622(z)-0.408 l (x+y) Eq. 6
z' = O.5772(x-Hy-Hz) Eq. 7
This transformation does not translate the origin, but only rotates the axes. We referto the coordinates x', y \ and z' defined by Equations 5-7, as slab coordinates.
These data suggest that the representation of vowels by means of the logarithmicratios of the first three formants and a reference frequency can uniquely characterizethe vowels of American English. Such a representation may thus account for much ofthe variability due to phonetic context and variations in speaking rate, as well as fordifferences among speakers in terms of gender and age.
I 'levels in (ireek and (iernian
Fig , 1 . P a n e l A , F o r m a n t m e a s u r e m e n t s o t 4 0 6 A m e r i c a n Engl i s l i m o n o p l i t l i o n g a l v o w e lt o k e n s t r a n s f o r m e d i n t o p o i n t s in t h e t h r e e - d i m e n s i o n a l a u d i t o r \ - p e r c e p t u a ls p a c e , X, y , a n d z a x e s a re in 0,1 log u n i t s , a n d t l ie p o i n t o f o r ig in is ( 0 , 0 , 0 ) .T h e s y m b o l s a r e in A r p a b e t n o t a t i o n : h ' = | i ] , 111 = |i | , t l 1 = Ir, ] , A h = |iv | ,A ( ) = | , . | , A A = | a | , A l l = [ A | , I W = | u | , l i | l = | u | .
Pane l B. S ide v i e w o f t h e , ' \PS s h o w i n t ; h o w all v o w e l t o k e n s tall i n t o a n a r n i ws l ab ot t h e AP,S ( t h e "vowel s lab") . In t h i s o r i e n t a t i o n , t h e x at id \ a x e s a p p e a lt o lall oti t o p o l e a c h o t h e r , ( l - i g u r e t a k e n t~rotii F i g u r e 13 iti " A u d i t o r y - p e r -c e p t u a l i r i t e r p r e t a t i o t i ot^ t h e v o w e l " b y ,1 .0 , WxWux. Jtnir}ial t<f the AeousiieulSneietv iif Awerieu. 8 5 , 21 14 Z\^^. C o p y r i g h t 1')S^' b \ - .Acous t ica l S o c t e t \ ' otA m e r i c a , I ' s e d b y pe r i t i i s s i o t i . )
.-1, J(>ni;nian, M. Fourakis and J..\. Sereno
0 6 -
0 4 -
/uw .
-0 2j-
-0 <N
\ A
AO _ ; .
J
EH
AA
1
AE '
- 0 2 0
Fig, 2, Target zones for the nine tnonophthongal vowels ot' .American hnglish iti slab
coordinates (see Equat ions 5 — ^ 1 , In this tVont vie\\\ the z' axis is perpendicular
to the \ ' y ' platie. x' y' and z' axes are in 0,1 log units, atid the point ot origin
is (0, 0, 0), (Figure based on Figure 14 in '".Auditory-perceptual in te rpre ta t ion
ot the vowel" by J.D. Miller, Jounial of the .Aeoustieal Soeietv ofAineriea. 85,2114- 2134. Cop\Tight l'- 'K'-i by Acoustical Societ\ ' of America. I 'sed by per-
mission. )
A eross-lansjiage evaluation of the auJitoiy-pereeptual theory
T h e a u d i t o r y - p e r c e p t u a l t h e o r y - n( s p e e c h p e r c e p t i o n w a s i n i t i a l h ' d e v e l o p e d f o r , a t i da p p l i e d t o , t h e v o w e l s o u n d s o f . A m e r i c a n E n g l i s h . H o w e v e r , f o r t h e t h e o r \ ' t o b eu t i i v e r s a l , at i i m p o r t a n t c h a l l e n g e is t o s e e h o w it a c c o t i i t i i o d a t e s v o w e l i t n e n t o r i e s t h a td i t t e r t r o m t h o s e o f E n g l i s h . F o r e x a t i i p l e , it is o f i n t e r e s t t o s e e h o w t h e v o w e l s ofl a n g u a g e s w i t h s m a l l e r v o w e l i t i v e n t o r i e s a r e o r g a n i z e d in t h e a u d i t o r \ - p e r c e p t u a l s p a c e .M o r e i t n p o r t a n t h - , t h e a u d i t o r y - p e r c e p t u a l t h e o r \ m u s t a l s o b e a b l e t o a c c o u n t f o r t i i o i ec o m p l e x i n v e n t o r i e s . In a first a t t e m p t , t h e p r e s e n t s t u d \ e x a t i i i n e s t h e v o w e l s p a c e r o tM o d e r n C r e e k a n d C e r m a n . M o d e r n C r e e k h a s a s i m p l e t i v e - v o w e l s \ s t e m , c o n s i s t m ' j o \| i , e , a, o , u ] ( H o u s e h o l d e r , K a z a z i s , a n d K o u J s o u d a s , I ' ' ( i 4 ) . G e r m a n , o n t h e o t h e r l i a i u l ,h a s a c o m p l e x \> v o w e l i n v e n t o r y w i t h b o t h t e t i s e a t i d l a x v o w e l s , a n d r o t m d e d a n d
226 Vowels in Greek and German
unrounded vowels, consisting of [i:, i, e, e:, e, y:, y, 0:, 0, a:, a, o:, o, u:, u ] ' (Moulton,1962; J0rgensen, 1969);
To our knowledge, no studies of the spectral characteristics of Greek vowels have beenreported. For German, two studies were of interest for the purpose of the presentanalysis. In a spectrographic study, J(/)rgensen (1969) reported the formant values of the15 German vowels as produced by six male speakers. In addition, these vowels wereplotted in a linear Fl by F2 space for each speaker individually. In general, J0rgensenfound a tendency for the lax vowels to be more centrally located than their tensecounterparts. There was, however, considerable overlap among vowels even for a singlespeaker, and superimposing the data plots for all six speakers showed extensive overlapfor some vowels. Unfortunately, J0rgensen did not include a plot or any statisticalevaluation for the combined data of all six speakers. In a recent study, Iivonen (1987)reported formant measurements of the 15 German vowels produced by one male speaker(with exceptionally high FO, around 200 Hz). This study was not designed to uniquelycharacterize the German vowels; it primarily compared the accuracy of three methodsof spectral analysis.
Given the lack of any data on the Greek vowels, and the scarceness of data on theGerman vowels, the present study serves two purposes. First, it provides basic vowelmeasurements for Greek and German by listing values of fundamental frequency andthe first three formants for each vowel. Second, this study tries to uniquely characterizethese vowels by converting the formant values into log frequency ratios and plotting themas points in the three-dimensional auditory-perceptual space.
METHOD
Subjects
Subjects were recruited from the student body and faculty of Washington University,St. Louis, MO, with the exception of one Greek speaker, who was visiting the UnitedStates. Recruitment of native speakers who had only recently arrived in the UnitedStates drastically restricted the subject pool. The Greek subjects were two graduatestudents, one faculty member, and one visiting scholar, all native male speakers ofModern Greek (three from Athens, one from Patras). All four subjects had been in theUnited States between 10 days and four months at the time of the recordings. TheGerman speakers were three female graduate students, two from Cologne and one fromBonn, all native speakers of German. All three speakers can be classified as speaking the'Central Franconian' dialect (Wiesinger, 1983). At the time of the recordings, theyhad been in the United States for a duration of two to six months.
We have adopted the phonetic symbols used by J0rgensen (1969) so that his resultscan be directly compared to'the present results.
A. Jongman, M. Fourakis andJ.A. Sereno 227
TABLE 1
Greek and German test words containing the 5 vowels of Greek and 14 vowels ofGerman, respectively. IPA symbols representing the vowel of the first syllable
of each word are given to the right of each column.
ModernGreek
Pita
Pet a
Pata
Pote
Puse
Gloss
pie
fiy!
step!
when
whereare you?
IPA
[i]
[e]
[a]
[o]
[u]
German
Stiele
StUle
stehle
Stelle
fiihle
mile
Hohle
HoUe
Buhle
BuUe
Sohle
solle
fahle
Falle
Gloss
handles
silence
(I) steal
place
(I) feel
(I) fill
cave
hell
lover
bull
sole
(I) ought to
pale (pi.)
trap
IPA
[ i ]
[i]
[e:]
[e]
[y:]
[y]
[</>:]
[4>]
[u:]
[ul[o:]
[0]
[a:]
[a]
Materials
Two word lists (see Table 1) were used: one for the Greek and one for the Germanspeakers. For Greek, stimulus words were selected such that the five target vowelsappeared in approximately the same context. For German, stimuU were taken fromMouhon (1962) such that words containing a tense-lax vowel opposition formed minimalpairs (e.g., 'stehle' - 'Stelle'); the following consonant was always /I/. The low, frontlong vowel [e] was excluded, since it is subject to much dialectal variation (Moulton,1962; J0rgensen, 1969). Thus, the number of German vowels in the present study was 14,Randomized lists were made that contained four repetitions of each word. Therandomized Greek and German words were placed in the following carrier sentences,respectively:
Greek; [6a po ksana]. "I will say again."
German: Ich sage < noch einmal. "I say one more time."
228 Vowels in Greek and German
The subjects were instructed to familiarize themselves with the list of sentences beforethe recordings were made. The total number of tokens recorded was 80 (5 vowels X4 speakers X 4 repetitions) for Greek, and 168 (14 vowels X 3 speakers X 4 repetitions)for German.
Recording procedure
Subjects were recorded in an anechoic chamber using a special, low-noise microphone/preamplifier combination (Bruel and Kjaer 4179/2660). The microphone was placed ata height equal to, and 0.5 m in front of, the speaker's mouth (0° angle of incidence).Conversational speech levels were used. The microphone output was fed directly to aSony PCM-501ES digital audio recorder (16 bit mode) with a JVC 720 VCR serving asthe storage medium. A reading timer device, designed and built in-house, was used toregulate subjects' speed for recitation of the sentences.
Analysis and further processing
The recordings were digitized at 20 kHz with a 10-kHz low-pass filter setting with16 bit precision and stored as files to be processed by the commercial software packageILS (Interactive Laboratory System). The primary sampled data files were then high-pass filtered at 50 Hz to remove any incidental low frequency noise. First, LPC analysiswas performed using a 24 msec Hamming window moving in 1 msec steps, a preemphasisfactor of 98%, and 24 poles. In addition, a cepstrally based algorithm was used todetermine the fundamental frequency. Next, a frequency spectrum was derived at eachmillisecond of signal by means of a Fast Fourier Transform of the coefficients resultingfrom the linear prediction analysis. A program written in-house was used to extract theFO, Fl , F2, and F3 values from the ILS secondary file and store them in a table-formatfile along with the corresponding frame (msec) number. In the case of merged or missingformants, a root solving command was used to enhance the resonance calculations of theFFT,
The values of FO, Fl , F2, and F3 were converted into x, y, and z coordinates bymeans of Equations 1—3. These x, y, and z values were calculated and plotted at eachmillisecond of the waveform, thus generating a sequence of points, or path, through thethree-dimensional space. The distance between any two points in the path correspondedto the amount of spectral change that had taken place within that time period.
For each vowel, we selected the 'steady-state' portion in the following manner. Whenwe displayed a path in the APS on a graphics screen (Evans and Sutherland PS300),we could determine the portion of the path exhibiting very little movement in the space,i.e., corresponding to slow or no spectral change. Using cursors, we then were able todetermine the beginning and end of these portions. We extracted the correspondingsegment of the original waveform, smoothed it on both sides using a 12 msec half Kaiserwindow, and listened to it to verify that it sounded hke the intended vowel. Thisprocedure of excising the 'steady-state' portion was applied to all vowel utterances ofboth groups of speakers.
Once the tokens were verified, we took an average of the x, y, and z coordinates overthat part of the vowel showing little change, yielding one point per vowel token for
A. Jongman, M. Fourakis andJ.A. Sereno 229
each speaker. Thus, each vowel token was represented by a single point in the APS, whichwas based on an average of the x, y, and z values corresponding to the 'steady-state'portion of the vowel and which had been perceptually verified to be the intended vowel.
Using the procedures outlined above, we collected 80 points for the Greek vowels,and 167 points for the German vowels. (One token of the German vowel [o:] wasdisqualified due to equipment malfunction during the recording session.) The Appendixlists all the absolute frequency values (FO, Fl , F2, F3) for each token of each vowel forboth the Greek and the German speakers. The values reported in the Appendix aregeometric means of FO, Fl , F2, and F3 over the 'steady-state' portion of the voweltokens as defined above.
Data points from the literature
In order to expand our data base, we incorporated measurements taken from theavailable literature. For German, we converted J0rgensen's (1969) measurements intopoints in the APS by means of Equations 1-3, using a constant FO of 135 Hz(approximating the average male pitch of the Peterson and Barney (1952) data base)for all six male speakers. J0rgensen did not report F3 values for 14 out of 24 instancesof mid and high back vowels. For these cases, we used as an F3 value the geometric meanof all his reported F3 values for that vowel for all speakers. These procedures yielded84 points for the German vowels.
In addition, we converted formant values reported by Iivonen (1987) into points inthe APS. Iivonen included measurements for fundamental frequency and the threeformant frequencies at three different locations in the vowel: One taken at the "targetfor F l " one at the "target for F2", and one at "maximum volume velocity (Iivonen,1987, p. 128). In converting these values to points in the APS, we used the geometricmean of these three measurements for each vowel. This yielded 14 more points for theGerman vowels.
In sum, the Greek data base consisted of 80 tokens based on our own measurements,while the German data base consisted of 265 vowel points, of which 167 were based onour own measurements, 84 were taken from J0rgensen (1969), and 14 were taken fromIivonen (1987).
RESULTS
Modem Greek
The vowel productions of the Greek speakers are shown plotted in the APS (frontview) in Figure 3. Each symbol represents one vowel token by a speaker. The middleof each symbol represents the actual location of each token. Points belonging to thesame phonetic category are enclosed in target zones. These target zones are, in fact,three-dimensional objects, the plane projections of which are shown in Figure 3.^ All
The target zones were drawn on a high-resolution color graphics terminal (Evansand Sutherland PS300) which emulates a third dimension. No constraints were placedon the shape of the target "Zones, except for the fact that they had to be continuous.All zones were drawn by hand, using a computer-aided method with a resolution of0.01 log units. This resolution was chosen so that the present classification perfor-mance could be compared to that for American English as reported by Miller (1989),
230 I 'owels in (ireek and German
' 111.' [el.:" [al» [01
. [u]
Y'
Fig, 3, Data points and target zones tor the live Greek vowels shown in front view usingslab coordinates. See Figure 2 for axis labels and units. Each sytnbol representsone vowel token produced by one speaker.
the present data points can be enclosed by distinct, non-overlapping zones. Fhus, thesezones enable us to describe the present corpus of Greek vowels with 1 OO'> accurac\.
It is remarkable that the target zones for [i] and [e | , in particular, leave a lot o\space unoccupied. One might expect much more variation in the vowel productions ^\Greek speakers since there are no competing photienies iti the area, as opposed toAtnerican English (see, for example, Lindblom, 1086), Of course, turther atiaUsts ot'vowels produced by both male and female speakers (since only male speakers were usedtn the present study) and iti tiiore phonetic contexts tnight result in target zoties forModern Creek which are less compact,
Gernian
The (jermati data present a more cotuplcx pattert i . Our Gertiian corpus otters tiot
only a greater vowel inventory, btit also vowel tokens from both male atul fetiiale
speakers. Ihe 2r>, points representing the Cerman vowels were plot ted in the APS. In
an a t t empt to mitiimize the amount o{ overlap among the 14 vowels, each set of points
.-1, Jongman, M. Fourakis and J..A. Sereno
Fig, 4, Target zones for the 14 Gertnan vowels shown in front view using slab coor-dinates. See Figure ] for axis labels and units. The zones associated with each ofthe vowels are itidicated hy the labels.
belonging to a particular vowel category was enclosed by an outline of a correspondingtarget zone.
Figure 4 shows a front view of the auditory-perceptual space with the target zones forthe 14 German vowels.' As is apparent from Figure 4, these target zones are quiteirregular in shape, thus accommodating some differences in speaker and phonetic context.For the present stud>', we have sitnpl>- drawti the target zones such that the)' enclose asmany appropriate data points as possible. However, with this relatively small sample,it retrains an open question whether such zoties will prove to have atiy explatiatory value.In order t(.) address this question, perceptual veritication experitiients tnust be conductedto establish the "psychological reality of the target zone boundaries. Such experimentsare ctirrentl> being cotiducted in our laboratories to evaluate the target zones establishedtor American English. However, until this issue has been settled, we believe that thereis no a priori reason to prefer smooth zones (e.g., circles or ellipses) o\er the present
For the sake of clarity, individual data points are not sliown in tliis lMgure but areshown for each vowel in Figures ,S, d, and 7.
232 Vowels in Greek and German
irregular zones.*For the 14 German vowels, the three-dimensional irregular zones were able to describe
the present corpus of German vowels with 94% accuracy. The zones group the datapoints according to phonetic category without any overlap. Since Figure 4 is a two-dimensional representation of the three-dimensional space, some target zones may appearto overlap. However, none of the target zones show any overlap when taking all threedimensions into account.
Table 2 summarizes the results of mapping the intended productions onto distincttarget zones. The overall accuracy is approximately 94% correct. Only 16 vowel tokensout of a total of 265 tokens were misclassified, with the majority of those points fallingin regions not belonging to any target zone. The vowel with the lowest score is [y](53%), which exhibits considerably more scatter than any of the other vowels. With theexception of this vowel, the percent correct classifications range from 89% to 100%.
The individual target zones with their corresponding data points are shown in Figures5, 6, and 7. Figure 5 shows a front view of the data points and target zones for thefront unrounded vowels [i:, i, e:, e] . Figure 6 shows a front view of the data points andtarget zones for the front rounded vowels [y:, y, 0:, 0 ] . Figure 7 shows a front viewof the data points and target zones for the back vowels [ u : , u , o : , o , a : , a ] .
With respect to the front-back distinction, there is a clear separation between frontvowels, on the one hand, and back vowels on the other. Front vowels occupy the APSquadrant with positive x' and y values, while back vowels occupy quadrants withnegative x' and either positive or negative y' values.
As for the unrounded-rounded distinction, the target zones for the front roundedvowels ([y:, y, 0:, 0]) fall behind those for the front unrounded ones ([i:, i, e:, e]).
* In addressing the issue of the irregular boundaries of the target zones. Miller (1987b)presented the 'third iteration' target zones for English, which classified 2051 voweldata points into the nine monophthongal categories of English with 95.8% accuracy.Of these 2051 points, 1420 were from Peterson and Barney (1952) and 631 werecollected at Central Institute for the Deaf. In order to examine the effect of highlyirregular borders on the percentage of correct classification. Miller (1987b) applied asmoothing algorithm to these target zones, which iteratively averaged the border linesevery three points. He applied this smoothing 50, 100, 400, and 1000 times. After100 times, for example, the target zone for [A] , which is highly irregular since in APSit is flanked by many adjacent target zones, became a three-dimensional oval-shapedobject with no appendages, about 0.25 log units long and 0.18 log units wide in frontview. The correct classifications for each of the smoothings are given in the followingtable as a percentage of the total 2051 points:
After about 1000 iterations, the zones collapsed into single points and the classifi-cation percentage approached zero. We did not apply this algorithm to the Gerniantarget zones because of the small number of data points. However, we are currently inthe process of extending our German database and we hope to be able to refine theGerman target zones as well as get some estimate of the effect of irregular boundariesusing perception experiments.
Pro-ducedvowel
i:
i
e:
e
y:
Mapped
i: i
19 -
- 17
- -
- -— —
onto:
e:
_
-
19
-
—
A. Jongman, M. Fourakis andJ.A. Sereno 233
TABLE :
Classification of German vowels by means of perceptual target zones.The rows show the intended vowels and the columns the target zones
onto which they map
Un-claimed %
y: y 0 : 0 u: u o: o a: a areas correct
- - - - - - - - - - - 100
- - 1 - - - - - - - - - 1 89
- - - - - - - - - - - 100
- - - 1 9 - - - - - - - - - - 100
19 - - - - - - - - - 100
y — 2 — — — 10 1 — — — — — — — 6 53
0: - - - - - 1 18 - - - - - - - - 95
0 - - - - - - - 1 9 - - - - - - - 100
u: - - - - - - - - 1 8 - - - - - 1 95
u _ - _ - - - - - - 1 8 - - - - 1 95
o: - - - - - - - - - - 1 7 - - - 1 94
o - - - - - - - - - - - 1 9 - - - 100
a: - - - - - - - - - - - - 1 9 - - 100
a - - - - - - - - - - - - - 1 8 1 95
Total correct:249/265 = 94
as shown in Figure 8. A comparison of the front unrounded and front rounded vowelsin terms of z' values revealed that the front unrounded vowels have a mean z' value ofapproximately 0.70 log units, whereas the front rounded vowels are further back with amean z' value of approximately 0.65 log units. The distinction between the front roundedand front unrounded vowels in terms of z' values eliminates the apparent overlap in thetwo-dimensional front view among the vowels [ i : ] , [ e : ] , and [y : ] . As shown in Figure4, the target zone for rounded [y:] seems to overlap with those for unrounded [i:]and [e:] , However, when the vowel slab is rotated 90 degrees into side view, it can beseen that these target zones-do not overlap. Figure 9 shows such a side view of the data
234 in (ireek and (jernian
[J
+
•
111
le)[ i l
[e ]
F i g , 5 , D a t a p o i n t s a n d t a r g e t z o n e s t o r t h e G e r t n a t i tVont u n r o u n d e d v o w e l s ( | i : .
i, e :, e j ) s h o w t i in I r o n t v i e w u s i n g s l a b c o o r d i n a t e s . S e e F i g u r e 2 t o r a x i s l a b e l s
a n d u n i t s . E a c h s y m b o l r e p r e s e n t s o n e v o w e l t o k e n p r o d u c e d b y o t i e s p e a k e r .
p o i n t s t o r | i : J a n d | e : ] ( w h i c h o v e r l a p in s i d e v i e w o t i l y ) a t i d t h e p o i t i t s r e p r e s e n t t t i g
| \ ' : J . ,As c a n b e s e e n , all | y : | p o i n t s a r e b e h i n d t h e d a t a p o i n t s f o r | i : ] a t i d | e : ] . T h u s ,
t h o s e u n r o u n d e d - r o u n d e d v o w e l p a i r s t h a t o v e r l a p in t h e t r o u t v i e w ca t i b e s e e n t o b e
n o n o v e r l a p p i n g in t h e s i d e v i e w .
F i t i a l l y , C e r m a n t e t i s e a n d l ax v o w e l s t i i a p o n t o d i s t i t i c t t a r g e t z o n e s , w i t h t a r g e t / o n e s
l o r t e n s e v o w e l s g e n e r a l l \ - h a v i n g l a r g e r y v a l u e s t h a n t h o s e f o r t h e i r l ax c o u n t e r p a r t s .
S i n c e t h e v a l u e o t y ' is l a r g e l y d e p e n d e n t o n t h e v a l u e o f z ( s e e E q u a t i o n d ) , t h i s t i n d i n g
is in a c c o r d w i t h t h e t r a d i t i o t i a l o b s e r v a t i o n ( e . g . , J 0 r g e t i s e n , I '- 'd'- ') t h a t t e t i s e ( l o n g )
v o w e l s t e n d t o h a v e a l o w e r E l ( i . e . , a r e less c e n t r a l i z e d ) t h a t i t h e i r l ax ( s h o r t ) c o u n t e r -
p a r t s .
In o r d e r t o c o t n p a r e t h e p e r t o r m a n c e o t t h e A P S t o t h a t o f o t h e r c l a s s i t T c a t i o n
s c h e m e s , w e c o n d u c t e d l i n e a r d i s c r i m i n a n t a t i a K s e s u s i n g c o t i i b i n a t i o n s o f F O , E l , E 2 ,
a i u l E 3 , a s w e l l as t h e A P S p a r a m e t e r s x , y , a n d z , a s t h e c l a s s i f i c a t o r \ ' v a r i a b l e s . ' ^ 1 m c a i
W f t l i a i i k o n e o t t t i e r e v i e w e r s , l e r r y N e , i r e v , t o r n i a k i n p h i s l i n e a r d i s c r i t n i n a n ta i i a h ' s i s p n . i g r a n i a v a i l a b l e t o t i s .
A. Jongman, M Fourakis and J. A. Sereno
Y'
AD
[ y i[0]
"7T• —'-•f-
0
/>\ff
Eig. (T. Data points and target zones t"or the Certnan tVont rounded vowels ([>• , y,
0 . , 01 ) shown in front view using slab coordit iates. See Eigure 2 for axis labels
and units. Each symbol represetits one vowel token produced b\ ' one speaker.
discritninant analysis is a technique to verify that apparent clusters are real, and to decide
to which cluster a new data point should be assigned. The discritiiinatit atiaK'sis m e t h o d
as applied to sitnilar issues is described in ,Assmann, Xearey, and Hogan (l'-^S2), Syrdal
11*^)85), and Syrdal and Gopal ( l ' i S b ) , Brietlv', this analysis uses group tiieans to assign
each individual token to a group on the basis of the es t imated a posteriori p robabt l i t \
ot group membership . The results presented here are tVom the R ( resubs t i tu t ion) m e t h o d
ot dassificaticMi, which provides an index of the resolution into groups of the present
data set. In addit ion, the a posteriori probabil i ty (APP) indicates the relative s trength of
group metiibership. For modera te satiiple sizes such as the present one , this index nia\- be
more intormative than percent correct classification (.Assniann et a!.. U''S2),
The results of the discriminant analyses for the variables FO, F l . E2, atid E3 , as well
as X, y, and z, are shinvn in Table 3. It can be seen that \ari inis combina t ions o\ these
variables produce reasonably similar results. However, the additioti of EO to ei ther the
tirst two or the first three formants improves resolut ion, increasin!: the APP to O.'^S
and O.fiO, respectively. The x , y , z coordinate system yields a ver\- similar pe r fo rmance .
Vowels in (ireek and Gernian
Eig. 7. Data points and target zones for the German back vowels ( | u : , u, o : , o, a :, a | )
shown in front view usina slab coordinates . See Eigure 2 for axis labels atid
units . Each symbol represents one vowel token produced by one speaker.
T A B L I : 3
Results of discriminant analysis for Gernian vowels
Variables
E l , E2
F l , E 2 , E3
EO, E l , E2
EO, E l , E 2 ,
X, \ . z
' ' correct
(ll M
a posteriori probability (APP)
O., O
()„> I
0,58
O,(iO
O,SS
,•1, Jongman, M. Fourakis and J..A. Sereno
• Front Unrounded® Front Rounded
Z'
Fia. 8, Data pomts for the German front unrounded vowels ([t:, i, e :, e] ) and the trontrounded vowels {[y.. y. 4>:. (^\) shown in side view using slab coordinates. Thefront unrounded vowels are represented b>' cubes, their rounded counterpartsby spheres. In this orientation, the x' axis is perpendicular to the y z' plane,x', >•' and z' axes are in 0,1 log units and the point of origin is (0, 0, 0). Thefront of the space is to the right, with the data points tiu the rounded vowelsfalling behind those t'or the unrounded vowels.
with an APP of 0.58, While the percent correct classification of the x, y, and z variablesdoes not reach the level of the hatid-drawn target zones, this tiiay be due to the factthat tokens are assigned to groups on the basis of group means, Eor target zones thatenvelope other zones, the group means will be ver)' similar. In this sense, then, litieardiscriminant analysis would not be able to difterentiate points tailing, tor exatnple. itithe target zoties for Gernian [a] and [a:] (see Eigure 7), since these zones would havesimilar means. Nevertheless, the present analysis shows that vowel classitlcatioti oti thebasis of X, y, and z roaches a level cotnparable to that on the basis of either EO, E l , E2,or EO, El , F2, and F3. For another approach to the comparison of the ,APT scheme toother normalization schetiies (e.g., Koenig, mel, and Bark), see Miller ( U>8^)),
I 'owels in (ireek and German
Y
n [ i ]A [e-l
T
Fig. Q, Data points tor the Gernian unrounded vowels | i : ] and [e: ] along with the
data points tor the German rounded vowel [y : ] shown in side view using slab
coordinates . See Fiaure 8 for axis labels atid units.
DISCUSSION
The present s tudy a t t emp t s to characterize the vowels of Greek and Gerniati b\ '
conver t ing FO, E l , F2 , and E3 values into log frequency ratios and plott ing them \\\
the APS, Represent ing the relatively simple vowel inventory of Greek in the audi tor \ -
perceptual space enabled us to describe the five Greek vowels with lOO'i' accuracy. The
data points tor the five Greek vowels are enclosed by non-contiguous target zones.
Represent ing the much niore complex German vowel system in the three-ditiiensional
space enabled us to describe the vowels with ' '4 '- ' accuracy. The data points tor the
14 Cerman vowels are enclosed by adjacent target zones, with little or no unclaimed
space be tween them. This representat ion of the German vowels yielded good separation
along the t ront -back , rouiuled-Lmrotuuled, and tense-lax dimensions.
In his spectrographic s tudy ot the Cerman vowels, ,l0rgensen (l'^'d'-') pointed out that
instead of representing the vowels in an Fl by F2 space, it might be advantageous to
take 13 into account , especially tor laiigtiages with both unrounded and routided front
vowels. Since, according to J^igeiisen, three-dimensional displa>'s are usualU' quite
A. Jongman, M. Fourakis andJ.A. Sereno 239
hard to grasp, J0rgensen plotted the vowels in a two-dimensional space, using an effectiveF2' (Korlen and Malmberg, 1960), a measure that takes F3 into account. However, whenJ0rgensen plotted the German vowels in an Fl by F2' space, it did not enhance theseparation of the unrounded and rounded front vowels at all, so that J0rgensen used anFl by F2 space for the remainder of the paper (J0rgensen, 1969). Although J0rgensendid not quantify the amount of overlap in his two-dimensional space, it does seem that inthe present analysis taking a third dimension (F3) into account as an independentdimension, rather than a derived dimension, substantially reduces the amount of overlap,particularly that between front unrounded and rounded vowels. Furthermore, theintroduction of a low-frequency reference (SR) for speaker normalization and for thedisambiguation of certain vowels may also have improved the classification of the 14German vowels in the present study.
In addition, the three-dimensional space used in the auditory perceptual theoryaffords a way of comparing vowel systems across languages which is not subject to anyof the objections brought forth by Disner (1980; 1986). Disner (1986) argues againstnormalization schemes which use mean formant frequencies as the correction factorwhen comparing vowel systems from different languages. She states that, when usingoverall formant frequency means for a language like French (or German) with frontrounded vowels, the normalization procedure is likely to overnormalize the data, sincethere are more front vowels than back vowels. However, the auditory-perceptual theorydoes not use overall formant means as the correction factor for normalization. Thus,the vowels of each language can be mapped onto the same space, bounded by anatomicaland physiological constraints, independent of vowel density in any area of the space.
It is possible, then, to represent and compare the vowels of all languages in a vowelspace that is defined independently of vowel inventory. For example, the /i/ vowelof one language can be compared with the /i/ vowel of another language, regardless ofthe density of vowels in the language. It does seem to be the case that those vowelphonemes that the three languages discussed in this paper (Greek, German, and AmericanEnglish) have in common occupy similar locations in APS. Figure 10 shows a front viewof APS with the three vowels that Greek, German, and American Enghsh have in common([i, a, u]). For each language, each vowel is represented by a single point which is basedon the average of the x', y', and z' values for all tokens of a particular vowel. The x',y', and z' values for American English [i, a, u] were taken from the database describedin Miller (1989). As can be seen, these common vowels occupy similar regions in APS.However, it seems that the locations of these common vowels do, in fact, vary as afunction of the phoneme inventory of each individual language, with the Greek vowelstending to be more centrally located and the German and American English vowels moreperipherally located.
In general, it would seem advantageous for a given language to have vowels that aremaximally distinct acoustically (see, for example, Liljencrants and Lindblom, 1972;Stevens, 1972; Lindblom, 1986) for reasons of communicative efficiency. Greek providesan example with its five vowels being quite far from each other in APS. Interestingly,five-vowel inventories similar to that of Greek are much more frequent than any othertype of vowel inventory (Crpthers, 1978; Maddieson, 1984). The question remains, then.
240 I'owels in Greek and German
Greek
German
American English
•
4a]
Fig, 10, Data points t~or three common vowels of Greek, German, and American English([i, u, a] ) shown in front view using slab coordinates. Each data point is theaverage of the x' y' and z' values of all tokens for each vowel in each language.See Figure 2 for axis labels and units.
how the vowels of languages with larger vowel inventories are organized in this samevowel space. The vowel spaces tor Gernian and American Hnglish are much more dense.It seems that the larger the vowel inventory, the more peripheral the location of theextreme vowels (in terms of x' and y'), relative to vowels of languages with smalleiinventories.
As shown in Figure 10, this general trend holds for the vowels |i] and |u | ot German,a language with 15 niotiophthotigal vowels, as compared to those of American Enghsh,a language with ^ tnonophthongal vowels, which, in turn, are relativelv' more extretiiethan those ot Greek, a language with 5 vowels. The exception is | a ] , which is mostextremely located for American English, Of ctnirse, these prelitiiinary findings will haveto be replicated with a much larger sample.
In conclusion, the spectral characteristics of the tive vowels in Modern Greek and 14vowels in German were examined. For each vowel token, trequenc\' values of EO, El , E2,and E3 were obtained. A transtorination was tised to convert these measurements intolog frequency ratios which were then plotted as points in the APS, Einally, target /ones
A. Jongman, M. Fourakis and J. A. Sereno 241
were drawn around points representing phonemically identical vowels. These target zonescould correctly categorize the present Greek and German corpora with 100% and 94%accuracy, respectively. Using this preliminary set of data, the present approach appearsto normalize for a variety of inter- and intra-speaker variables and allow a unique charac-terization of vowel sounds in different languages, as well as a way to compare vowelsacross languages. In this manner, such an approach may provide insights not only intothe language-specific organization of vowel systems, but also into cross-languagecomparisons.
(Received March 6, 1989: accepted October 6, 1989)
REFERENCES
ASSMANN, P.F., NEAREY, T.M., and HOGAN, J.T. (1982). Vowel identification: Orthographic,perceptual, and acoustic aspects. Journal of the Acoustical Society of America, 71, 975-989.
CROTHERS, J. (1978). Typology and universals of vowel systems. In J.H. Greenberg, C.A, Ferguson,and E.A. Moravcsik (eds.), Universals of Human Language, Vol. 2: Phonology (pp, 93-152).Stanford: Stanford University Press,
DISNER, S.F. (1986). On describing vowel quality. In J.J. Ohala and J.J. Jaeger (eds.), ExperimentalPhonology (pp. 69-79). New York: Academic Press.
FANT, G. (1973). Speech Sounds and Features. Cambridge: MIT Press.HOUSEHOLDER, F.W., KAZAZIS, K., and KOUTSOUDAS, A. (1964). Reference Grammar of
Literary Dhimotiki. UAL 30/2 Part II, PubUcations of the Indiana University ResearchCenter in Anthropology, Folklore, and Linguistics, 31, Bloomington: Indiana University.
IIVONEN, A. (1987). A set of German stressed monophthongs analyzed by RTA, FFT, and LPC.In R. Channon and L, Shockey (eds,). In Honor of Use Lehiste (pp. 125-138). Dordrecht:Foris Publications.
J<I)RGENSEN, H. (1969). Die gespannten und ungespannten Vokale in der norddeutschen Hoch-sprache, mit einer spezifischen Untersuchung der Struktur ihrer Formantenfrequenzen,Phonetica, 19,217-245.
KORLEN, G., and tVIALIVIBERG, B. (1960). Tysk Fonetik. Lund: Gleerups,LILJENCRANTS, J., and LINDBLOM, B. (1972). Numerical simulation of vowel quality systems:
The role of perceptual contrast. Language, 48, 839-862.LINDBLOM, B. (1986). Phonetic universals in vowel systems. In J.J. Ohala and J.J, Jaeger (eds).
Experimental Phonology (pp. 13-44). New York: Academic Press.MADDIESON, I, (1984). Patterns of Sounds. Cambridge: Cambridge University Press.MILLER, J.D. (1987a). Auditory-perceptual processing of speech waveforms. In W.A. Yost and C.S.
Watson (eds.). Auditory Processing of Complex Sounds (pp. 257-266). HiUsdale: Erlbaum,MILLER, J.D. (1987b). Classification of vowel production by means of perceptual target zones: A
response to Ladefoged and Studdert-Kennedy. Journal of the A coustical Society of A merica,82, Suppl. 1,S82.
MILLER, J.D. (1989). Auditory-perceptual interpretation of the vowel. Journal of the AcousticalSociety of America, 85,2114-2134.
MOULTON, W.G.( 1962). The Sounds of German and English. Chicago: University of Chicago Press.NEAREY, T.M. (1978). Phonetic Feature Systems for Vowels. Bloomington: Indiana University
Linguistics Club.
242 Vowels in Greek and German
PETERSON, G.E. (1952). The information bearing elements of speech. Journal of the AcousticalSociety of America, 24, 629-637.
PETERSON, G.E., and BARNEY, H.L. (1952). Control methods used in a study of the vowels. Journalof the .Acoustical Society of America, 24, 175-184.
POLS, L.C.W. (1977). Spectral Analysis and Identification of Dutch Vowels in Monosyllabic Words.Soesterberg, The Netherlands: Institute for Perception TNO,
SHEPARD, R.N. (1972). Psychological representation of speech sounds. In E.E. David and P.B. Denes(eds.). Human Communication: A Unified View (pp. 67-113). New York: McGraw Hill.
STEVENS, K.N. (1972). The quantal nature of speech. In E,E, David and P.B, Denes (eds.). HumanCommunication: A Unified View (pp. 51-66). New York: McGraw Hill.
SYRDAL, A.K.(1985). Aspects of a model of the auditory representation of American English vowels.Speech Communication, 4, 121-135,
SYRDAL, A,K., and GOPAL, H.S. (1986). A perceptual model of vowel recognition based on theauditory representation of American English vowels. Journal of the Acoustical Society ofAmerica, 79, 1086-1100.
WIESINGER. P. (1983). Die Einteilung der deutschen Dialekte, In W. Besch, U. Knoop, W. Putschke,and H.E. Wiegand (eds.), Dialektologie: Ein Handbuch zur deutschen und allgemeinenDialektforschung, Vol 2 (pp. 807-900). Berlin: Walter de Gruyter,
WRIGHT, J.T. (1986). The behavior of nasalized vowels in the perceptual vowel space. In J.J, Ohalaand J.J. Jaeger (td%.). Experimental Phonology (pp, 45-67) , New York: Academic Press.
APPENDIX
Tables Al and A2 list the fundamental frequency and formant frequency values for eachof the vowel tokens plotted in the APS. Each measurement reported represents thegeometric mean of the relevant parameter over the total duration of the steady-stateportion of the vocalic section transcribed as the intended vowel. To replicate the plotsin the main paper, one needs to convert the FO values into SR values using Equation 4and then use Equations 1 through 3, or Equations 5 through 7, to derive x, y, z or x',y', and z' coordinates, respectively. Table Al lists the values for the five Greek vowels