Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Word and sentence intonation: A quantitative model ¨ Ohman, S. journal: STL-QPSR volume: 8 number: 2-3 year: 1967 pages: 020-054 http://www.speech.kth.se/qpsr
Dept. for Speech, Music and Hearing
Quarterly Progress andStatus Report
Word and sentenceintonation: A quantitative
modelOhman, S.
journal: STL-QPSRvolume: 8number: 2-3year: 1967pages: 020-054
http://www.speech.kth.se/qpsr
B. WORD AND SENTENCE INTONA'~ION: A QUANTITATIVE MODEL
In previous papers ("we have suggested a functional model of pitch
control i n speech production ('I. The present report summarizes the
resul ts of some fur ther attempts to explore a s well a s to constrain the
model by comparing i t with empir ical data. We shall concentrate he re
on the patterns of control that must be postulated to account fo r the
Scandinavian word tones and that present cer ta in neuro-motor implica-
tions (3). We will then briefly consider the model in relation to tone
languages i n general.
The discussions that we shal l offer below a r e not meant a s a defin-
itive t r ea t i se of word and sentence intonation. The hypotheses to be
proposed will probably have to be modified i n the light of data that have
not been available to the author and/or that a r e yet t o be discovered.
It is hoped that it will be possible to collect some of these data through
experimental use of the tentative model outlined below. The purpose
of this paper is thus to stimulate discussion and research.
Model
The main features of the model (4) a r e summarized i n Fig. 11-B-1.
The fundamental frequency signal fo( t ) is synthesized by a mechanism
(labeled "larynx model") which accepts three types of input: 1) the
t ime varying "vocal cord tension" g(t), which i s the s u m of two com-
ponents, g (t) and gw(t), where g s(t) represents the sentence intonation 5
and g (t) the word intonation, 2) a n acoustic interaction signal ar is ing W
f rom the secondary effects on fo caused by fluctuations i n sub- and
supraglottal p res su re (5), and 3) an articulatory interaction signal
deriving f r o m the non-phonatory movements of the hyo- thyroid lever
system. These movements a r e due to cer tain ar t iculatory ges tures of ( 6 ) the tongue .
The two signals g (t) and gw(t) a r e the outputs of two f i l ters that s may have different properties. These f i l te rs a r e assumed to represent
the dynamic charac ter i s t ics of the mechanical and pe ripheral-neural
components of the laryngeal system.
The inputs to the sentence and the word intonation f i l ters a r e sup-
posed to be built up f r o m s tep functions with different amplitudes and
onset t imes. These s teps represent the discre te higher-level neural
FUNCTIONAL MODEL OF LARYNX CONTROL
SENTENCE INTONATION INPUTS
ARTICULATORY INTERACTION
SIGNAL
Fig. 11-B-1.
INTONATION
FILTER g,(t)
LARYNX w - fJt)
g ( t MODEL
gw(t)
ACOUSTIC INTERACTION WORD
INTONATION SIGNAL INPUTS
- WORD
INTONATION
FILTER
STL-QPSR 2-3/1967 21.
events that correspond to the linguistic intonation elements (levels, tones,
etc. ). In particular, i t i s assumed that only a finite "library" of s tep
amplitude and timing configurations may be drawn upon i n the construc-
tion of the f -contour of an utterance. Empir ical investigation involving 0
systematic matching of the model to data must decide what these con-
figurations a r e like.
It is necessary to formulate exactly the propert ies that should be in-
corporated into the boxes labeled "larynx model", "word intonation filter",
and "sentence intonation fi l ter" i n Fig. 11-B- 1. The part icular choices
that have been made fo r the purposes of the present study a r e indicated
i n Fig. 11-B-2. We will not discuss in detail the physiological facts that
motivate these choices here('ll. The following points should be men-
tioned however.
1) The relationship fo=eg is suggested by the fact that the f -contours of the same utterance spoken a t different over-al l pitch yevela (pitch reg is te rs ) give essentially paral le l curves when plotted on a logarithmic frequency scale.
2 The effects of the articulatory and acoustic interactions a r e dis- regarded since we will avoid utterances containing voiccles s and/or strongly obs sounds in the important par t s of the intonation contours
3) The f o r m of the impulse responses of the word and intonation fil- t e r s is assumed to be tne-Yt in analogy with the general s h the tension developed by s t r ia ted muscle i n a single twitch w. of It i s to be expected that the mechanical interconnections in the larynx and the charac ter i s t ics of the neuro-motor circui ts do not cause the form of the response function to differ appreciably f rom fioy experience f rom that of muscle. This hypothesis i s suggeste movement analysis of l ip activity in speech
We repeat that the word and sentence intonation f i l ters , which a r e
jointly represented by g'(n, y; t ) in Fig. 11-B-2, may have different im-
pulse responses. This i s to say that the constants 1 and - n must b e de ter -
mined separately fo r the sentence and word components through compar-
ison with actual data(''). The bottom par t of Fig. 11-B-2 shows impulse
responses fo r n=l and n=2. - -
Computer program
The measurements and calculations of the present study have been
made by means of a Control Data 1700 as well a s a PDP-7 ( I 2 ) digital corn-
puter. The program used for measuring the f -contours of human utter-
ances has been described in STL-GPSR 1 / 1 9 6 6 ~ l ~ ) . A simplified version
has also been written by Tjernlund for the CDC-1700. The numerical
simulation of the model of Fig. 11-B-2 was implemented by ahman on the
PDP-7 for visual analysis purposes. Another program described in the
present i ssue of QPSR ( ~ e c . IV.B) has been written by Liljencrants (14)
NEURO-MOTOR COMMAND
FUNCTIONAL MODEL OF LARYNGEAL CONTROL IN INTONATION
VOCAL CORD "TENSION"
FUNDAMENTAL FREQUENCY
Fig. 11-B-2. Detailed specification of one of the channels shown in Fig. 11-B- I .
- -- -v
1 -t time
- 1- t ime
< g(t )=~Ju- , ( r t i )g ' (n.8;x)dx fo(t ) A u-,(t- t, ) 0
,g(t) m
t
-- g'(n.x;t)
acute accent is realized by a rising pitch during, and a glottal stop
at the end of, the s t r e s sed vowel, and the grave accent consis ts of a
pitch contour which is level during most of the f i r s t syllable and then
r i s e s abruptly of the end of the syllable (I8). In Stockholm, on the
other hand, the pitch i s rising i n acute accented vowels and falling i n
grave accented vowels, while the situation is almost reversed i n
Southern Swedish Skdne (I9). In many of the dialects of Dalarna
(Dalecarlia), finally, the pitch pat tern is rising during the f i r s t vowel
of both the acute and the grave accents, the difference being that the
pitch drop s t a r t s ea r l i e r i n the acute than i n the grave accented words
(c f . , fo r instance, dialects No. 23 through No. 29 of Fig. 11-B-3).
Fig. 11-B-3 gives a qualitative summary of the pitch patterns of Scan-
dinavian one-word utterances, a s established by Meyer (20).
In view of the phonological unity of the accent phenomenon it may
be suspected that t he re i s some common physiological bas is behind i t s
d iverse phonetic real izat ions. If such a bas is can be found, a study of
i t s propert ies may suggest universal constraints on the intonation model.
Stockholm sentence intonation
The speech mater ia l of the present study consis ts mainly of non-
compound Swedish accented words embedded in one of the two f r ames
[ se ja igen], "to say again" o r [de va ja sa:],
"it was (that) I said". In a n impressionis t ic description of the
sentence intonation of these f r ames as spoken in the Stockholm dialect
the pitch would be medium high on the f i r s t two syllables ([ seja] o r
[de va]), high f r o m the third syllable on, and low on the las t and per -
haps a l so on the second las t syllable. Using a subjective scale with
five levels (0-4) and assigning one level t o each syllable of the f r ame
we would wri te [334 . . . 401, and sometimes [ 334 . . . 201 o r evcn
[ 334 . . . 001 depending on the part icular mode of pronunciation of the
speaker (21)
These subjective data indicate that the sentence intonation par t of
the model should be adjusted in the following manner. The constant Fo
is chosen s o as to match the pitch of the f i r s t two syllables (subjective
level 3). Somewhere i n the neighborhood of the f i r s t syllable of the
accented word a positive s tep is introduced into the sentence intonation
f i l ter . A second s tep of negative amplitude, representing the end-
contour, is then added in the neighborhood of the p ~ n u l t i m a t e syllable
STOCKHOLM
Stockholm
Stodihdm
Swckhdm
Ab.nI ( *Ipnt 1
DALARNA
-4 Kind
Linkbping l I xEEl3 I I I L
NORWEGEN I I d I I
Fbda
GASTIIIKLAND
MEDELPAD
Sundnoll
NGEUIANLAND
lunrl.
1 DALARNA
DALSLAND -4 Fig. 11-B-3. Schematic acute and grave accent patterns of a hundred Scandinavian
dialects according to E. A. Meyer: Die Intonation im Schwedischen, part 11.
STL-OPSR 2-3/1967
of the utterance (22). To go beyond this point, however, it is neces-
s a r y to introduce the following TIMING ASSUMPTION: "The positive
sentence intonation s tep s t a r t s at the beginning of the f i r s t syllable of
the accented word (23) , I
Model comma.nd s for Stockholm word intonation
The top par t of Fig. 11-B-4 shows a typical acute accent pattern
(Stockholm dialect) corresponding to the utterance [de va m6:nen ja
W: -'J (24). Here the beginning of the f i r s t syllable of the accented word
[mci:nen-'J coincides with the beginning of the closure for [ m ] a s seen
f rom the acoustic record. The superimposed smooth line i n the top left
par t of the figure (curve marked M) shows the resul t of introducing two
sentence intonation s teps (one positive and one negative) i n accordance
with the considerations just discussed. The composite s tep configura-
tion (marked I) is shown below the fo-contours.
It will be seen f r o m the figure that the model contour fi ts the
measured contour ra ther badly i n the segment immediately to the right
of the positive s tep (25). The e r r o r curve (marked E) which represents
what is left i n the measured contour af ter the model contour has been
subtracted f r o m i t , displays a negative dip near the beginning of the
f i r s t syllable of [ m6:nenl. This dip cannot be removed without adding
fur ther steps into the model curve. It i s a typical feature of the Stock-
holm acute accent pattern a s indicated by an analysis of about a hundred
contours, and i t i s often much more pronounced than the example of (26) Fig. 11-B-4 might suggest .
The situation is quite s imi lar a s regards the grave accent contours.
This fact i s demonstrated i n the lower left par t of Fig. 11-B-4. Here
a positive and a negative sentence intonation s tep have been introduced
in the same manner a s was done with the acute accent pattern. The
e r r o r curve again displays a negative dip which occurs somewhat l a t e r
in this c a s e than in the previous case. Obviously, the mismatch be-
tween model and data cannot be removed without entering fur ther s teps
into the model contour.
It was noted above that the impressionis t ic sentence intonation for
both [de va m6:nen ja sa:] and [de va mb:nen ja sa:] would be recorded
a s /334410/ ( o r [3344xO] 0 5 x 5 4). Since the sentence intonation ap-
pears to be level in the accented syllables i t may therefore be proposed
ACUTE ACCENT: STOCKHOLM
Hz l d r v a 1 m 1 6: 1 n s n 1 J a s a: 1
GRAVE ACCENT: STOCKHOLM
Fig. 11-B-4. Comparison of Stockholm accent patterns with curves calculated by means of intonation model. The pulses marked I, IS, and IW represent model output8 with the same input commands that were used to match the empir- ical data but with the model constants a and @ both set to 1000.
STL-CPSR 2-3/1967
that the two residual e r r o r contours of Fig. 11-B-4 discussed above
should be ascr ibed to the word intonation commands f o r the acute and
the grave accents. The right par t of Fig. 11-B-4 shows the resul t ob-
tained by entering a negative pulse (marked IW) of appropriate dura-
tion, amplitude and onset t ime into the word intonation f i l t e r simulta-
neously with the s t ep configuration fo r the sentence intonation ar r ived
a t in the previous analysis (curve marked IS). The e r r o r has now
decreased to 3 o r 4 percent.
Pulse theory of Scandinavian accents
The idea strongly suggests itself that the Stockholm tonal accents
should be represented by a suitably tailored negative pulse fed to the
word intonation f i l ter a t a cer ta in moment relative to the sentence in-
tonation step. This moment is ear ly f o r the acute accent and la te fo r
the grave one (27). This scheme is summarized in Fig. 11-8-5 the
symbols of which should be self- explanatory.
The model r z s p ,n,=. t . inputs of the f o r m of Fig. 11-B-5 under
systematic variation of the relevant parameters is shown i n Fig.
11-B-6. The labels indicate the parameters that were varied i n each
c a s e shown. In the curves labelled - A andcy - the value of g w a s ze ro
(see Fig. 11-B-5 and p, 22 for meanings of parameter names). In the
curves labeled B, B , and D- the value of A was zero. The curves - - - - labeled 5 , finally, represent combinations of a typical sentence intona-
tion s tep with a typical word intonation pulse, the relative timing of
these two events being changed systematically. Fig. 11-B-6 should
give an idea of the possibilities of the scheme summarized i n Fig. 11-B-5.
A qualitative comparison of the patterns that can be s y n t h e s i z ~ d
using the scheme of Fig. 11-B -5 with descriptions of the accentual pitch
contours of a wide variety of Scandinavian dialects (cf. Fig. 11-B-3)
suggests that i s may be possible to put a l l these dialectal manifesta-
tions on the same formula.
Model commands for Malmo word intonation
Fig. 11-B-7 gives an example f rom Southern Swedish ( ~ a l m o ) a r -
ranged in the same manner as that of Fig. 11-B-4. The utterances of
this example a r e [ se ja m6:nen ijen] and [: s ~ j a m2):nen ijen]. Note that
the end-contour of the sentence intonation differs f r o m that of the
INPUT COMMANDS FOR SWEDISH ACCENTS
SENTENCE INTONATION STEP
+ - time
Fig. 11-B-5.
+ WORD INTONATION PULSE
4
B
.
ACCENT MODEL: EFFECTS OF SIX PARAMETERS
Fig. 11-B-6 . Model responses to input commands of Fig. 11-B-5. Explanation in text.
MALMO: ACUTE ACCENT
Fig. 11-B-7
MALMO: GRAVE ACCENT
Hz 50-
- 50
-100
-SOL ;1 .5 1.0 sec 0 5 - 5 9 10 sec.
Comparison of acute and grave accent patterns of the Malm6 dialect with mode: generated curves (cf. text to Fig. 11-B-4). The IS step starting at [ I j ~ n : ] i s due to the fact that the lat ter word was stressed. The glottal stop at the beginning of [ I j ~ n : ] , marked by the let ter G both i n the data curves and in the IW curves, i s discueeed in the text (p. 25).
5 1.0 sec 0 5 10 s,c -lW
O ; T 7 - E
-
- 5 0 Hz
0
-50
Stockholm speaker , Also, a glottal stop occurs a t the beginning of
[ i j ~ n ] i n both utterances. The la t te r s eems to be a general feature of
Swedish phonology: a syllable beginning i n a vowel may be preceded (28) by a glottal stop .
Evidently, the word intonation pulses for the Southern acute and
grave accents differ f r o m those of the Stockholm speaker i n that the
grave pulse s t a r t s ea r l i e r than the acute one. I. e., the temporal o r -
d e r of the Malmo acute and grave pulses i s the r eve r se of that of the
corresponding Stockholm pulses. Moreoever, the Malmo acute has
longer duration, D, and a s teeper t ime course (grea ter value fo r 8 ) than
the Stockholm acute does.
These differences probably account for the fact that the subjective
impression of the directions of the accentual pitch movements a r e op-
posite in the two dialects (cf. Fig. 11-B-8). In the Stockholm acute the
pitch drop due to the negative word intonation pulse is completed during
the initial consonant of the s t ressed syllable and the re turn of the pitch
up to the sentence intonation level occurs during the vowel. This is
heard a s "rising pitch".
In the Malmo acute, on the other hand, the word intonation pulse
s t a r t s only a f te r the initial consonant. Thus the pitch will begin with
an upward movement during the initial consonant and the very f i r s t
portion of the vowel in response to the sentence intonation step. The
negative word intonation pulse then makes the pitch t u r n in the down-
ward direction during the l a t e r par t of the vowel. Apparently, this con-
figuration is perceived a s "falling pitch".
Conversely, the onset of the Stockholm grave pulse is la te enough
to permit the pitch to r i s e in response to the sentence intonation s tep
during the initial consonant and then to drop throughout most of the vow-
e l s o that the impression of a "falling pitch" results. At the end of the
f i r s t vowel the pitch then turns back up towards the sentence intonation
level. The relatively high pitch on the second syllable is correlated
with an impression of "tertiary s t r d s c " ,
The Malmo grave, on the other hand, holds back the pitch r i s e due
to the positive intonation s tep during the initial consonant and par t of the
initial vowel segment s o that a "rising pitch" is heard. In fact, the
Malmo grave accent of many speakers sounds very s imi lar to the acute
accent of many Stockholm speakers. These relationships a r e i l lustra-
ted schematically in Fig. 11-B-8.
A C U T E GRAVE
C v C
ACUTE GRAVE
Fig. II-B-8. Input commands suitable for Stockholm and Malmb accents. The pitch contour has been drawn with thick lines in the vowel segments.
STL-QPSR
Perception of tones
It appears from this discussion that the perception of pitch move-
ment in the Swedish accents may be based on a subjective measurement
of fundamental frequency during the vowel segments only ( 2 9 ) . This
assumption appears reasonable also in view of the possibility of having
accented words with all consonants voiceless. In a word such a s
c stik:at], about 700 msec long, the two voiced segments [ i] and [a ]
may be about 80-150 msec each o r even less. Yet the accent i s imme-
diately perceivable ! (30)
Model commands for the Danish word intonation
The glottal stop typical of the Standard Danish counterpart of the
Stockholm acute accent can be simulated in our model by introducing a
fast and brief word intonation pulse late in the stressed syllable (31)
(cf. Fig. 11-B-9). The Danish "grave accent", on the other hand, de-
scribes a level pitch contour during most of the stressed syllable,
followed by a rapid rise/fall pattern that s tar ts just before the medial
consonant and reaches i t s peak in the middle of the second vowel. This
pattern i s well matched by the command configuration of Fig. II-B-5,
if the word intonation pulse i s put at the beginning of the stressed syl-
lable.
W e -must, accordingly, conclude that the order of the Danish word
intonation pulses, like that observed in the MalmiS dialect, i s opposite
to the order of the Stockholm word intonation pulses.
Model commands for North Swedish word intonation
It i s interesting to contrast this situation with that of Northern
Swedish (cf. Fig. 11-B-10) where the acute accent pulse either
is absent o r i s placed on an unstressed syllable immediately pre-
ceding the accented word. Many of these Northern dialects also have
so-called circumflex, i. e. monosyllabic words with the grave accent
(cf. [bi:t] = "bite !" and [bf:t] = "to bite"). These forms have arisen 932) through apocopation of the final vowel of a bisyllabic grave accented wor .
That also the grave accent pulse is earl ier in the North than in
Stockhdm follaws from a comparison of the relative heights of the
two pitch peaks of this accent: in the North the second peak i s , in gen-
eral, higher than the f irst , but in Stockholm the f irst peak i s us-uily hjgkr
MYCKELGENSJO
ACUTE ACCENT GRAVE ACCENT
I I
0 I
.5 1.0 sec.
Fig. 11-B-10. Comparison of North Swedish (Myckelgensja) accents with model generated curves. Cf . Fig. 11-B-4.
"ACUTE ACCENT": COPENHAGEN "GRAVE ACCENT": COPENHAGEN
Id rva l rn 1 e: I l l ~ ~ a 1 s I B : I I d 6 v a l r n l I: I l l ~ j a l s I s: I
I I 1.0 5 1.0 sec 0 .5 1 0 sec
I I I
5 10 sec 0 .5 1t0 scc. J~
I- - -
Fig. 11-B-9. Comparison of Danish accent patterns with model generated curves. Cf. Fig. 11-B-4.
100
50
.5 1.0 sec
ORSA
ACUTE ACCENT GRAVE ACCENT
I I
0 .5 1.0 sec. I 1 1 1
0 .5 1.0 sec.
I 1 I
0 .5 1.0 sec.
Fig. II-B-11. Comparison of Dalarna (Orea) accent patterns with model generated curves. Cf. Fig. II-B-4.
STL-QPSR 2-3/1967 29.
After having discussed the physiological correlates of the model
elements we shall return to the Scandinavian tonal accents in a more
systematic way and propose a tentative and necessarily incomplete
theory a s to their possible origins and development. Finally, a few
comments will be made concerning tone languages outside of Scandinavia.
Glottal stop interpretation of word intonation pulses
The phonetic nature of the Danish "acute accent" suggests that all
Scandinavian accents should be understood a s variously timed glottal
stops, only softer than the Danish ones (33). At the moment, this i s
an open question.
Fo r one thing it i s not quite clear what a glottal stop is , physiolog-
ically speaking. It i s well known, however, that the muscles of the
larynx a r e innervated by two distinct branches of the X'th cranial
nerve, the recurrence and the laryngicus cranialis (34). The latter
branch ends in the two parts of the crico-thyroid muscle (pars recta
and pars obliqua) while the recurrence branch innervates the remain-
ing intrinsic muscles of the larynx. It i s believed that the crico-thyroid
muscle i s the one mainly responsible fo r pitch control in singing and
in speech (34). It i s thought to perform this function by causing a rota-
tion of the thyroid cartilage about i t s joint on the lower posterior part
of the cricoid cartilage thereby stretching o r relaxing the vocal cords.
The valve function of the larynx, on the other hand, evident in
glottal stops and in the unvoicing gesture of truly voiceless consonants
(this unvoicing gesture i s apparently a sort of negative glottal stop!)
- i s probably performed by the various adducing and abducing muscles.
There i s some evidence of reciprocal inhibition between the crico-
thyroid system and parts of the adducing/abducing system, fn (34).
In a small pilot experiment (35) the motor unit activity of the
vocalis and the crico-thyroid muscles was recorded by means of thin
concentric needle electrodes. The results will be summarized in a
later section of this CPSR (sec. 11. C). The main finding, however, i s
a brief phase of inhibition of crico-thyroid activity at the moment where
the negative word intonation pulses would occur for the two accents
(early for the acute accent and late for the grave). The balli is ti^'^
character of this inhibitory phase i s consistent with the glottal-stop
theory of the accents.
STL-GPSR 2-3/1967 30,
In view of the relative sizes of the anatomical structures affected
by the two laryngeal control systems, one would expect the pitch func-
tion to be mare sluggish than the valve function. It is therefore inter-
esting to observe that in matching the accent pitch patterns it has in-
variably been necessary to assign a value of about 8 to the model con-
stant - cr and values exceeding 12 to - B . Thus, in t e rms of these para-
meters, the word intonation f i l ter of the model has to be at least 50 O/o faster than the sentence intonation filter.
A relevant point in this connection i s the pronunciation of the words
meaning "yest1 and "noff in the Stockholm dialect. Normally, these
words a r e pronounced [ja':] and [nlj:] with the acute accent. But they
can also be spoken bisyllabically and with the grave accent, [ j ; : a ] and
[&:El. In emphatic speech they a r e produced with glottal stops, * ? [j&l and [ n ~ E ] .
Whispered accents
Surprisingly enough, the word accent distinction does not disappear
in whispered speech (36). Under this condition the resonances of the
vocal t ract a r e excited by a turbulent noise source generated by the
incompletely closed glottis. At moderate Reynold' s numbers the spec-
t r a l peaks of this noise a r e tuned to the vocal t rac t formants (37). It
i s well known that the resonance frequencies of a tube open at both ends
(both at the lips and at the glottis, i n the case of the vocal tract) will be
higher than those of a tube closed at one end, provided that the shape of
the tube i s unchanged, If the word accents involve a brief narrowing of
the glottis orifice, a downward movement of the "noise pitch" would
therefore be expected. Ebidently, a decrease in noise intensity also
accompanies the accent gesture. Effects of this type may convey the
subjective word accent impression in whispered speech.
Prominence of accented svllables
A brief comment i s due also regarding the impressionistic assign-
ment of prominence levels to the syllables affected by the word accents.
Using a subjective scale of five levels, (0-4), traditional phoneticians
put [40] for acute accented words of the Stockholm dialect such as (38) [rnb:n~n] and [32 ] for grave accented words such a s [mb:nen] .
I. e. , t h e second syllable of the grave accented word i s perceived a s
being more prominent than the corresponding syllable of the acute
accented word.
In t e rms of the pulse theory ( re fe r to upper part of Fig. 11-B-8)
the acute accent pulse occurs early in the f i rs t syllable while the grave
accent pulse i s late. The laryngeal effect needed to return the pitch
to the sentence intonation level after the negative accent pulse has oc-
curred will therefore be greater in the grave accent case, since the
pitch contour without the accent pulse would be close to i t s asymptote
near the end of the f irst vowel. MoreXer, this effort must be ex-
pended during most of the second syllable in the grave case in contra-
distinction to the acute case where it occurs during the f irst syllable.
It may be that the perceived degrees of prominence a r e at least partly
based on these relationships (39)
PHYSIOLOGICAL INTERPRETATION OF SENTENCE INTONATION COMMANS.5
We have a s yet only studied the tonal accents in sentence positions
characterized by essentially c ~ n s t a n t subglottal pressure, P , and con- -5
stant "average pitch level". However, a sentence intonation contour
may be subdivided into consecutive phrases each of which i s grossly
characterized by an initial rise, a flat peak medially, and (usually) a
fast drop at the end of both the - Ps and the fo-oontours (40). (The
simple declarative sentence should be regarded a s a single phrase.)
If one assumes, a s we have done, that F is constant (cf. p. i:2 0
et sqq) then one is forced to conclude that while the sentence intonation
steps a r e positive in the pre-peak portion of the phrase, they should be
negative when occurring after the point where the peak is desired in
order to bring about the typical pitch drop at the end of the over-all
f -contour. If this were so, the tonal accents could not be generated 0
by means of a single negative word intonation pulse in the falling part
of the fo-contour, for the accentual pitch patterns do not loose their
familiar shape in these positions. In particular, the Stockholm grave
accent always s tar ts with a r i se o r at least a strong reduction in the
rate of fall in pitch. But - no combination of a negative sentence intona-
tion step with a negative word intonation pulse could bring this effect
about.
This problem can apparently be solved by assuming that
1) the model parameter F i s - not constant throughout the phrase but describes a slow rigine;-falling movement typical of the breath-group henceforth to be called the basic phrase contaur the dctailed properties of which a r e yet to be determined by empirical measurements.
2) the sentence intonation commands that a r e added to the basic phrase contour a r e not steps, a s assumed previously, but pu s . These pulses, which we will call phonatory s t ress pulscs f4f 7 have a duration of the order of the syllable and a r e always of positive amplitude.
3) a phonatory s t r e s s pulse of some non-zero amplitude i s cn- tered at the beginning of every stressed syllable according to a strategy that will be described below.
4 the word intonation is simulated by means of negative word intonation pulses in the same manner as proposed previously (P* 25).
Note that requirement 2) guarantees that the appropriate word tone
patterns can be synthesized with the model also in the falling part of
the over-all intonation contour. Since the phonatory s t ress pulse al-
ways has positive amplitude i t will always bring about an upward in-
flection of the fo-contour. On the other hand, since i t is a pulse of
finite duration it will only cause a local perturbation on the basic
phrase contour so that the over-all f a rm of the latter i s not too drastic-
ally changed .
Properties of the basic phrase contour
As was noted above, the detailed properties of the deepest of the
intonation processes, i. e. the basic phrase contour, must be estab-
lished by means of phonetic measurements, Given sufficient informa-
tion about the general properties of phonatory s t ress pulses and word
intonation pulses i t should be possible to take fo-contours from human
utterances, subtract off the pitch modulations due to s t ress and to word
tones, and thereby obtain the basic phrase contour a s a residue (with
a superimposed ripple due to noise a s well a s to articulatory and acous-
tic interactions). This procedure i s illustrated in Fig. II-B - 12.
Although the study of the basic phrase contour, a s defined above,
has not yet proceeded very far , i t i s nevertheless possible to distinguish
three typical patterns; that of the terminative mode, that of the conti- - nuative mode, and that of the elicitative mode of pronunciation. The
terminative mode i s characterized by a steeply falling end-contour that
s tar ts close to the end of the latest s trcssed word of the utterance and
i s normally used in simple declarative sentences. The continuative
mode has a level end-cont our and i s normally used to indicate that more
phrases will follow. The elicitative mode, finally, has a rising end-
contour (often follawed by a fast drop) and i s used in questions and to
express surprise,
1 .4 .6 .8 1.0 1.2 1 . 1.6 1.8 sec
Fig. 11-B-12. Curve I: Measured pitch contour of the utterance [:, dEn: &r:ancn va m&:m m&: ad], "and that morning the man was murdered", spoken i n the elicitative mode (Stockholm dialect).
Curve 11: Model response to the four phonatory s t r e s s pulses shown i n curve V. F o r a l l of these input pulses the amplitude was .5 , the duration . 3 sec, and the filter constant ct = 3 . Curve 111 shows the result of adding the word intonation pulses of curve VI to the input. The lat ter pulses a r e a l l identical and B = .8, D = . 15 sec, and B = 15. Curve IV i s the difference between Curve I and Curve I11 and probably represents the basic phrase contour. Note the typical elicitative end-contour.
STL-CPSR 2-3/1967 34.
Secondly, the increase of physiological intensity in the articulatory
channel would bring about f a s t e r and m o r e vigorous movements of the
tongue, l ips, velum, and jaw, a s well a s of the ar t iculatory components
of the larynx. As a consequence of this, the s t r e s sed syllables would
be character ized by a lower degree of coarticulatory overlap between
the abutting consonant and vowel ges tures (45). Phonetic distinctions
would therefore be sha rpe r in these syllables.
Finally, the physiological intensity increase in the phonatory chan-
nel will - a s already noted - bring about a positive inflection i n the fo-
contour through the introduction of a s t r e s s pulse a t the input of the
sentence intonation filter.
The pitch r i s e due to the phonatory s t r e s s pulse is positively co r -
related with the length of t h e syllable on which i t occurs. However,
a s was noted above, the duration s f equally s t r e s sed syllables is
sma l l e r i n the middle of the phrase, i. e . , where I (t) is assumed to cP
have g rea te r values, than a t the end of the phrase, where I (t) has cp
sma l l e r values. It is therefore surpris ing a t f i r s t , that an increase
in I (t), due to the introduction of a s t r e s s pulse, should increase rp
ra ther than dec rease the duration of the s t r e s sed syllable.
This apparent contradiction may be explained i n t e r m s of auditory
prominence, a perceptual quality of syllables. It may be assumed that,
everything e lse being equal
5) the f a s t e r the pitch r i s e s i n a syllable ( o r the l e s s steeply it drops), the g rea te r will the perceptual prominence be, and
6) a longer syllable is erceived a s m o r e prominent than a shor te r syllable (46 f
The lengthening of s t r e s sed syllables may now be seen a s having
two functions, namely, to increase the perceptual prominence of the
s t r e s sed syllable, and to avoid that the following syllable becomes
more prominent than it should be. The second syllable would become
too prominent if the pitch r i s e due t o the phonatory s t r e s s pulse was
not given tiArre to become completed during the s t r e s sed syllable but (47 continued to r i s e in the following syllable .
Physiological energy
The integral of the physiological intensity function I ( t ) taken over cp
the t ime interval of the ent i re phrase might be given the name total - physiological energy, ET. The inverse relationship between syllable
duration (disregarding the effects of s t ress ) and the basic phrase con-
tour, commented on earl ier , suggests that, for unstressed syllables,
the integral of I ( t ) over the time segment of the syllable i s approxi- cp
mately constant.
Moreover, preliminary experiments indicate that when one of the
syllables of an utterance is given emphatic s t ress two things happen:
the over-all pitch level of the remaining syllables drops in proportion
to the degree of emphasis, and the pitch modulation due to s t ress and
word intonation in the non-emphatic syllables becomes - de-emphasized.
This suggests that the physiological energy that goes into the ernphasis
is subt-racted f rom the energy that :;therwioo would go into the non-
emphatic syllables.
A conservation principle of a similar kind i s apparently implied by
one of the basic devices in Chomsky' s and Halle' s generative formu-
lation 3f American English phonology (48). According to this theory
the simplest way of describing the s t ress patterns possible in the
American English phonological phrase i s to postulate a (probablyuni-
versal) type of rule that (a) changes the s t r e s s of a syllable to primary
s t ress and at the same time (b) reduces each of the s t resses of the re-
maining syllables of the phrase by one degree. The order in which the
primary s t resses a r e introduced into the phrase thus determines the
s t ress pattern a s a whole, and this order i s a unique function of the
morphemic and syntactic structure of the phrase and of the rules.
Thus the difference in s t ress pattern on compound Nouns a s compared
with Noun Phrases consisting of anAdjective followed by a Noun a s well
a s a great many other prosodic phenomena may be described in a very (49) simple and elegant fashion .
On the basis of these hints, the following hypothesis may be pro-
posed.
7) In normal conversation the total physiological energy E of T a phonological phrase, includ'ng the energies of s tressed syllables, depends mainly on the number of syllables of the phrase.
Synthesis strategy for intonation contours
W e a r e presently experimenting with strategies for the synthesis
of f -contours that obey assumptions (1-7) of pp. 31, 32, 34, and 35. 0
One of these strategies, summarized below, i s a phonetic analogue of
the Chomsky- Halle sequential procedure mentioned earlier.
STL-QPSR 2-3/1967 36.
We have, so far , only been working with simple declarative sen-
tences in the terminative mode. It i s assumed that the basic phrase
contour s tar ts with a slow r ise from about 90 Hz toward 110 Hz. This
r i se i s turned into a slow drop at a point corresponding to the boundary
between the last syllable of the noun-phrase constituent and the f i rs t
syllable of the verb-phrase constituent of the sentence. This point will
henceforth be called the turn-over point. The end-contour, finally, is
assumed to s tar t at the beginning of the syllable that immediately fol-
lows the latest s tressed syllable of the sentence. If the last syllable
of the sentence is stressed the end-contour s tar ts i n its middle.
The main steps of the procedure a r e a s follows.
SI 1) Assume initially that all syllables a r e unstressed, calculate a basic phrase contour F (t) for this condition, and calculate the locations of the syllabqe boundaries on the contour.
This step is carried out using the two assumptions tha (with gb(t !
fo(t) = eg(t), g(t) = gb(t) + gs(t) + gw(t), i. e., Fo(t) = e ) g(t) i s proportional to the physiological intensity I (t), and that, initially,
v each syllable i s given the same physiological energy Es. If the noun-
phrase constituent contains nl syllables and i f the latest s t ress of the
sentence occurs on the a ' th syllable, then the turn-over point, 2 t l '
and the time coordinate, t2, of the onset of the end-contour can be
found by solving the equations
; E = constant S
for t l and t Here Fo(t) = e gg(')
2 i s the basic phrase contour of step
SI 1) synthesized according to the principles already described.
The following steps involve a perturbation of the basic phrase con-
tour obtained from SI 1) - both with respect to frequency values and
with respect to the durations of the syllables - according to the distri-
bution of s t resses within the phrase. The main condition i s that the - total energy of the phrase remains unchanged. Thus,
SI 2) Specify the sequence in which s t resses a r e to be introduced into the phrase.
SI 3) Calculate the energies of the s t resses according to a procedure that will be described in a moment.
SI 4) Calculate the amplitudes of the phonatory s t ress pulses using the result of SI 3).
SI 5) Recalculate the basic phrase contour and resegment it into syllables .
S16) Introducephonatorystressandwordintonationpulses.
The procedure referred to in step SI 3) i s a s follows. After step
SI 1) each syllable has been given the constant energy Es. If the phrase
contains n syllables, then the total energy ET i s equal to nEs. To ob-
tain the amplitudes of the phonatory s t ress pulses and the new durations
of the syllables, we introduce the concept of s tress. A s t ress is a
& associated with a syllable at a certain stage of quantum of energy,
the calculation. The s t resses a r e introduced one at a time. F o r k 2 i, k
Ei refers to the value (at the k ' th stage) of the energy quantum intro-
duced at the i ' th stage. The subscript, zero, however, i s reserved k
for the energy of the basic phrase contour. Thus Eo is the energy of
this contour at the k ' th stage.
The procedure indicated in SI 3) may now be summarized by means
of the following formulas (a sample derivation i s given onp: 38).
(51 3:2) 332: = constant for k > 0
ktl k t1 (SI 3:4) 335 = ET - Ei
1= 1
Thus, as s t resses a r e introduced one by one into the phrase, the
energies of the basic phrase contour and of the s t resses introduced I<+ 1 ear l ier a r e successively reduced. In particular, the energy Ei
of the i' th s t ress after the k t l ' s t stage (kt l>i ) i s only a fraction
~ ( 1 - 2 - ~ ) , 0 5 c 5 1, of the value it had after the k'th stage.
Suppose the calculation is terminated after the m y t h stage. We
may now determine for any given syllable of the phrase how many
s t resses have been associated with it during the calculation as well a s
the energies of these stresses. The sum of these energies is the en-
ergy that goes into the phonatory s t ress pulse to be introduced on the
syllable in qucstion. This i s what we nced to ca r ry out steps SI 4)
and SI 5).
enter stress on S3
enter stress on S6
enter stress on S1
enter stress on S3
energy to be added to each stressed syllable
Example of Derivation of Stress Energies
Ed
with c = 49
SEQUENCE OF SYLLABLES IN THE PHRASE
STAGE S1 I S2 I S3
E:
E:.
'4 1 '5 / '6 I '7,
E l 1
b
E: E: I E l E': E E.: E:
STL-CPSR 2-3/1967
The Scandinavian accent orbit
Let us now reconsider the Scandinavian word tones in the light of
the modified concept of sentence intonation that was discussed in the
previous section. Our model now provides us with three types of
building block in t e rms of which the fo-contours of human utterances
a r e to be analyzed. These building blocks a r e a) the bask phrase
contour, b) the (positive) phonatory s t r e s s pulses, and c ) the (nega-
tive) word intonation pulses. We have also suggested that the basic
phrase contour i s related to the breathing cycle, that the s t ress pulses
reflect the insertion of a quantum of physiological energy into the epeech
production system as a whole, and that the word intonation pulses cor-
respond to more o r less tense glottal stops.
Since we may assume that the basic phrase contour i s level in the
neighborhood of the test word of utterances like [ d ~ va mb:n&n ja sa:], etc., the command configuration of Fig. 11-B-5 that was suggested a s
the basic formula for all Scandinavian word tones, need be changed
only in one respect. The "sentence intonation step" of amplitude + A
is now replaced by a phonatory s t ress pulse. This change i s equivalent
to adding a step of amplitude - A to the sentence intonation command of
Fig. 11-B-5 near the second consonant (cluster) of the syllable.
In all cases that we have analyzed ( ~ i ~ s . 11-B-4, 11-B-7, 11-B-9,
11-B-10, and 11-B-11) the off-set of the phonatory s t ress pulse would
be very close to the end-contour of the basic phrase pattern. The
negative sentence intonation step that was previously used to generate
the terminal contour may therefore be regarded as including both the
off-set of the phonatory s t ress pulse and the end command of the basic
phrase contour. The modification of the model just introduced con-
sequently does not al ter our earl ier conclusion that the Scandinavian
word tones may be simulated by means of appropriately timed word
intonation pulses of negative amplitude.
The pulse character of the phonatory s t r e s s command suggests a
very natural interpretation of the difference between the Stockholm and
the North Swedish grave accent patterns. As was noted earl ier , the
second peak of the Stockholm grave accent has a lower frequency value
than the f irst peak, whereas this relationship is reversed in many
North Swedish dialects. This difference may be obtained in the model
by putting the accent pulse slightly earl ier o r slightly la ter than the
STL-CPSR 2-3/1967 4 1.
pitch peak that would result from the phonatory s t r e s s pulse alone.
This effect i s illustrated in Fig. 11-B-13.
We have already noted that the difference between the Orsa acute
and grave accent patterns may be simulated by entering a word intona-
tion pulse close to the beginning of the second syllable of the acute
accented word while the grave pattern i s synthesized by means of the
phonatory s t ress pulse and the basic phrase contour only. It i s in-
teresting that the same statement holds for some of the (geographically
distant) Gotland dialects. According to most authorities, the Central
Dalarna and the Gotland Dialects a r e closer to Proto-Scandinavian (53)
than any of the other modern Scandinavian tongues also a s regards
phonological structure and vocabulary.
To judge from Meyer' s data some of the Dalarna dialects show a
different type of contrast between the acute and the grave patterns (cf.
e.g. dialect No. 18 of Fig. 11-B-3) than those observed in Orsa. Here
it is more probable that the acute pattern should be synthesized by
means of the phonatory s t ress pulse and the basic phrase contour only,
while the grave pattern would be obtained by also entering a word in-
tonation pulse at the beginning of the - f i rs t syllable of the accented word.
If we take the Rattvik dialect (NO. 27 of Fig. 11-B-3) a s our start-
ing point we may accordingly visualize two ways of sharpening the
accent contrast. One way i s to enter a word intonation pulse on the
falling tail of the acute contour. We may then gradually move this
pulse from the right into the f irst syllable of the word. The other
way i s to enter a word intonation pulse on the rising ramp of the
Rattvik grave pattern. We may then gradually move this pulse - from
the left toward the end of the f irst syllable of the word.
In Fig. 11-B-14 we have rearranged a part of Meyer's data so a s
to demonstrate that dialects may be found that correspond to the suc-
cessive stagcs of these two gradual processes. The Riittvik dialect
will be found in the leftmost part of the "accent orbit". Following
the orbit downward from this point on corresponds to the f irst men-
tioned of these two processes, and the second process i s represented
by the opposite direction. (In each subgraph of Fig. 11-B-14 the left
curve represents the acute accent and the right curve represents the
grave pattern. )
Fig. 11-B - 13. Model outputs for different combinations of input commands. Curve I: Phonatory s t r e s s pulse + terminative end-contour. Curve 11: Terminative end-contour only. Curve 111: Word intonation pulse + terminative end-contour. Curves IV-VIII: Phonatory s t r e s s pulse + terminative end-contour + word intonation pulse moved stepwise t o the right. Note differ- ence in height of the two pitch peaks of curves V and IV.
STL-QPSR 2-3/1967 42.
Fig. 11-B - 14 was arranged on purely phonetic principles suggested
by our model, and not all dialect pairs that a r e adjacent in the figure
a r e also contiguous geographically, In fact, the dialects do not appear
to be distributed along a single geographic orbit, but along several.
It i s too early to make definitive statements about the exact geographic
course of these orbits since a sufficiently dense sampling of the dia-
lects has not yet been carried out over the whole area,
However, Meyer himself emphasized (54) the dialectal continuity
along a line that s tar ts in Central Dalarna och moves South-East into
Uppland. Examination of Meyer' s data suggests that this line divides
itself into a Northern and a Southern branch in Uppland. Branch
points a r e also indicated in other parts of the area.
The phonetic importance of investigating these relationships in
detail should be stressed. The hypothesis of an accent orbit together
with the possibility that there a r e chains of geographically contiguous
dialects that correspond to various sectors of the accent orbit of Fig,
11-B-14 implies, for instance, that the acute and the grave accent
pulses do not move independently of each other a s the dialects develop,
but that certain temporal and other qualitative constraints a r e obeyed
so that dialects not in contacts may develop in similar ways for intrin-
sic reasons. The determination of these constraints may aid our un-
derstanding of speech communication in general.
On the origin of the Scandinavian accents
We must ask now if our intonation model can throw any light on the
historical problem as to how the Scandinavian tonal accents may have
arisen. Needless to say, the remarks that we shall make on this top-
ic a r e speculative, Moreover, historical linguists disagree about the
date and the exact phonological circumstances of the formation of the
accent distinction. Below we shall, however, accept the essentials of
the views set forth by Oftedal (55). The discussion offered here may
be viewed as an illustration of the central details of his theory in t e rms
of the model proposed in previous sections of this paper.
At the time of Scandinavian linguistic unity (about 500 B . C . ) the pitch of stressed syllables (normally the f irst syllable of the word)
would be characterized simply by a phonatory s t ress pulse. The
language distinguished between two types of syllable, namely, long
STL-GPSR 2-3/1967 43.
L' V
(CV:C or CVCC) and short (CVG). Both types could be stressed. In
the interior of the phonological phrase the tonal difference between
stressed short and stressed long syllables would probably be slight.
As illustrated in Fig. 11-B-15 (parts I and 11) the difference would be
more marked in phrase final position, however.
In drawing Fig. 11-B-15, it has been assumed that the phonatory
s t ress pulse (marked SP) has a fixed duration and amplitude, and that
i t s tar ts at the beginning of the f i rs t consonant of the stressed syllable.
Moreover, i t has been assumed that the tail of the basic phrase con-
tour (marked BPC) s tar ts on the f irst consonant of the syllable imine-
diately following the latest stressed - word of the phrase. I. e. , the
end-contour is tied to the final word boundary of the latest stressed
word.
At the stage illustrated by parts I and I1 of Fig. LI-B- 15 the second
syllable of a bisyllabic word with a short f irst syllable was probably
perceived a s more prominent than the second syllable of a word with
long f irst syllable. In the former case the pitch i s rising and in the
lat ter case it i s falling on the second syllable. It i s likely, however,
that the speakers felt the difference induration of the f irst syllable
a s being more important, since this difference would dominate in the
interior of the phrase.
After this period three developments took place in the following
order.
Syncopation. A syllable following a long stressed syllable inside a word was shortened. In particular, if the second syl- lable was long it became short, and i f i t was short i t s vowel disappeared. Later the same kind of second syllable shorten- ing took place after a s tressed short syllable. In phrase final position this caused the end-contour of the basic phrase pat- tern to move with the word boundary closer to the f irst syllable of the word.
2 ) Word boundary shift. In noun phrases the enclitic definite article, having been a f ree morpheme (a postponed definite pronoun), became part of the word, i. e., the word boundary was moved one syllable to the right. PJso, syllabic word final consonants o r consonant clusters that had formed as a result of the syncopation were amplified by means of a svarabhakti vowel. This also had the effect of delaying the word boundary and the end-contour.
3 level in^. Short stressed syllables were lengthened.
LOW FALLING
LOW RISING
FALLING/ RISING
HlGH FALLING
HlGH RlSlNG
RISING / FALLING
Fig. 11-B-16. Hypothetical pitch patterns to be compared with Eaet and South Eaet Asian tonee.
STL-QPSR 2-3/1967 45.
opposition, then a word intonation pulse may have been entered near
the beginning of the second syllable of these words as illustrated in
part V of Fig. 11-B-15. On the other hand, i f the relatively high pitch
on the second syllable of polysyllabic words (cf. Pa r t I of Fig. 11-B- 15)
was perceived a s the marked feature, then a word intonation pulse may
have been inserted at the beginning of the f i rs t syllable of these words
a s illustrated in part VI of Fig. 11-B- 15,
Different dialects would have chosen different combinations of
these possibilities. In later stages of the historical development,
perhaps in conjunction with the leveling of short syllables, the word
intonation pulses must have started to drift in the various dialects,
the acute, 'tmonosyllabic" pulse towards the beginning of the f irst syl-
lable, and the grave, ltpolysyllabic" pulse towards the end of the f irst
syllable along the lines of the Scandinavian accent orbit. (Cf. parts
VII and VIII of Fig. 11-B - 15.)
It should be noted that our theory also postulates a certain tonal
distinction a t various stages (before the leveling period) between acute
accented words with long and short initial syllable as well a s between
grave accented words with long and short initial syllable. In certain
modern dialects that preserve the syllable length distinction (Solleron,
for instance) the accent on short syllabic words i s somewhat different
f rom the ordinary grave and acute patterns. It i s possible that this
feature i s related to the processes described in Fig. 11-B-15.
Other tone languages
It may at last be worth while to consider very briefly the possibi-
lities of our intonation model in relation to tone languages other than
those of Scandinavia (56). It i s reasonable to expect that the word
tones of these languages also a r e superimposed on an underlying basic
phrase contour (57). Whether o r not the tones of all tone languages
a r e best described by means of a simple negative pulse fed to the in-
put of the word intonation f i l ter i s , of course, an open question at the
moment. Even with this restriction imposed on the model, however,
a very great number of sharply different tone patterns can be generated.
Suppose that the following constraints were valid for any given tone
language :
1) The phonatory s t ress pulse can only s tar t at the onset of the syllable.
2) Only two amplitudes a r e allowed for the phonatory s t ress pulse, zero and a certain non-zero value, A . -
The negative intonation pulse can only have the following discrete
properties :
3 ) the amplitude i s zero o r has a fixed non-zero value, - B;
4) the duration is either short o r long;
5 ) the model constant - $ is either large o r small;
6 ) the word intonation pulse can only occur near the initial consonant(s), near the middle of the vowel(s), o r near the final consonant(s) of the syllable.
A language employing all the possibilities that a r e open under these
constraints would have 26 different tones (when B = 0 the distinctions
under 4) , 5 ) , and 6 ) above become irrelevant). Six of these a r e show
in Fig. 11-B- 16. They seem approximately to fit the description of
some of the more common tones of the East and South-East Asian lan-
guage area ( 5 8 )
It i s interesting in this connection to note certain systematic rela-
tionships between the Hakka, Foochow, and Pekingese dialects of
modern Chinese. In a group of historically related words, the final
consonant i s a voiceless stop in Hakka, a glottal stop in Foochow, and
zero in Pekingese (59). However, these words have a low falling tone
in the Pekingese dialect, and specialists believe that this tone has
developed from an earl ier glottal stop which in turn developed from a
(glottalized) final stop consonant. A process like this which seems to
be partly inverse to the development of the Danish glottal stop (stbd)
from a word tone, i s nevertheless entirely consistent with the intona-
tion model proposed in this paper.
SUMMARY
In the present paper a quantitative model of larynx control during
speech production has been described. The input commands a r e con-
figurations of simple step functions fed to the model over two channels,
the sentence intonation filter and the word intonation filter.
In order to find further constraints to impose on the model for
purposes of empirical adequacy, the Scandinavian grave/acute accent
opposition was analyzed by fitting curves generated with the model to
empirically measured f -contours. 0
It was found that the salient features of these intonation patterns
in simple utterances of a number of dialects can be simulated by means
of a single positive step a s input to the sentence intonation filter and
an appropriately timed negative pulse as input to the word intonation
filter (cf. Fig. 11-B-5). It was proposed, tentatively, that this analysis
i s valid for all Scandinavian dialects.
We next turned to the question a s to how the model elements should
be interpreted in physiological terms. Regarding the word intonation
channel the hyp:>thesis was proposed that the Scandinavian tonal accents
a r e a sort of laryngeal consonants, not unlike glottal stops, that a r e
coarticulated with the sentence intonation a s well as with the "segmen-
tal" gestures of s tressed syllables. The sentence intonation commands,
on the other hand, turned out to be decomposable into a basic phrase
contour and a sequence of phonatory s t r e s s pulses. It was suggested
that these constructs reflect an underlying process termed physio-
logical intensity and that s t ress should be understood a s the addition
of a quantum of physiological energy to the speech production system
as a whole. This energy is distributed (possibly unevenly) over the
pulmonary, phonatory, and articulatory channels. In the phonatory
channel the s t ress energy manifests itself as a phonatory s t ress pulse
at the input of the sentence intonation filter.
In this connection, possible energy conserving principles regarding
the phonological phrase as a whole were considered. Also, a synthesis
strategy for sentence intonation was discussed. This strategy has ce r -
tain properties in c o m m ~ n with the transformational cycle of C homsky
and Halle' s theory of phonology.
Having developed these concepts we returned to the Scandinavian
accents. h examination of Meyer' s data indicated that a ser ics of
dialects may be found that display certain systematic relationshi ps
with respect to the relative locations of the hypothetical acute and
grave accent command pulses. When adjacent members in this ser ies
a r e compared, the accent pulses appear to be cotranslated a small
step either to the left o r to the right. This relationship was termed
the Scandinavian accent orbit.
STL-QPSR 2-3/1967 48.
Consideration of the accent orbit together with known facts of
Scandinavian linguistic history suggested an hypothesis about the ori - gin of the word tone distinction. Briefly, this distinction may have
arisen i n phrase final position a s a result of the successive move-
ments of the terminative end-contour.
Finally, a few remarks were made about the generative power of
the model with respect to tones observed in non-Scandinavian languages.
The intonation model summarized in the present paper makes i t
possible to collect systematic quantitative information on the tonal a s
well a s other prosodic events. Work involving close comparison of
the model with empirical data i s in progress:
ACKNOWLEDGMENTS
I have profited from valuable discussions with G. Fant, B.
Lindblom, J. Lindqvist, E. and L. GBrding, K-H. Dahlstedt,
R. Leanderson, G. Malmqvist, and A. Ellegdrd to all of whom I
want to express my sincere thanks. I am alone responsible for
the ideas and hypotheses presented in this paper, however. I a m
also grateful to Mrs. S. Felicetti for her expert editorial assistance.
FOOTNOTES
(1 a ) Z)hman, S. : "On the Coordination of Articulatory and Phonatory Activity in the Production of Swedish Tonal Accents", STL- GPSR 2/1965, pp. 14-19.
(1 b) ahman, S. and Lindqvist, J. : "Analysis-by-Synthesis of Prosodic Pitch Contours", STL-OPSR 4/1965, pp. 1-6; to appear in Proc. of Seminar on Speech Production and Perception, Z. f. Phonetik usw., Berlin, DDR.
(2) I am grateful to Dr. Philip Lieberman who in many interesting discussions during the academic year 1963-64 drew my atten- tion to the problems of intonation. Cf. Lieberman, P. : Intonation, Perception, and Language (Cambridge, Mass. 1967).
STL-QPSR 2-3/1967
footnotes
( 3 ) The phonetics of the word tones has been descr ibed by Meyer, E, A. : Die Intonation i m Schwedischen, Tei l I (Stock- holm 1937) and Die Intonation i m Schwedischen, Teil I1 (Stockholm 19547; ~ a l m b e r ~ , B. : .sydsvensk ordaccent (Lund 1953); Hadding-Koch, K. : Acoustico-Phonetic Studies i n the Intonation of Southern Swedish ( thesis , Lund 196 1); E le r t , C -C. : Phonologic Studies of Quantity i n Swedish (thesis, ~ t o c k h g l m 1964).
: ordaccent (Lund 1953); - . . - - . s i n the Intonation
of southern Swedish ( thesis , Lund 196 1); E le r t , C-C. : P h o n o l o l h
j4). ~ l o s t e r Jensen, M. : ~ o n e m i c i t ~ , Nsrwegian Universit ies P r e s s (Oslo 1961).
(4) The model described he re is a slightly revised version of the one presented i n fn (1 b). A second revision will be proposed a s a resul t of discussions to follow l a t e r i n this paper.
The pitch/pressure-dependency has been dealt with i n the reference of fn (1 b) a s well a s by Ladcfoged, P. : "Physiologicel Studies of Speech", STL-GPSR 3/1961, pp. 16-21; Lieberman, P. , loc. cit. Ventsov, A. V. : "The Relationship Between the Pi tch Period and the Intra-Aural P r e s s u r c t t , paper 24-3, p. 345 i n Digest of the 7th ICMBE. Stockholm 1967.
(6) This factor has been unsufficiently studied. Cf. fn (1 b) and Faaborg- Andersen, K. and Sonninen, A. : "The Function of the Extr insic Laryngeal Musclcs a t Di fferent Pitch", Acta oto- Laryng. 2 (1960), pp. 89-93.
(7 ) A fuller t reatment will be given i n a forthcoming publication.
The measurements of Ladefoged, ohman and Lindqvist, and Vcntsov (cf. fn (5)) indicate that the fundamental frequency may vary by 0.16 ~ z / c m ~ 0 if only the pressure-drop a c r o s s the
2 glottis is changed and everything e lse is constant. Lieberman gives a somewhat higher value.
According to o u r syllables of normal s u r e increases by at
own measurements , during s t r e s sed conversational speech, the subglottal pres- most a few c m H,O and the increase in
pitch due to this ;actor would therefor6 be negligible in com- parison with the pitch movements caused by larynx muscle ad- justments. The effect of p res su re fluctuations may be con- si dcrable in voiced obstruents, semi-aspirat ives , and during thc terminal phase of the sentence intonation, however.
The tonal configurations studied in the present paper have been embedded in a sentence f r ame in a position where the subglottal p res su re may be assumed to be essentially constant except for minor fluctuations due to s t r e s s . In a more com- plete t reatment a p res su re dependent correct ion factor must be introduced into the model calculations in the f o r m of an "acoustic interaction signal".
(9) Ruch, T.C., Patton, H.D., Woodbury, J. W., and Towe, A. L,: NeurophySiology (1 962), pp. 103- 105.
STL-QPSR 2-3/1967 50.
footnotes
Lindblom, B. : "Studies of Labial Articulation", STL-CPSR 4/1965, pp. 7-9. ohman, S. , Pe r s son , A , , and Leanderson, R. : "Speech Production a t the Neuro-Motor Level", forthcoming ar t ic le i n J. of the Acoustical Society of America.
It appears that the best resul ts a r e obtained with n 2 2 fo r most speakers . All examples of the present paper have been cal- culated with n = l , however.
The PDP-7 computer belongs to the Department of Automatic Control, Royal Institute of Technology (KTH), Stockholm. Prof. L. von Hbmos' kind cooperation i s gratefully acknow- ledg ed . ahman, S. : "Computer P r o g r a m fo r Pi tch Measurements", STL-GPSR 1/1966, p. 11.
I wish to express my grati tude to J. Liljencrants whose co- operation i n the programming phase of this work simplified my efforts substantially.
The uniqueness question for solutions obtained by means of the automatic i terat ive procedure will be dealt with in a l a t e r publication.
ahman, S. : "Generative Rules fo r the Phonology and Prosody of the Swedish Verb", (Generativa reg ler fo r det svenska verbets fonologi och prhsodi), in Swedish, Forhandlingar vid Sammankomst fo r att d ryf ta Friigor Rarande Svenskans Be- skrivning I11 (Goteborg 1966), pp. 7 1-87.
Kock, A. : Sprzkhistoriska undersokningar om svensk accent, P a r t I (Lund 1878) and P a r t I1 (Lund 1884).
\
Hesselman, B. : Huvudlinjer i nordisk spriikhistoria ( ~ p p s a l a 1948 - 1953). bftedal, M. : "On the Origin of the Scandinavian Tone Distinc- tion", Norsk Tids skrift fo r Sprogvidenskap, - 16 (1 952), pp. 201-225.
The Danish glottal stop has been described f rom various points -
of view by Smith, S. : ~tjddet i dansk rigssprog (Copenhagen 1944). Martinet, A. : La phonologie du mot en Danois . Hansen, A. : "Stjddet i danslr", Det Kpl. Da. Vidensk.Selsk. Hist. -Fil. Meddelelser, XXIX:5 (Copenhagen 1943).
Malmberg, B. : op. cit. Hadding-Koch, K. : op. cit. gives spectrographic i l lustrations.
Meyer, E. A. : op. cit . , Teil 11.
(21) This analysis, which i s an account of my own impressions, purposely d is regards the tonal contours of the word accents.
(22) The end-contour can of course be different in different dialects and in different sentence types. Any such contour can be syn- thesized by means of an appropriately chosen s tep configuration fed into the sentence intonation f i l ter . In o r d e r to purify the effects of the word intonation we have t r ied to choose f r ames with maximally simple sentence intonation, however. It s o happens that a negative s tep introduced a t the end of the pzn- ultimate syllable of the f r ame refer red to in the text suffices to match the Stockholin data satisfactorily. Questions regard- ing the end-contours will be discussed in more detail on p. 30 et sqq.
STL-QPSR 2-3/1967 51.
footnotes
The sentence intonation source of the model i s assumed here to generate s teps only. As a resul t of the discussion of the physiological meaning of sentence intonation on p. 30 et sqq., we will l a t e r replace this assumption by one stating that the sentence intonation source generates positive pulses only. These pulses will a lso be assumed to s t a r t a t the beginning of the s t r e s sed syllable.
The acute accent will be denoted by " '", and the grave accent by 1'"1+
A sha rp and brief dip in the measured f -contour usually oc- c u r s during the [v] of [ d t v a . . . ] of theoframe. This dip is probably caused by the increased in t raora l p res su re during th is consonant (fn (1 b)). During the [s] of [ . . . ja sa:] fo is of cour se undefined.
Meyer, E. A. : op. cit . , s ta tes that the pitch drop at the be- ginning of acute accented syllables of the Stockholm dialect represents a n influence f r o m Southern Swedish.
E. Haugen, in personal communication, has suggested to me that the Eas t Norwegian acute accent should be identified with the sentence intonation and that the grave accent i s a delayed version of the sentence intonation. F o r the Swedish dialects that I have had experience with s o f a r , i t s eems be t te r t o postulate that the sentence intonation s tep is fixed a t the be- ginning of the syllable and that the timing (and other parameters ) of the word intonation pulse is responsible fo r the tonal con- t r a s t . Cf. Haugen, E. and Joos, M. : "Tone and Intonation in Eas t Norwegian", Acta Philologica Scandinavica - 22 (1 952), pp. 41 -64.
The role of this a s well a s of other types of juncture has been extensively studied by GBrding, E.: "Internal Juncture i n Swedish", ( thesis , Lund 1967).
In synthesis experiments B. Malmberg has noted that the im- pression of a grave accent o r an acute accent can be obtained by moving a f -peak within the span of the f i r s t syllable of the accented wor8. Malmberg, B. : "Observations on the Swedish Word Accent", mimeographed report f r o m Haskins Laborator ies , New York and Lund 1955.
ahman, S . : "On the Coordination of Articulatory and Phonatory Activity i n the Production of Swedish Tonal Accents", STL- QPSR 2/1965, pp. 14-19.
The word intonation pulse a t the end of the second syllable of the two Danish t e s t words represents a c lear ly audible juncture at the beginning of [ j a sas:]. This juncture is s imi lar to the one observed in the Malmo utterances of Fig. 11-B-7. Cf. a lso fn (28).
(32) Dahlstedt, K-H. : Det svenska VilhelminamHlet, 2 ( ~ p p s a l a 1962). This book gives a detailed phonological discussion of a grea t number of North Scandinavian dialects.
STL-QPSR 2-3/1967
foot note a
(33) Swedes imitating Danish tend to exaggerate the glottal stops, however. This may be because, a s was noted above, tense glottal stops occur in Swedish before stressed syllables be- ginning in a vowel. The accents a r e lax in comparison with these glottal stops.
(34) Sonesson, B. : "The Mechanisms of the Human Vocal Folds", forthcoming article in American Lecture Series in Anatomy.
The conception of larynx function briefly summarized he r e was explained to me by B. Sonesson. I am grateful to him for many illuminating discussions on this topic.
(35) The experiment was carried out at the Central Neurophysio- logical Laboratory, Karolinska Sjukhuset, Stockholm, by Drs. A. M3rtensson, R. Leanderson, and A. Persson. I am grate- ful to them for their willingness to cooperate. A more com- plete description of the methods and procedures used, will be given by Leanderson in a forthcoming publication.
(36) Hadding-Koch, K. : op. cit. Segerbsck, B. : "La RCalisation d' une Opposition de TonErnes dans des Dissyllabes Chuchot6sff (Lund 1966). Meyer-Eppler, W. : "Realization of Prosodic Features in Whispered Speech", J. Acoust. Soc. Am. - 29 (1 957), pp. 104- 106.
(37) Fant, G. : Acoustic Theory of Speech Production (' s-Graven- hage 1960), p, 272 et sqq.
(38) This notation, which has been adopted by Svenska Akademiens Ordbok, has been discussed by C-C. Elert (see fn (3)).
(39) Elert , C-C. : op. ci-t. , p. 139, states that "the difference in duration between words with (acute) accent I and (grave) accent 11 i s analogous to the differences in intensity and the funda- mental pitch in the two types of word. The rapid decrease in over-all intensity and the fall in the fundamental pitch in the stressed syllable of (grave) accent 11 words a r e accompanied by a shorter duration of the vowels in that syllable".
It may be added that the final syllable of grave words i s somewhat longer than that of acute accented words. It i s a s i f the speaker waits for the pitch to return to the sentence in- tonation level and therefore prolongs the syllable in which this return occurs ( f i rs t syllable of acute and second syllable of grave words). The relative lengths of corresponding syl- lables of acute and grave words may of course also contribute to the perceived differences in prominence a s well a s to the perceived difference in accent in whispered speech.
It i s interesting in this connection to note that North Swedish dialects with monosyllabic grave accented words (circumflex) display a duration relationship that i s opposite to that of the Stockholm dialect. I. e. , thc syllable with circumflex i s longer than that with the acute accent. ( ~ a h l s t e d t , K-H. : op. cit., p. 156), yet the native speakers do not "feel" a phono- logical length contrast. These facts a r e consistent with the idea that the syllable i s lengthened because the speaker waits for the pitch to return to the sentence intonation level.
STL-QPSR 2-3/1967
footnotes
(40) The over-all intonation contour of the phonological phrase i s discussed in great detail by P. Lieberman, op. cit. Although certain aspects of this underlying contour probably a r e univer- sa l it is not unlikely that the details of i t s shape could vary from language to language (and even from speaker to speaker) being constant for any given language (or speaker).
(41) These laryngeal commands should not be confused with the pul- monary s t ress pulses that may be observed in the intercostal muscle activity (cf. fn (44)). ,
(42) Lieberman; P. : op. cit., and von Euler, C. : "Proprioceptive Control in Respiration", i n Nobel Symposium I, Muscular Afferents and Motor Control, ed. by R. Granit (Stockholm 1966), pp, 197-207.
(43) Lindblom, B. : Wenner-Gren Foundation Report Studies of Human Speech also "Some Temporal Correlates of Stress Contours", to be published.
(44) Draper, M. H. , Ladefoged, P. , and Whitteridge, D. : "Respira- tory Muscles in Speecht1, J. of Speech and Hearing Res. - 2 (1959), pp. 16-27.
(45) Lindblom, B.: "On Vowel Reduction", STL, KTH, Report No. 29 (stockholm 1963).
(46) Fry, D. : "Duration and Intensiby aa Physical Carrelates of Linguistic Stress", .J.Acoust.Soc.Arn. - 27 (19551, pp. 765-768.
(47) In other words, the two syllables immediately following the ~ n a e t ;f thc ph,>nzWry o t r e s ~ pulse m a y be given diffcrzr-t relative prc>minance imply by adjusting their durations,
(48) Chomsky, N. and Halle, M. : Sound Pattern of English (forth- coming). Prof. Halle has been kind enough to let me see parts of the -manuscript of this book before the publication.
2 1 (49) In Swedish, the difference between [var:m kor:vlNR(~djective
t Noun), and [vbr:m + k8r:vl (Compound ~ o u n ) , w e r e 1 denotes primary s t ress , and 2 s e c o n d r y s t ress , would be due to the circumstance that the last primary s t ress introduced in the transformational cycle, was put on [kor:v] in the f irst case, and on [var:m) in the second case.
(50) Future research will show whether ET depends on the number of syllables only o r on the number of syllables plus the number - of primary s t resses introduced at the beginning of the trans- formational cycle.
(51) Chomsky, N, and Halle, M. : op. cit.
It is in my opinion quite premature to conclude that it is im- p
possible to expect a complete correspondence between the records of modern phonetics and the elements and processes postulated in a systernat ic linguistic theory. In fact, some of the most recent developments in phonetics indicate that cor- relations of this sort may be successfully established.
As I see it , i t i s not only possible but necessary to continue work along these lines. F i r s t the introspective skills of the auditory phonetician must be translated into objective physical
STL-QPSR 2-3/1967
cont, fn (52)
measurement techniques. Then i t may be possible to disambi- guate and sharpen these skills beyond the limits of subjective intuition. In this way we may succeed in establishing a sci- entific instrument by means of which phonological theories can be put to objective test. Naturally, in the initial stages of this work our phonetic experiments must be guided by phonological theory. As phonetic theory develops, however, i t should be increasingly feasible to substitute objective phonetic measure- ment for impressionistic methods wherever the lat ter a r e in- determinate.
(53) The chronology of the Scandinavian languages has been dis - cussed by E. Haugen in Language 1949, p. 307.
(54) Meyer, E.A. : op.cit., Teil I, p. 232 et sqq.
(55) Ref. in fn (17).
(56) Pike, K. : "Tone Languages", Univ. of Michigan Publications, Linguistics - 4 (1 948).
(57) Chang, N. C. T. : "Tones and Intonation in the Chengtu Dialectft, Phonetica 2 (1958), pp. 59-85. Abram;;~ , x. : "The Vowels and Tones of Standard Thai", (thesis, Columbia Univ., New York 1960).
(58) For res t , R.A.D.: The Chinese Language (Faber and ~ a b e r ) .
(59) For res t , R. A. D. : op. cit.