Word and sentence intonation: A quantitative model · Word and sentence intonation: A quantitative model ... posed to be built up from step functions with different amplitudes and

Dept. for Speech, Music and Hearing

Quarterly Progress andStatus Report

Word and sentenceintonation: A quantitative

modelOhman, S.

journal: STL-QPSRvolume: 8number: 2-3year: 1967pages: 020-054

http://www.speech.kth.se/qpsr

http://www.speech.kth.se

http://www.speech.kth.se/qpsr

B. WORD AND SENTENCE INTONA'~ION: A QUANTITATIVE MODEL

In previous papers ("we have suggested a functional model of pitch

control i n speech production ('I. The present report summarizes the

resul ts of some fur ther attempts to explore a s well a s to constrain the

model by comparing i t with empir ical data. We shall concentrate he re

on the patterns of control that must be postulated to account fo r the

Scandinavian word tones and that present cer ta in neuro-motor implica-

tions (3). We will then briefly consider the model in relation to tone

languages i n general.

The discussions that we shal l offer below a r e not meant a s a defin-

itive t r ea t i se of word and sentence intonation. The hypotheses to be

proposed will probably have to be modified i n the light of data that have

not been available to the author and/or that a r e yet t o be discovered.

It is hoped that it will be possible to collect some of these data through

experimental use of the tentative model outlined below. The purpose

of this paper is thus to stimulate discussion and research.

Model

The main features of the model (4) a r e summarized i n Fig. 11-B-1.

The fundamental frequency signal fo( t ) is synthesized by a mechanism

(labeled "larynx model") which accepts three types of input: 1) the

t ime varying "vocal cord tension" g(t), which i s the s u m of two com-

ponents, g (t) and gw(t), where g s(t) represents the sentence intonation 5

and g (t) the word intonation, 2) a n acoustic interaction signal ar is ing W

f rom the secondary effects on fo caused by fluctuations i n sub- and

supraglottal p res su re (5), and 3) an articulatory interaction signal

deriving f r o m the non-phonatory movements of the hyo- thyroid lever

system. These movements a r e due to cer tain ar t iculatory ges tures of ( 6 ) the tongue .

The two signals g (t) and gw(t) a r e the outputs of two f i l ters that s may have different properties. These f i l te rs a r e assumed to represent

the dynamic charac ter i s t ics of the mechanical and pe ripheral-neural

components of the laryngeal system.

The inputs to the sentence and the word intonation f i l ters a r e sup-

posed to be built up f r o m s tep functions with different amplitudes and

onset t imes. These s teps represent the discre te higher-level neural

FUNCTIONAL MODEL OF LARYNX CONTROL

SENTENCE INTONATION INPUTS

ARTICULATORY INTERACTION

SIGNAL

Fig. 11-B-1.

INTONATION

FILTER g,(t)

LARYNX w - fJt)

g ( t MODEL

gw(t)

ACOUSTIC INTERACTION WORD

INTONATION SIGNAL INPUTS

- WORD

INTONATION

FILTER

STL-QPSR 2-3/1967 21.

events that correspond to the linguistic intonation elements (levels, tones,

etc. ). In particular, i t i s assumed that only a finite "library" of s tep

amplitude and timing configurations may be drawn upon i n the construc-

tion of the f -contour of an utterance. Empir ical investigation involving 0

systematic matching of the model to data must decide what these con-

figurations a r e like.

It is necessary to formulate exactly the propert ies that should be in-

corporated into the boxes labeled "larynx model", "word intonation filter",

and "sentence intonation fi l ter" i n Fig. 11-B- 1. The part icular choices

that have been made fo r the purposes of the present study a r e indicated

i n Fig. 11-B-2. We will not discuss in detail the physiological facts that

motivate these choices here('ll. The following points should be men-

tioned however.

1) The relationship fo=eg is suggested by the fact that the f -contours of the same utterance spoken a t different over-al l pitch yevela (pitch reg is te rs ) give essentially paral le l curves when plotted on a logarithmic frequency scale.

2 The effects of the articulatory and acoustic interactions a r e dis- regarded since we will avoid utterances containing voiccles s and/or strongly obs sounds in the important par t s of the intonation contours

3) The f o r m of the impulse responses of the word and intonation fil- t e r s is assumed to be tne-Yt in analogy with the general s h the tension developed by s t r ia ted muscle i n a single twitch w. of It i s to be expected that the mechanical interconnections in the larynx and the charac ter i s t ics of the neuro-motor circui ts do not cause the form of the response function to differ appreciably f rom fioy experience f rom that of muscle. This hypothesis i s suggeste movement analysis of l ip activity in speech

We repeat that the word and sentence intonation f i l ters , which a r e

jointly represented by g'(n, y; t ) in Fig. 11-B-2, may have different im-

pulse responses. This i s to say that the constants 1 and - n must b e de ter -

mined separately fo r the sentence and word components through compar-

ison with actual data(''). The bottom par t of Fig. 11-B-2 shows impulse

responses fo r n=l and n=2. - -

Computer program

The measurements and calculations of the present study have been

made by means of a Control Data 1700 as well a s a PDP-7 ( I 2 ) digital corn-

puter. The program used for measuring the f -contours of human utter-

ances has been described in STL-GPSR 1 / 1 9 6 6 ~ l ~ ) . A simplified version

has also been written by Tjernlund for the CDC-1700. The numerical

simulation of the model of Fig. 11-B-2 was implemented by ahman on the

PDP-7 for visual analysis purposes. Another program described in the

present i ssue of QPSR ( ~ e c . IV.B) has been written by Liljencrants (14)

NEURO-MOTOR COMMAND

FUNCTIONAL MODEL OF LARYNGEAL CONTROL IN INTONATION

VOCAL CORD "TENSION"

FUNDAMENTAL FREQUENCY

Fig. 11-B-2. Detailed specification of one of the channels shown in Fig. 11-B- I .

- -- -v

1 -t time

- 1- t ime

< g(t )=~Ju- , ( r t i )g ' (n.8;x)dx fo(t ) A u-,(t- t, ) 0

,g(t) m

t

-- g'(n.x;t)

acute accent is realized by a rising pitch during, and a glottal stop

at the end of, the s t r e s sed vowel, and the grave accent consis ts of a

pitch contour which is level during most of the f i r s t syllable and then

r i s e s abruptly of the end of the syllable (I8). In Stockholm, on the

other hand, the pitch i s rising i n acute accented vowels and falling i n

grave accented vowels, while the situation is almost reversed i n

Southern Swedish Skdne (I9). In many of the dialects of Dalarna

(Dalecarlia), finally, the pitch pat tern is rising during the f i r s t vowel

of both the acute and the grave accents, the difference being that the

pitch drop s t a r t s ea r l i e r i n the acute than i n the grave accented words

(c f . , fo r instance, dialects No. 23 through No. 29 of Fig. 11-B-3).

Fig. 11-B-3 gives a qualitative summary of the pitch patterns of Scan-

dinavian one-word utterances, a s established by Meyer (20).

In view of the phonological unity of the accent phenomenon it may

be suspected that t he re i s some common physiological bas is behind i t s

d iverse phonetic real izat ions. If such a bas is can be found, a study of

i t s propert ies may suggest universal constraints on the intonation model.

Stockholm sentence intonation

The speech mater ia l of the present study consis ts mainly of non-

compound Swedish accented words embedded in one of the two f r ames

[ se ja igen], "to say again" o r [de va ja sa:],

"it was (that) I said". In a n impressionis t ic description of the

sentence intonation of these f r ames as spoken in the Stockholm dialect

the pitch would be medium high on the f i r s t two syllables ([ seja] o r

[de va]), high f r o m the third syllable on, and low on the las t and per -

haps a l so on the second las t syllable. Using a subjective scale with

five levels (0-4) and assigning one level t o each syllable of the f r ame

we would wri te [334 . . . 401, and sometimes [ 334 . . . 201 o r evcn

[ 334 . . . 001 depending on the part icular mode of pronunciation of the

speaker (21)

These subjective data indicate that the sentence intonation par t of

the model should be adjusted in the following manner. The constant Fo

is chosen s o as to match the pitch of the f i r s t two syllables (subjective

level 3). Somewhere i n the neighborhood of the f i r s t syllable of the

accented word a positive s tep is introduced into the sentence intonation

f i l ter . A second s tep of negative amplitude, representing the end-

contour, is then added in the neighborhood of the p ~ n u l t i m a t e syllable

STOCKHOLM

Stockholm

Stodihdm

Swckhdm

Ab.nI ( *Ipnt 1

DALARNA

-4 Kind

Linkbping l I xEEl3 I I I L

NORWEGEN I I d I I

Fbda

GASTIIIKLAND

MEDELPAD

Sundnoll

NGEUIANLAND

lunrl.

1 DALARNA

DALSLAND -4 Fig. 11-B-3. Schematic acute and grave accent patterns of a hundred Scandinavian

dialects according to E. A. Meyer: Die Intonation im Schwedischen, part 11.

STL-OPSR 2-3/1967

of the utterance (22). To go beyond this point, however, it is neces-

s a r y to introduce the following TIMING ASSUMPTION: "The positive

sentence intonation s tep s t a r t s at the beginning of the f i r s t syllable of

the accented word (23) , I

Model comma.nd s for Stockholm word intonation

The top par t of Fig. 11-B-4 shows a typical acute accent pattern

(Stockholm dialect) corresponding to the utterance [de va m6:nen ja

W: -'J (24). Here the beginning of the f i r s t syllable of the accented word

[mci:nen-'J coincides with the beginning of the closure for [ m ] a s seen

f rom the acoustic record. The superimposed smooth line i n the top left

par t of the figure (curve marked M) shows the resul t of introducing two

sentence intonation s teps (one positive and one negative) i n accordance

with the considerations just discussed. The composite s tep configura-

tion (marked I) is shown below the fo-contours.

It will be seen f r o m the figure that the model contour fi ts the

measured contour ra ther badly i n the segment immediately to the right

of the positive s tep (25). The e r r o r curve (marked E) which represents

what is left i n the measured contour af ter the model contour has been

subtracted f r o m i t , displays a negative dip near the beginning of the

f i r s t syllable of [ m6:nenl. This dip cannot be removed without adding

fur ther steps into the model curve. It i s a typical feature of the Stock-

holm acute accent pattern a s indicated by an analysis of about a hundred

contours, and i t i s often much more pronounced than the example of (26) Fig. 11-B-4 might suggest .

The situation is quite s imi lar a s regards the grave accent contours.

This fact i s demonstrated i n the lower left par t of Fig. 11-B-4. Here

a positive and a negative sentence intonation s tep have been introduced

in the same manner a s was done with the acute accent pattern. The

e r r o r curve again displays a negative dip which occurs somewhat l a t e r

in this c a s e than in the previous case. Obviously, the mismatch be-

tween model and data cannot be removed without entering fur ther s teps

into the model contour.

It was noted above that the impressionis t ic sentence intonation for

both [de va m6:nen ja sa:] and [de va mb:nen ja sa:] would be recorded

a s /334410/ ( o r [3344xO] 0 5 x 5 4). Since the sentence intonation ap-

pears to be level in the accented syllables i t may therefore be proposed

ACUTE ACCENT: STOCKHOLM

Hz l d r v a 1 m 1 6: 1 n s n 1 J a s a: 1

GRAVE ACCENT: STOCKHOLM

Fig. 11-B-4. Comparison of Stockholm accent patterns with curves calculated by means of intonation model. The pulses marked I, IS, and IW represent model output8 with the same input commands that were used to match the empirical data but with the model constants a and @ both set to 1000.

STL-CPSR 2-3/1967

that the two residual e r r o r contours of Fig. 11-B-4 discussed above

should be ascr ibed to the word intonation commands f o r the acute and

the grave accents. The right par t of Fig. 11-B-4 shows the resul t ob-

tained by entering a negative pulse (marked IW) of appropriate dura-

tion, amplitude and onset t ime into the word intonation f i l t e r simulta-

neously with the s t ep configuration fo r the sentence intonation ar r ived

a t in the previous analysis (curve marked IS). The e r r o r has now

decreased to 3 o r 4 percent.

Pulse theory of Scandinavian accents

The idea strongly suggests itself that the Stockholm tonal accents

should be represented by a suitably tailored negative pulse fed to the

word intonation f i l ter a t a cer ta in moment relative to the sentence in-

tonation step. This moment is ear ly f o r the acute accent and la te fo r

the grave one (27). This scheme is summarized in Fig. 11-8-5 the

symbols of which should be self- explanatory.

The model r z s p ,n,=. t . inputs of the f o r m of Fig. 11-B-5 under

systematic variation of the relevant parameters is shown i n Fig.

11-B-6. The labels indicate the parameters that were varied i n each

c a s e shown. In the curves labelled - A andcy - the value of g w a s ze ro

(see Fig. 11-B-5 and p, 22 for meanings of parameter names). In the

curves labeled B, B , and D- the value of A was zero. The curves - - - - labeled 5 , finally, represent combinations of a typical sentence intona-

tion s tep with a typical word intonation pulse, the relative timing of

these two events being changed systematically. Fig. 11-B-6 should

give an idea of the possibilities of the scheme summarized i n Fig. 11-B-5.

A qualitative comparison of the patterns that can be s y n t h e s i z ~ d

using the scheme of Fig. 11-B -5 with descriptions of the accentual pitch

contours of a wide variety of Scandinavian dialects (cf. Fig. 11-B-3)

suggests that i s may be possible to put a l l these dialectal manifesta-

tions on the same formula.

Model commands for Malmo word intonation

Fig. 11-B-7 gives an example f rom Southern Swedish ( ~ a l m o ) a r -

ranged in the same manner as that of Fig. 11-B-4. The utterances of

this example a r e [ se ja m6:nen ijen] and [: s ~ j a m2):nen ijen]. Note that

the end-contour of the sentence intonation differs f r o m that of the

INPUT COMMANDS FOR SWEDISH ACCENTS

SENTENCE INTONATION STEP

+ - time

Fig. 11-B-5.

+ WORD INTONATION PULSE

4

B

.

ACCENT MODEL: EFFECTS OF SIX PARAMETERS

Fig. 11-B-6 . Model responses to input commands of Fig. 11-B-5. Explanation in text.

MALMO: ACUTE ACCENT

Fig. 11-B-7

MALMO: GRAVE ACCENT

Hz 50-

- 50

-100

-SOL ;1 .5 1.0 sec 0 5 - 5 9 10 sec.

Comparison of acute and grave accent patterns of the Malm6 dialect with mode: generated curves (cf. text to Fig. 11-B-4). The IS step starting at [ I j ~ n : ] i s due to the fact that the lat ter word was stressed. The glottal stop at the beginning of [ I j ~ n : ] , marked by the let ter G both i n the data curves and in the IW curves, i s discueeed in the text (p. 25).

5 1.0 sec 0 5 10 s,c -lW

O ; T 7 - E

-

- 5 0 Hz

0

-50

Stockholm speaker , Also, a glottal stop occurs a t the beginning of

[ i j ~ n ] i n both utterances. The la t te r s eems to be a general feature of

Swedish phonology: a syllable beginning i n a vowel may be preceded (28) by a glottal stop .

Evidently, the word intonation pulses for the Southern acute and

grave accents differ f r o m those of the Stockholm speaker i n that the

grave pulse s t a r t s ea r l i e r than the acute one. I. e., the temporal o r -

d e r of the Malmo acute and grave pulses i s the r eve r se of that of the

corresponding Stockholm pulses. Moreoever, the Malmo acute has

longer duration, D, and a s teeper t ime course (grea ter value fo r 8 ) than

the Stockholm acute does.

These differences probably account for the fact that the subjective

impression of the directions of the accentual pitch movements a r e op-

posite in the two dialects (cf. Fig. 11-B-8). In the Stockholm acute the

pitch drop due to the negative word intonation pulse is completed during

the initial consonant of the s t ressed syllable and the re turn of the pitch

up to the sentence intonation level occurs during the vowel. This is

heard a s "rising pitch".

In the Malmo acute, on the other hand, the word intonation pulse

s t a r t s only a f te r the initial consonant. Thus the pitch will begin with

an upward movement during the initial consonant and the very f i r s t

portion of the vowel in response to the sentence intonation step. The

negative word intonation pulse then makes the pitch t u r n in the down-

ward direction during the l a t e r par t of the vowel. Apparently, this con-

figuration is perceived a s "falling pitch".

Conversely, the onset of the Stockholm grave pulse is la te enough

to permit the pitch to r i s e in response to the sentence intonation s tep

during the initial consonant and then to drop throughout most of the vow-

e l s o that the impression of a "falling pitch" results. At the end of the

f i r s t vowel the pitch then turns back up towards the sentence intonation

level. The relatively high pitch on the second syllable is correlated

with an impression of "tertiary s t r d s c " ,

The Malmo grave, on the other hand, holds back the pitch r i s e due

to the positive intonation s tep during the initial consonant and par t of the

initial vowel segment s o that a "rising pitch" is heard. In fact, the

Malmo grave accent of many speakers sounds very s imi lar to the acute

accent of many Stockholm speakers. These relationships a r e i l lustra-

ted schematically in Fig. 11-B-8.

A C U T E GRAVE

C v C

ACUTE GRAVE

Fig. II-B-8. Input commands suitable for Stockholm and Malmb accents. The pitch contour has been drawn with thick lines in the vowel segments.

STL-QPSR

Perception of tones

It appears from this discussion that the perception of pitch move-

ment in the Swedish accents may be based on a subjective measurement

of fundamental frequency during the vowel segments only ( 2 9 ) . This

assumption appears reasonable also in view of the possibility of having

accented words with all consonants voiceless. In a word such a s

c stik:at], about 700 msec long, the two voiced segments [ i] and [a ]

may be about 80-150 msec each o r even less. Yet the accent i s imme-

diately perceivable ! (30)

Model commands for the Danish word intonation

The glottal stop typical of the Standard Danish counterpart of the

Stockholm acute accent can be simulated in our model by introducing a

fast and brief word intonation pulse late in the stressed syllable (31)

(cf. Fig. 11-B-9). The Danish "grave accent", on the other hand, de-

scribes a level pitch contour during most of the stressed syllable,

followed by a rapid rise/fall pattern that s tar ts just before the medial

consonant and reaches i t s peak in the middle of the second vowel. This

pattern i s well matched by the command configuration of Fig. II-B-5,

if the word intonation pulse i s put at the beginning of the stressed syl-

lable.

W e -must, accordingly, conclude that the order of the Danish word

intonation pulses, like that observed in the MalmiS dialect, i s opposite

to the order of the Stockholm word intonation pulses.

Model commands for North Swedish word intonation

It i s interesting to contrast this situation with that of Northern

Swedish (cf. Fig. 11-B-10) where the acute accent pulse either

is absent o r i s placed on an unstressed syllable immediately pre-

ceding the accented word. Many of these Northern dialects also have

so-called circumflex, i. e. monosyllabic words with the grave accent

(cf. [bi:t] = "bite !" and [bf:t] = "to bite"). These forms have arisen 932) through apocopation of the final vowel of a bisyllabic grave accented wor .

That also the grave accent pulse is earl ier in the North than in

Stockhdm follaws from a comparison of the relative heights of the

two pitch peaks of this accent: in the North the second peak i s , in gen-

eral, higher than the f irst , but in Stockholm the f irst peak i s us-uily hjgkr

MYCKELGENSJO

ACUTE ACCENT GRAVE ACCENT

I I

0 I

.5 1.0 sec.

Fig. 11-B-10. Comparison of North Swedish (Myckelgensja) accents with model generated curves. Cf . Fig. 11-B-4.

"ACUTE ACCENT": COPENHAGEN "GRAVE ACCENT": COPENHAGEN

Id rva l rn 1 e: I l l ~ ~ a 1 s I B : I I d 6 v a l r n l I: I l l ~ j a l s I s: I

I I 1.0 5 1.0 sec 0 .5 1 0 sec

I I I

5 10 sec 0 .5 1t0 scc. J~

I- - -

Fig. 11-B-9. Comparison of Danish accent patterns with model generated curves. Cf. Fig. 11-B-4.

100

50

.5 1.0 sec

ORSA

ACUTE ACCENT GRAVE ACCENT

I I

0 .5 1.0 sec. I 1 1 1

0 .5 1.0 sec.

I 1 I

0 .5 1.0 sec.

Fig. II-B-11. Comparison of Dalarna (Orea) accent patterns with model generated curves. Cf. Fig. II-B-4.

STL-QPSR 2-3/1967 29.

After having discussed the physiological correlates of the model

elements we shall return to the Scandinavian tonal accents in a more

systematic way and propose a tentative and necessarily incomplete

theory a s to their possible origins and development. Finally, a few

comments will be made concerning tone languages outside of Scandinavia.

Glottal stop interpretation of word intonation pulses

The phonetic nature of the Danish "acute accent" suggests that all

Scandinavian accents should be understood a s variously timed glottal

stops, only softer than the Danish ones (33). At the moment, this i s

an open question.

Fo r one thing it i s not quite clear what a glottal stop is , physiolog-

ically speaking. It i s well known, however, that the muscles of the

larynx a r e innervated by two distinct branches of the X'th cranial

nerve, the recurrence and the laryngicus cranialis (34). The latter

branch ends in the two parts of the crico-thyroid muscle (pars recta

and pars obliqua) while the recurrence branch innervates the remain-

ing intrinsic muscles of the larynx. It i s believed that the crico-thyroid

muscle i s the one mainly responsible fo r pitch control in singing and

in speech (34). It i s thought to perform this function by causing a rota-

tion of the thyroid cartilage about i t s joint on the lower posterior part

of the cricoid cartilage thereby stretching o r relaxing the vocal cords.

The valve function of the larynx, on the other hand, evident in

glottal stops and in the unvoicing gesture of truly voiceless consonants

(this unvoicing gesture i s apparently a sort of negative glottal stop!)

- i s probably performed by the various adducing and abducing muscles.

There i s some evidence of reciprocal inhibition between the crico-

thyroid system and parts of the adducing/abducing system, fn (34).

In a small pilot experiment (35) the motor unit activity of the

vocalis and the crico-thyroid muscles was recorded by means of thin

concentric needle electrodes. The results will be summarized in a

later section of this CPSR (sec. 11. C). The main finding, however, i s

a brief phase of inhibition of crico-thyroid activity at the moment where

the negative word intonation pulses would occur for the two accents

(early for the acute accent and late for the grave). The balli is ti^'^

character of this inhibitory phase i s consistent with the glottal-stop

theory of the accents.

STL-GPSR 2-3/1967 30,

In view of the relative sizes of the anatomical structures affected

by the two laryngeal control systems, one would expect the pitch func-

tion to be mare sluggish than the valve function. It is therefore inter-

esting to observe that in matching the accent pitch patterns it has in-

variably been necessary to assign a value of about 8 to the model con-

stant - cr and values exceeding 12 to - B . Thus, in t e rms of these para-

meters, the word intonation f i l ter of the model has to be at least 50 O/o faster than the sentence intonation filter.

A relevant point in this connection i s the pronunciation of the words

meaning "yest1 and "noff in the Stockholm dialect. Normally, these

words a r e pronounced [ja':] and [nlj:] with the acute accent. But they

can also be spoken bisyllabically and with the grave accent, [ j ; : a ] and

[&:El. In emphatic speech they a r e produced with glottal stops, * ? [j&l and [ n ~ E ] .

Whispered accents

Surprisingly enough, the word accent distinction does not disappear

in whispered speech (36). Under this condition the resonances of the

vocal t ract a r e excited by a turbulent noise source generated by the

incompletely closed glottis. At moderate Reynold' s numbers the spec-

t r a l peaks of this noise a r e tuned to the vocal t rac t formants (37). It

i s well known that the resonance frequencies of a tube open at both ends

(both at the lips and at the glottis, i n the case of the vocal tract) will be

higher than those of a tube closed at one end, provided that the shape of

the tube i s unchanged, If the word accents involve a brief narrowing of

the glottis orifice, a downward movement of the "noise pitch" would

therefore be expected. Ebidently, a decrease in noise intensity also

accompanies the accent gesture. Effects of this type may convey the

subjective word accent impression in whispered speech.

Prominence of accented svllables

A brief comment i s due also regarding the impressionistic assign-

ment of prominence levels to the syllables affected by the word accents.

Using a subjective scale of five levels, (0-4), traditional phoneticians

put [40] for acute accented words of the Stockholm dialect such as (38) [rnb:n~n] and [32 ] for grave accented words such a s [mb:nen] .

I. e. , t h e second syllable of the grave accented word i s perceived a s

being more prominent than the corresponding syllable of the acute

accented word.

In t e rms of the pulse theory ( re fe r to upper part of Fig. 11-B-8)

the acute accent pulse occurs early in the f i rs t syllable while the grave

accent pulse i s late. The laryngeal effect needed to return the pitch

to the sentence intonation level after the negative accent pulse has oc-

curred will therefore be greater in the grave accent case, since the

pitch contour without the accent pulse would be close to i t s asymptote

near the end of the f irst vowel. MoreXer, this effort must be ex-

pended during most of the second syllable in the grave case in contra-

distinction to the acute case where it occurs during the f irst syllable.

It may be that the perceived degrees of prominence a r e at least partly

based on these relationships (39)

PHYSIOLOGICAL INTERPRETATION OF SENTENCE INTONATION COMMANS.5

We have a s yet only studied the tonal accents in sentence positions

characterized by essentially c ~ n s t a n t subglottal pressure, P , and con- -5

stant "average pitch level". However, a sentence intonation contour

may be subdivided into consecutive phrases each of which i s grossly

characterized by an initial rise, a flat peak medially, and (usually) a

fast drop at the end of both the - Ps and the fo-oontours (40). (The

simple declarative sentence should be regarded a s a single phrase.)

If one assumes, a s we have done, that F is constant (cf. p. i:2 0

et sqq) then one is forced to conclude that while the sentence intonation

steps a r e positive in the pre-peak portion of the phrase, they should be

negative when occurring after the point where the peak is desired in

order to bring about the typical pitch drop at the end of the over-all

f -contour. If this were so, the tonal accents could not be generated 0

by means of a single negative word intonation pulse in the falling part

of the fo-contour, for the accentual pitch patterns do not loose their

familiar shape in these positions. In particular, the Stockholm grave

accent always s tar ts with a r i se o r at least a strong reduction in the

rate of fall in pitch. But - no combination of a negative sentence intona-

tion step with a negative word intonation pulse could bring this effect

about.

This problem can apparently be solved by assuming that

1) the model parameter F i s - not constant throughout the phrase but describes a slow rigine;-falling movement typical of the breath-group henceforth to be called the basic phrase contaur the dctailed properties of which a r e yet to be determined by empirical measurements.

2) the sentence intonation commands that a r e added to the basic phrase contour a r e not steps, a s assumed previously, but pu s . These pulses, which we will call phonatory s t ress pulscs f4f 7 have a duration of the order of the syllable and a r e always of positive amplitude.

3) a phonatory s t r e s s pulse of some non-zero amplitude i s cn- tered at the beginning of every stressed syllable according to a strategy that will be described below.

4 the word intonation is simulated by means of negative word intonation pulses in the same manner as proposed previously (P* 25).

Note that requirement 2) guarantees that the appropriate word tone

patterns can be synthesized with the model also in the falling part of

the over-all intonation contour. Since the phonatory s t ress pulse al-

ways has positive amplitude i t will always bring about an upward in-

flection of the fo-contour. On the other hand, since i t is a pulse of

finite duration it will only cause a local perturbation on the basic

phrase contour so that the over-all f a rm of the latter i s not too drastic-

ally changed .

Properties of the basic phrase contour

As was noted above, the detailed properties of the deepest of the

intonation processes, i. e. the basic phrase contour, must be estab-

lished by means of phonetic measurements, Given sufficient informa-

tion about the general properties of phonatory s t ress pulses and word

intonation pulses i t should be possible to take fo-contours from human

utterances, subtract off the pitch modulations due to s t ress and to word

tones, and thereby obtain the basic phrase contour a s a residue (with

a superimposed ripple due to noise a s well a s to articulatory and acous-

tic interactions). This procedure i s illustrated in Fig. II-B - 12.

Although the study of the basic phrase contour, a s defined above,

has not yet proceeded very far , i t i s nevertheless possible to distinguish

three typical patterns; that of the terminative mode, that of the conti- - nuative mode, and that of the elicitative mode of pronunciation. The

terminative mode i s characterized by a steeply falling end-contour that

s tar ts close to the end of the latest s trcssed word of the utterance and

i s normally used in simple declarative sentences. The continuative

mode has a level end-cont our and i s normally used to indicate that more

phrases will follow. The elicitative mode, finally, has a rising end-

contour (often follawed by a fast drop) and i s used in questions and to

express surprise,

1 .4 .6 .8 1.0 1.2 1 . 1.6 1.8 sec

Fig. 11-B-12. Curve I: Measured pitch contour of the utterance [:, dEn: &r:ancn va m&:m m&: ad], "and that morning the man was murdered", spoken i n the elicitative mode (Stockholm dialect).

Curve 11: Model response to the four phonatory s t r e s s pulses shown i n curve V. F o r a l l of these input pulses the amplitude was .5 , the duration . 3 sec, and the filter constant ct = 3 . Curve 111 shows the result of adding the word intonation pulses of curve VI to the input. The lat ter pulses a r e a l l identical and B = .8, D = . 15 sec, and B = 15. Curve IV i s the difference between Curve I and Curve I11 and probably represents the basic phrase contour. Note the typical elicitative end-contour.

STL-CPSR 2-3/1967 34.

Secondly, the increase of physiological intensity in the articulatory

channel would bring about f a s t e r and m o r e vigorous movements of the

tongue, l ips, velum, and jaw, a s well a s of the ar t iculatory components

of the larynx. As a consequence of this, the s t r e s sed syllables would

be character ized by a lower degree of coarticulatory overlap between

the abutting consonant and vowel ges tures (45). Phonetic distinctions

would therefore be sha rpe r in these syllables.

Finally, the physiological intensity increase in the phonatory chan-

nel will - a s already noted - bring about a positive inflection i n the fo-

contour through the introduction of a s t r e s s pulse a t the input of the

sentence intonation filter.

The pitch r i s e due to the phonatory s t r e s s pulse is positively co r -

related with the length of t h e syllable on which i t occurs. However,

a s was noted above, the duration s f equally s t r e s sed syllables is

sma l l e r i n the middle of the phrase, i. e . , where I (t) is assumed to cP

have g rea te r values, than a t the end of the phrase, where I (t) has cp

sma l l e r values. It is therefore surpris ing a t f i r s t , that an increase

in I (t), due to the introduction of a s t r e s s pulse, should increase rp

ra ther than dec rease the duration of the s t r e s sed syllable.

This apparent contradiction may be explained i n t e r m s of auditory

prominence, a perceptual quality of syllables. It may be assumed that,

everything e lse being equal

5) the f a s t e r the pitch r i s e s i n a syllable ( o r the l e s s steeply it drops), the g rea te r will the perceptual prominence be, and

6) a longer syllable is erceived a s m o r e prominent than a shor te r syllable (46 f

The lengthening of s t r e s sed syllables may now be seen a s having

two functions, namely, to increase the perceptual prominence of the

s t r e s sed syllable, and to avoid that the following syllable becomes

more prominent than it should be. The second syllable would become

too prominent if the pitch r i s e due t o the phonatory s t r e s s pulse was

not given tiArre to become completed during the s t r e s sed syllable but (47 continued to r i s e in the following syllable .

Physiological energy

The integral of the physiological intensity function I ( t ) taken over cp

the t ime interval of the ent i re phrase might be given the name total - physiological energy, ET. The inverse relationship between syllable

duration (disregarding the effects of s t ress ) and the basic phrase con-

tour, commented on earl ier , suggests that, for unstressed syllables,

the integral of I ( t ) over the time segment of the syllable i s approxi- cp

mately constant.

Moreover, preliminary experiments indicate that when one of the

syllables of an utterance is given emphatic s t ress two things happen:

the over-all pitch level of the remaining syllables drops in proportion

to the degree of emphasis, and the pitch modulation due to s t ress and

word intonation in the non-emphatic syllables becomes - de-emphasized.

This suggests that the physiological energy that goes into the ernphasis

is subt-racted f rom the energy that :;therwioo would go into the non-

emphatic syllables.

A conservation principle of a similar kind i s apparently implied by

one of the basic devices in Chomsky' s and Halle' s generative formu-

lation 3f American English phonology (48). According to this theory

the simplest way of describing the s t ress patterns possible in the

American English phonological phrase i s to postulate a (probablyuni-

versal) type of rule that (a) changes the s t r e s s of a syllable to primary

s t ress and at the same time (b) reduces each of the s t resses of the re-

maining syllables of the phrase by one degree. The order in which the

primary s t resses a r e introduced into the phrase thus determines the

s t ress pattern a s a whole, and this order i s a unique function of the

morphemic and syntactic structure of the phrase and of the rules.

Thus the difference in s t ress pattern on compound Nouns a s compared

with Noun Phrases consisting of anAdjective followed by a Noun a s well

a s a great many other prosodic phenomena may be described in a very (49) simple and elegant fashion .

On the basis of these hints, the following hypothesis may be pro-

posed.

7) In normal conversation the total physiological energy E of T a phonological phrase, includ'ng the energies of s tressed syllables, depends mainly on the number of syllables of the phrase.

Synthesis strategy for intonation contours

W e a r e presently experimenting with strategies for the synthesis

of f -contours that obey assumptions (1-7) of pp. 31, 32, 34, and 35. 0

One of these strategies, summarized below, i s a phonetic analogue of

the Chomsky- Halle sequential procedure mentioned earlier.

STL-QPSR 2-3/1967 36.

We have, so far , only been working with simple declarative sen-

tences in the terminative mode. It i s assumed that the basic phrase

contour s tar ts with a slow r ise from about 90 Hz toward 110 Hz. This

r i se i s turned into a slow drop at a point corresponding to the boundary

between the last syllable of the noun-phrase constituent and the f i rs t

syllable of the verb-phrase constituent of the sentence. This point will

henceforth be called the turn-over point. The end-contour, finally, is

assumed to s tar t at the beginning of the syllable that immediately fol-

lows the latest s tressed syllable of the sentence. If the last syllable

of the sentence is stressed the end-contour s tar ts i n its middle.

The main steps of the procedure a r e a s follows.

SI 1) Assume initially that all syllables a r e unstressed, calculate a basic phrase contour F (t) for this condition, and calculate the locations of the syllabqe boundaries on the contour.

This step is carried out using the two assumptions tha (with gb(t !

fo(t) = eg(t), g(t) = gb(t) + gs(t) + gw(t), i. e., Fo(t) = e ) g(t) i s proportional to the physiological intensity I (t), and that, initially,

v each syllable i s given the same physiological energy Es. If the noun-

phrase constituent contains nl syllables and i f the latest s t ress of the

sentence occurs on the a ' th syllable, then the turn-over point, 2 t l '

and the time coordinate, t2, of the onset of the end-contour can be

found by solving the equations

; E = constant S

for t l and t Here Fo(t) = e gg(')

2 i s the basic phrase contour of step

SI 1) synthesized according to the principles already described.

The following steps involve a perturbation of the basic phrase con-

tour obtained from SI 1) - both with respect to frequency values and

with respect to the durations of the syllables - according to the distri-

bution of s t resses within the phrase. The main condition i s that the - total energy of the phrase remains unchanged. Thus,

SI 2) Specify the sequence in which s t resses a r e to be introduced into the phrase.

SI 3) Calculate the energies of the s t resses according to a procedure that will be described in a moment.

SI 4) Calculate the amplitudes of the phonatory s t ress pulses using the result of SI 3).

SI 5) Recalculate the basic phrase contour and resegment it into syllables .

S16) Introducephonatorystressandwordintonationpulses.

The procedure referred to in step SI 3) i s a s follows. After step

SI 1) each syllable has been given the constant energy Es. If the phrase

contains n syllables, then the total energy ET i s equal to nEs. To ob-

tain the amplitudes of the phonatory s t ress pulses and the new durations

of the syllables, we introduce the concept of s tress. A s t ress is a

& associated with a syllable at a certain stage of quantum of energy,

the calculation. The s t resses a r e introduced one at a time. F o r k 2 i, k

Ei refers to the value (at the k ' th stage) of the energy quantum intro-

duced at the i ' th stage. The subscript, zero, however, i s reserved k

for the energy of the basic phrase contour. Thus Eo is the energy of

this contour at the k ' th stage.

The procedure indicated in SI 3) may now be summarized by means

of the following formulas (a sample derivation i s given onp: 38).

(51 3:2) 332: = constant for k > 0

ktl k t1 (SI 3:4) 335 = ET - Ei

1= 1

Thus, as s t resses a r e introduced one by one into the phrase, the

energies of the basic phrase contour and of the s t resses introduced I<+ 1 ear l ier a r e successively reduced. In particular, the energy Ei

of the i' th s t ress after the k t l ' s t stage (kt l>i ) i s only a fraction

~ ( 1 - 2 - ~ ) , 0 5 c 5 1, of the value it had after the k'th stage.

Suppose the calculation is terminated after the m y t h stage. We

may now determine for any given syllable of the phrase how many

s t resses have been associated with it during the calculation as well a s

the energies of these stresses. The sum of these energies is the en-

ergy that goes into the phonatory s t ress pulse to be introduced on the

syllable in qucstion. This i s what we nced to ca r ry out steps SI 4)

and SI 5).

enter stress on S3

enter stress on S6

enter stress on S1

enter stress on S3

energy to be added to each stressed syllable

Example of Derivation of Stress Energies

Ed

with c = 49

SEQUENCE OF SYLLABLES IN THE PHRASE

STAGE S1 I S2 I S3

E:

E:.

'4 1 '5 / '6 I '7,

E l 1

b

E: E: I E l E': E E.: E:

STL-CPSR 2-3/1967

The Scandinavian accent orbit

Let us now reconsider the Scandinavian word tones in the light of

the modified concept of sentence intonation that was discussed in the

previous section. Our model now provides us with three types of

building block in t e rms of which the fo-contours of human utterances

a r e to be analyzed. These building blocks a r e a) the bask phrase

contour, b) the (positive) phonatory s t r e s s pulses, and c ) the (nega-

tive) word intonation pulses. We have also suggested that the basic

phrase contour i s related to the breathing cycle, that the s t ress pulses

reflect the insertion of a quantum of physiological energy into the epeech

production system as a whole, and that the word intonation pulses cor-

respond to more o r less tense glottal stops.

Since we may assume that the basic phrase contour i s level in the

neighborhood of the test word of utterances like [ d ~ va mb:n&n ja sa:], etc., the command configuration of Fig. 11-B-5 that was suggested a s

the basic formula for all Scandinavian word tones, need be changed

only in one respect. The "sentence intonation step" of amplitude + A

is now replaced by a phonatory s t ress pulse. This change i s equivalent

to adding a step of amplitude - A to the sentence intonation command of

Fig. 11-B-5 near the second consonant (cluster) of the syllable.

In all cases that we have analyzed ( ~ i ~ s . 11-B-4, 11-B-7, 11-B-9,

11-B-10, and 11-B-11) the off-set of the phonatory s t ress pulse would

be very close to the end-contour of the basic phrase pattern. The

negative sentence intonation step that was previously used to generate

the terminal contour may therefore be regarded as including both the

off-set of the phonatory s t ress pulse and the end command of the basic

phrase contour. The modification of the model just introduced con-

sequently does not al ter our earl ier conclusion that the Scandinavian

word tones may be simulated by means of appropriately timed word

intonation pulses of negative amplitude.

The pulse character of the phonatory s t r e s s command suggests a

very natural interpretation of the difference between the Stockholm and

the North Swedish grave accent patterns. As was noted earl ier , the

second peak of the Stockholm grave accent has a lower frequency value

than the f irst peak, whereas this relationship is reversed in many

North Swedish dialects. This difference may be obtained in the model

by putting the accent pulse slightly earl ier o r slightly la ter than the

STL-CPSR 2-3/1967 4 1.

pitch peak that would result from the phonatory s t r e s s pulse alone.

This effect i s illustrated in Fig. 11-B-13.

We have already noted that the difference between the Orsa acute

and grave accent patterns may be simulated by entering a word intona-

tion pulse close to the beginning of the second syllable of the acute

accented word while the grave pattern i s synthesized by means of the

phonatory s t ress pulse and the basic phrase contour only. It i s in-

teresting that the same statement holds for some of the (geographically

distant) Gotland dialects. According to most authorities, the Central

Dalarna and the Gotland Dialects a r e closer to Proto-Scandinavian (53)

than any of the other modern Scandinavian tongues also a s regards

phonological structure and vocabulary.

To judge from Meyer' s data some of the Dalarna dialects show a

different type of contrast between the acute and the grave patterns (cf.

e.g. dialect No. 18 of Fig. 11-B-3) than those observed in Orsa. Here

it is more probable that the acute pattern should be synthesized by

means of the phonatory s t ress pulse and the basic phrase contour only,

while the grave pattern would be obtained by also entering a word in-

tonation pulse at the beginning of the - f i rs t syllable of the accented word.

If we take the Rattvik dialect (NO. 27 of Fig. 11-B-3) a s our start-

ing point we may accordingly visualize two ways of sharpening the

accent contrast. One way i s to enter a word intonation pulse on the

falling tail of the acute contour. We may then gradually move this

pulse from the right into the f irst syllable of the word. The other

way i s to enter a word intonation pulse on the rising ramp of the

Rattvik grave pattern. We may then gradually move this pulse - from

the left toward the end of the f irst syllable of the word.

In Fig. 11-B-14 we have rearranged a part of Meyer's data so a s

to demonstrate that dialects may be found that correspond to the suc-

cessive stagcs of these two gradual processes. The Riittvik dialect

will be found in the leftmost part of the "accent orbit". Following

the orbit downward from this point on corresponds to the f irst men-

tioned of these two processes, and the second process i s represented

by the opposite direction. (In each subgraph of Fig. 11-B-14 the left

curve represents the acute accent and the right curve represents the

grave pattern. )

Fig. 11-B - 13. Model outputs for different combinations of input commands. Curve I: Phonatory s t r e s s pulse + terminative end-contour. Curve 11: Terminative end-contour only. Curve 111: Word intonation pulse + terminative end-contour. Curves IV-VIII: Phonatory s t r e s s pulse + terminative end-contour + word intonation pulse moved stepwise t o the right. Note difference in height of the two pitch peaks of curves V and IV.

STL-QPSR 2-3/1967 42.

Fig. 11-B - 14 was arranged on purely phonetic principles suggested

by our model, and not all dialect pairs that a r e adjacent in the figure

a r e also contiguous geographically, In fact, the dialects do not appear

to be distributed along a single geographic orbit, but along several.

It i s too early to make definitive statements about the exact geographic

course of these orbits since a sufficiently dense sampling of the dia-

lects has not yet been carried out over the whole area,

However, Meyer himself emphasized (54) the dialectal continuity

along a line that s tar ts in Central Dalarna och moves South-East into

Uppland. Examination of Meyer' s data suggests that this line divides

itself into a Northern and a Southern branch in Uppland. Branch

points a r e also indicated in other parts of the area.

The phonetic importance of investigating these relationships in

detail should be stressed. The hypothesis of an accent orbit together

with the possibility that there a r e chains of geographically contiguous

dialects that correspond to various sectors of the accent orbit of Fig,

11-B-14 implies, for instance, that the acute and the grave accent

pulses do not move independently of each other a s the dialects develop,

but that certain temporal and other qualitative constraints a r e obeyed

so that dialects not in contacts may develop in similar ways for intrin-

sic reasons. The determination of these constraints may aid our un-

derstanding of speech communication in general.

On the origin of the Scandinavian accents

We must ask now if our intonation model can throw any light on the

historical problem as to how the Scandinavian tonal accents may have

arisen. Needless to say, the remarks that we shall make on this top-

ic a r e speculative, Moreover, historical linguists disagree about the

date and the exact phonological circumstances of the formation of the

accent distinction. Below we shall, however, accept the essentials of

the views set forth by Oftedal (55). The discussion offered here may

be viewed as an illustration of the central details of his theory in t e rms

of the model proposed in previous sections of this paper.

At the time of Scandinavian linguistic unity (about 500 B . C . ) the pitch of stressed syllables (normally the f irst syllable of the word)

would be characterized simply by a phonatory s t ress pulse. The

language distinguished between two types of syllable, namely, long

STL-GPSR 2-3/1967 43.

L' V

(CV:C or CVCC) and short (CVG). Both types could be stressed. In

the interior of the phonological phrase the tonal difference between

stressed short and stressed long syllables would probably be slight.

As illustrated in Fig. 11-B-15 (parts I and 11) the difference would be

more marked in phrase final position, however.

In drawing Fig. 11-B-15, it has been assumed that the phonatory

s t ress pulse (marked SP) has a fixed duration and amplitude, and that

i t s tar ts at the beginning of the f i rs t consonant of the stressed syllable.

Moreover, i t has been assumed that the tail of the basic phrase con-

tour (marked BPC) s tar ts on the f irst consonant of the syllable imine-

diately following the latest stressed - word of the phrase. I. e. , the

end-contour is tied to the final word boundary of the latest stressed

word.

At the stage illustrated by parts I and I1 of Fig. LI-B- 15 the second

syllable of a bisyllabic word with a short f irst syllable was probably

perceived a s more prominent than the second syllable of a word with

long f irst syllable. In the former case the pitch i s rising and in the

lat ter case it i s falling on the second syllable. It i s likely, however,

that the speakers felt the difference induration of the f irst syllable

a s being more important, since this difference would dominate in the

interior of the phrase.

After this period three developments took place in the following

order.

Syncopation. A syllable following a long stressed syllable inside a word was shortened. In particular, if the second syllable was long it became short, and i f i t was short i t s vowel disappeared. Later the same kind of second syllable shorten- ing took place after a s tressed short syllable. In phrase final position this caused the end-contour of the basic phrase pattern to move with the word boundary closer to the f irst syllable of the word.

2 ) Word boundary shift. In noun phrases the enclitic definite article, having been a f ree morpheme (a postponed definite pronoun), became part of the word, i. e., the word boundary was moved one syllable to the right. PJso, syllabic word final consonants o r consonant clusters that had formed as a result of the syncopation were amplified by means of a svarabhakti vowel. This also had the effect of delaying the word boundary and the end-contour.

3 level in^. Short stressed syllables were lengthened.

LOW FALLING

LOW RISING

FALLING/ RISING

HlGH FALLING

HlGH RlSlNG

RISING / FALLING

Fig. 11-B-16. Hypothetical pitch patterns to be compared with Eaet and South Eaet Asian tonee.

STL-QPSR 2-3/1967 45.

opposition, then a word intonation pulse may have been entered near

the beginning of the second syllable of these words as illustrated in

part V of Fig. 11-B-15. On the other hand, i f the relatively high pitch

on the second syllable of polysyllabic words (cf. Pa r t I of Fig. 11-B- 15)

was perceived a s the marked feature, then a word intonation pulse may

have been inserted at the beginning of the f i rs t syllable of these words

a s illustrated in part VI of Fig. 11-B- 15,

Different dialects would have chosen different combinations of

these possibilities. In later stages of the historical development,

perhaps in conjunction with the leveling of short syllables, the word

intonation pulses must have started to drift in the various dialects,

the acute, 'tmonosyllabic" pulse towards the beginning of the f irst syl-

lable, and the grave, ltpolysyllabic" pulse towards the end of the f irst

syllable along the lines of the Scandinavian accent orbit. (Cf. parts

VII and VIII of Fig. 11-B - 15.)

It should be noted that our theory also postulates a certain tonal

distinction a t various stages (before the leveling period) between acute

accented words with long and short initial syllable as well a s between

grave accented words with long and short initial syllable. In certain

modern dialects that preserve the syllable length distinction (Solleron,

for instance) the accent on short syllabic words i s somewhat different

f rom the ordinary grave and acute patterns. It i s possible that this

feature i s related to the processes described in Fig. 11-B-15.

Other tone languages

It may at last be worth while to consider very briefly the possibi-

lities of our intonation model in relation to tone languages other than

those of Scandinavia (56). It i s reasonable to expect that the word

tones of these languages also a r e superimposed on an underlying basic

phrase contour (57). Whether o r not the tones of all tone languages

a r e best described by means of a simple negative pulse fed to the in-

put of the word intonation f i l ter i s , of course, an open question at the

moment. Even with this restriction imposed on the model, however,

a very great number of sharply different tone patterns can be generated.

Suppose that the following constraints were valid for any given tone

language :

1) The phonatory s t ress pulse can only s tar t at the onset of the syllable.

2) Only two amplitudes a r e allowed for the phonatory s t ress pulse, zero and a certain non-zero value, A . -

The negative intonation pulse can only have the following discrete

properties :

3 ) the amplitude i s zero o r has a fixed non-zero value, - B;

4) the duration is either short o r long;

5 ) the model constant - $ is either large o r small;

6 ) the word intonation pulse can only occur near the initial consonant(s), near the middle of the vowel(s), o r near the final consonant(s) of the syllable.

A language employing all the possibilities that a r e open under these

constraints would have 26 different tones (when B = 0 the distinctions

under 4) , 5 ) , and 6 ) above become irrelevant). Six of these a r e show

in Fig. 11-B- 16. They seem approximately to fit the description of

some of the more common tones of the East and South-East Asian lan-

guage area ( 5 8 )

It i s interesting in this connection to note certain systematic rela-

tionships between the Hakka, Foochow, and Pekingese dialects of

modern Chinese. In a group of historically related words, the final

consonant i s a voiceless stop in Hakka, a glottal stop in Foochow, and

zero in Pekingese (59). However, these words have a low falling tone

in the Pekingese dialect, and specialists believe that this tone has

developed from an earl ier glottal stop which in turn developed from a

(glottalized) final stop consonant. A process like this which seems to

be partly inverse to the development of the Danish glottal stop (stbd)

from a word tone, i s nevertheless entirely consistent with the intona-

tion model proposed in this paper.

SUMMARY

In the present paper a quantitative model of larynx control during

speech production has been described. The input commands a r e con-

figurations of simple step functions fed to the model over two channels,

the sentence intonation filter and the word intonation filter.

In order to find further constraints to impose on the model for

purposes of empirical adequacy, the Scandinavian grave/acute accent

opposition was analyzed by fitting curves generated with the model to

empirically measured f -contours. 0

It was found that the salient features of these intonation patterns

in simple utterances of a number of dialects can be simulated by means

of a single positive step a s input to the sentence intonation filter and

an appropriately timed negative pulse as input to the word intonation

filter (cf. Fig. 11-B-5). It was proposed, tentatively, that this analysis

i s valid for all Scandinavian dialects.

We next turned to the question a s to how the model elements should

be interpreted in physiological terms. Regarding the word intonation

channel the hyp:>thesis was proposed that the Scandinavian tonal accents

a r e a sort of laryngeal consonants, not unlike glottal stops, that a r e

coarticulated with the sentence intonation a s well as with the "segmen-

tal" gestures of s tressed syllables. The sentence intonation commands,

on the other hand, turned out to be decomposable into a basic phrase

contour and a sequence of phonatory s t r e s s pulses. It was suggested

that these constructs reflect an underlying process termed physio-

logical intensity and that s t ress should be understood a s the addition

of a quantum of physiological energy to the speech production system

as a whole. This energy is distributed (possibly unevenly) over the

pulmonary, phonatory, and articulatory channels. In the phonatory

channel the s t ress energy manifests itself as a phonatory s t ress pulse

at the input of the sentence intonation filter.

In this connection, possible energy conserving principles regarding

the phonological phrase as a whole were considered. Also, a synthesis

strategy for sentence intonation was discussed. This strategy has ce r -

tain properties in c o m m ~ n with the transformational cycle of C homsky

and Halle' s theory of phonology.

Having developed these concepts we returned to the Scandinavian

accents. h examination of Meyer' s data indicated that a ser ics of

dialects may be found that display certain systematic relationshi ps

with respect to the relative locations of the hypothetical acute and

grave accent command pulses. When adjacent members in this ser ies

a r e compared, the accent pulses appear to be cotranslated a small

step either to the left o r to the right. This relationship was termed

the Scandinavian accent orbit.

STL-QPSR 2-3/1967 48.

Consideration of the accent orbit together with known facts of

Scandinavian linguistic history suggested an hypothesis about the ori - gin of the word tone distinction. Briefly, this distinction may have

arisen i n phrase final position a s a result of the successive move-

ments of the terminative end-contour.

Finally, a few remarks were made about the generative power of

the model with respect to tones observed in non-Scandinavian languages.

The intonation model summarized in the present paper makes i t

possible to collect systematic quantitative information on the tonal a s

well a s other prosodic events. Work involving close comparison of

the model with empirical data i s in progress:

ACKNOWLEDGMENTS

I have profited from valuable discussions with G. Fant, B.

Lindblom, J. Lindqvist, E. and L. GBrding, K-H. Dahlstedt,

R. Leanderson, G. Malmqvist, and A. Ellegdrd to all of whom I

want to express my sincere thanks. I am alone responsible for

the ideas and hypotheses presented in this paper, however. I a m

also grateful to Mrs. S. Felicetti for her expert editorial assistance.

FOOTNOTES

(1 a ) Z)hman, S. : "On the Coordination of Articulatory and Phonatory Activity in the Production of Swedish Tonal Accents", STL- GPSR 2/1965, pp. 14-19.

(1 b) ahman, S. and Lindqvist, J. : "Analysis-by-Synthesis of Prosodic Pitch Contours", STL-OPSR 4/1965, pp. 1-6; to appear in Proc. of Seminar on Speech Production and Perception, Z. f. Phonetik usw., Berlin, DDR.

(2) I am grateful to Dr. Philip Lieberman who in many interesting discussions during the academic year 1963-64 drew my atten- tion to the problems of intonation. Cf. Lieberman, P. : Intonation, Perception, and Language (Cambridge, Mass. 1967).

STL-QPSR 2-3/1967

footnotes

( 3 ) The phonetics of the word tones has been descr ibed by Meyer, E, A. : Die Intonation i m Schwedischen, Tei l I (Stock- holm 1937) and Die Intonation i m Schwedischen, Teil I1 (Stockholm 19547; ~ a l m b e r ~ , B. : .sydsvensk ordaccent (Lund 1953); Hadding-Koch, K. : Acoustico-Phonetic Studies i n the Intonation of Southern Swedish ( thesis , Lund 196 1); E le r t , C -C. : Phonologic Studies of Quantity i n Swedish (thesis, ~ t o c k h g l m 1964).

: ordaccent (Lund 1953); - . . - - . s i n the Intonation

of southern Swedish ( thesis , Lund 196 1); E le r t , C-C. : P h o n o l o l h

j4). ~ l o s t e r Jensen, M. : ~ o n e m i c i t ~ , Nsrwegian Universit ies P r e s s (Oslo 1961).

(4) The model described he re is a slightly revised version of the one presented i n fn (1 b). A second revision will be proposed a s a resul t of discussions to follow l a t e r i n this paper.

The pitch/pressure-dependency has been dealt with i n the reference of fn (1 b) a s well a s by Ladcfoged, P. : "Physiologicel Studies of Speech", STL-GPSR 3/1961, pp. 16-21; Lieberman, P. , loc. cit. Ventsov, A. V. : "The Relationship Between the Pi tch Period and the Intra-Aural P r e s s u r c t t , paper 24-3, p. 345 i n Digest of the 7th ICMBE. Stockholm 1967.

(6) This factor has been unsufficiently studied. Cf. fn (1 b) and Faaborg- Andersen, K. and Sonninen, A. : "The Function of the Extr insic Laryngeal Musclcs a t Di fferent Pitch", Acta oto- Laryng. 2 (1960), pp. 89-93.

(7 ) A fuller t reatment will be given i n a forthcoming publication.

The measurements of Ladefoged, ohman and Lindqvist, and Vcntsov (cf. fn (5)) indicate that the fundamental frequency may vary by 0.16 ~ z / c m ~ 0 if only the pressure-drop a c r o s s the

2 glottis is changed and everything e lse is constant. Lieberman gives a somewhat higher value.

According to o u r syllables of normal s u r e increases by at

own measurements , during s t r e s sed conversational speech, the subglottal pres- most a few c m H,O and the increase in

pitch due to this ;actor would therefor6 be negligible in comparison with the pitch movements caused by larynx muscle ad- justments. The effect of p res su re fluctuations may be con- si dcrable in voiced obstruents, semi-aspirat ives , and during thc terminal phase of the sentence intonation, however.

The tonal configurations studied in the present paper have been embedded in a sentence f r ame in a position where the subglottal p res su re may be assumed to be essentially constant except for minor fluctuations due to s t r e s s . In a more complete t reatment a p res su re dependent correct ion factor must be introduced into the model calculations in the f o r m of an "acoustic interaction signal".

(9) Ruch, T.C., Patton, H.D., Woodbury, J. W., and Towe, A. L,: NeurophySiology (1 962), pp. 103- 105.

STL-QPSR 2-3/1967 50.

footnotes

Lindblom, B. : "Studies of Labial Articulation", STL-CPSR 4/1965, pp. 7-9. ohman, S. , Pe r s son , A , , and Leanderson, R. : "Speech Production a t the Neuro-Motor Level", forthcoming ar t ic le i n J. of the Acoustical Society of America.

It appears that the best resul ts a r e obtained with n 2 2 fo r most speakers . All examples of the present paper have been calculated with n = l , however.

The PDP-7 computer belongs to the Department of Automatic Control, Royal Institute of Technology (KTH), Stockholm. Prof. L. von Hbmos' kind cooperation i s gratefully acknow- ledg ed . ahman, S. : "Computer P r o g r a m fo r Pi tch Measurements", STL-GPSR 1/1966, p. 11.

I wish to express my grati tude to J. Liljencrants whose cooperation i n the programming phase of this work simplified my efforts substantially.

The uniqueness question for solutions obtained by means of the automatic i terat ive procedure will be dealt with in a l a t e r publication.

ahman, S. : "Generative Rules fo r the Phonology and Prosody of the Swedish Verb", (Generativa reg ler fo r det svenska verbets fonologi och prhsodi), in Swedish, Forhandlingar vid Sammankomst fo r att d ryf ta Friigor Rarande Svenskans Be- skrivning I11 (Goteborg 1966), pp. 7 1-87.

Kock, A. : Sprzkhistoriska undersokningar om svensk accent, P a r t I (Lund 1878) and P a r t I1 (Lund 1884).

\

Hesselman, B. : Huvudlinjer i nordisk spriikhistoria ( ~ p p s a l a 1948 - 1953). bftedal, M. : "On the Origin of the Scandinavian Tone Distinc- tion", Norsk Tids skrift fo r Sprogvidenskap, - 16 (1 952), pp. 201-225.

The Danish glottal stop has been described f rom various points -

of view by Smith, S. : ~tjddet i dansk rigssprog (Copenhagen 1944). Martinet, A. : La phonologie du mot en Danois . Hansen, A. : "Stjddet i danslr", Det Kpl. Da. Vidensk.Selsk. Hist. -Fil. Meddelelser, XXIX:5 (Copenhagen 1943).

Malmberg, B. : op. cit. Hadding-Koch, K. : op. cit. gives spectrographic i l lustrations.

Meyer, E. A. : op. cit . , Teil 11.

(21) This analysis, which i s an account of my own impressions, purposely d is regards the tonal contours of the word accents.

(22) The end-contour can of course be different in different dialects and in different sentence types. Any such contour can be synthesized by means of an appropriately chosen s tep configuration fed into the sentence intonation f i l ter . In o r d e r to purify the effects of the word intonation we have t r ied to choose f r ames with maximally simple sentence intonation, however. It s o happens that a negative s tep introduced a t the end of the pzn- ultimate syllable of the f r ame refer red to in the text suffices to match the Stockholin data satisfactorily. Questions regarding the end-contours will be discussed in more detail on p. 30 et sqq.

STL-QPSR 2-3/1967 51.

footnotes

The sentence intonation source of the model i s assumed here to generate s teps only. As a resul t of the discussion of the physiological meaning of sentence intonation on p. 30 et sqq., we will l a t e r replace this assumption by one stating that the sentence intonation source generates positive pulses only. These pulses will a lso be assumed to s t a r t a t the beginning of the s t r e s sed syllable.

The acute accent will be denoted by " '", and the grave accent by 1'"1+

A sha rp and brief dip in the measured f -contour usually oc- c u r s during the [v] of [ d t v a . . . ] of theoframe. This dip is probably caused by the increased in t raora l p res su re during th is consonant (fn (1 b)). During the [s] of [ . . . ja sa:] fo is of cour se undefined.

Meyer, E. A. : op. cit . , s ta tes that the pitch drop at the beginning of acute accented syllables of the Stockholm dialect represents a n influence f r o m Southern Swedish.

E. Haugen, in personal communication, has suggested to me that the Eas t Norwegian acute accent should be identified with the sentence intonation and that the grave accent i s a delayed version of the sentence intonation. F o r the Swedish dialects that I have had experience with s o f a r , i t s eems be t te r t o postulate that the sentence intonation s tep is fixed a t the beginning of the syllable and that the timing (and other parameters ) of the word intonation pulse is responsible fo r the tonal con- t r a s t . Cf. Haugen, E. and Joos, M. : "Tone and Intonation in Eas t Norwegian", Acta Philologica Scandinavica - 22 (1 952), pp. 41 -64.

The role of this a s well a s of other types of juncture has been extensively studied by GBrding, E.: "Internal Juncture i n Swedish", ( thesis , Lund 1967).

In synthesis experiments B. Malmberg has noted that the impression of a grave accent o r an acute accent can be obtained by moving a f -peak within the span of the f i r s t syllable of the accented wor8. Malmberg, B. : "Observations on the Swedish Word Accent", mimeographed report f r o m Haskins Laborator ies , New York and Lund 1955.

ahman, S . : "On the Coordination of Articulatory and Phonatory Activity i n the Production of Swedish Tonal Accents", STL- QPSR 2/1965, pp. 14-19.

The word intonation pulse a t the end of the second syllable of the two Danish t e s t words represents a c lear ly audible juncture at the beginning of [ j a sas:]. This juncture is s imi lar to the one observed in the Malmo utterances of Fig. 11-B-7. Cf. a lso fn (28).

(32) Dahlstedt, K-H. : Det svenska VilhelminamHlet, 2 ( ~ p p s a l a 1962). This book gives a detailed phonological discussion of a grea t number of North Scandinavian dialects.

STL-QPSR 2-3/1967

foot note a

(33) Swedes imitating Danish tend to exaggerate the glottal stops, however. This may be because, a s was noted above, tense glottal stops occur in Swedish before stressed syllables beginning in a vowel. The accents a r e lax in comparison with these glottal stops.

(34) Sonesson, B. : "The Mechanisms of the Human Vocal Folds", forthcoming article in American Lecture Series in Anatomy.

The conception of larynx function briefly summarized he r e was explained to me by B. Sonesson. I am grateful to him for many illuminating discussions on this topic.

(35) The experiment was carried out at the Central Neurophysio- logical Laboratory, Karolinska Sjukhuset, Stockholm, by Drs. A. M3rtensson, R. Leanderson, and A. Persson. I am grateful to them for their willingness to cooperate. A more complete description of the methods and procedures used, will be given by Leanderson in a forthcoming publication.

(36) Hadding-Koch, K. : op. cit. Segerbsck, B. : "La RCalisation d' une Opposition de TonErnes dans des Dissyllabes Chuchot6sff (Lund 1966). Meyer-Eppler, W. : "Realization of Prosodic Features in Whispered Speech", J. Acoust. Soc. Am. - 29 (1 957), pp. 104- 106.

(37) Fant, G. : Acoustic Theory of Speech Production (' s-Graven- hage 1960), p, 272 et sqq.

(38) This notation, which has been adopted by Svenska Akademiens Ordbok, has been discussed by C-C. Elert (see fn (3)).

(39) Elert , C-C. : op. ci-t. , p. 139, states that "the difference in duration between words with (acute) accent I and (grave) accent 11 i s analogous to the differences in intensity and the fundamental pitch in the two types of word. The rapid decrease in over-all intensity and the fall in the fundamental pitch in the stressed syllable of (grave) accent 11 words a r e accompanied by a shorter duration of the vowels in that syllable".

It may be added that the final syllable of grave words i s somewhat longer than that of acute accented words. It i s a s i f the speaker waits for the pitch to return to the sentence intonation level and therefore prolongs the syllable in which this return occurs ( f i rs t syllable of acute and second syllable of grave words). The relative lengths of corresponding syllables of acute and grave words may of course also contribute to the perceived differences in prominence a s well a s to the perceived difference in accent in whispered speech.

It i s interesting in this connection to note that North Swedish dialects with monosyllabic grave accented words (circumflex) display a duration relationship that i s opposite to that of the Stockholm dialect. I. e. , thc syllable with circumflex i s longer than that with the acute accent. ( ~ a h l s t e d t , K-H. : op. cit., p. 156), yet the native speakers do not "feel" a phonological length contrast. These facts a r e consistent with the idea that the syllable i s lengthened because the speaker waits for the pitch to return to the sentence intonation level.

STL-QPSR 2-3/1967

footnotes

(40) The over-all intonation contour of the phonological phrase i s discussed in great detail by P. Lieberman, op. cit. Although certain aspects of this underlying contour probably a r e univer- sa l it is not unlikely that the details of i t s shape could vary from language to language (and even from speaker to speaker) being constant for any given language (or speaker).

(41) These laryngeal commands should not be confused with the pulmonary s t ress pulses that may be observed in the intercostal muscle activity (cf. fn (44)). ,

(42) Lieberman; P. : op. cit., and von Euler, C. : "Proprioceptive Control in Respiration", i n Nobel Symposium I, Muscular Afferents and Motor Control, ed. by R. Granit (Stockholm 1966), pp, 197-207.

(43) Lindblom, B. : Wenner-Gren Foundation Report Studies of Human Speech also "Some Temporal Correlates of Stress Contours", to be published.

(44) Draper, M. H. , Ladefoged, P. , and Whitteridge, D. : "Respira- tory Muscles in Speecht1, J. of Speech and Hearing Res. - 2 (1959), pp. 16-27.

(45) Lindblom, B.: "On Vowel Reduction", STL, KTH, Report No. 29 (stockholm 1963).

(46) Fry, D. : "Duration and Intensiby aa Physical Carrelates of Linguistic Stress", .J.Acoust.Soc.Arn. - 27 (19551, pp. 765-768.

(47) In other words, the two syllables immediately following the ~ n a e t ;f thc ph,>nzWry o t r e s ~ pulse m a y be given diffcrzr-t relative prc>minance imply by adjusting their durations,

(48) Chomsky, N. and Halle, M. : Sound Pattern of English (forthcoming). Prof. Halle has been kind enough to let me see parts of the -manuscript of this book before the publication.

2 1 (49) In Swedish, the difference between [var:m kor:vlNR(~djective

t Noun), and [vbr:m + k8r:vl (Compound ~ o u n ) , w e r e 1 denotes primary s t ress , and 2 s e c o n d r y s t ress , would be due to the circumstance that the last primary s t ress introduced in the transformational cycle, was put on [kor:v] in the f irst case, and on [var:m) in the second case.

(50) Future research will show whether ET depends on the number of syllables only o r on the number of syllables plus the number - of primary s t resses introduced at the beginning of the transformational cycle.

(51) Chomsky, N, and Halle, M. : op. cit.

It is in my opinion quite premature to conclude that it is im- p

possible to expect a complete correspondence between the records of modern phonetics and the elements and processes postulated in a systernat ic linguistic theory. In fact, some of the most recent developments in phonetics indicate that cor- relations of this sort may be successfully established.

As I see it , i t i s not only possible but necessary to continue work along these lines. F i r s t the introspective skills of the auditory phonetician must be translated into objective physical

STL-QPSR 2-3/1967

cont, fn (52)

measurement techniques. Then i t may be possible to disambi- guate and sharpen these skills beyond the limits of subjective intuition. In this way we may succeed in establishing a sci- entific instrument by means of which phonological theories can be put to objective test. Naturally, in the initial stages of this work our phonetic experiments must be guided by phonological theory. As phonetic theory develops, however, i t should be increasingly feasible to substitute objective phonetic measurement for impressionistic methods wherever the lat ter a r e in- determinate.

(53) The chronology of the Scandinavian languages has been dis - cussed by E. Haugen in Language 1949, p. 307.

(54) Meyer, E.A. : op.cit., Teil I, p. 232 et sqq.

(55) Ref. in fn (17).

(56) Pike, K. : "Tone Languages", Univ. of Michigan Publications, Linguistics - 4 (1 948).

(57) Chang, N. C. T. : "Tones and Intonation in the Chengtu Dialectft, Phonetica 2 (1958), pp. 59-85. Abram;;~ , x. : "The Vowels and Tones of Standard Thai", (thesis, Columbia Univ., New York 1960).

(58) For res t , R.A.D.: The Chinese Language (Faber and ~ a b e r ) .

(59) For res t , R. A. D. : op. cit.

THE SCANDINAVIAN

ACCENT ORBIT

Fig. 11-B-14. Selection of accent pattern* from material of Fig. 11-B-3 to suggest cotranrlation of acute and grave word intonation pulrer .

Word and sentence intonation: A quantitative model · Word and sentence intonation: A quantitative model ... posed to be built up from step functions with different amplitudes and

Documents