PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a publisher's version. For additional information about this publication click this link. http://hdl.handle.net/2066/15650 Please be advised that this information was generated on 2018-07-07 and may be subject to change.
5
Embed
PDF hosted at the Radboud Repository of the Radboud ... · words beginning with strong syllables (i.e. syllables containing a full vowel) as beginning with weak syllables (i.e. syllables
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
PDF hosted at the Radboud Repository of the Radboud University
Nijmegen
The following full text is a publisher's version.
For additional information about this publication click this link.
http://hdl.handle.net/2066/15650
Please be advised that this information was generated on 2018-07-07 and may be subject to
THH P R O S O D I C S T R U C T U R E O P IN IT IA L S Y L L A B L E S IN E N G L I S H
A n n e C u t l e r ’ and David Car te r*
A B S T R A C T
Studies of' h u m a n c o n i i n u o u s - s p e e c h recogni t ion suggest that l i s teners use a s t ra tegy o f pos tu la t ing a word bounda ry , and ini t ial ing a lexical access p rocedure , at each met r ica l ly s t r u n g syllable. T h e likely success of' this s t ra tegy was here e s t im a ted against the character is t ics o f the English vocabulary. C o m p u t e r i s e d d ic t ionar ies o f Engl ish were found to list app rox imate ly th ree l imes as m a n y words be g in n in g with s t rong syllables (i.e. syllables con ta in ing a full vowel) as be g in n in g with weak syllables (i.e. syllables con ia in ing a reduced vowel) . F u r t h e r m o r e , the m ean f r equency of oc cu r r e nc e o f words b e g in n in g with s t rong syllables is near ly twice as great as that o f words beg inn ing with weak syllables. T h es e f indings m o t iv a t ed an es t imate for eve ryday speech recogni t ion that a p p r o x i m a t e l y X5% o f lexical words (i.e. exc lud ing func t ion words) will begin with s t rong syllables. In fact, in a large corpus o f s p o n t a n e o u s conv e r sa t io n 90 % o f lexical words were found to begin with s t rong syllables.
I N T R O D U C T I O N
Word recogni t ion in c o n t i n u o u s speech is compl ica ted by the ab sence o f reliable word b o u n d a r y correlates . P luman l is teners neve r the l e s s recognise words in r u n ning speech at least as eff icient ly as they recognise words in isolat ion, if' not m o r e eff icient ly ( r e f I). R ecen t s tud ies o f h u m a n speech process ing h av e sugges ted that l i s teners m a y use heur is t ic s t rategies for o v e r c o m i n g the ab sen ce o f word b o u n d a r y in fo rm a t ion . Such s t ra teg ies may al low l isteners to guide thei r lexical access a t t e m p t s by pos tu la t ing word onse t s at what l inguist ic e x p e r i e n c e sugges t s are the m o s t l ikely locat ions for word onse t s to occur.
Cu t l e r and Norr i s ( r e f 2) have p roposed such a s t ra tegy based on prosod ic s t r u c ture. In a s t ress language like English, syllables can be e i the r s t r o n g o r weak; s t rong syl lables con ta in full vowels, while weak syllables conta in reduced vowels (usua l ly schwa) . C u l l e r and Norr is found thai l i s teners w-ere s lower to de tec t the e m b e d d e d real word in miniavf ( in which the s econd vowel is s t rong) than in min- tef( in which ihe second vowel is schwa) . They sugges ted that l i s teners were s e g m e n t i n g miniii) / pr ior to the second syllable, so that de tec t ion o f mini t h e r e f o r e r e quired c o m b i n i n g speech mater ia l f rom paris o f the signal which had b e en seg m e n te d f rom one a no the r . No such diff icul ty would arise for the de t ec t io n o f mini in mime/, s ince the weak second syllable would not be s e g m e n t e d f rom the p reced ing mater ial . C u t l e r and Norr is p roposed that, in Engl ish, l i s teners use s t rong syl lables as the basis for a s e g m e n t a t i o n s t ra tegy in c o n t i n u o u s speech p r o cessing. S t r o n g syl lables are taken to be likely word onse ts , and the c o n t i n u o u s speech s t r e a m is s e g m e n t e d at s t r o n g syllables so that lexical access a t t e m p t s can be ini t iated.
M R C Appl ied Psychology Uni t , 15 C h a u c e r Rd. , C a m b r i d g e CB2 2EP.
' C o m p u t e r Labora to ry , U n ive r s i ty o f C am b r id ge , C o r n E x c h a n g e St., C a m b r i d g eCB2 3QG.
208
The success rale o f such a st rategy, how eve r , depends at least in part on how' realistically it ref lects the character is t ics o f the vocabulary. H ypo thes i s ing that s t rong syl lables may be word onse t s is unl ikely to be a very eff icient s t ra tegy for d e l e c t ing actual word o n se t s if mos t actual words do not begin with s t r o n g syllables. T h e p resen t s tudy e s t ima tes the likely success rale o f the s t ra tegy p roposed by C u l l e r and Norr is against the character is t ics o f the English vocabulary, and then tes ts it on an aciual corpus o f English conversa t ion .
W O R D - I N I T I A L S Y L L A B L E S IN E N G L I S H
T h e M R C Psychol inguis t ic Da tabase ( r e f 3) is a lexicon o f o v e r 98000 words, based on the S h o r t e r Oxford Dict ionary. O v e r 33000 en t r i es h a v e ph o n e t i c t r an scr ipt ions . Eig. I shows the prosodic character is t ics o f the initial syl lables o f the t ranscr ibed words, divided into four categories: monosy l l ab les ( s u c h as bone or splint), polysyl lables with p r imary st ress on the first syllable ( s u c h as lettuce or splendour), polysyllables with s econdary s t ress on the first syllable ( s u c h as trombone o r polysyllabi city), and polysyllables with weak initial syl lables( in which the vowel in the first syl lable is usual ly schwa, as in annoy or trapeze, bu t m a y also be a r e du ced fo rm o f a n o t h e r vowel, as in invest o r external). A n y o f the first three ca tegor ies would sat isfy the s e g m e n t a t i o n s t ra tegy proposed by C u t l e r and Norris. It can be seen that these categories t o ge th e r account for 73% o f the words analysed.
Since the p roposed s t ra tegy is a im ed at the efficient ini t iat ion o f lexical access, ho w ev e r , it is r easonab le to exc lude f rom our analysis those words whose inte rp re ta t ion in a speech con tex t relies not upon a lexical l o o k u p but upon strictly c o n te x tu a l factors; that is, it is r easonable to exc lude g rammat ica l w-ords ( s u c h as art icles, c o n ju n c t io n s and p r o n o u n s ) . T h e d is t r ibu t ion o f the p rosod ic cha rac te r i s tics o f the initial syllables o f lexical words ( n o u n s , verbs, adject ives and mos t a d ve rbs ) in the M R C D a tabase is, however , vir tual ly identical to Eig. I, s ince exc lu s ion o f g rammat ica l words reduced the total corpus size by less than 1%.
a * I a a * a a . • a • • a • • • • • • « • • •................................... .........................................I ................... ... • l a • a a a ....................... • ■ • •
. , ■ i a a a • • * • • • a a * # * * * - * • . # - a • a • • • • • • • •
J . . . • . a .................................. ... • a - • M*. a a • • a • • a ■ • • • a • • • < • a a a a , .
• • • • • « • • • • • • • • « a a • « a
Eig. I. P rosod ic categor ies as p ropor t ions o f the MRC' Database .
mono poly 1 p o 1 v 2 polyO
Fig. 2. M e a n f r equency of O c c u r r e n c e for lexical i t ems by prosodic category.
W O R D P R O S O D Y A N D F R E Q U E N C Y O F O C C U R R E N C E
The m o s t c o m m o n word type in English is clearly a polysyllable with initial s t ress. However , individual word types differ in the f r equency with w'hich they occur. F r e q u e n c y o f occu r r e nc e statistics ( r e f 4) are listed in the M R C Database . Fig. 2 shows the m e a n f r equency for the four prosodic word-ca tegor ies ( lexical words only). It can be seen that monosy l lab les occur on average far m o re f r equen t ly than o t h e r p rosodic types. Thus a l though the re are m o r e than seven t imes as m a n y polysyl lables in the language as the re are monosyl lab les , ave rage speech con tex t s are likely to con ta in a lmos t as m a n y monosy l l ab les as polysyllables. Fig. 3 sho w s an e s t ima te o f the likely d is t r ibu t ion of prosodic ca tegor ies in a real speech con tex t , de r ived f rom a co m b in a t io n o f the da ta in Figs. I and 2; this s u g gests that only 17% o f le xical tokens wi11 begin w ith weak syllables.
I / . 1 0 /0
Ò .0 / %
■ rn o n o0 Doly 1□ poi y 2□ polyO
i /o
Fig. 3. P red ic t ed d i s t r ibu t ion of prosodic ca tegor ies in real speech
2 1 0
W O R D P R O S O D Y IN A N A T U R A L S P E E C H S A M P L E
W e tes ted the e s t ima te sh o w n in Fig. 3 against a natura l speech sample , the London-Lund Corpus o f English Conversation ( r e f 5), us ing the f r equency c o u n t of this co rpus p repared by Brown ( r e f 6). T h e L o n d o n - L u n d corpus consis ts o f ap-
British Engl ish conve r sa t ion . Fig. 4p r o x im a te ly 190,000 words o f s p o n t a n e o u s British Engl ish conve r sa t ion , s h o w s the d i s t r ibu t ion o f prosodic categor ies for lexical words in this corpus . The t h r e e ca tegor ies with s t ro n g initial syllables accoun t for 90% o f the tokens ; only 10% o f the lexical words have weak initial syllables.
/ f /“S m •v.v/.%v.vav.v.%v.v.>sv.v.;.v.\v.v,v.v.x*X%vA ^ r • • • • • • • • • •
59 .54%
Fig. 4. D i s t r i b u t ion o f prosodic categor ies in the Corpus o f English Conversation.
C O N C L U S I O N
T h e d i s t r ibu t ion o f word types in the Engl ish vocabulary, c o m b i n e d with relative f r e q u e n c y o f oc cu r r e nc e across types, p rov ides an a d e q u a t e basis for the implem e n t a t i o n o f a s e g m e n t a t i o n s t ra tegy in c o n t i n u o u s speech recogni t ion whereby s t r o n g syl lables are a s s u m e d to be the onse t s o f lexical words.
A C K N O W L E D G E M E N T S
T h i s r e s e a r c h w as s u p p o r t e d by a g r a n t f r o m t he A l v e y D i r e c t o r a t e ( M M 1-069) to C a m b r i d g e U n i v e r s i t y , t he M e d ic a l R e s e a r c h C o u n c i l a n d S t a n d a r d T e l e c o m m u n i c a t i o n s L a b o r a t o r i e s . W e t h a n k G o r d o n B r o w n f o r m a k i n g ava i l ab l e t h e m a c h i n e - r e a d a b l e v e r s i on o f hi s f r e q u e n c y c o u n t o f t he L o n d o n - L u n d c o r p u s .
R E F E R E N C E S
1. E X . S c h w a b , H . C . N u s b a u m & D. B. P i s on i , H u m . F a c t o r s , 27, 395 ( 1985) .2. A. C u t l e r & D . G . N o r r i s , J. Exp . P s y c h o l : H u m . P ere . Pert', ( in p r e s s ) .3. M. C o l t h e a r t , Q u a r t . J. Ex p . P s y c h o l . J J A , 497 ( 198 1).4. H. K u c e r a & W . N . F r a n c i s , C o m p u t a t i o n a l A n a l y s i s o f P r e s e n t - D a y A m e r i c a n En g l i s h
( B r o w n U n i v . P r e s s , P r o v i d e n c e , 1967) .5. J. S v a r t v i k & R. Q u i r k , A C o r p u s o f E n g l i s h C o n v e r s a t i o n ( G l e e r u p , L u n d , 1980) .6 . G . D . A . B r o w n , Beh . Res . M e t h . Ins t r . & C o m p . / 6 , 502 ( 1984) .