Top Banner
Institute for Perception Research IPO Annual Progress Report 23 1988 Institute for Perception Research Den Dolech 2 5612 AZ Eindhoven The Netherlands Postal address: Instituut voor Perceptie Onderzoek P.O. Box 513 5600 MB Eindhoven The Netherlands Telephone Telefax National (040) 472485/756605 International +3140472485/756605 National (040) 758885 International + 3140 75 88 85
150

Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

May 11, 2018

Download

Documents

vandat
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

1l/Y/~1

Institute for Perception Research

IPO Annual Progress Report23 1988

Institute for Perception Research

Den Dolech 25612 AZ EindhovenThe Netherlands

Postal address:

Instituut voor Perceptie OnderzoekP.O. Box 5135600 MB EindhovenThe Netherlands

Telephone

Telefax

National (040) 472485/756605

International+3140472485/756605

National (040) 758885

International+314075 88 85

Page 2: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Contents

Page

4 Introduction

6 Research Programme 1988/1989

9 Organization IPO

12 Hearing and Speech

13 R. Collier and A.J.M. HoutsmaDevelopments

15 J. Smurzynski and A.J.M. HoutsmaJ.F. Schouten revisited: Pitch of complex tones having many high­order harmonics

24 A.M.L. van Dijk-KappersComparison of parameter sets for temporal decomposition ofspeech

34 P.A. van RijnsoeverA multilingual text-to-speech system

42 Vision and Reading

43 J.A.J. RoufsDevelopments

45 F.J.J. BlommaertPerceptually optimal sampling of images

55 H. de Ridder and G.M.M. MajoorSubjective assessment of impairment in scale-space-coded images

65 J.A.J. Roufs and A.M.J. GoossensPerceived quality and contrast as a function of gamma

72 Cognition and Communication

73 D.G. BouwhuisDevelopments

75 F.L. Engel and M.P.W. GeeringsQuestion presentation methods for paired-associate learning

[PO annual progress report 291988

2

Page 3: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

85 C.D.J.M. van del' Pol and H.H. EllermannMIR: A Monitor for Initial Reading

92 I. J. GraingerNeighbourhood frequency effects in visual word recognition andnaming

102 Information Ergonomics

103 F.L. van NesDevelopments

104 F.L. van NesMultimedia workstations for the office

112 Communication Aids

113 H.E.M. MelotteDevelopments

115 R.J.H. Deliege, I.M.A.F. Speth-Lemmens and R.P. WaterhamRealization and evaluation of two speech communication aids

122 Instrumentation and Software

123 L.F. WillemsDevelopments

124 J.G. JonkerA new experimentation set-up for psychoacoustic research

129 Publications 1988

144 Papers accepted for publication

153 Colophon

For information on the use of material from this report, see the last page.

3

Page 4: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Introduction

The Institute

The 'Stichting Instituut voor Perceptie Onderzoek' (IPO Foundation, Institute forPerception Research) constitutes a formal cooperation between Eindhoven Univer­sity of Technology and Philips Research Laboratories. The Supervisory Board hastwo members from Philips, two from the University and one from the NetherlandsOrganization for Scientific Research NWO. Scientists from several disciplines serveas members of the Scientific Board.

The Institute, located on University premises, welcomes guest researchers.

Events 1988

Supervisory Board

Mr C.P.M. Pijnen, member of the Executive Board of the University, has succeededthe late Dr Nijman as chairman. Professor W.A.T. Meuwese left the Board becauseof his retirement. For over 15 years, Professor Meuwese has quietly supported theinstitute, for which IPO is very grateful. He is succeeded by Professor N.H. Douben,Dean of the Faculty of Philosophy and Social Sciences.

Professor Anthony Cohen

On December 31, 1988 Professor Cohen retired as scientific adviser to IPO on Lin­guistics and Phonetics. He served IPO for a period extending over three decades,during the first as founder of speech research and after 1967 as scientific adviser.Stimulating he always was, and a wise adviser as well, over a broad range of issues.IPO expresses its deep gratitude for his participation in our research endeavours.

New Chairs

Dr Don G. Bouwhuis has been appointed professor to the new part-time Chair ofTechnology and Psychonomics (August).Dr Floris L. van Nes has been appointed professor to the new part-time Chair ofInformation Ergonomics (September).

Miscellaneous

• The compact disc with Auditory Demonstrations prepared in 1987 by IPO incooperation with Northern Illinois University and the Acoustical Society ofAmerica, will soon see its second, unchanged edition.

IPO annual progress report 29 1988

4

Page 5: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

• The Proceedings of the workshop 'Working models of Human Perception', heldin August 1987 on the occasion of IPQ's 30th anniversary, have been publishedby Academic Press.

5

Page 6: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Research Programme 1988/1989

General

IPO research is concerned with the understanding of sensory and cognitive informa­tion processing by humans, interacting with flexible information equipment, bothhardware and software. Cooperation with third parties has been indicated.

1 Hearing and Speech

Relations between sound stimuli and auditory sensations, between speech soundsand speech recognition. Systems for the analysis and synthesis of speech.

1.1 Pitch and timbre in speech and musicPitches of complex sounds. Timbre cues in relation to spectral profiles. Perceptualintegration and dissociation of noise in relation to target sounds.

1.2 Speech processing1,2

Speech analysis for high-quality synthesis. Physical correlates of speaker charac­teristics. Exploration of new techniques, such as scale-space coding and networkmodels.

1.3 Speech synthesis from keyboard1,3,4

High-quality speech from diphones for Dutch, English, German. Prosodic rules forsentence intonation and timing. Implementation of rules for grapheme-phonemeconverSIOn.

1.4 Speech recognition3,4

Psychoacoustic, phonetic and cognitive models for speech recognition.

2 Vision and Reading

Visual information processing, in particular relations between physical and percep­tual aspects of image quality, including those of text and graphics.

2.1 Brightness and brightness contrastCharacteristic visual transfer functions in both space and time.

2.2 Image quality 5

Assessment of image quality and visual performance in relation to display technol­ogy and High-Definition TV (Eureka project).

2.3 Image processing6

Efficient coding of static and dynamic images using visual coding principles, in par­ticular the newly developed Scale-Space Coding and Hermite transformations.

2.4 Reading and searchProcesses of reading and search at visual work stations. Network models of visualword recognition. Learning aspects. Reading processes in the case of magnifiedtexts. Influence of illumination spectrum.

IPO annual progress report 291988

6

Page 7: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

3 Cognition and Communication

Cognitive processes and modelling involved in communication. Knowledge represen­tation. Language usage and language learning. Interactive training. Combinationof natural language and graphics in human-computer interfaces. Multimodal com­munication.

3.1 Information dialogues3,7,8

Syntactic, semantic and pragmatic aspects of man-machine dialogues in restrictednatural language, both speech and print. User-adaptive dialogue algorithms. For­mulation of machine output.

3.2 Interactive instruction8 ,9

Acquisition of language skills (reading, speaking) using interactive systems. Acqui­sition of user knowledge about interactive equipment. Monitoring functions.

3.3 MultimodaI communicationIntegration of perceptual information and motor skills.

4 Information Ergonomics

Study of actual interactive use of information equipment. Usability aspects of newinformation systems.

4.1 Work stationlO

Speech, print graphics and direct manipulation as communication modes of workstations. Rules for use of colour and for dialogue design. Contributions to specificindustrial projects. Work station of the future. Normalization.

4.2 Consumer ElectronicsRemote controls and car equipment. Test beds. Contributions to specific industrialprojects.

5 Communication Aids

Development and evaluation of prototypes, in particular for handicapped persons.

5.1 Speech synthesis; reading aids ll ,12,13

Practical speech synthesizers for the speech-impaired. Magnifiers plus illumination.

Cooperation outside Philips Electronic Industries (Netherlands) and Eindhoven Uni­versity of Technology:

1 Research Programme Analysis and Synthesis of Speech (SPIN Programme)other participants:Research Institute for Language and Speech (OTS), Utrecht UniversityDepartment of General Linguistics, Phonetics Laboratory, Leyden UniversityInstitute of Phonetics, Nijmegen UniversityInstitute of Phonetics, Amsterdam UniversityDr Neher Laboratory (PTT), Leidschendam

2 Instituut voor Doven, St. Michielsgestel

7

Page 8: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

3 SPICOS projectother participants:Philips Forschungslabor, Hamburg/Aachen, BRDPhilips Research Laboratories, Brussels, BelgiumSiemens Zentrallabor, Miinchen, BRD

4 Polyglot Project (ESPRIT-II programme EC)other participants:Olivetti, Ivrea, ItalyCentre for Speech Technology Research, Edinburgh University, Great BritainLIMSI/CNRS, Orsay, FranceDepartment of Linguistics, Nijmegen UniversityPatras University, Patras, GreecePhilips Kommunikations Industrie AG, Niirnberg, BRDRuhr University, Bochum, BRDTriumph Adler AG, Niirnberg, BRDPolytechnical University of Madrid, Madrid, Spain

5 Project High-Definition TV (EUREKA programme EC)other participants:RAI Research Centre, Turin, ItalyIBA, Winchester, Great BritainCCETT, Cesson Sevigne Cedex, FranceThomson CSF-LER, Cesson Sevigne, FranceHeinrich Hertz Institut, Berlin, BRDBBC Research Department, Kingswood Warren, Great BritainPhilips Consumer Electronics, EindhovenPhilips Research LaboratoriesDTB, Hannover, BRDlTV Laboratories, Manchester, Great Britain

6 Department of Ophthalmology, Nijmegen University

7 Institute for Language and Technology and Artificial Intelligence (ITK), Tilburg Uni­versity

8 Research Programme Man-Machine Interface (SPIN programme)other participants:Research Laboratories Oce v.d. Grinten, VenloInstitute for Language and Technology and Artificial Intelligence (ITK), Tilburg Uni­versityDepartment of Psychology, Nijmegen University

9 Department of Educational Psychology, Tilburg University

10 HUFIT project (ESPRIT-I programme EC)other participants:Fraunhofer Institut fii.r Information und Organisation, Stuttgart, BRDHUSAT Research Centre, Loughborough, Great BritainICL, Stevenage, Great BritainBull Transac, Massy, FranceOlivetti, Ivrea, ItalySiemens AG, Erlangen, BRD

11 Instituut voor Revalidatievraagstukken, Hoensbroek

12 Eschenbach Optik, Niirnberg, BRD

13 Voice Data Systems, Utrecht

8

Page 9: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Supervisory Board

(31.12.1988)

Organization IPO

C.P.M. Pijnen (chairman)

Prof. dr N.H. Douben

Dr ir N. HazewindusDrs J. SmitsII' F. Valster

Scientific Board

(31.12.1988)

Director

Advisers

Prof. dr W.J.M. Levelt (chairman)

Prof. dr P.C. Baayen

Prof. dr J.F.A.K. van BenthemProf. dr ir R.T. BouteProf. dr S.C. DikProf. dr ir P. Eykhoff

Prof. dr L.F.W. de KlerkDr ir A. van MeeterenProf. ir O. RademakerProf. dr R.J. RitsmaProf. dr ir H. Spekreijse

Prof. dr ir K. Teer

Prof. dr H. Bouma

Prof. dr A. Cohen" (Utrecht University)Prof. dr H.C. Bunt (Tilburg University)Prof. dr S.G. Nooteboom (Utrecht University)

Nijmegen

Amsterdam

AmsterdamNijmegenAmsterdamEindhoven

TilburgSoesterbergEindhovenGroningenAmsterdam

Waalre

Group Leaders and

Coordinators

Prof. dr R. Collier,Dr A.J.M. Houtsma

Prof. dr ir J.A.J. RoufsProf. dr D.G. Bouwhuis

Prof. dr ir F.L. van NesH.E.M. MelotteII' L.F. Willems

Hearing and Speech

Vision and ReadingCognition and Communication

Information ErgonomicsCommunication AidsInstrumentation and Software

Research Associates Drs L.M.H. Adriaens* (Free University, Brussels, Belgium)Ms Drs U. Adriaens-Porzig (Siemens, Miinchen, BRD)Drs ing. J.G. BeerendsoDr D. Beroule" (LIMSI-CNRS, Paris, France)II' L.G.M. BeukII' R.J. BeunDr ir F.J.J. BlommaertDrs L.F.M. ten Bosch* (Nijmegen University)II' D.G. Broeder* (Utrecht University)Dr ir D.S.J van Compernolle* (Leuven University, Belgium)Drs C.J. van Deemter

[PO annual progress report 291988

9

Page 10: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Research Staff

Ir R.J.H. DeliegeIr J. Douma* (Philips Data Systems, Apeldoorn)Drs R. DrullmanMs Drs A.M.L. van Dijk-Kapperso

Ir J.H. EggenDrs H.H. EllermannDr B.A.G. ElsendoornDr ir F.L. EngelIr J.F. GerrissenDr I.J. Grainger (Rene Descartes University, Paris)Ms Drs J.M.M. Hanssen-J. 't HartIr J.P. van Hemert­Dr D.J. HermesIng. J.M. den HertogMs Drs J.E. Hofhuis­Ing. Th.A. de JongIr W. KraaWIr H.C. van LeeuwenDr Ch.P. Legein* (Catharina Hospital, Eindhoven)Dr ir J.B.O.S. MartensIng. G.J.J. Moonen-Ms Drs M.G.P. MuldersDr ir J.J. NeveMs Drs C. OdeoDr K. O'Regan* (Lab. de Psych. Exp., Paris, France)Dr J.R. de PijperDr H. de RidderMs P. Romero*, M. Sc.Ms Drs M.J. Sanders* (Utrecht University)Ass. Prof. Z. Shu (Xi'an Jiatong University, Xi'an, China)Dr J. Smurzyllski- (Academy of Music, Warsaw, Poland)Drs G.W.G. SpaaiDr W. Stanioch- (Polam, Warsaw, Poland)Dr J.M.B. TerkenDr ir W.D.E. VerhelstIr J.H.M. de VetDr ir L.L.M. VogtenIr R.P. WaterhamMs Drs J.H.D.M. WesterinkDrs J.M. Westhoff*

Ing. P. AllainIng. M.e. BoschmanB.e.W.M. van den BraakMs Ing. T. Broekema­Ing. A.L.G. BuijsenK.W. de GraafIng. H. van der Griendt­Ing. H.Th. de GrootA.M.F. HeuvelmansJ.P.M. van Itegem

10

Page 11: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Graduate Students

Secretaries

Librarian

Workshop

Ing. J.G. JonkerMs Ing. G.M.M. MajoorMs I.M.A. de MeyereOMs A.R. OlijslagersIr J.A. Pellegrino van Stuyvenberg*C.D.J.M. van del' PolIng. P.A. van RijnsoeverIng. J.J.B. StakenborgOIng. C. TeunissenIng. L.J.C. Theelen*Ing. W.M. Wagenaars

11' E.J.J. BierensB. Escalante Ramirez, M.Sc.Ir E.L. FreeDrs A.J.M. van HeijnsbergenC. Ma, M.Sc.11' M.R.M. NijenhuisII' G. SchoutenII' A. Storm11' N.J. VersfeldODrs A.L.J. de Zittel' (Ministry of Education, Belgium)

Ms P.J. Evers (head)Ms W.P.C.M. van Casteren*oMs M.W. BrouwerMs Y.G.J. Huyberts-van ZuidenMs I. Th.M. van Loon-Soesman*oMs A.M. Manders*oMs J. OostindjerOMs C.E.A.L. van de Water*

Ms R.M. Smith

J.H. Bolkestein (head)

A. AartsA.J.J. BruursoH.A.C.M. CompenP.C.J. KosterOJ.C. van de LaarM.J.J. van del' LubbeoL.P.H. van den Reek*

°Part timeLeft during 1988

Netherlands Organization for Scientific Research NWO

11

Page 12: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

HEARING AND SPEECH

12

Page 13: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Developments

R. Collier and A.J .M. Houtsrna

This has been a year of changes in personnel. Dr R. Collier has joined thegroup in succession to Prof. S.G. Nooteboom. The vacancy left by Dr S.M. Marcushas been filled by Dr W.D.E. Verhelst and Dr D.S.J. van Compernolle (part-time).Funding by national research programmes has brought us Dr J .M.B. Terken, MessrsN.J. Versfeld, C. Ma, R. Drullman and A.L.J. de Zitter as new colleagues. Thegroup is the poorer for the departure of Mr J.P. van Hemert and Ms T. Broekema.Ms U. Adriaens-Porzig (Siemens), Dr J. Smurzynski (Academy of Music, Warsaw)'Mr L.F.M. ten Bosch (Univ. of Nijmegen), Dr A. Kohlrausch (Univ. of Gottingen)and Dr K. Slethei (Univ. of Bergen) were guest researchers for some time.

Pitch and timbre in speech and music

Research effort is being devoted to the effect of duration on the pitch percept(s) ofsingle and simultaneous complex tones (Beerends, Houtsma), the salience of funda­mental pitch and spectral edge pitches evoked by clusters of high-order unresolvedharmonics (Smurzynski, Houtsma, this issue; Kohlrausch), spectral profile anal­ysis (Versfeld) and sound-source localization in a reverberant room (Wagenaars,Houtsma).

Speech processing

A major research effort is being directed towards improving the quality and thenaturalness of the excitation function (Willems, Eggen, Ma, Verhelst). The studycarried out on temporal decomposition was rounded off with a comparison of dif­ferent parameter sets (Van Dijk, this issue). Research on automatic detection ofvowel-onsets was completed (Hermes).

Speech synthesis from keyboard

In the domain of multilingual speech synthesis by means of diphone concatenation,work on the development of temporal adjustment rules (De Pijper, Van Hemert,Adriaens-Porzig, Vogten, Elsendoorn), the use of diphones excised from unaccentedsyllables (De Pijper, Drullman) and the integration of diphone and allophone syn­thesis schemes (Vogten, Ten Bosch) is being continued.In the prosodic domain there has been progress in the following areas: melodi­cal models for English ('t Hart, Collier), German (Adriaens) and Russian (Ode);recipes for a more lively intonation in Dutch (Terken, Collier; De Zitter, Swertsand Teeuwen, graduate students of the University of Antwerp); micro-intonation

[PO annual progress report 23 1988

13

Page 14: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

('t Hart, Terken; Damen and Mijnsbergen, graduate students of the Eindhoven Uni­versity of Technology). Research into automatic stylization of pitch contours wascontinued (Hermes).Our growing expertise has led to extensions in our experimental text-to-speech sys­tem (Van Rijnsoever, this issue; Van Leeuwen) and to improved applications in astand-alone keyboard-to-speech system (Deliege).

Cooperation

Our research efforts have continued to form part of major national and internationalprogrammes and projects, such as the 'Speech analysis and synthesis' programmefunded by the Dutch government, and the SPICaS project (Philips and Siemens).The cooperation with the Instituut voor Doven (St. Michielsgestel) has been con­tinued.

14

Page 15: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

J .F. Schouten revisited: Pitch of complextones having many high-order harmonics

J. Smurzynski' and A.J .M. Houtsma

* permanent address: Fred. Chopin Music Academy, Okolnik 2-4, Warsaw, Poland.

Abstract

Four experiments are reported which deal with pitch perception of har­monic complex tones containing many high-order, aurally unresolvable partials.Melodic-interval identification performance in the case of sounds with increas­ing harmonic order remains significantly above chance level, even if the rangeof harmonics extends from the 20th to the 30th. Just-noticeable differences inpitch of the missing fundamental increase with harmonic order, but level offat about 5 Hz when the lowest harmonic is the 12th or higher. These resultssuggest the existence of two separate pitch mechanisms in the auditory system.A primary mechanism matches low-order, resolved harmonics to a harmonictemplate, as described in Goldstein's optimum processor theory or Terhardt'svirtual pitch theory. A secondary mechanism operates on clusters of high-orderunresolved harmonics in a manner described by Schouten's residue theory.

Introduction

The working of the ear as a frequency analyser on the one hand, and a periodicitydetector on the other, has been the focus of much research and speculation since themiddle of the nineteenth century. Seebeck (1841) had shown that a periodic soundwith only a very weak fundamental still evoked a relatively strong sensation of pitchat that fundamental. Ohm (1843) dismissed Seebeck's observation as an illusionnot worth studying. Subsequent place theorists (Helmholtz, 1863; Von Bekesy,1944) explained the phenomenon by nonlinear distortion in the middle ear, whichwould resupply the fundamental as a difference tone. Schouten (1940), on the otherhand, postulated an entirely different theory in which the ear performs a frequencyanalysis with only limited resolution power. High-order harmonics, which the earfails to resolve, create a 'residue' signal at the cochlear output which has an envelopeperiodicity equal to the frequency of the fundamental, regardless of the questionwhether this fundamental is physically present or not. De Boer (1956) and Schoutenet al. (1963) were later able to show experimentally that the difference-tone theorywas totally inadequate, so that Schouten's residue theory remained as the only likelyexplanation of Seebeck's original observ~tions. It became generally considered as ageneral theory of pitch perception for pure as well as complex tones of any harmonicorder.

The first indications that something was wrong with Schouten's residue theorycame from Ritsma (1967) and Plomp (1967) who found that low-order harmonics(e.g. harmonics 3 to 5) are the dominant ones in mediating residue pitch. These

[PO annual progress report 29 1988

15

Page 16: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

harmonics, however, are known to be resolved in the cochlea and, according tothe residue theory, cannot contribute to a missing-fundamental pitch sensation. Alater study by Houtsma and Goldstein (1972) provided direct evidence for a central,rather than a peripheral origin of residue pitch. By using dichotic two-tone com­plexes, they showed that pitch sensations of a missing fundamental must be mediatedby a mechanism which operates on neural signals derived from cochlearly resolvedpartials. Modelling efforts since the early 1970s focussed therefore on central pro­cessing of aurally resolved harmonics (Terhardt, 1972; Goldstein, 1973), rather thanon physical interaction effects that might occur in the auditory periphery as a resultof limited frequency resolution power of the cochlear filter.

It is becoming increasingly more evident, however, that central frequency tem­plate models like those of Terhardt and Goldstein cannot account for all observedpitch phenomena and behaviour. If, for instance, one passes a periodic impulsethrough a high-pass filter, the sensation of (fundamental) pitch is retained even forvery high cut-off frequencies, well beyond the limit of known resolution of harmonicsin the cochlea. The same is observed with high-pass-filtered speech vowels. It istherefore quite possible that the replacement of Schouten's residue model by moderncentral frequency-template models has been too rigorous and that, in fact, bothkinds of models are needed to explain the whole range of observed pitch phenomenaassociated with complex tones. Similar observations prompted Moore (1982) topropose a dual pitch theory, with a spectral template-matching scheme operating onlow, resolved partials and a periodicity-detection scheme on high-order, unresolvedpartials. This dual model has remained largely a qualitative statement, however,and has never been quantitatively worked out, hor has it ever been systematicallytested. It is the purpose of this study to present such a test by assessing the salienceof pitches evoked by clusters of high-order unresolved harmonics compared withthose evoked by low-order resolved harmonics.

Experiment I

The first experiment was designed as a melodic-interval identification task. Musicallyexperienced subjects listened to sequential pairs of complex tones that contained onlyupper harmonics. The missing fundamental of the first tone complex was always200 Hz, and that of the second sound was a random choice from seven possiblefrequencies, beginning with 211.9 Hz (one semitone above the fundamental frequencyof the first sound) and extending in six semitone steps up to 299.7 Hz (an equally­tempered fifth above 200 Hz). Thus, if the (first) reference sound is referred to as thenote 0, the (second) test sound was chosen from the notes O~, D, D~, E, F, F" andG. After presentation of each pair of sounds, subjects had to identify the melodicinterval or, equivalently, the last note they heard, by pressing the appropriate keyon a keyboard.

All complex tones contained 11 successive harmonics which were presented in sinephase and had the same amplitude. The lowest harmonic number N was chosen atrandom for each sound over a range of 3 successive integers. The middle of thisrange of lowest harmonic numbers, which in the remainder of this paper will bereferred to as N, was the independent variable. The duration of each sound was512 ms, including a 40-ms linear rise and fall. Between sounds of a pair there was a500-ms silent period, and after each pair there was an open response interval. The

16

Page 17: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

response triggered presentation of the next tone pair after a short delay.

The reason why the lowest harmonic number N was not fixed but rather random­ized, is to prevent the subjects from using spectral pitch cues for the performance ofthe melodic-interval identification task. If N were fixed during an experimental run,the spectral edges formed by the lowest and highest harmonic and, in fact, everysingle harmonic, would trace out the same melodic interval as the missing fundamen­tal. One could afterwards never be sure that subjects' behaviour was based on thepercept of a missing fundamental. When harmonics are randomized, spectral pitchcues become useless and correct identification of interval must be based on a pitchrelated to the missing fundamental. Randomization of harmonic order does causesome 'smear' in the value of the independent variable, but as long as the random­ization range is kept small, the average of this range (N) will remain a meaningfulparameter.

Stimuli were synthesized digitally with a Microvax II computer and a 16-bit DIAconverter, using a sampling frequency of 20 kHz. Stimuli were presented throughEtymotic ER-2 insert earphones. These earphones have a flat Zwislocki-couplerfrequency response (± 1 dB between 200 and 10000 Hz, ± 5 dB between 50 and15000 Hz). This measure is indicative for the average eardrum-pressure response ofreal ears (Killion, 1984). A constant pink-noise signal at 30 dB SL was presented tomask possible aural combination tones. The subject, who was seated in a double­walled sound-insulated chamber, first adjusted the intensity of a typical complexsound until it was barely audible in the noise. Tone complexes were presented 20dB above this threshold level. Subjects were told that the first 'standard' note wasto be regarded as C, and were instructed to identify the second note by pressingkeys marked as C~, D, D~, E, F, F~ and G. Feedback of the correct answer wasprovided after each response, followed by a 3-s silence interval and presentation ofthe next stimulus pair.

Four subjects, three male and one female, participated, including both authors.All subjects had professional musical training, and worked individually in approxi­mately 30-min sessions.

The experiment was performed for the conditions N=7, N=10, N=13, N=16,and N=19. The value N=7, for instance, represents a condition of complex toneswith all harmonics between 6th and 16th, 7th and 17th, or 8th and 18th. For eachvalue of N, five runs of 63 trials were taken per subject. Results are presentedin Figure 1, where percentage correct identification is plotted against the averagelowest harmonic number N for each subject. Each data point represents the averagescore of five runs. The average score of the four subjects is represented by the dashedfunction. For N = 7 all subjects scored perfectly. For N = 10 scores are seen todrop to 75-90%, the average being 80.7%. For higher values of N, subjects scorebetween 53 and 67% correct, with a rather constant average score of 60%. Thesescores are significantly above chance level, which is one-seventh or 14.3% correct.

Examination of confusion matrices reveals a rather clustered distribution of er­rors made by the subjects. It was found that 88% of all responses was correct orfell within a semitone from the correct response. This shows that the pitch evokedby groups of 11 harmonics is sufficiently clear to recognize musical intervals with anaccuracy of not worse than a semitone. More precise measurements of this accuracyare carried out in Experiment III.

17

Page 18: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

100 q o -NV

.. o -JS90 .. ?

~\

6 -AH\0r:: 80 ..

?-MH0

III 2\\0 70- 0 0r:: .. ?Q)

60 .'q.....• -.g........~-0

-0 ~ 0Q) 50........00 •40+-~~~~~~~~~~

2 3 4 5 6 7 8 9 10 11nr. of components (M)

N=10oo

oJS

o MH•o•

70

60

100

r::o 801Oo

gS1 90

-r::Q)

-0

-oQ)

:: 50oo

197 10 13 16avo lowest harm. nr. (N)

Figure 1: Interval identification scores offour subjects for seven melodic intervalsrepresented by ll-tone complexes plottedagainst the average lowest harmonic num­ber N. Dashed line represents subject­averaged scores.

Figure 2: Melodic interval identificationscores of two subjects plotted against thetotal number of harmonics M, with N asparameter. Solid functions represent two­subject averages.

Experiment II

In this experiment the total number of successive harmonics M, which was fixedat 11 in the previous experiment, was taken as the independent variable. Usingthe same procedure as in Experiment I, recognition behaviour of musical intervalswas examined as a function of the number of harmonics in the complex for the twoconditions N = lO and N = 16. The actual numbers of harmonics used were M = 2,M = 3, M = 5, and M = 7, while data for M = 11 were available from ExperimentI. Two subjects, who also participated in the previous experiment, performed fiveruns of 63 trials for each of the eight conditions.

The results of this experiment are shown in Figure 2. For the condition N=lO,performance seems fairly independent of the number of harmonics, as long as M isfive or more. For M=3 performance falls to around 80% correct, and for M=2 toabout 63%. The relatively high identification score for M 2:: 5 and the sudden dropfor M < 5 suggest that pitch perception in this case is mediated by barely resolvedharmonics, of order 8 to lO, which are available as aural combination tones andwhose salience may grow when the number of stimulus harmonics M is increased.

For the case N = 16, scores appear to increase rather monotonically with M.This gradual rise in score suggests that, when stimulus partials are not resolvable inthe cochlea and there are no resolvable combination tones available either, anotherpitch mechanism operates which is less effective (given the generally lower scores) butwhose effectiveness does increase when the number of stimulus harmonics increases.This mechanism may very well operate in the time domain, since increasing thenumber of unresolved upper harmonics has the effect of better defining the temporalenvelope and periodicity of the signal.

18

Page 19: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Experiment III

In this experiment, differential thresholds for the fundamental pitch of ll-tone har­monic complexes were measured with an adaptive two-interval forced choice (two­down, one-up) procedure. The value of N, defined as in the previous experiments,was the independent variable in a single test. The two values of the lowest harmonicnumber N, randomly chosen for each of the complex tones in a pair, were not allowedto be the same. The purpose of this was to prevent the subjects from using spectraledges as cues for (missing) fundamental pitch discrimination.

One of the complex tones of each pair had a fixed fundamental frequency of200 Hz. The other had a variable fundamental of 200+~fHz. The temporal orderof fixed and variable tone was random with equal probabilities. Subjects were in­structed to indicate which way the missing fundamental had moved by pressing oneof two keys, labelled 'up' and 'down', after each trial. Visual feedback of the correctanswer was provided immediately following each response.

The temporal structure of stimuli and pairs was identical with those of Experi­ment I. The frequency difference ~f of the initial pair was chosen as about twice thedifference limen (OL) estimated by pilot experiments. The initial step size of 0.2 Hzwas halved after the first five reversals. The ultimate OL was estimated from themid-run average of ten reversals with the step of 0.1 Hz. Five OL estimates werecollected from each subject for N-values of 7, 10, 13, 16,19, and 25. The same foursubjects as in Experiment I participated in the test.

Results averaged for these 4 subjects are given in Figure 3 by the solid func­tion. Each data point along this function represents an average of 20 adaptive runs.Vertical bars indicate ranges of plus/minus one standard deviation of the five-runaverage scores of individual subjects. One can see that for N =7, OLs are between

Figure 3: Subject-averaged DLs as a func­tion of the average lowest harmonic number

IV N from Experiments III and IV. Bars indicatestandard deviations of mean DLs of individualsubjects.

10

12

-I60

Q)expo IIICl

co 4...Q)>co

2

07 10 13 16 19 22 25

avo lowest harm. nr. (Nl

N

:I: 8

0.4 and 0.9 Hz, with an average of 0.6 Hz. This value is quite consistent with typ­ical frequency OLs for a 200-Hz pure tone (Moore, 1973). For N=10, fundamentalfrequency OLs are nearly three times the OLs found at N =7 for all subjects. Forhigher positions of harmonics (13 ::; N ::; 25) OLs show a greater variability, butaverage OLs are almost constant with values of about 5 Hz.

19

Page 20: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Experiment IV

The fundamental pitch percept of a complex tone which comprises only low-orderharmonics is known to be rather insensitive to the phase relations between its har­monics (Houtsma & Goldstein, 1971; Terhardt, 1972; Wightman, 1973). Now thatwe have established the existence of a secondary pitch mechanism which handlesclusters of high-order unresolved harmonics, it appears likely that such a mech­anism is much more sensitive to phase shifts between those harmonics. For thisreason another experiment was performed, similar to the one just discussed, to in­vestigate the effect of phase on the salience of fundamental pitch. The criterion forsalience was the size of the difference limen DL measured in an adaptive discrimi­nation procedure.

The experiment was in every respect identical with Experiment III, except for thephase relations between stimulus harmonics. Instead of sine-phase relations, usedin all previous experiments, phases in this experiment were calculated according toSchroeder's formula (Smith et aI., 1986):

<Pn = _ 1rn(n + 1)M

(1)

where <Pn represents the phase of the nth-order harmonic and M the total numberof harmonics in the stimulus, which was 11 in our case. This phase condition will bereferred to as the 'Schroeder phase'. It will cause the time signals at the output ofbasilar membrane filters to have a minimum peak factor, i.e., a minimum differencebetween peaks and valleys in the amplitude envelope.

Subject-averaged results of this experiment are also shown in Figure 3 by meansof the dashed curve. For N =7 and N =10, DLs for zero-phase and Schroeder-phaseconditions are almost identical, with slightly higher thresholds for the latter con­dition. For higher positions of harmonics, however, DLs for the Schroeder-phasecondition increase progressively, until for N =19 or N =25 they are 70% higher thanthresholds of zero-phase complexes.

The data of Experiments III and IV therefore show quite unambiguously that,for low-order resolved harmonics, the effect of phase is negligible, whereas for high­order unresolved harmonics the effect is substantial. It is quite possible that there arephase functions other than the zero-phase and Schroeder-phase relations for whichthe fundamental frequency DL difference is still greater than the one we found andhave shown in Figure 3. No further attempt was made, however, to find such phasefunctions.

Discussion and Conclusions

When all the results of Experiments I through IV are considered, a picture emergesof two complementary pitch mechanisms which operate in different domains of auralresolution and are characterized by different properties. This picture is summarizedin the following paragraphs.

For complex tones comprising many successive harmonics, with the lowest around7 or less, pitch sensations of the missing fundamental are very clear and salient. Suchsounds contain a number of aurally resolvable components. Recognition of musical

20

Page 21: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

intervals is usually perfect, and the differential threshold for the missing fundamen­tal is of the same order as that of a pure tone around the missing fundamental'sfrequency. The pitch percept appears to be independent of phase relations betweencomponents, i.e. independent of the waveform of the stimulus.

When the lowest harmonic number N is around 10, DLs are two to three timesgreater than for N =7. Musical interval scores drop to a level significantly lessthan perfect. Differential thresholds, however, still appear to be fairly independentof phase relations between partials. Reduction of the total number of harmonicsdoes not affect identification performance until this number becomes five or less.Nevertheless, for M=2, subjects still scored better than 60% correct.

With only high-order harmonics present in the stimulus (N 2 13), identificationscores level off to a constant value independent of N, whereas reduction of the num­ber of harmonics present in the stimulus has a monotonically negative effect on pitchidentification performance. Changing sine-phase relations into Schroeder-phase re­lations increases the differential pitch threshold significantly. All this suggests that,besides a central pitch mechanism which operates on neural transformations of re­solved stimulus partials, there is a secondary mechanism which operates by per­forming some kind of temporal interference pattern detection on neural signals de­rived from clusters of unresolved partials, similar to the one described in Schouten'sResidue Theory. The cross-over point between the two mechanisms is somewherearound the 10th harmonic, which corresponds to an aural resolution power of about10%.

Results of identification (Experiment I) and discrimination (Experiment III) arequalitatively in good agreement with one another. Identification performance dropsfrom perfect at N =7 to a constant level of 60% correct for N 213, whereas dis­crimination thresholds rise from a low of about 0.25 Hz at N =7 to a constant valueof about 5 Hz for N 213. A more formal quantitative comparison between iden­tification and discrimination data was made using a Thurstonian decision model,

12

N 10I

8 expo III....JClu 6Q)

E...0 4~Ctil... 2...

07 10 13 16 19 22 25

avo lowest harm. nr. (iii)

Figure 4: Equivalent one-intervalforced-choice DLs computed from thedata of Experiments I and III.

21

Page 22: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

as was done by Braida and Durlach (1970). From confusion matrices for the con­ditions N=lO, 13, 16, and 19, and with the data of all four subjects pooled, theaverage sensitivity d' between all intervals of successive notes used in Experiment Iwas calculated (i.e., intervals of stimuli 1 vs 2, 2 vs 3 etc.). This d' represents thetypical sensitivity for a melodic interval of a semitone played with complex tones,and yields an estimate of the corresponding DL by simple interpolation. Discrimina­tion data of Experiment III were also used to obtain independent estimates of DLsfor a non-adaptive, one-interval two-alternative forced-choice procedure. For N = 7,no d' estimate could be obtained from Experiment I since performance was perfect,whereas only discrimination data were available for N = 25. Both kinds of DLs aretherefore plotted in Figure 4 for the range 10 ::; N ::; 19. They appear to be reason­ably consistent with one another except, perhaps, for N =16. The good agreementof the remaining points of the two functions suggests that identification behaviouris completely determined by discrimination limits, i.e., mistakes in melodic intervalrecognition during Experiment I can be accounted for by sensation noise in the pitchpercepts of the missing fundamentals, as represented by the frequency DLs foundin Experiment III. If short-term memory noise had played a role in absolute iden­tification, the DLs derived from the data of Experiment I should have been greaterthan those computed from Experiment III. Our subjects appear to exhibit minimalmemory noise in the identification experiments, which is consistent with the amountof their musical training and experience.

In conclusion, there is strong evidence that central pattern matching of au­rally resolved frequencies and central periodicity detection on residues of unresolvedharmonics both play a role in the establishment of a fundamental pitch sensationfrom a harmonic tone complex. The two mechanisms each have their own domainof operation, depending on whether harmonics are of low or high order, and actindependently of one another. The periodicity mechanism working on high-orderharmonics produces a pitch which is considerably less salient than the pitch evokedby low-order harmonics. It can therefore only play an insignificant role when a stim­ulus contains both low and high-order harmonics, as is the case with most naturalsounds that one encounters in speech and music.

References

Bekesy, G. von (1944) Uber die Frequenzauflosung in der menschlichen 8chnecke. Acta Oto­larynchologica, 32, 60-84.

Boer, E. de (1956) On the Residue in Hearing. Doctoral dissertation, University of Amster­dam.

Braida, L.D. & Durlach, N.!. (1970) Intelisity perception II: Resolution in one-interval para­digms. Journal of the Acoustical Society of America, 51,483-502.

Goldstein, J.1. (1973) An optimum processor theory for the central formation of tIle pitchof complex tones. Journal of the Acoustical Society of America, 54, 1496-1516.

Helmholtz, H.1.F. von (1863) Die Lehre von den Tonempfindungen als physiologt'sche Grund­lage fur die Theorie der Musik. Braunschweig: F. Vieweg & 801m.

Houtsma, A.J.M. & Goldstein, J.L. (1971) Perceptt'on of musical intervals: evidence for thecentral origin of the pitch of complex tones. Technical Report 484 MassachusettsInstitute of Technology, Research Laboratory of Electronics.

Houtsma, A.J.M. & Goldstein, J.L. (1972) The central origin of the pich of complex tones:

22

Page 23: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

evidence from musical interval recognition. Journal of the Acoustical Society of Amer­ica, 51, 520-529.

Killion, M.C. (1984) New insert earphones for audiometry. Hearing Instruments, 35, 45-46.

Moore, B.C.J. (1973) Frequency difference limens for short-duration tones. Journal of theAcoustical Society of America, 54, 610-619.

Moore, B.C.J. (1982) An Introduction to the Psychology of Hearing. London: AcademicPress.

Ohm, G. (1843) Uber die Definition des Tones, nebst daran geknlipfter Theorie del' Sireneund iihnlicher tonbildender Vorrichtungen. Annalen fur Physik und Chemie, 59, 513­565.

Plomp. R. (1967) Pitch of complex tones. Journal of the Acoustical Society of America, 41,1526-1533.

Ritsma, R.J. (1967) Frequencies dominant in the perception of pitch of complex sounds.Journal of the Acoustical Society of America, 42, 191-198.

Schouten, J.F. (1940) The residue and the mechanism of hearing. Proceedings of the Ko­ninklijke Akademie van Wetenschappen, 43, 991-999.

Schouten, J.F., Ritsma, R.J. & Cardozo, B.L. (1962) Pitch of the residue. Journal of theAcoustical Society of America, 34, 1418:-1424.

Seebeck, A. (1841) Beobachtungen libel' einige Bedingungen del' Entstehung von Tonen.Annalen fur Physik und Chemie, 53, 417-436.

Smith, B.K., Siebern, U.K., Kohlrausch, A. & Schroeder, M.R. (1986) Phase effects in mask­ing related to dispersion in the inner ear. Journal of the Acoustical Society of America,80, 1631-1637.

Terhardt, E. (1972) Zur Tonhohenwahrnehmung von Kliingen II: ein Funktionsschema. Acus­tica, 26, 187-199.

Wightman, F.L. (1973) Pitch and stimulus fine structure. Journal of the Acoustical Societyof America, 54, 397-406.

23

Page 24: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Comparison of parameter sets for temporaldecomposition of speech

A.M.L. Van Dijk-Kappers

Abstract

Temporal decomposition of a speech utterance results in a description ofspeech parameters in terms of overlapping target functions and associated targetvectors. Although developed for economical speech coding, this method alsoprovides an interesting tool for deriving phonetic information from the acousticspeech signal. The target vectors may correspond to idealized articulatorytargets; the target functions describe the temporal evolution of these targets.

The speech parameters used by Atal when he proposed this method (1983)are the log-area parameters. Our modified temporal decomposition method(1987) also works with these parameters as input. However, in principle, mostcommonly used parameter sets can be used. In this paper we compare theresults for six different sets of speech parameters.

The main performance criterion will be the phonetic relevance of the targetfunctions. The phonetic interpretation of the target vectors and the resynthesisof the speech signal will also be considered as criteria.

From our experiments, we will conclude that the filter bank output parame­ters and the log-area parameters are the most suitable parameter sets availablefor temporal decomposition.

Introduction

The temporal decomposition method, proposed by Atal (1983) for economical speechcoding, decomposes the speech signal into overlapping units, each described by atarget function and a target vector. No use is made of any explicit phonetic knowl­edge. Our aim is to see to what extent these units can be interpreted phonetically.The target vectors may correspond to idealizedarticulatory targets whose temporalevolution is described by the target function. We have shown (Van Dijk-Kappers &Marcus, 1987, 1988) that promising results can be obtained with some modificationsand extensions of the original method.

Both Atal's original and our modified method use log-area parameters as input.Although these parameters have yielded reasonably satisfying results, it is not in­conceivable that better candidates exist. In fact, temporal decomposition resultsobtained with different parameter sets have been reported in recent papers (Cholletet al., 1986; Ahlborn et al., 1987; Bimbot et al., 1987).

In this paper we compare temporal decomposition results based on six differentsets of speech parameters. The main performance criterion will be the correspon­dence of target functions to phonemes or subphonemes, since this gives a goodindication of the phonetic relevance of the decomposition. In addition, the phoneticinterpretation of the associated target vectors is considered. We also evaluate the

IPO annual progress report 29 1988

24

Page 25: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

quality of the resynthesis after temporal decomposition for those parameter sets forwhich resynthesis is possible.

The work reported here is part of a project for studying the relationship betweenthe target functions and vectors obtained and a phonetic transcription of the sameutterance. The results may provide deeper insight into the structure of the speechsignal. Such knowledge can be applied to economical speech coding or speech syn­thesis. Applications of the method as preprocessor for automatic speech recognitionor transcription may also be feasible.

In the following sections we give a brief description of the temporal decompositionmethod, introduce the speech parameters used and analyse the performance of thespeech parameters according to the above-mentioned criteria. Finally, we discussthe results and draw some conclusions about the most convenient parameter spacesin which the target functions and vectors should be determined.

Temporal decomposition

Temporal decomposition of speech is based on the assumption that, given some suit­able parametric representation of the input speech, coarticulation can be describedby simple linear combinations of the underlying targets. If we represent the kth

target by a target vector a(k), and the temporal evolution of this target by a targetfunction 4>dn), the observed speech parameters y(n) can be approximated by thelinear combination of target vectors and functions

K

y(n) = L a(k)4>k(n), 1:S n :S N (1)k=l

where y(n) is the approximation of y(n). The frame number n represents discretetimes and varies between 1 and the total number of frames N of the utterance. Thetotal number of targets within the utterance is given by K. The target vectors andfunctions, their number and locations are unknown in this equation. In solving theequation, the target functions are determined first by the method described by VanDijk-Kappers and Marcus (1987, 1988). The second step consists in determiningthe optimal acoustic target vectors by minimizing the difference between y(n) andy(n).

In principle, any suitable set of speech parameters y(n) can be used. In thispaper we compare the performance of six different parameter sets, namely log-areaparameters (LA), reflection coefficients (RC), area coefficients (A), log-area ratios(LAR) (e.g. Viswanathan & Makhoul, 1975), formants (F) (Willems, 1986) andfilter bank output parameters (FB) (Sekey & Hanson, 1984). The first five sets arederived from the prediction coefficients of an LPC analysis; the source parametersdo not playa role in temporal decomposition. The last set is based on the outputof a filter bank.

An example of the decompositions of the utterance / dapala / using different setsof input parameters can be seen on the left side of Figure 2. From the top downwards,RC, LAR, A and LA have been used. As can be seen, not only the number oftarget functions but also their locations can vary considerably from one set to theother. Consequently, the corresponding target vectors can be very distinct. Thusthe performance of the temporal decomposition method is indeed sensitive to thechoice of input parameters. The following sections will deal with these phenomena.

25

Page 26: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Phonetic relevance of the target functions

Most phonemes are assumed to contain one target position. Clear exceptions areplosives and diphthongs, both containing two target positions. In the following, wewill use the term phone for a speech unit containing one target position. Thoughthis gives a simplified view of reality, it makes it possible to judge the phoneticrelevance of the decomposition. An important criterion for good performance is aone-to-one relation between target functions and phones. Thus each target functionwill be associated with a particular phone, and for each phone the number of targetfunctions associated with it will be counted.

Experimental procedure

A small data base was constructed consisting of CVC combinations embedded in aneutral context: IdaCI VC2a;' The consonants Cl and C2 were one of the phonemesIll, Iml, Ibl or Ipl and V was one of the short vowels lal, Iii or 10;' For practicalreasons the size of the data base had to be restricted. Although the phonemes usedin this data base are not representative for all possible phoneme classes, they willsuffice for this experiment.

Each of the 48 possible combinations was produced by a single male speaker.Phonetic labelling of the CVC combinations was carried out by hand. Closure andburst of the plosives were labelled separately. Temporal decomposition analysiswith the six parameter sets was carried out for all 48 utterances. Subsequentlyevery target function was assigned to a phone.

Results

The percentage of phones associated with 0, 1, 2 or more target functions wasdetermined for each set of input parameters. The results are shown in Table l(A}.

Table 1: Percentages of phones associated with 0, 1,2 or more than2 target functions. The results given are averaged over all phones,including and excluding the bursts, respectively.

A. including bursts B. excluding burstsparameters 0 1 2 >2 0 1 2 >2FB 19.5 64.9 15.1 0.5 0.0 79.0 20.3 0.7LA 17.8 63.2 18.4 0.5 1.4 73.2 24.6 0.7

F 14.1 58.9 23.2 3.8 0.0 63.8 31.2 5.1A 21.1 47.0 29.7 2.2 3.6 53.6 39.9 2.9LAR 16.2 43.2 34.1 6.5 0.0 45.7 45.7 8.7RC 14.1 41.6 37.3 7.0 0.0 40.6 50.0 9.4

As we aim at a one-to-one correspondence of target functions to phones, thesecond column gives the best indication of performance. The best sets are FB andLA, with about 64% of phones being described by only one target function andvector. The other columns do, however also reveal important information. A highpercentage of phones associated with precisely one target function is useless if theremaining phones are not associated with any target function and are thus notdetected. As can be seen in the first column, all sets show an unacceptably high

26

Page 27: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

percentage of missed phones. Since the bursts are only of very short duration, thequestion arises as to whether these phones can be correctly modelled by a targetfunction which, of necessity, has a longer duration. It might be expected that a fairnumber of them will not be detected.

In Table 1B the results are shown for the same phones, but excluding the burstsof the plosives. It can be clearly seen that the overall results are much improved, allpercentages in the first column showing a dramatic decrease, while all percentages inthe second column are increased as compared to the results in Table 1A. Especiallythe relatively good parameter sets of Table lA, such as FB and LA, profit from thisalternative presentation of the results.

It should be noted that the remaining percentages of missed phones do not alwaysindicate a gap in the sequence of overlapping target functions. This can rather beattributed to strong coarticulation, so that two consecutive phones are associatedwith the same target function. Only in the case of the closures of voiceless stopconsonants of relatively long duration is a real gap sometimes found. This is easilyunderstandable as, in these particular cases hardly any speech signal exists.

The order in which the results of the parameter sets are presented in this tableis an indication of their performance. The FB are therefore the most suitable inputparameters for temporal decomposition if the criterion is a one-to-one correspon­dence of target functions to phones. The historically most often used LA occupy a

2000

•1800

1600

1400

N

~ 1200Lf:'

1000

800

600

•o 0 a..0~~<J3~ 80•• tr. 0­

••.;1

700600500400300

400 ~------,_-__~ ~ ~ _

200F, (Hz)

Figure 1: Plot of the first two formants F1 and F2 for target vectors and actuallyrealized middle frames of vowels. The target vectors are associated with the shortvowels /a/ (e), /i/ (.) and /0/ (A). The middle frames are taken from the samevowels: /a/ (0), /i/ (D) and /0/ (6).

27

Page 28: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

respectable second place. The F still give reasonable results, but the A, LAR andRC turn out to be unsuitable in this respect.

Phonetic relevance of the target vectors

The target vectors have the same dimension as a frame of the input parameters. Aswe started our research using the LA, we will first investigate the interpretation ofthe acoustic target vectors determined with the LA, restricting ourselves to targetvectors belonging to vowels associated with precisely one target function. We willthen extend our findings to the other sets of parameters.

Target vectors of log-area parameters

In order to obtain more vowels associated with precisely one target function andvector, the data base was extended by two more productions of the same utterancesby the same speaker. For all vowels associated with only one target function, thetarget vectors in the LA space were transformed to the formants and bandwidthsspace. Subsequently, the first two formants (F 1 and F2) of all these vectors wereplotted, since these two formants are known to be perceptually most relevant forvowels. The result is shown in Figure 1, where the target vectors associated withan /a/, /i/ and /0/ are represented by filled circles (e), filled squares (.) and filledtriangles (.), respectively. As a reference, the actually realized middle frames ofthe same vowels are also shown. These /a/, /i/ and /0/ are represented by opencircles (0), open squares (0) and open triangles (t:,), respectively.

On this figure a few observations can be made. The target vectors cluster in threeseparate groups of points. The actually realized phonemes form separate groupswhich are slightly more compact. The two clusters belonging to the same phonemedo not coincide, although there is a fair amount of overlap. This can be observedmost clearly for the phoneme /0/. One might argue that this is due to the fact thatthe target vectors represent idealized targets and are thus not necessarily realizedin the acoustic speech signal. However, in that case one would expect compacterclusters, as all different realizations of the same phoneme are supposed to belongto the same target. A more plausible explanation is that the normalization of thetarget functions which directly influences the length of the target vectors, and thusalso the values of F 1 and F2 , is not optimal for all cases (Van Dijk-Kappers, 1988).Unfortunately, this cannot be solved without making use of phonetic knowledgewhich we explicitly excluded.

Target vectors of the remaining parameters

The conclusions with respect to the phonetic relevance of the LA target vectorscan be extended to RC, LAR and F. As a result a comparative analysis of thetarget vectors in the F 1-F2 plane makes no sense. Changing the length of A or FBtarget vectors has no effect on the formants. However, sometimes the values of theA parameters turned out to be negative and thus unphysical. Unphysical valueswere also found for RC and F. It may be clear that target vectors consisting of oneor more physically uninterpretable parameters can never model a target position.

28

Page 29: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Implementing phonetic knowledge in the temporal decomposition method will notsolve this problem.

The FB parameters have not been transformed into formants and bandwidths,since that requires special techniques which were not available for oUf research. Thetarget vectors transformed to log-amplitude spectra showed the same kind of spreadas LA vectors. This is, however, difficult to quantise.

Resynthesis

The temporal decomposition method of Atal was originally proposed for economicalspeech coding. Although we have a different purpose, it remains useful to evaluatethe quality of the resynthesized speech signal. There are two ways to test the qualityof the resynthesis. First, the speech signal can be evaluated perceptually. However,a suitable synthesizer was not available for all parameter sets, which makes anextensive perception experiment impossible. The second way to evaluate the speechquality consists of determining a physical error defined as the difference between theoriginal and the resynthesized speech parameters. Here we will confine ourselves tothe latter possibility.

Physical errors in the resynthesis

Comparison of the physical errors of different parameter sets is only meaningful ifthese errors are computed in the same parameter space. Since not all the sets aretransformable into one another, only LA, A, LAR and RC will be compared in thisway. We will use the LA as a reference; all reconstructed speech parameters will betransformed to the LA space.

err.

RC_~ 81.2

LAR_~ 75.5

~A _ '" " 79.9

LA_~

~

49.2

-t 200ms 1-

Figure 2: Target functions belonging to the reflection coefficients (RO), the log-area ratios(LAR), the areas (A) and the log areas (LA), next to the resynthesis error of the OVOutterance Idapala;' A further explanation of this figure is given ill the text.

We will use a simple Euclidian error measure. The error Er(n) for one particularframe is defined as

29

Page 30: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

(2)

Both Yi(n) and Yi(n) consist of LA parameters, but the optimization of Yi(n) (i.e.the determination of the target functions and vectors) has taken place in the variousparameter spaces. We sum this error Er(n) over 50 frames in the middle of theutterance, yielding an error measure Err. This, of course, is a relative measure,which can only be used to compare the reconstructions of the same utterances.

In Figure 2, next to the target functions, the Euclidian error signal Er(n) isshown. The vertical bars under the A-error signal indicate the locations whereunphysical (i.e. negative) values were obtained. The number at the right-hand sideof this figure represents Err.

Temporal decomposition attempts to describe the speech parameters with a lin­ear model. A parameter set is suitable for linear modelling only if the error signalis small and varies little in time; peaks in the error signal indicate locations wherethis model is not satisfactory. In the example of Figure 2 it can be seen that theerror signal of the A shows high peaks, confirming once more that the A are notvery convenient parameters for temporal decomposition. Although none of the otherparameter sets produces a consistently flat error signal, the achievements of the LAare most satisfactory in this respect. The measure of error Err is also the smallestin the LA case. Due to an almost identical decomposition, the error signals of theRC and the LAR are very much alike in this example. However, even here, the errorof the RC is the larger of the two.

Most of these observations hold for all examples studied, even when there is aconsiderable difference in the number of target functions; only in a few cases is theerror signal of the LAR smaller than that of the LA. The decoded LAR alwaysdescribe the speech signal better than the RC, and of the four sets the A usuallyperform worst, although that it not visible in this particular example.

Resynthesis using mixed parameter spaces

So far the determination of the target functions and the subsequent computation ofthe target vectors has always taken place in the same parameter space. However,since the target functions are dimensionless, it is possible to use them in anotherparameter space than the one in which they have been determined, thus possiblycombining the advantages of two spaces.

Given the results of the previous sections, the LA space is an obvious choice forthe determination of the target functions. The acoustic target vectors can then becomputed in the various parameter spaces and the resulting resynthesis errors canbe compared using the same error measure as before. An example of this approachcan be seen in Figure 3A.

The fact that the error signals of the LA and the LAR are identical, is due to thespecific coherence of the two spaces. This means that these two spaces are equivalentfor the computation and interpretation of the target vectors.

Another example of the decomposition of the same utterance / d8lo18 / is given inFigure 3B. There the target functions shown are derived in the RC space, yieldingconsiderably more target functions. However, comparison of the LA error signalsdemonstrates that the error Err is greater if the RC functions are used. Since the

30

Page 31: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

err. err.

RC 50.0 49.0-

LAR_ 39.8 44.6

A 62.2 99.9II

LA 44.6-

A B-..1 200ms 1-

Figure 3: A) Target functions of the utterance Idijlolal determined in the LA space.Using these functions, the optimal target vectors are computed in the RC, LAR, A andLA space, yielding the plotted error signals. B) Same as A, only in this case the targetfunctions are determined in the RC space.

LA and LAR error signals are identical, this also holds for the LAR. Also the A errorsignal is smaller when the LA target functions are used. In this particular examplethe RC error signal is smallest if RC functions are used, but quite often the oppositeis true, which signifies that RC functions are not always optimal, even in RC space.

A final possibility that we have examined is the use of FB target functions forreconstruction in the LA space in the hope of achieving still better results. Althoughthis sometimes led to better descriptions of the original speech signal, more often theerror signals obtained were larger. As it is impossible to transform the reconstructedFB parameters to the LA space we have not been able to compare the error signalsof both spaces.

Discussion and conclusions

From the previous sections it can be concluded that, up till now, the LA and theFB are the most suitable parameter sets available for temporal decomposition. Themain criterion has been the phonetic relevance of the target functions measured asthe number of target functions per phone. If weight is given to the criterion of speechquality after resynthesis, the FB cannot be classified as a suitable set, because itwas impossible to resynthesize the speech signal.

Although all parameter sets describe the same speech signal, there is a strikingdifference in the number of target functions obtained in the various parameter spaces.There seems to be a direct relation between the time variations of a parameter setand the number of target functions found. This is understandable, since the targetfunctions are constructed by imposing boundary conditions on linear combinationsof the parameters within a certain window. If the parameters exhibit capriciousbehaviour in time, as is the case for the RC and the LAR, the window has to be

31

Page 32: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

small in order to find a target function satisfying the imposed restrictions. This,of course, results in more target functions than in the case of more slowly varyingparameters.

The phonetic interpretation of the target vectors is somewhat dubious. Probablydue to a nonoptimal normalization of the target functions, they do not representidealized target positions. However, if LA parameters are used, the target vectorsbelonging to the same vowel form more or less distinct clusters in the F 1-F 2 plane.This indicates that they could still be used for classification.

As regards resynthesis, the LA also turned out to be one of the most suitableparameter sets, often yielding the smallest resynthesis error even when, compared tothe other sets, the reconstruction of the signal was achieved with the lowest numberof target functions and vectors. If LA target functions were used, better resultswere also obtained in other parameter spaces, especially in the LAR space, wherethe results are even identical. Although the RC are almost identical to the LAR,they invariably perform worse, mainly due to the occurrence of unphysical values.Viswanathan and Makhoul (1975) already reported that, for speech transmission,the optimal transformation of the RC were the LAR.

In recent literature, successful temporal decomposition results are reported usingthe LAR (Ahlborn et al., 1987; Bimbot et al., 1987; Chollet et aI., 1986; Marteauet al., 1988; Niranjan & Fallside, 1987)' which is not in agreement with our results.However, in their experiments the target vectors are assumed to be known, leavingonly the target functions to be determined. As we have shown (Van Dijk-Kappers,1988)' identical target functions yield identical target vectors in the LA and LARspace. This also holds the other way round: identical target vectors yield identicaltarget functions. Thus, using temporal decomposition in this way, the LAR areindeed good candidates, although the LA will perform equally well.

In this paper we have shown that the choice of the parametrization of the speechsignal is very important for temporal decomposition and has far-reaching conse­quences. Possibly due to their close relationship to the positions of the articulators,the parameters originally proposed by Atal (1983), namely the log-area parameters(LA), turned out to be a suitable choice. If resynthesis is not required, the FB areeven more suitable.

Acknowledgement

This research was supported by the Foundation for Linguistic Research, which isfunded by the Netherlands Organization for Scientific Research, NWO. The authorwishes to thank E. van Mierlo of the University of Utrecht for making availablecomputer programs for the filter bank analyses and the students F.J. Benning andL. W. Lemmens from the Eindhoven University of Technology for carrying out partof the experiments. Thanks are also due to many IPO colleagues for their commentson various versions of this manuscript.

References

Ahlbom, G., Bimbot, F. & Chollet, G. (1987) Modeling spectral speech transitions using tem­poral decomposition techniques, Proceedings ICASSP, 13-16.

32

Page 33: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Atal, B.S. (1983) Efficient coding of LPC parameters by temporal decomposition, Proceed­ings ICASSP, 81-84.

Bimbot, F., Ahlbom, G. & Chollet, G. (1987) From segmental synthesis to acoustic rulesusing temporal decomposition, Proceedings 11th ICPHS, 5, Tallinn, 31-34.

Chollet, G., Grenier, Y. & Marcus, S.M. (1986) Temporal decomposition and non-stationarymodeling of speech, Proceedings 3rd EUSIPCO, 365-368.

Dijk-Kappers, A.M.L. van (1988) Comparison of parameter sets for temporal decomposi­tion, IPO Manuscript 652, submitted to Speech Communication.

Dijk-Kappers, A.M.L. van & Marcus, S.M. (1987) Temporal decomposition of speech, IPOAnnual Progress Report, 22, 41-50.

Dijk-Kappers, A.M.L. van & Marcus, S.M. (1988) Temporal decomposition of speech, IPOManuscript 608, submitted to Speech Communication.

Marteau, P.F., Bailly, G. & Janot-Giorgetti, M. T. (1988) Stochastic model of diphone-likesegments based on trajectory concepts, Proceedings ICASSP, 615-618.

Niranjan, M. & Fallside, F. (1987) On modelling the dynamics of speech patterns, Proceed­ings European Conference on Speech Technology, Edinburgh, 71-74.

Sekey, A. & Hanson, B.A. (1984) Improved 1-bark bandwidth auditory filter, Journal of theAcoustical Society of America, 75, 1902-1904.

Viswanthan, R. & Makhoul, J. (1975) Quantization properties of transmission parametersin linear predictive systems, IEEE Transactions on acoustics, speech and signal pro­cessing, 23, 309-321.

Willems, L.F. (1986) Robust formant analysis, IPO Annual Progress Report, 21, 34-40.

33

Page 34: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

A multilingual text-to-speech system

P.A. van Rijnsoever

Abstract

A tool that allows researchers to combine and evaluate their contribut,ions tothe overall system is needed for the development of high-quality speech synthe­sis. A program that flexibly integrates various modules, such as grapheme-to­phoneme conversion, phoneme-to-diphone conversion, duration and intonationcontrol and synthesis has been developed to this end. A command structure hasalso been provided to obtain help, change program settings or to display resultsof the different stages of the program. The system handles diphone speech syn­thesis for Dutch, English and German. This paper describes the characteristicsof the experimental system in some detail.

Introduction

Human conversion of an arbitrary text into speech sounds implies considerableimplicit knowledge of language. Most people know how to translate characters(graphemes) into corresponding speech sounds (phonemes or allophones) and, withunderstanding of the context, they can solve ambiguities in word pronunciation. Fur­thermore, humans can convey information about the meaning of the text by means ofword or sentence accentuation and by indicating major syntactic boundaries. Theseproperties, referred to as prosody, determine the sentence melody. The temporalstructure of the utterance is closely related to sentence melody. These processes andproperties have to be simulated in an automatic text-to-speech system.

To be able to generate arbitrary utterances from text, a basic speech unit must bechosen. The basic unit could be phonemes, but experience has shown that syntheticspeech generated by merely concatenating phonemes is not acceptable as such; thetransitions between phonemes are of critical importance for speech perception, andmuch effort must be put into generating the transitions correctly. Another basicconcatenation unit is the diphone, which consists of the second half of one phonemeand the first half of the next phoneme. The transitions between phonemes are nowencoded in the diphone, and the diphone boundaries consist of the steady state partof the phonemes. This facilitates concatenation, with better results. Using about50 phonemes in a language will result in somewhat less than a quadratic number ofdiphones.

The source-filter model, representing the human vocal tract, is found to be a sat­isfactory model for the acoustic representation of the speech sounds (Vogten, 1983).The model parameters are: the amplitude (gain), a parameter indicating whetherthe speech signal is voiced (periodic) or unvoiced (noisy), the repetition frequency(pitch) of the source signal (if voiced) and filter parameters that characterize thespectral envelope of the speech signal. As the speech signal remains perceptuallyconstant within an interval of about 10 ms, a diphone may be constructed from

[PO annual progress report 231988

34

Page 35: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

speech frames with an update frequency (frame duration) of 10 ms. The utteranceis generated by concatenating frames of various diphones. The frame duration isused to control the temporal structure of the diphones. Figure 1 presents an exam­ple of the representation of the speech parameters of an utterance.

text to speech conversion, using diphones

N~ 2

5000 ['~; . ~,' ~.;.", ,.' .. .- ::-. ./"/'";''.;- .c',. r·· -~'\~ ..... 1: 5:: ./ ~< '-.' ".; ...~ '.,.- .-' : '''''.. ' ..,- . .' .. 'J ::.::. ........,:: /~': _

::::::l

400 J::J: 200 rIE 100 - - ", . .//" --~'/~"""'-

50 ~=========;=========;=='==~========;===;='==~~~=..,~=====;8 5 .-.:.-'-1 r~ '." 'J1,' -- ~ _.- - - .•.- .• , ";r.... '(~I ') r'lf .' /4 #', ij . "fI.· t~ \.': .';:, 1

4..... J.' .1. 'i'. . :. ,,",

o-"~o 4 . . .. ' "-:""., 't' •.' ..... If .... ,t, r l'

~, \ ~.J/t. r ·to· , •. ,\ .''1/1 ,,: •\r:, II J-•... ';.\ :•.' ... ~, ~..ll',,''!,''... :..,. ~'f'.,t" "II. \f'../l" .... .' ." :' ....) ~. ,,' .",,~\. ~ .! ..•, '., _Ilr..... '.' ., J; '. '. Y. #i' . ',. ". II

3 .... P.,·.. .,' tI ·1....·· ,\1, '. '\". "h\' ' ..,. ~ t t .0°.:. -.''\'! .. ~ - .. ~ V .- t-:, .,j;........:. '\ •.~:... '"I\,.:.l.\,'::t·"'';A,~·'''. ,::.~, ;_;..;.~\

. r;""'~'/"'/" .~.:: ,. r"'~': '. I: 'r

.:,... '.-,' ti\.,';~." :'",,:':<1.":. '.':'.' ",,_, .1·:.....;..·7..,: '\ ..... V,\" .'~.~,. ';" :.~...I·;: .

. . '4....~.

'_"': -... J'._ . . ''-. _ :-••:--.:-'....._ "-'---: .;,. ___o+--'-~~-~~-:T::~-~~~-"'--~~~'~'T-~--~-TI~~~~-~-----t0.0 1.0 2.0 t (8) 3.'0 4.0 5.0

Figure 1: Display of speech parameters. In t.he top and middle window, resp. the am­plitude and pitch are displayed on a logarithmic scale, in the bottom window the formantfrequencies and selectivities are displayed on a linear scale. A dot (.) above the top windowindicates the first frame of a diphone. A dot (or line) above the middle window marks anunvoiced frame.

The purpose of our text-to-speech system is to allow researchers to do exper­iments and to evaluate their contributions to the whole text-to-speech conversionfor Dutch, English and German, without putting too much effort into pre and post­processing. Therefore our text-to-speech system contains the following modules: aprosody module for deriving prosodic features from the input sentence, a grapheme­to-phoneme conversion module which converts the letters into speech sounds, aphoneme-to-diphone module to select the diphones and collect the frame data, a du­ration module which controls the temporal structure of the utterance, an intonationmodule to generate a sentence melody and a synthesis module which enables theframe data to be synthesized and obtain the sampled data which can be sent to aD/ A converter and a loudspeaker.

As a research tool, the system must be able to contain several parallel routinesfor the same task, so that their merits can be compared and evaluated. When a userwants to evaluate a specific part of the system, functions must be available so that theuser can concentrate on that specific part of the program. This requires functionsfor control of the program flow and display and storage of results at particularprogram stages. To this end a command handler has been implemented providinga large summary of commands, including an extensive 'help' library. An interruptcan be given while processing without leaving the program. This cancels the currentprocessing as soon as possible and returns control to the input level. Moreover, an

35

Page 36: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

error recovery procedure is implemented to prevent the system from crashing onsevere errors which prevents the user from losing data and program settings.

Description of the system

The system provides text-to-speech conversion for the Dutch, English and Germanlanguage. The basic structure of the system is shown in Figure 2.

start )J,

I initializations II

I r help l'J, 1

I input II

I I command handler I+ 1

I prosodic analysis II

+Igraphemes to phonemes:

+I phonemes to diphones ~

+I duration control I

r+

I intonation control Ir

J,

I synthesis II

+ stop

Figure 2: Flow chart of the system (explanation ill text).

A module can contain more than one module task routine. The user is able toselect a routine by means of commands. Timer functions can be set to keep trackof the performance of each module. Figure 2 shows the modules which are availableand implemented at the moment. However, an ideal system should contain addi­tional, and more refined modules from which the existing modules can benefit. Forinstance, a morphological parser would greatly facilitate the grapheme-to-phonemeconversion, and a syntactic analyser do likewise for the generation of intonationcontours. At present, some grapheme-to-phoneme converters implemented in the

36

Page 37: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

system have additional features, such as word stress assignment, generation of ac­cent markers or conversion of numbers and abbreviations into plain words. Thiscould lead to an overlap with existing or future modules. The command structurewithin the system (analogous to the VAX/VMS 1 command structure) thereforeallows the user to define a path in the text-to-speech system by selecting, enablingor disabling modules.

Initializations

At the start of the program some initializations have to be done. The Dutch languageis chosen by default, resulting in the selection of the proper modules for Dutch text­to-speech conversion and a Dutch diphone inventory. The same actions take placeeach time a different language is chosen.

Input module

In the input module, a text can be read either from the terminal or from a file.A command for the system is distinguished from normal text by typing a slash atthe beginning of the input line, followed by the command. The input lines to beprocessed are automatically stored in a number of temporary buffers to allow editingor recalling previous sentences. These buffers contain the last fifteen textlines. Forlonger storage the temporary buffers can be stored in more permanent buffers (forthe duration of the program) or in files on disc.

Command handler

If a line is preceded by a slash, it is interpreted as a command and sent to thecommand handler. The command handler parses the command line and interpretsand executes the command. The large set of commands includes those which displayresults of modules (display of phonemes, parameters, etc.), which show the currentstate of program settings (which module is used or diphone inventory is used, etc.),which list the available items of a particular part of the program (the modulesavailable or diphones present in the inventory, etc.), which change program settings(changing diphone inventory, modules, etc.), which read and edit text (files) andwrite and store results (of each module). A help function is provided to obtaininformation on commands or program topics. The structure of the commands andhelp functions is the same as those of the VAX/VMS. In this way the user is quicklyaccustomed to the program.

Prosodic analysis

The prosody module carries out a prosodic analysis of the input sentence. At themoment a prosodic analysis is only available for Dutch (Kager & Quene, 1987). Theprosody module enriches the input sentence with markers for major syntactic bound­aries and sentence accents. This is derived automatically from orthographic sentenceproperties. Essentially, both phrasing and accentuation are based on the distinctionbetween function words (e.g. articles, pronouns, auxiliary verbs, prepositions) and

lYAX/YMS is an operating system of Digital Equipment Corporation.

37

Page 38: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

content words (e.g. nouns, adjectives, most verbs), in combination with the theo­retical framework provided by nonlinear sentence phonology. Function words andinherently (de)accentuated words are stored in separate lexica.

Grapheme-to-phoneme conversion

The conversion of graphemes into phonemes can be accomplished by using a dic­tionary containing the phonetic representation of each graphemic entry (includingstress marks, syllable boundaries etc.). Although a limited dictionary can cover alarge number of frequent words, grapheme-to-phoneme conversion using a lexiconcan never be complete. Therefore another method is also needed, namely grapheme­to-phoneme conversion by means of rules (Kerkhof, Wester & Boves, 1984; VanLeeuwen, forthcoming). In this approach the input text is always converted into aphonetic representation but, because of irregularities in the languages, such conver­sions will not be error-free. For optimal results, a combination of the two conversionmethods should be used: the dictionary is addressed for words that are exceptionsto the rules, and rules can be used for entries not included in the dictionary. Ourtext-to-speech system contains, for each language, several grapheme-to-phonemeconverters using either rules, or a lexicon, or both. They vary from just acceptingphonetic input (for bypassing any conversion)' to grapheme-to-phoneme converterswhich include number grammars, rules that handle abbreviations and rules for wordstress assignment.

Phoneme-to-diphone conversion

As mentioned above, diphones essentially contain the transition between two pho­nemes, i.e. from the steady state of one phoneme to that of the following phoneme.Thus a language with roughly 50 phonemes requires about 2500 diphones. However,many possible phoneme combinations do not occur in a language, which reduces thenumber of diphones. Problems can arise when consonants appear in clusters: theconsonants follow each other rapidly with no defined steady-state part. Represen­tation by diphones of each consonant within the cluster does not result in speechof good quality. Therefore the phoneme inventory is extended to include consonantclusters, so that complete consonant clusters are stored as one unit. This approachis applied to the English and German diphones. Another shortcoming is a conse­quence of the preparation of the diphones: they are extracted from stressed syllables.This often results in a staccato-like rhythm and overarticulation. In normal speech,reduction takes place, especially in vowels in unaccented syllables: the vowels donot reach their spectral targets and become shorter. A first attempt has been madeto include 'reduced' diphones (which have been excised from unaccented syllables)in the English diphone inventory. The first results sound promising. Routines havebeen provided for constructing utterances with or without diphones containing con­sonant clusters and with or without reduced diphones.

The phoneme-to-diphone conversion module first derives the diphone names fromthe phoneme string. In some exceptional cases the diphone name cannot be foundby simply concatenating two phonemes. The program then automatically handlesthese (language-dependent) exceptions. After converting the phoneme string into adiphone string, the diphones are looked up in the diphone inventory and the framesconcatenated. As a rule, the amplitude at the diphone borders is smoothed, except

38

Page 39: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

where no interpolation is allowed, for instance in the case of plosives. The systemallows fast switching of diphone inventories. At present two diphone inventories areavailable for Dutch (Elsendoorn & 't Hart, 1982; Elsendoorn, 1984), one for English,and one for German (a new diphone inventory is in preparation for both Dutch andGerman).

Duration control

In normal speech, shortening or lengthening of vowels takes place, e.g. as a functionof the phonetic context. Lengthening may also take place in a syllable that carriesa sentence accent or precedes a major syntactic boundary. Such effects can becontrolled by means of duration rules. The duration module provides a tool forimplementing sets of duration rules. The module functions at the phoneme levelin the absolute, as well as in the relative time domain. In the absolute domainthe phoneme duration is expressed in milliseconds and in the relative domain it isa percentage of the original phoneme duration. The module allows for rules thatcombine both types of expression, e.g.: 'make this phoneme 150% of the originalduration but no longer than 300 ms'. It is possible to adjust the parameters in theduration rules without leaving the program. The sets of rules make use of features.Each phoneme can be represented as a list of distinctive features that can have eitherof two values (+ or -). There are two sorts of features: first, the fixed features thatare constant in a language (such as whether a phoneme is a vowel or not) and secondthe variable features (labels) that depend on the context (such as whether a phonemeis in an accented position or not). Currently the duration-control module containsa set of rules which is similar to those described by Klatt, who originally developedit for American English, (Klatt, 1979). Additional parameters have been added tocontrol the overall speech rate, and to select the standard phoneme duration, eitherfrom a table containing the inherent and minimum durations, or from the diphonelibrary, i.e. the phoneme duration after diphone concatenation.

Intonation

Acoustically speaking, the pitch contour in natural speech is rather capricious. How­ever, the pitch contour can be replaced by straight-line pieces without noticeabledifference in perception. An intonation grammar automatically produces a pitchcontour on the basis of intonation markers, using only standard intonation pat­terns. These intonation markers are generated by the prosodic analysis module, orare manually inserted into the input string. Each language has its own intonationgrammar (Adriaens, 1984; 't Hart & Cohen, 1973; 't Hart & Collier, 1975; Willems,Collier & 't Hart, 1988). Since no semantic analysis can be performed and thus themeaning of the sentences remains unknown, the intonation contour must be 'neu­tral' , which can result in a rather boring speech melody. For Dutch, an attempthas been made to generate a more lively intonation contour by using declinationresets at major syntactic boundaries and by varying the excursion size of the pitchmovements. Moreover, different pitch contours may be generated for structurallyidentical sentences.

39

Page 40: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Synthesis

The coded speech frames of the diphones allow manipulation of the parameters suchas smoothing of the amplitude, temporal adjustment, pitch change. Convertingthese speech parameters into the speech signal (synthesis) will produce the sampleddata. The sampled data are sent to the digital-to-analog converter and subsequentlyto a loudspeaker.

Conclusions

The system is found to be convenient as a research tool for text-to-speech conversionfor Dutch, English and German. It can be used for reseach in any single partof the text-to-speech process without taking too much stock of the preceding andfollowing processes. The system has been made available to several laboratories andincorporates parts of research done elsewhere. A prototype of a stand-alone Dutchtext-to-speech system is now under development. It will contain an optimal paththrough the system described here.

References

Adriaens, L.M.H. (1984) A preliminary description of German intonation. [PO AnnualProgress Report, 19, 36-41.

Elsendoorn, B.A.G. & Hart, J. 't (1982) Exploring the possibilities of speech synthesis withDutch diphones. [PO Annual Progress Report, 17, 63-65.

Elsendoorn, B.A.G. (1984) Heading for a diphone speech synthesis system for Dutch. [POAnnual Progress Report, 19, 32-35.

Hart, J. 't & Cohen, A. (1973) Intonation by rule, a perceptual quest. Journal of Phonetics,1, 309-327.

Hart, J. 't & Collier, R. (1975) Integrating different levels of intonation analysis. Journal ofPhonetics, 9, 235-255.

Kager, R. & Quene, H. (1987) Deriving prosodic sentence structure without exhaustive syn­tactic analysis. Proceedings of the European conference on speech technology, Vol. 1,243-246.

Kerkhof, J., Wester, J. & Boves, L. (1984) A compiler for implementing the linguistic phaseof a text-to-speech conversion system. In: H. Bennis, W. U.S. van Lessen Kloeke (Eds):Linguistics in the Netherlands. Dordrecht: Foris Publications, 111-117.

Klatt, D.H. (1979) Synthesis by rule of segmental durations in English sentences. Proceed­ings of the ninth international congress of phonetic sct'ences, 290-297.

Leeuwen, H.C. van (forthcoming) A development tool for linguistic rules. To be publishedin Oomputer Speech and Language.

Vogten, L.L.M. (1983) Analyse, zuinige code ring en resynthese van spraakgeluid. Helmond:Wibro.

Willems, N., Collier, R. & Hart, J. 't (1988) A synthesis scheme for British English intona­tion. Journal of the Acoustical Society of America, 84(4), 1250-1261.

40

Page 41: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

VISION AND READING

42

Page 42: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Developments

J.A.J. Roufs

Our group has been joined by Messrs M. Nijenhuis and G. Schouten. Prof. Z.Shu was detached to us for one year by the Chinese government. Dr W. Stanioch,Poland, stayed three months in the group as a Research Fellow.

Brightness and Brightness Contrast

The parameters of a spatiotemporal threshold model of the transient system havebeen estimated. A physical membrane serves as a spatiotemporal operator. Itsparameters change characteristically with luminance level, consistent with a closed­loop adaptive filter (Den Brinker). The amplitude of the internally generated noise,manifest in threshold measurements, is studied systematically by different methodsfor different areas and background levels. The reason for this is that peculiar valuesare found for large stimulus areas and high luminance levels (Pellegrino van Stuyven­berg, Piceni, Roufs). A new method for determining perceptive-field shapes, eachone of which detects a certain detail, has been tried out experimentally. It is basedon contrast-trade-off functions. For the smallest stimuli, the dominant perceptivefield was identical with the point-spread function obtained with perturbation (Roufsin cooperation with Mortensen; Schouten, student).

The results of different methods of brightness scaling and equisection have beencompared. The results of category scaling, of non-metric scaling and of scalingbrightness distances of pairs were mutually consistent. They differ however frommagnitude estimation and bisection, which are also mutually consistent. Models ofcentral processing which should explain this are being tested (De Ridder, Theelen).

A new and promising theory concerning the spatial aspects of brightness-lumi­nance relations is being developed. It is based on properties of receptive fields andembedded in a scale-space approach (Blommaert, Martens, Schouten).

Perceptual image quality and visual performance

The effect of level-dependent visual acuity on the perceptual quality-sharpness re­lations was as expected. The effect of locally sharp-or-unsharp areas is being in­vestigated (Westerink). Within the frame of the working group 'Perception of TVsystems' (see IPO annual progress report 22, 1987,53), the effect of scene movementon sharpness perception is studied (Teunissen, Westerink).

In connection with the CEC project Eureka '95 and in cooperation with 5 otherlaboratories, subjective assessments of 7 transmission algorithms for High-Definition(MAC) TV systems were made. The results were such that it was possible to selectthe best algorithm (Westerink).

[PO annual progress report 231988

43

Page 43: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

The work being done in cooperation with the Display Principles group at PhilipsResearch Laboratories to adapt Liquid Crystal TV systems to the eye, is beingcontinued (Blommaert, this issue; Nijenhuis).

New experiments concerning the effect of contour on the perceptual quality­gamma relations have been started (Van Tongeren, student; Roufs). See a relatedarticle in this issue (Roufs, Goossens).

In the framework of the study of methods for perceptual quality assessment, itwas found that thresholds of impairments due to quantisation steps in coded imagesare rather insensitive to the exact nature of the method. The learning effects usuallyfound with threshold measurements seem to be caused by the fact that the observerfinds new details in the picture to which he is more sensitive (Majoor, De Ridder).Scaled impairments due to quantisation steps did not show learning effects and varyalmost linearly with step size. Impairments seem to be additive (De Ridder).

In connection with visual comfort in high-performance tasks of VDUs, reactiontime in word identification was studied as a possible candidate to be added to thelist of comfort-sensitive variables. However, the results so far are more complex thanexpected (Boschman, Roufs).

Image Processing and Coding

Digital coding gives rise to quantisation errors. Quite some attention has been paidto their relation with subjective impairments in the luminance and chrominance do­mains, see above. Thresholds of impairments due to quantisation errors at differentlevels of scale-space coding seem to be independent (Majoor, Martens, De Ridder,Shu, Van den Braak).

Practical imaging systems are often modelled by a low-pass filter followed bysampling in two (or more) dimensions. Starting from these sample values, the imageis to be reconstructed as accurately as possible. This is not easy with arbitrary sig­nals. In deriving the Hermite transformation processing, it was already shown thatimages can be locally approximated with great accuracy by polynomials (Martens).The deblurring problem can be solved by this approximation.

Within the framework of a project on processing noisy medical images, in coop­eration with the University of Nijmegen, the effect of nonlinear luminance transferon the detectability of objects of interest has been studied. A power law transformwith the power gamma ranging from 1 to 5 had little effect on the detectability ofobjects. Neither did the application of decorrelation filters. A new approach is beingworked out (Blankers, student; Escalante, Martens).

Reading

Experiments on reading, with the exception of those in connection with VDUs,were carried out this year in the 'Cognition and Communication' group (see thisissue, Grainger) and in the framework of the activities of 'Communication Aids'.As regards the latter, it was the effect of the spectral content of the illuminant onthe reading comfort of observers with vision problems that was of interest. Thetechniques used are dominantly those applied to VDUs (Van Heijnsbergen).

44

Page 44: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Perceptually optimal sampling of images

F .J .J. Blommaert

Abstract

The sampling of images may evoke undesirable perceptual phenomena, suchas spatial structure, field flicker, line flicker and Moire patterns. The visibilityof these artefacts depends both on the nature of the sampling strategy and onthe characteristics of vision. The present paper shows how sampling strategiescan be optimized from a perceptual point of view. The described computationaltheory is based on lattice theory and uses the concept of a 'visibility window'in the frequency domain.

Introduction

Display principles are often based on sampling. This holds for the cathode ray tube(CRT) as well as for liquid crystal (LC), electroluminiscent (EL), light-emittingdiode (LED) and gas-discharge (GO) displays. Images should not be undersampled,yet oversampling is a waste. This leaves room for optimization.

The problem of optimal image sampling involves many different aspects. First,technical and economical constraints limit the design of the display and hence alsothe feasible sampling structures. Second, the viewing situation and the imagery aredetermined by the observer, so that they are usually not well defined at the designstage. Third, given the sampling structure, viewing situation and imagery, theperceptual system determines the visibility of sampling artefacts or, more generally,how genuine an image looks.

In this paper we are interested in the visibility of sampling artefacts introduced bydifferent sampling structures. Hence, we are looking for the necessary and sufficientrequirements that have to be met by a sampling structure so that the displayedimages look genuine. The neccesary requirement is that artefacts are invisible, andis thus about what the visual system should not see. The sufficient requirement isthat artefacts are just invisible and no more, which addresses the question as to themost economic sampling structure which satisfies this criterion. Both aspects willbe touched upon in this contribution.

We will restrict ourselves to monochrome imaging, in order to give the analysisgreater transparancy.

Concept

As we do not intend to discuss processing prior to the receipt of the video signal,we start the imaging chain at the level of the video signal (Figure 1).

IPO annual progress report 231988

45

Page 45: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Figure 1: Imaging chain from video signal via display to the observers 'internalrepresentation' .

First of all, the video signal should be adapted to the specific imaging propertiesof the display. In Figure 1 this is denoted by 'preprocessing stage'. It may containsimple low-pass filtering to diminish visibility of aliasing. More sophisticated pro­cessing, like inter-field, intra-field and motion-adaptive filtering may also take placeat this stage.

After preprocessing, the modified video signal is displayed and the image viewedby the observer. Viewing an image can be formalized by a mapping of the imageinto a (time-varying) internal representation. Image-quality judgements of an ob­server are then based on the properties of the image at the level of his internalrepresentation.

Description of sampled imagery

In case of 3-D sampling, the luminance distribution is described in three discretedimensions. A basic contribution to the luminance distribution can be written as

£s(X, y, t) = {.c(x, y, t) x 8(x - xt}8(y - yt}8(t - tt}} *T(x, y, t), (1)

where * symbolizes convolution.The expression denotes that the contribution is taken from the original luminance

distribution .c(x, y, t) at the point (Xl, YI, tI) in space-time. This sample should beconvolved with the spatiotemporal point-spread function T(x, y, t) of a single displayelement. In case of a CRT, for instance, T(x, y, t) equals the product of the spotprofile and the temporal impulse response of the phosphor.

In order to display a complete image sequence, this sampling should be repeatedregularly in the three directions according to some appropriate sampling strategy.

The luminance distribution of the displayed image can then be formalized by

.cs(x, y, t) = {.c(x, y, t) x A(x, y, t)} *T(x, y, t). (2)

In this expression, A(x, y, t) is called the sampling lattice and consists of a collectionof three-dimensional Dirac functions. A cubic lattice with sampling distances ~x,

~y and At can, for instance be written as

A(x, y, t) = L L L 8(x - i~x)8(y - j~y)8(t - k6.t).j k

46

(3)

Page 46: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Visual characteristics

The visual process performs a complex information-processing task which favoursthe visibility of certain aspects of an image at the cost of other image properties.The system is rather sensitive, for instance, to abrupt changes in luminance (edges)and relatively insensitive to shallow gradients (in both space and time).

The visual system is also sensitive to structure and repetition. This is proba­bly effected by that part of the process which is usually referred to as 'perceptualgrouping', and has been nicely demonstrated by Glass patterns (Glass, 1969). Italso follows from the existence of hyperacuity and from the fact that the visibility ofsinusoidal gratings (line patterns) is much higher than predicted from the visibilityof single details (cf. Du Buf, 1987).

Since sampling is, by definition, a superposition of repetitions in different direc­tions, this manner of representing an image can be regarded as a rather unfortunatechoice; since the visual system uses perceptual grouping as one of its strategies forinterpreting an image, it is also very good at detecting these sampling artefacts.

In order to optimize sampling strategies from a perceptual point of view, we needa computational theory in which the visual transformation of an image to the sensoryspace is adequately represented. At the present state of knowledge on vision, thisis not yet possible. Although mechanistic schemes have been developed (cf. Marr,1982), a quantitative description of the transformation is (still) lacking.

An obvious way to reduce the problem is to examine the nature of the artefactsthat will be introduced by sampling. Since repetitive structures are used, harmonicluminance variations will be introduced with frequency components in both spaceand time. Such luminance variations will evoke visibility of spatial structure (lineand dot structure), flicker and combinations of these two (line flicker).

A sensible way of treating the problem is to separate different aspects of sampledimagery in accordance with:

• Harmonic luminance variations introduced by the specific characteristics ofthe sampling strategy. Typical examples are flicker and spatial structure. Theperceptual consequences are relatively independent of what image is presented,although luminance and size do playa role .

• Effects that are strongly influenced by the luminance distribution of the dis­played image. Resolution loss, the staircase effect and motion smear are typicalexamples of this class. The occurrence of these artefacts is image-specific.

The two separate aspects of image deformation will be treated subsequently, withsome emphasis on the first class.

Visibility of harmonic sampling artefacts

The visual system is not a frequency analyser, although simple operations on thefrequency representation of an image can mimic properties of the visual system.These include the sensitivity to repeated luminance variations and an easy descrip­tion of channels that are tuned to specific frequencies (Campbell & Robson, 1968).Furthermore, the Fourier language provides simple expressions for related visual

47

Page 47: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

properties like size invariancy and the separation of global shape from fine localdetail (Cavanagh, 1984).

One of the main features in which visual processing differs from frequency anal­ysis is the strictly local and nonlinear behaviour of its neural units. Ganglion cells,for instance, possess highly nonlinear operating characteristics (Shapley & Enroth­Cugell, 1984). Therefore, a frequency description of the visual process, which is defacto linear, is only meaningful for a rather restricted set of images. An obviousrestriction is that only slight deviations from a constant and steady background lu­minance are considered. An alternative way of approaching the problem might beto restrict the analysis to small local areas within an image.

Nevertheless, for the description of the visibility of harmonic sampling artefacts,we will choose a description in the frequency space. The reasons are threefold.First, the Fourier language provides a transparent and precise way of handling theproblem of visibility of harmonic sampling artefacts. Second, it has been shown that,for high spatial and temporal frequencies, some degree of generalization is possible(Campbell & Robson, 1968; De Lange, 1952). Third, over the last few decades, muchexperimental material has been gathered on the visibility of harmonics in space andtime.

Let us start the description in the frequency space by taking the Fourier trans­form of the sampled image as given by equation 2, so that

J {£s} = [J{£} * J {A}l X J {T}. (4)

The Fourier transform of a lattice A(x,y,t) is called a reciprocal lattice J{A}and is again a regular collection of three-dimensional Dirac functions8(u - uI)8(v - vt}8(w - wI), now in the basic frequency variables u, v and w. Re­ciprocal lattices of regular space-time structures can be calculated fairly easily (ef.Dubois, 1986). For example, the space-time lattice for a cubic structure in threedimensions and its reciprocal counterpart are shown in Figure 2.

ilx

Figure 2: Three-dimensional cubic lattice in space-time (left) and its reciprocal lattice inthe Fourier domain (right).

In order to calculate the spectrum of the sampled image according to equation 4,the reciprocal lattice has to be convolved with the frequency spectrum of the originalimage denoted by J{fl. This introduces repeated spectra around every lattice pointof the reciprocal lattice; an effect usually indicated by 'aliasing'.

48

Page 48: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

To complete the calculation, the convolution has to be multiplied by the Fouriertransform of the point-spread function of the display element l{T}. It means thatall spectral components are weighted according to the Fourier transform of the pixelspread in space and time. This quantity thus acts as a spatiotemporal transferfunction of the display.

Window of visibility

In order to analyse the visibility of harmonic sampling artefacts we will use theconcept of a 'visibility window' in the frequency space, which was introduced byWatson et al. (1986), the basic ideas of which were already present in the work ofFahle and Poggio (1981). In its global form, the concept uses a boundary of criticalfrequencies, both temporal and spatial, above which the visual system is unable toresolve.

These critical frequencies can be determined psychophysically as detection thresh­olds of suitably chosen stimulus patterns. Such patterns consist of grating patternsin space modulated with harmonic time functions (counterphase gratings). A fewillustrative experimental results are taken from the literature (Campbell et aI., 1966;Robson, 1966; Kulikowski, 1971; Koenderink & Van Doorn, 1979) and are shown inFigure 3.

0.5 u* 1.0 ..

0.5

w*

..u*

• Koenderink andVan Doorn (1979)

& Kulikowski (1971)• Robson (1966)

Figure 3: Detection thresholds for gratings plotted as reduced critical frequencies. Left:spatial critical frequencies as a function of orientation, after Campbell et al. (1966). Right:critical frequency boundaries derived from experiments with counterphase gratings, plottedas a function of reduced spatial and temporal frequencies.

The interpretation of the critical frequency boundary is in terms of the frequencyregions that are included or excluded. The hypothesis is that two images will ap­pear identical to an observer if their spectra, after passing through the window ofvisibility, are identical. From this hypothesis it follows that, if sampling artefactsare manoeuvred outside the window of visibility, the sampled image will appearindistinguishable from the original one.

The question now is: what parts of the Fourier spectrum of the sampled image

49

Page 49: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

of equation 4 are included within the visibility window? In the first place this will bethe spectrum of the original image itself (see Figure 4). In the second place, it willbe parts of the repeated spectra around those lattice points of the reciprocal latticethat are nearest to the origin. In Figure 4, this is illustrated for a cubic lattice inthe two-dimensional spatial frequency plane.

Figure 4: Repeated spectra, schematically de­picted by shaded areas that contain non-zero fre­quency components, in a cubic reciprocal lattice.The spatial window of visibility is indicated. Fordetails see text.

In general, it is not clear what the perceptual consequences are if repeated spec­tral components appear within the window of visibility, with the exception of singlefrequency components (Moire). However, if the lattice points themselves lie withinthe window the interpretation is straightforward. The lattice points 1 and 2 in Fig­ure 4, for instance, form a spatial structure of the form cos(211" ;z)' which will givethe impression of a vertical line pattern. In order to make this pattern invisible, thesampling frequency in the x-direction should increase to the critical visual frequencyin that direction.

Economic sampling structures

At this point one might ask what is required from a sampling structure to make surethat all harmonic sampling artefacts, introduced by the lattice points themselves,are invisible in any image. The question can be rephrased as follows: what are therequirements of a sampling structure to make sure that these artefacts are invisiblein the image that is most sensitive to them?

Since critical spatial and temporal frequencies increase monotonically with lu­minance and target size, the most sensitive image will be a homogeneous, steadyimage presented at the peak luminance of the display system. What is required fromthe sampling structure is then that, for this image, all repeated lattice points areexcluded from the window of visibility. Note that, for any sampling structure, thiscan always be achieved by choosing the sample distances short enough.

50

Page 50: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

We are, however, interested in the most economic sampling structure that satis­fies this criterion. In order to satisfy the criterion, it is necessary and sufficient thatall repeated lattice points are just excluded from the window of visibility. The opti­mal solution is the structure with the lowest sampling density that still satisfies thecriterion. Since the visual critical frequencies are lowest in oblique directions (bothfor the purely spatial case and the combined spatiotemporal case), the repeatedlattice points should be oriented in these directions. Then it is not hard to see thatthe most economic sampling structure should have a double quincunx lattice, thatis, a spatial quincunx lattice displayed in 2:1 interlace (see Figure 5). It can evenbe proved mathematically that, at least for the class of 'visibility windows' withthe property that the critical frequencies in oblique directions are equal to or lessthan those in the directions of the axes, the quincunx sampling structure is the mosteconomical one. An additional advantage is that this sampling structure is also usedin some of the new TV transmission standards such as HD-MAC (d. Annegarn etal., 1986), and hence accomplishes a good match between transmission and display.

Figure 5: Perceptually optimal lattice structure.

The sample distances that should be chosen depend on peak luminance, screensize and viewing distance. Note that the required sampling structure has the samelattice as the closest-sphere packing (cf. Legault, 1973). Since the three-dimensionalwindow of visibility is less extended in the oblique directions than a sphere, however,more visibility windows than spheres can be packed per content-unit. Therefore,to satisfy the criterion on invisibility of sampling structures, the required sampledensity is less than for the closest-sphere packing.

Influence of the optical transfer function

Up to now, we have ignored the optical transfer function of the display, althoughits influence is sometimes far from negligible. In this section we show that, withincertain limits, it acts as a quantity that can be used to minimize the visibility ofharmonic sampling artefacts.

If display elements possess infinitesimal area and display time, the optical trans­fer function 1 {T} (see equation 4) is independent of frequency, which means thatall spectral components are displayed unattenuated. It can easily be verified that

51

Page 51: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

the harmonic luminance variations in space and time, evoked by the lattice pointsof the reciprocal lattice, will then be displayed with a modulation depth of 2. Anywidth of the display elements in one of the three spatiotemporal directions will leadto a decrease in modulation depth of the harmonics in that direction especially.

This property offers a simple recipe for reducing visibility of harmonic artefacts;since visibility of these harmonics grows monotonically with modulation depth, themodulation depth should be decreased as much as possible. This can thus be ac­complished by increasing the pixel spread in those directions in which the visibleharmonics occur.

The recipe, however, conflicts with the demand that the original image shouldbe displayed undamaged. This cannot be achieved if the frequency spectrum ofthe original image is attenuated or cut off at too low frequencies, which may leadto resolution losses and motion artefacts. Rules for optimization still have to bedeveloped.

Invisibility of aliasing: artefact-free imaging

If all repeated lattice points are excluded from the window of visibility, parts of therepeated spectra may still be included (see Figure 4). The nature and strength ofthese aliasing effects depend exclusively on the specific image that is displayed: spa­tially repeated structures (fences, woods)' for instance, may lead to Moire patternswhile single lines may evoke the staircase effect and line flicker.

In order to avoid visibility of aliasing altogether, one should demand that allrepeated spectra are excluded from the visibility window. This can again be accom­plished by sufficiently increasing the sampling density. Note that it is also necessaryfor the spectrum of the original image to be limited; which is one of the tasks of thepreprocessing stage (see Figure 1). The obvious way to filter the image is to limitthe frequency content to the region bounded by the critical frequencies. In orderto satisfy the criterion for invisibility of all aliasing, the sampling structure shouldthen be chosen so that no spectral overlap between original and repeated spectra ispossible. The sufficient condition is that the spectra just do not overlap. From thisit follows that the most economic sampling structure should have a Voronoi cell orBrillouin zone which just spans the window of visibility (the Voronoi cell or Bril­louin zone of a lattice is the collection of points closer to the origin than to any otherlattice point). It can easily be verified that this condition can be approached fairlywell by using again the quincunx lattice structure with 2: 1 interlace (see Figure 5).Sample densities should be increased by a factor of 2 in each direction in comparisonwith the solution for the weaker criterion.

Conclusions

In this contribution, some global rules are formulated on optimal sampling of imagesfrom a perceptual point of view. The basic idea in the analysis is that if the imageis perceptually free from artefacts, and the original image is transferred undamaged,then the sampled image will be indistinguishable from the original scene. Fromthis point of view, surprisingly little information on vision is needed to answer thequestion on optimal sampling.

52

Page 52: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

An important result is that perceptually optimal sampling structures can bederived straightforwardly. Furthermore, the computational theory provides a sim­ple means of calculating the perceptual consequences of new or unusual samplingstrategies.

Although the description in Fourier language provides a powerful tool for ana­lysing visibility of harmonic sampling artefacts, its validity beyond these aspects israther limited. The description obviously fails if aspects are evaluated that concernthe nonlinear behaviour of the visual system. One of the important failures is that itcannot describe the local processing that is performed by the visual system, notablyits local adaptive properties. For an analysis of perceptual consequences that concernnon-harmonic display artefacts, a description of this behaviour is indispensible.

Acknowledgement

The author wishes to thank the Philips Research Laboratories for financial support,and Dr C.J. Gerritsma, Prof. dr ir J.A.J. Roufs, Ir K.E. Kuijk and Dr J. Bruinink,who contributed to this paper in one way or another.

References

Annegarn, M.J.J.C., Arragon, J.P., Haan, G. de, Heuven, J.H.C. van & Jackson, R.N.(1986) HD-MAC: een stap vooruit in de evolutie van de televisietechniek. PhilipsTechnisch Tijdschrift, 49, 213-230.

Buf, J.M.H. du (1987) Spatial Characteristics of Brightness and Apparent-contrast Percep­tion. Doctoral dissertation, Eindhoven University of Technology.

Campbell, F. W. & Robson, J.G. (1968) Application of Fourier analysis to the visibility ofgratings. Journal of Physiology, 197, 551-566.

Campbell, F.W., Kulikowski, J.J. & Levinson, J. (1966) The effect of orientation on the vi­sual resolution of gratings. Journal of Physiology, 147, 427-436.

Cavanagh, P. (1984) Image transforms in the visual system. In: P.C. Dodwell and T. Caelli(Eds): Figural Synthesis. Hillsdale, New Jersey: Erlbaum.

Dubois, E. (1986) The sampling and reconstruction of time-varying imagery with applica­tion on video systems. Proceedings of the IEEE, 79, 502-522.

Fahle, M. & Poggio, T. (1981) Visual hyperacuity: spatiotemporal interpolation in humanvision. Proceedings of the Royal Society of London, Series B, Biological Sciences, 219,451-477.

Glass, L. (1969) Moire effect from random dots. Nature, 243, 578-580.

Koenderink, J.J. & Doorn, A.J. van (1979) Spatiotemporal contrast detection thresholdsurface is bimodal. Optics Letters, 4, 32-34.

Kulikowski, J.J. (1971) Some stimulus parameters affecting spatial and temporal resolutionof human vision. Vision Research, 11, 83-93.

Lange, H. de (1952) Experiments on flicker and some calculations on an electrical analoguefor the eye. Physica, 18, 935-950.

Legault, R. (1973) The aliasing problem in two-dimensional sampled imagery. In: L.M.Biberman (Ed.): Perception of Displayed Information. New York: Plenum Press.

Man, D. (1982) Vision. San Francisco: Freeman and Co.

Robson, J.G. (1966) Spatial and temporal contrast sensitivity functions of the visual system.Journal of the Optical Society of America, 56, 1141-1142.

53

Page 53: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Shapley, R. & Enroth-Cugell, C. (1984) Visual adaptation and retinal gain controls. Prog­ress in Retinal Research 9, 263-346.

Watson, A.B., Ahumada, A.J. & Farrell, J.E. (1986) Window of visibility: a psychophysicaltheory of fidelity in time-sampled visual motion displays. Journal of the OpticalSociety of America A, 9, 300-307.

54

Page 54: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Subjective assessment of impairmentscale-space-coded images

H. de Ridder and G.M.M. Majoor

•In

AbstractDirect category scaling and a scaling procedure in accordance with Func­

tional Measurement Theory (Anderson, 1982) have been used to assess impair-.ment in scale-space-coded images, displayed on a black-and-white TV monitor.The image of a complex scene was passed through a Gaussian filter of limitedbandwidth. A 'prediction image' of the original image was made from thisbandlimited signal. The degree of quantisation of the 'prediction error image',obtained by subtracting the prediction image from the original one, determinedthe impairment in the reconstructed image. Category scaling of impairmentand category scaling of differences in impairment, created by presenting theimages according to a factorial design as dictated by Functional MeasurementTheory, gave the same monotone S-shaped relation between impairment andsize of the quantisation step. The impairment consisted of unsharpness andoccurrence of speckles. A second series of experiments was carried out to scalethese percepts. The relation between unsharpness, occurrence of speckles andimpairment is discussed.

Introduction

The goal of image coding is to reduce the amount of information needed to storeand transmit images. Reduction can be introduced up to the level where the codedimages are not usable and/or acceptable to the observer any longer. For applicationsof images in nonperformance environments like TV broadcasting (Hunt & Sera,1978)' the amount of admissible data reduction depends on the demands madeupon the subjective quality of the coded images. Here, subjective quality must beinterpreted as 'the ability to please the eye' (Roufs & Bouma, 1980). The demandmight be that no differences are perceived between the original image and the codedversion of that image, i.e. no visible coding artefacts are accepted (Watson, 1987;Martens & Majoor, 1988). Watson (1987) referred to this condition as perceptuallylossless coding. A less stringent demand may be that only less annoying impairmentsare (slightly) visible. In that case, it becomes important to know what impairmentsare relevant to the observer, how different impairments combine to affect the overallquality of the image and what degree of impairment is tolerated by the observer(Allnatt, 1983). In this connection, Sjoberg (1987) pointed out that, for digitaltelevision in particular, little is known about the components which are responsiblefor the overall image quality. This knowledge, however, is relevant in applicationsof digital television where the cost of good subjective image quality is high, e.g.teleconference via satellite.

To enable .us to answer the above-mentioned questions, image quality and im­pairment have to be measured. This is usually done by means of direct category

IPO annual progress report 231988

55

Page 55: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

scaling (Allnatt, 1983; Hagenzieker & Wagenaar, 1985). By this method, subjectsare instructed in classifying an impaired image into one of a small number of qualityor impairment categories (e.g. CCIR, 1986). A fundamental problem with directscaling methods like this one, is that they cannot be used to determine the relationbetween the stimulus and the subjective impression evoked by that stimulus, unlessthe transformation of the subjective impression into the overt response is known(e.g. Gescheider, 1988). A possible solution to this problem is offered by Anderson'sFunctional Measurement Theory (Anderson, 1982; Birnbaum, 1982). According tothis theory, subjective impressions evoked by different stimuli are combined to forman internal or psychological response. Subsequently, this psychological response istransformed into an overt response (judgment function; Birnbaum, 1982), e.g. anumber on a category scale. Furthermore, it is assumed that subjects use simplerules such as addition, subtraction and multiplication to combine subjective im­pressions into a psychological response. Anderson (1982) argued that the relationbetween the stimuli and their subjective impressions, the combination rule and thejudgment function can be determined simultaneously if stimuli are presented in afactorial design so that the subjective impression evoked by a stimulus is comparedwith that of every other stimulus involved. This can be illustrated in the followingway. Suppose that differences between subjective impressions have to be judged.There is empirical evidence that, in such a case, subjects use a subtractive rule (An­derson, 1982; Birnbaum, 1982). If so, then the factorial plot of the overt responseswill consist of a set of parallel curves only if the judgment function is a linear one.This parallelism can be tested by means of analysis of variance. If it is not rejected,then the marginal means of the factorial design will represent the subjective im­pressions on an interval scale (Anderson, 1982). Birnbaum (1982) pointed out thatadditional constraints are needed to specify the combination rule and the judgmentfunction, since parallelism in a factorial plot can be obtained by many combina­tions of the combination rule and the judgment function. We will come back to thisproblem when the results of the present study are discussed.

As far as we know, Functional Measurement Theory has never been used to eval­uate the visible effects of image coding, although it is recognized as applicable inthe analysis of subjective image quality (Sjoberg, 1987). As the functional measure­ment approach consists of comparisons between subjective impressions, this methodmight even be preferred to direct category scaling when it comes to evaluating slighteffects of image coding (Allnatt, 1983). On the other hand, direct category scaling isa much less time-consuming method and should be preferred to functional measure­ment where both methods can be applied. Before such practical decisions can bemade, however, an important methodological problem has first to be solved, that is,'Do functional measurement and direct category scaling lead to the same results?'

To provide an answer to this question, we have carried out an experiment in whichboth methods were used to assess impairment in a set of coded images displayed ona black-and-white TV monitor. The set was generated by means of the so-calledscale-space-coding algorithm (e.g. Martens, 1987) and consisted of twelve codedversions of the picture of a complex scene. In order to explain how impairment wasvaried in this set, a brief description of scale-space coding is presented first of all.

Image coding according to the principle of scale-space filtering (Martens, 1987;Martens & Majoor, 1986, 1988) implies that an image is passed through a number ofGaussian filters of decreasing bandwidth, thus creating a set of filtered versions of the

56

Page 56: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

original image that contain increasingly less high spatial frequencies. Subsequently,from each of these Gaussian filtered images a prediction is made for the Gaussianfiltered image of the next higher bandwidth. It can be shown that only the predictionerrors and the Gaussian filtered image of lowest bandwidth are needed to reconstructthe original image (Martens, 1987). The reconstruction is done in the followingway. Starting from the lowest bandwidth version of the original image, the above­mentioned prediction is made for the next higher bandwidth version. Subsequently,the prediction error image relevant to this prediction is added to recover the higherresolution image. This recovered image is used to make a prediction for the nexthigher bandwidth version, after which the relevant prediction error is added. Thisprocess is repeated until the original image is obtained. In this algorithm, datareduction is accomplished by quantising the prediction errors before transmission.The degree of quantisation of the different prediction error images determines notonly the data reduction but also the impairment in the reconstructed image. Thislast consequence of quantisation has been used to make a set of images in whichimpairment varies systematically.

In our experiment, the original image was passed through one Gaussian filteronly, that is the one with the highest bandwidth. Accordingly, only one predictionerror image was generated by subtracting the prediction image from the originalone. In the terminology of scale-space coding, this prediction error image is atscale So (Martens, 1987). It mainly evokes responses at the fine details in thepicture, e.g. lines and edges. The impairment, caused by uniformly quantising thisprediction error image, consists of different degrees of unsharpness and quantisationnoise (Martens & Majoor, 1986). Since these percepts seem unrelated, this set ofimages can also be used to determine how different kinds of impairment combineto form an overall impression of impairment. In this way, we have an opportunityto test Allnatt's 'law of subjective addition' (Allnatt, 1983) stating that differentimpairments add when they appear simultaneously. This has been examined in anadditional experiment in which impairment, unsharpness and occurrence of speckleshave been assessed in separate sessions.

Both experiments are described below. First, the one in which a comparison ismade between functional measurement and direct category scaling. Second, the onein which the combination rule for different impairments is determined. Since theywere carried out on a set of scale-space-coded images, these experiments can alsobe regarded as designed to evaluate scale-space coding. In this sense they are anextension of the detection experiment described by Martens and Majoor (1988).

Method

Stimuli

The coded picture was the portrait of a female model (Wanda01) and is one ofthe complex scenes used by Martens and Majoor (1988). It was digitized with8 bits/pixel on a grid of 512 by 512 pixels. As already described in the introduction,the coding consisted of uniformly quantising the prediction error image at scale So·Eleven values of quantisation step qo were taken in the range from 1 to 56. An imagewith qo equal to 127 was added in order to be sure that, in one case, the predictionerror image had almost completely disappeared. Note that a quantisation step of 1

57

Page 57: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

implies that the original image is recovered. For further details on the scale-spacecoder as well as examples of coded images, the reader is referred to Martens andMajoor (1988).

Procedure

Functional measurement dictates that stimuli are compared. Therefore the twelvecoded images were factorially combined to form 144 pairs of images. The two im­ages of a pair were simultaneously displayed on a black-and-white monitor (CON­RAC 2400 High Resolution Monochrome), one on each half of the screen. The view­ing conditions were in accordance with CCIR Recommendation 500 (1986) with theexception that the peak luminance was increased to 115 cd/m2. The monitor wasplaced in a dark room in front of a dimly lit 'white' background. The subject viewedthe monitor at a distance of 1.80 m, this being 6 times the height of the monitor. Atthis distance, the viewing angle for the sample spacing was about 1 min of arc. Thesize of a single image was 4.5 by 9.5 degrees. The subjects were instructed to ratethe difference in impairment on a 9-point numerical category scale, ranging from 1(the left image is much more impaired than the right one) to 9 (the right image ismuch more impaired than the left one). A rating of 5 implies that no difference inimpairment is perceived.

By means of direct category scaling, subjects assessed impairment, unsharpnessand occurrence of speckles in a single image presented in the middle of the screen.The assessment of these percepts occurred at separate sessions. In all cases, a 9-pointnumerical category scale was used.

8

7

Difference in impairment2 subjects

·--------------·z..~____--A

-:::::::_._.-v- ..---.-----0--'.--'_. 0--0

-'--'---'

qO. left image:

6-----·6 1

v-----·v 5.-----.. 100-----0() 150------0 20.-----.. 246-'-630v-·-v 35_·_40

0--'-0480-'-<156_·_127

l-+L....!!:.:=T------r----,--,----,-----.....-o 20 30 40 50 60 110 120 130

qO. right image

Figure 1: Mean judgments of difference in impairment between left andright images, as a function of quantisation step qo of the right image. Sep­arate curves represent different quantisation steps qo of the left image.

58

Page 58: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Subjects

In both experiments, the authors participated as subjects. They were experienced inthe sense that they had previously carried out the detection experiment describedby Martens and Majoor (1988). Besides the authors, six unexperienced subjectsparticipated in the second experiment, two of them in scaled impairment, two inunsharpness and two in occurrence of speckles. The authors scaled all three percepts.All subjects had normal or corrected-to-normal vision.

Results

Functional measurement versus direct category scaling

Impairment2: subjects

6.-----------------

30 40 50 60 110 120 130

qO left=right2010

4+---.------,...----r---r-.......,.--r-o

10 20

~.---_ ...

30 40 50 60 110 120 130

qO

Figure 2: Lower panel: Scale values of the impairment inleft and right images, derived from the data of Figure 1, as afunction of quantisation step go. Upper panel: Mean categoryratings of difference in impairment as a function of go whenthe quantisation steps of the left and right images are equal.In this and the following figures, the vertical bars denote twicethe standard error of the mean.

Figure 1 shows the mean judged differences in impairment, averaged across the twoexperienced subjects, as a function of quantisation step Qo of the right-hand image,with a separate curve for each quantisation step Qo of the left image. The twelvecurves have about the same shape, demonstrating no systematic deviations fromparallelism. To check for nonparallelism, the data were subjected to a two-way

59

Page 59: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

analysis of variance. No significant row X column interaction at the p < 0.01 levelwas obtained (F(121,l008)=1.27 ), thus confirming the parallelism shown in Figure 1.

According to Functional Measurement Theory, a simple explanation for this resultwould be that (1) the subjects, instructed to judge differences in impairment, indeedused a subtractive rule and (2) the judgment function is a linear one (Anderson, 1982;Birnbaum, 1982).

The confirmed parallelism in the data of Figure 1 also implies that the marginalrow and column means of the factorial design are estimates of the impairment inthe left and right images on interval scales. The lower panel of Figure 2 shows theseestimates as a function of quantisation step qo. The presented values were obtainedafter a linear transformation of the marginal means, resulting in a scale value ofzero for the impairment in the original image (qo=l). The two monotone relationsbetween impairment and size of the quantisation step so obtained are almost iden­tical. They do not start to deviate from zero until qo is equal to 10, indicating thatfor values of qo less than or equal to 10 the subjects did not notice any differencebetween the original and the coded image. This is consistent with the results of thedetection experiment described by Martens and Majoor (1988).

The subjects did not show a preference for the left or right image. This can beconcluded, among other things, from the data given in the upper panel of Figure 2.These data are the mean category ratings when the quantisation steps of the left andright images are equal and no difference in impairment is assumed to be perceived.As predicted, these ratings do not deviate svstematicallv from a scale value of five.

Impairment2 subjects

10.----------------

9

8

7

CIl..2 6o> 5CIl

"'6o 4II)

3

2

.----. cat.scale0---0 funct.m.

_-----0----------------..

O+----r----r-----,--r---r----r-o 10 20 30 40 50 60 110 120 130

qO

Figure 3: Comparison between impairment scales, obtainedby direct category scaling (filled symbols) and by functionalmeasurement (open symbols).

The results of the direct category scaling of impairment, averaged across thesame two subjects, can be found in Figure 3 (filled symbols). The data were lin­early transformed so that the impairments at qo equal to 1 and 127 received scalevalues of 1 and 9, respectively. The results show approximately the same mono-

60

Page 60: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

tone S-shaped relation between impairment and size of the quantisation step as wasfound with functional measurement (Figure 2). To emphasize this resemblance, theaverage of the two functions, presented in the lower panel of Figure 2, was linearlytransformed to fit the category ratings (Figure 3, open symbols). From the closecorrespondence between the two functions in Figure 3 (1'=0.98), it can be concludedthat direct category scaling and functional measurement lead to the same result, i.e.the same relation between impairment and size of the quantisation step qo.

Impairment / unsharpness4 subjects

10....----------------

9

8

7

V::J 6

"6> 5v

"6o ..l/l

3

2

0--0 impair....-..... unsharp.

---~=-Q

20 30 40 50 60 110 120 130

qO10

O+---r---r-~-__,.___-_r_-_._

o

Figure 4: Impairment (open symbols) and unsharpness(filled symbols) as a function of quantisation step qQ. Datahave been obtained by direct category scaling.

Impairment, unsharpness and occurrence of speckles

Experienced as well as unexperienced subjects rated impairment, unsharpness andoccurrence of speckles on a 9-point numerical category scale. Since no significantdifferences were found between the two classes of subjects, the results could beaveraged across all subjects. Figure 4 gives the mean results for impairment (opensymbols) and unsharpness (filled symbols), after the category ratings have beenlinearly transformed so that the same scale values can be attached to impairmentand unsharpness at qo equal to 1 as well as at qo equal to 127. The scale values ofimpairment and unsharpness were equalized at qo equal to 127 because all subjectsreported that, at this quantisation step, the impairment consisted of unsharpnessonly. The data in Figure 4 demonstrate a systematic difference between impairmentand unsharpness at the other quantisation steps. This difference is plotted againin Figure 5 (open symbols), where it has been linearly transformed to fit the meancategory ratings of the occurrence of speckles (filled symbols). A detailed analysis ofthe physical parameters determining the occurrence of speckles is needed to explainthe observed nonmonotone relation between qo and occurrence of speckles. Suchanalysis, however, is beyond the scope of the present study. The close correspondence

61

Page 61: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

between the two functions in Figure 5 (r=0.93) suggests that the difference betweenimpairment and unsharpness can be attributed to the occurrence of speckles. Thiswould imply that unsharpness and occurrence of speckles sum up to form the overallimpression of impairment. This is in agreement with Allnatt's 'law of addition'(Allnatt, 1983).

Speckles / difference4 subjects

10-.-------------~

9

8

7

Q)::J 6

"0> 5Q)

"0u 4Ul

3

2

""'0..···.···.

...-.... speckles<>----01.4+6.S*Dif

, ...

~O+---,---.,.----..--r---.....--"""T""""

o 10 20 30 40 50 60 110 120 130

qO

Figure 5: Filled symbols: Mean category ratings of occur­rence of speckles as a function of quantisation step qo. Opensymbols: Difference between impairment and unsharpness,calculated from the data in Figure 4 and linearly transformedto fit the category ratings of occurrence of speckles. After thistransformation the average standard error of the difference is2.25.

Discussion

Roufs and Bouma (1980) describe an experiment in which the functional measure­ment approach was successfully used to assess the subjective quality of photographicpictures as a function of objective contrast. The present study shows that functionalmeasurement can also be applied to assess impairment in digitized images (Figure 2).Both experiments support the suggestion of Sjoberg (1987) that this method is anappropriate one for the evaluation of image quality. The present study also showsthat functional measurement and direct category scaling lead to the same resultsthese, in the present case, being the same monotone S-shaped relation between im­pairment and size of the quantisation step go (Figure 3). This monotone relationcan be compared with the discrimination threshold between the original and codedimage, determined for the same scene by means of a two-alternative forced-choiceexperiment (Martens & Majoor, 1988). That experiment indicated that the twosubjects who participated in the first experiment of the present study distinguishedthe coded image from the original one in 79% of the presentations at go equal to

62

Page 62: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

about ten. This is the quantisation step in the present study, following which thescale values of impairment start to deviate from the value given to the impairmentin the original image (Figures 2, 3). Consequently, the results of the detection andscaling experiments are mutually consistent, implying that perceived impairmentincreases as soon as it is detected.

The marginal row and column means of the factorial design are considered to beestimates of the impairment in the left and right images because of the parallelism inthe data of Figure 1. To explain this parallelism, it was assumed that subjects useda subtractive rule, implying that the judgment function is a linear one. However,parallelism can also be obtained in other ways, e.g. a dividing rule in combinationwith a logarithmic judgment function. The impossibility of deciding from the dataas to what combination rule and judgment function are used, is known as the prob­lem of monotone indeterminacy (Anderson, 1982). Birnbaum (1982) has proposedtwo additional constraints to solve this problem. One of these constraints is that thejudgment function is the same when the same subjects use the same response proce­dure to judge the same set of stimuli (response scale convergence; Birnbaum, 1982).In the present study, both impairment and difference in impairment were assessed bymeans of a 9-point numerical category scale. Thus, according to Birnbaum (1982),the judgment function should be the same for both tasks. In the present study, itwas implicitly assumed that direct category scaling gives the perceived impairmenton an interval scale, implying a linear judgment function for direct category scal­ing. This has been tested and confirmed by analysing, for each subject, the resultsof Figure 3 by the method of successive interval scaling (Edwards, 1957). Fromthis, it is concluded that both subjects used a linear judgment function and, con­sequently, a subtractive rule for assessing differences in impairment. The similarityof the two functions in Figure 3 is consistent with the other constraint proposedby Birnbaum (1982), that is, that the relation between stimuli and the subjectiveimpressions evoked by these stimuli is independent of the task the subject has tocarry out (stimulus scale convergence; Birnbaum, 1982).

The present study also suggests that impairments in digital images sum up toform the overall impression of impairment (Figures 4, 5). Recently, a similar resultwas obtained by Ohtsuka et al. (1988). Both results imply that Allnatt's 'law ofaddition', originally established for analog systems (Allnatt, 1983; Hagenzieker &Wagenaar, 1985)' also holds for digital systems. However, there are also indicationsfor nonadditive interactions (Bennett, 1981), implying that results obtained withanalog systems cannot automatically be generalized to digital ones. Further researchis needed to solve the problem of additivity in digital systems.

Conclusions

The present study has shown that category scaling of impairment and categoryscaling of differences in impairment, created by presenting scale-space-coded imagesaccording to a factorial design as dictated by Functional Measurement Theory, leadto the same results. From this it is concluded that functional measurement can beused to evaluate impairment in digital images, at least when a relatively large rangeof impairments is used, as has been done in the present study. It has still to beexamined whether it can also be used for short ranges of impairment, where directcategory scaling is assumed to fail (e.g. Sjoberg, 1987) but functional measurement

63

Page 63: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

may still work because it consists of comparisons between subjective impressions.Functional measurement also provides means of establishing the way different sub­jective impressions are combined. Therefore, functional measurement will be anappropriate method of checking the additivity of impairments as has been observedin the present study.

Acknowledgement

The research of Dr H. de Ridder has been made possible by a fellowship of the RoyalNetherlands Academy of Arts and Sciences.

References

Allnatt, J. W. (1983) Transmitted-picture assessment. New York: John Wiley & Sons.

Anderson, N.H. (1982) Cognitive algebra and social psychophysics. In: B. Wegener (Ed.):Social attitudes and psychophys~'cal measurement. Hillsdale, New Jersey: LawrenceErlbaum Associates.

Bennett, D. (1981) SMPTE component-coded digital video-picture quality assessments.SMPTE Journal, 90, 960-967.

Birnbaum, M.H. (1982) Controversies in psychological measurement. In: B. Wegener (Ed.):Social attitudes and psychophysical measurement. Hillsdale, New Jersey: LawrenceErlbaum Associates.

CCIR (1986) Method for the subjective assessment of the quality of television pictures, Rec­ommendation 500-3. In: Recommendations and Reports of the CCIR. InternationalTelecommunication Union, Geneva.

Edwards, A.L. (1957) Techniques of attitude scale construction. New York: Appleton Cen­tury Crofts Inc.

Gescheider, G.A. (1988) Psychophysical scaling. Annual Review of Psychology, 99,169-200.

Hagenzieker, M. & Wagenaar, W.A. (1985) Picture quality in digital television. Depart­ment of Psychology, University of Leiden, The Netherlands.

Hunt, B.R. & Sera, G.F. (1978) Power-law stimulus-response models for measures of imagequality in nonperformance environments. IEEE Transactions on Systems, Man andCybernetics, 11, 781-791.

Martens, J.B.O.S. (1987) Applications of scale space to coding. IPO Manuscript 566, sub­mitted to IEEE Transactions on Communication Theory.

Martens, J.B.O.S. & Majoor, G.M.M. (1986) Scale-space coding and its perceptual relevance.IPO Annual Progress Report, 21, 63-71.

Martens, J.B.O.S. & Majoor, G.M.M. (1988) The perceptual relevance of scale-space imagecoding. IPO Manuscn'pt 619, to appear in Signal Processing.

Ohtsuka, S., Inoue, M. & Watanabe, K. (1988) Quality evaluation of pictures with multipleimpairments based on visually weighted error. SID 88 Digest, 428-431.

Roufs, J.A.J. & Bouma, H. (1980) Towards linking perception research and image quality.Proceedings of the SID, 21, 247-270.

Sjoberg, L. (1987) Psychometric considerations in the dimensional analysis of subjective pic­ture quality. Displays, 8, 210-212.

Watson, A.B. (1987) Efficiency of a model human image code. Journal of the Optical Societyof America A, 4, 2401-2417.

64

Page 64: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Perceived quality and contrast as a functionof gamma

J .A.J. Roufs and A.M.J. Goossens

Abstract

The luminance reproduction curve of a TV system is characterized by apower function over a significant range. The effect of the power gamma on per­ceptual image quality and on its dominant factor brightness contrast has beenstudied for still complex black-and-white scenes displayed on a TV monitor.Gamma was found to have a scene-dependent optimum value which is higherthan expected. This optimum was found to be uniquely determined by (subjec­tive) brightness contrast. The results also show that the ratio of the highest andlowest luminance in the scene is inadequate as a measure of brightness contrastof complex pictures.

Introduction

It has been known for many years that the reality of scenes is not most successfullydisplayed when the luminance or the ratio of scene luminances is reproduced exactly.On the contrary, nonlinear luminance transfer has been found to give better-lookingpictures. The luminance transfer of photographic pictures, slides or movie picturesis characterized by the so-called Hurter-Driffield curve, which is a plot of the opticaldensity as a function of the log of the illuminance of the photosensitive layer. Themiddle part of this s-type curve can be approximated by a straight line having aslope gamma, this being the exponent of the power function which describes theluminance transfer. In photography it was shown that, in most cases, the bestresults are obtained when gamma is greater than 1 (Breneman, 1962). In fact,for prints it should be about 1.1 and for transparencies (slides, movies) about 1.6.Bartleson and Breneman (1967) argued that this difference is due to the differentviewing conditions, slides or transparencies usually being observed almost in thedark. The scene is therefore surrounded by darkness. Prints are usually lookedat in moderately brightly lit environments and consequently have a relatively lightsurround. Due to the lateral effects of the surround on the brightness perceptionof the luminance pattern of the scene, this would cause a brightness reproductioncurve differing considerably from that elicited by the original scene. For example,the preferred gamma of about 1.6 for slides, viewed under their specific conditions,would produce a pattern of subjective brightnesses which would be the same as thatproduced by the real scene under normal conditions.

The luminance reproduction curve of a TV monitor is also an s-type curve andits most relevant middle part is usually also characterized by the exponent gammaof a power function.

Bartleson and Breneman (1967) argued on the same grounds that, in the caseof black-and-white TVs, which usually have a dim surround, gamma has to be

[PO annual progress report £3 1988

65

Page 65: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

about 1.2, in spite of the fact that TV engineers have traditionally argued that theoverall gamma of the transmission chain should be 1 (e.g. Schroter et al., 1956;Van de Poel & Valeton, 1954). In fact, in the present situation the set value of thegamma-correction network just behind the camera is 0.4 or 0.45. Together with atypical display gamma of 2.8 this leads to an overall value of 1.1 to 1.3. However,insiders know that the gamma corrector is frequently controllable over a short range,which enables the studio manager to optimize image quality by changing the overallgamma, be it in a limited range of about 1.0 to 1.5.

Since Breneman and Bartleson's work, the characteristics of TV monitors havechanged considerably. At present, when classical techniques are about to be radicallyimproved and new display techniques are being rapidly realized, the question of theoptimal value of gamma is topical once more.

In order to obtain more insight into the factors which determine perceptualquality as a function of gamma, quality and its underlying dimensions brightnesscontrast and sharpness have been scaled as a function of gamma.

Methods

Complex scenes carefully selected as suitable for quality judgement were photo­graphed with a very good camera using high-resolution film from which slides weremade. Every shot was immediately followed by an identical one, except that a greystaircase and resolution gratings were added into the scene in order to calibratethe transfer properties of the film. The transmission could be described in firstapproximation by a power function. The gammas of the slides were found to rangefrom 1.2 to 2.1. A video signal was generated with a Bosch slide scanner having acontrollable gamma correction. Only black-and-white images were used here.

A Gould-deAnza image processor was used to change the effective gamma ofthe scene under different ancillary conditions. In the first set of experiments, forinstance, the mean luminance of the scenes was kept constant while varying gamma.This was done in order to obtain an approximately constant mean brightness, sincethis is an identified quality factor by itself (Nakayama et al., 1980; Van der Zee& Boesten, 1981). At the same time the maximum displayed luminance was keptbelow the saturation knee in order to avoid clipping.

In a second set of experiments, constant mean luminance and constant lumi­nance contrast, i.e. the ratio of the highest to the lowest pixel luminance presentin the scene, were used. Again clipping was prevented. The values of the meanluminance and luminance contrast are both found by an iterative process and arefully determined by the scene and therefore no longer variable. These experimentsare part of a larger set which will be published elsewhere.

The processed image was displayed on a Barco colour monitor (30cm x 40cm)calibrated with a Pritchard luminance meter. The results of a central part measuring100 x 100 pixels were taken as typical, the deviations found over the screen area wererelatively small. The luminance is a power function of the video input over almost3 decades, gamma being 2.5. The processed image covered 27cm x 30cm.

For the data reported here, the subjects viewed the monitor from a distance of2.4m, this being 8 times the height of the monitor. The subjects' visual acuity wasmeasured with a Landolt chart at 5m and ranged from 1.25 to 2.00.

66

Page 66: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Quality, brightness contrast and sharpness were measured by psychometric scal­ing techniques, but always in separate sessions. The data reported here were ob­tained with a lO-point numerical category scale.

In every session the main series of randomized stimuli was preceded by 5 stimulicovering the entire range of parameters, thus making it easier for the subjects tokeep their ratings in the desired range.

In the event that the same subjects were to scale perceptive quality and bright­ness contrast, half the group scaled quality in the first session and brightness contrastin the second, while this order was reversed for the other half.

In the instructions to the subjects it was emphasized that only the relevantpsychological attribute had to be rated, excluding all other possible aspects. Thestimuli were usually presented 4 times in randomized blocks, the second and thefourth series in counterbalance with the first and third.

The subjects had about 15s to inspect the image. A homogeneous field having aluminance equal to the average value of all test stimuli, was presented for about 4sbetween stimuli.

Figure 1: Quality ratings by threesubjects as a function of the (overall)value of gamma for three differentscenes. Every scene has a constantmean luminance.

54

DEMER

TIE 121

3gamma

2

QUALITY SCALING

d:2.4m

6. subj J B+--~-\-'IW"<l=:;----j-----i 0 subj HR

o subj JW

10

8

6

4

I2

010

>- 8-ell::J

60-Q)>- 4uQ)

E 2::JCIl

10

10

8

6

4

2

01

67

Page 67: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

f:; TIE 121

1o DEMERo GROEN

uky 1 I 1.. j ilT IIv~ 1= r i 1

( P [" II

i

I jCf Ii

If HRAv. subjs. JW

TG

QUALITY SCALINGd'24m

10 10TIE 121DEMER

8GROEN

8

->- en- 6 CI:l 6...CI:l -::J C

0- 04 u 4

B E::J ::Jen 2 en 2

Av. subjs.

0 02 3 4 5

gamma

CONTRAST SCALING

2 3gamma

4 5

Figure 2: Mean quality ratings by the 3subjects in Figure 1 as a function of gammafor 3 different scenes. The mean luminanceof the scenes was kept constant.

Results

Figure 3: Mean values of (subjective)brightness contrast scaled by the subjectsin Figure 2 as a function of gamma for 3different scenes (constant mean luminanceof the scenes).

Results of quality judgements by subjects in experiment 1 are presented in Figure 1for three different scenes. TIE 121 is a portrait of a woman, DEMER a street scene

and GROEN is a greengrocer's shop. Scenes are clearly a larger source of variancethan subjects. In Figure 2 the averages of these three subjects are plotted showing

more clearly the profound optimum. Also a certain scene dependence is obvious.

10

8

>- 6-CI:l::J0- 4

B::Jen

2

00

QUALITY versus CONTRAST

d'2.4m

f:; TIE 121o DE MERo GROEN

HRAv. subjs. JW

TG

2 4 6 8 10subj. contrast

10

8

>--CI:l 6::J0-

E 4::Jen

2

0

QUALITY SCALING

2 3gamma

Av. 6 subjs.

4 5

Figure 4: Mean scaled image quality by 3subjects as a function of their scaled (sub­jective) brightness contrast for 3 differentscenes (constant mean scene luminance).

68

Figure 5: Mean quality ratings by 6 sub­jects as a function of gamma for 5 scenes.Mean luminance and luminance contrast ofthe individual scenes were kept constant.

Page 68: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

The scaled values of brightness contrast of the same scenes and experienced bythe same subjects are plotted in Figure 3. These curves show scene dependence, too.Since brightness contrast is known to be an important quality factor (Nakayama etal., 1980), this suggests a relation. Indeed, if scaled perceived quality is plottedas a function of scaled perceived contrast, the curves tend to be surprisingly close(Figure 4). The optimum value of the perceptive quality seems to be determinedalmost entirely by one value of the perceived contrast, corresponding to scale value6. This suggests a certain dominance of this factor under the present conditions.

So far, constant mean luminance and prevention of clipping were the main ancil­lary conditions for the luminance reproduction curve. As already pointed out, it ispossible to go one step further and keep the ratio of the highest pixel luminance tothat of the lowest found in the 'scene' constant too. Unfortunately, due to the lossof degrees of freedom, the value of the mean luminance cannot be changed anymore.

Figure 5 shows the mean quality ratings by 6 subjects judging 5 different scenes.Again, the optimum is clear and scene-dependent, even though the judgements ofthe three different pictures taken of the same model (Wanda) are close together.The brightness contrast ratings of the same scenes by the same subjects are drawnin Figure 6.

If quality judgement is plotted against brightness contrast ratings the curvescome together again (Figure 7). Figures 6 and 7 demonstrate that the luminancecontrast is a poor measure for subjective contrast of complex scenes since it waskept constant while varying gamma. Furthermore, they confirm the dominant ef­fect of brightness contrast on perceived quality. Sharpness, also a relevant qualitydimension which is gamma-dependent, does not seem to affect the optimal gamma(Roufs & Goossens, 1988).

QUALITY verSUS CONTRASTCONTRAST SCALING

d'2.4md,24m10 10

"I 101o WANDA 03o 160 OEMER

8 B >} GROEN

-enCIl~ 6 >- 6- -c

Cii0(J 61 1°1 :::J

4o WANDA 03 c:ro 16 4

"IS o DEMER E:::J >) GROEN :::Jen en2 2

Av. 6 subjs. Av. 6 subjs.

0 0

2 3 4 5 0 2 4 6 B 10

gamma subj. contrast

Figure 6: Mean values of scaled (subjec­tive) brightness contrast by 6 subjects asa function of gamma for 5 scenes. Meanluminance and luminance contrast of theindividual scenes were kept constant.

Figure 7: Mean scaled perceived imagequality by 6 subjects as a function of theirscaled (subjective) brightness contrast for 5different scenes (constant mean luminanceand luminance contrast).

69

Page 69: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Conclusions

• The quality of a picture is rather strongly affected by gamma, even if the meanluminance is kept constant.

• The optimum value of gamma, although scene-dependent under the presentconditions, was always found to be greater than 1, thus confirming Breneman'sconclusion that, under TV viewing conditions, nonlinear luminance transfergives more attractive results.

The results suggest that, on average, gamma should even be higher than thefrequently used value of 1.2. However, studio managers have the facility forchanging gamma over a limited range and they actually seem to do so.

• The optimal value of gamma was found to be dominantly determined by thechange in (perceived) brightness contrast, induced by variations of gamma,even when the luminance contrast was kept constant.

• From the preceding one would conclude that luminance contrast cannot be anadequate measure of brightness contrast of complex scenes.

References

Bartleson, C.J. & Breneman, E.J. (1967) Brightness reproduction in the photographic pro­cess. Photographic Science and Engineering, 11, 254-262.

Boesten, M.H.W.A. & Zee, E. van der (1981) Psychophysical versus psychometrical meth­ods in image quality measurements. IPO Annual Progress Report, 16,67-71.

Breneman, E.J. (1962) The effect of level of illumination and relative surround luminanceon the appearance of black-and-white photographs. Photograph~'c Science and Engi­neering, 6, 172-179.

Nakayama, T., Masaaki, K., Honjyo, K. & Nishimoto, K. (1980) Evaluation and predictionof displayed image quality. Proceedings of the Society for Information Display, Digest,1980, 180-181.

Poel, F.H.J. van de & Valeton, J.J.P. (1954) The flying spot scanner. Philips Technical Re­view, 15, 221-232.

Roufs, J.A.J. & Goossens, A.M.J. (1988) The effect of gamma on perceived image quality.Proceedings International Display Research Conference, San Diego, 1988.

Schroter, F., Theile, R. & Wendt, E. (1956) Lehrbuch der drahtlosen Nachrichtentechnik.In: N. Koshenewsky und W.T. Runge (Eds): Fernsehtechnik 1. Berlin: Springer Ver­lag, 39-47.

Zee, E. van der & Boesten, M.H. W.A. (1980) The influence of luminance and size on theimage quality of complex scenes. IPO Annual Progress Report, 15,69-75.

70

Page 70: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

COGNITIONAND

COMMUNICATION

72

Page 71: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Developments

D.G. Bouwhuis

Dialogues

Human-computer dialogues in natural language are of central concern in the Siemens­Philips cooperative SPICaS project. Two activities had a special emphasis in thisperiod; first the development of a Dialog Handler, second the efforts to resolve am­biguities in anaphoric references in a theoretically satisfactory manner.

The Dialog Handler deals specifically with a sequence of dialogue exchanges thatfollow from a single query. In the earlier SPICaS-I system only a single query, spokenand formulated in natural language, could be handled at a time. In the analysesof the input sentence, however, various sources of ambiguity, or even uncertainty,can be, and frequently are, encountered. This may happen in the signal-processingstages of speech recognition, during syntactic and semantic analysis and in queryevaluation. For the SPICaS-II system, that is now under development, togetherwith other partners within Siemens and Philips research, it was decided to havepotential ambiguities resolved by the system user from whom the query originates.To that end, specific questions must be formulated by the system expressing thecurrent problem in a way that is understandable to the user.

ane problem in particular that gives rise to ambiguities is that of anaphoricreference. In the example, 'Did A write to B? Did he write to C?', the pronounhe may be conceived most readily as referring back to A, but that need not be thecase with some other verbs. Current theoretical descriptions cannot handle resolu­tion of this phenomenon satisfactorily in a number of cases. Moreover, apart fromanaphoric (backward) references, forward (cataphoric) ones that are not covered bythese descriptions may also occur. A new, more comprehensive theory that canhandle a remarkable range of the reference problems that occur in natural languagehas now been developed at IPa. This theory will serve as the framework for theimplementation of natural-language query evaluation in the SPICaS-II system.

Empirical research has been carried out on the interpretation of dialogue utter­ances, other than by the informational content per se. A declarative sentence withan interrogative intonation, for example, seems to express at least some uncertaintyon the part of the speaker. Experiments were carried out, in which listeners judgedindeed that speakers had a weaker belief in their spoken statement when it wasexpressed interrogatively, among other things.

Interactive instruction

Evaluative trials were continued in the 'Reading Board' project, intended to deviseoptimal reading assistance for first-graders in a reading environment with fully in­teractive speech. Two main topics underlying the design of this system gave rise

IPO annual progress report 23 1988

73

Page 72: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

to detailed fundamental research, the monitoring of reading performance and thenature of feedback. The work on monitoring is described in a paper in this sectionby Ellermann and Van der Pol.

The problem of feedback is a long standing one in educational research. It is con­sidered by many to be the single most important variable in learning optimization.We have observed, however, that there is very little experimental data on how tostructure feedback exactly, as to quantity, content and, not least important, timing.We may note here, too, that in the application of feedback, human factors play animportant part, especially in temporally extended training and rehearsal.

A third topic that has been actively explored in the past year is the optimizationof paired-associate learning. Here too, we find that the inherent application value isin need of considerable fundamental research, as well as of ergonomic design. Twopapers, one by Engel and Geerings, the other by Ellermann and Free detail some ofthese issues in this section.

Multimodal interaction

By no means all human interactive behaviour takes place by way of spoken or writ­ten information. Gestures, pictures, hesitations, facial expressions,embarrassedsilences, and motor actions are natural in human expressive behaviour. Whereasmost computer systems have graphical output as a standard option, the visual sys­tem has an active part in Human-Computer interaction. This year we have exploredthe ways in which visual selective attention can be deployed both experimentally (incollaboration with the Dept. of Psychology, University of Kansas, Lawrence) andtheoretically. From the results, ever more evidence accrues that visual attention canbe flexibly tuned to small foveal areas or to very wide ones, even at the expense offoveal information.

In the field of visual word recognition we have specifically studied the effectof orthographic neighbourhood. This is, essentially, concerned with the way thehuman perceptual system deals with the recognition of visually similar words. Theexperiments are described in a paper by Grainger, who is now at the CNRS 'GroupeRegard' at the Universite Rene Descartes, Paris.

This word recognition study was carried out in close connection with a theo­retical study on neural networks, supported by interdisciplinary seminars organizedregularly at IPO. Neural network modelling is also the main activity in a cooper­ation agreement with LIMSI-CNRS, Orsay, France supported by the NetherlandsOrganization for Scientific Research NWO.

The special emphasis that current and future information systems will impose onHuman Factors research has motivated a stronger theoretical approach, in additionto the above-mentioned experimental activities. (See also the section on InformationErgonomics). Specifically, we are exploring layered-protocol models and combinedlinguistic-graphic interaction possibilities. In this field we pursue an active policy ofscientific exchanges with institutes specializing in the same fields.

74

Page 73: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Question presentation methods forpaired-associate learning

F .L. Engel and M.P.W. Geerings'

• M.P. W. Geerings is from Philips Research Labs Eindhoven.

Abstract

Four different methods of question presentation, in interactive computer­aided learning of Dutch-English word pairs are evaluated experimentally. Thesemethods are: 1) the 'open-question method', 2) the 'multiple-choice method',3) the 'sequential method' and 4) the 'true/false method'. When consistentlyapplied over a learning session, the true/false method provided the highestlearning rate.

In the true/false method, questions consist of the relevant (Dutch) stimulusword together with a possibly correct (English) translation. The student indi­cates whether the displayed combination is 'true' or 'false'. For reinforcement,the correct combination is thereupon displayed. The question selection strategydepended in all four cases on the student's response history.

The learning rate obtained for the true/false method was found to be almostthe same as that resulting from the traditional way of noninteractive vocabu­lary learning, that is the 'paper-and-pencil' method. Students preferred thetrue/false method to the paper-and-pencil method because of its vividness andadministrative support. It is suggested that, for optimal results, the appliedpresentation method should also vary, depending on the student's mastery ofthe word pairs.

Introduction

Objects and their names, printed words and their pronunciation, the tables of multi­plication, foreign vocabulary, foreign countries and their capital cities are examplesof paired associates to be learned in daily life. For most students, paired-associatelearning is an important, frequently occurring but tedious task. Therefore it is ofinterest to determine the potentials of the computer as an aid in this area, and tooptimize its performance accordingly.

In a pilot experiment (Engel, Andriessen & Welles, 1977), we examined thestrategies that adult students spontaneously apply in learning Dutch-English vo­cabulary when provided with paper, pencil and a list of word pairs on paper. Wealso examined the situation in which students were provided with small cards, eachcontaining one of the word pairs. From these experiments it became apparent thatmost of the available study time was spent on repeated self-testing, about eight outof the ten minutes available. We also observed that the students who learned fastertrained in particular on the items not yet mastered, that is by marking those alreadyknown, respectively by producing specific stacks of item cards, for instance thosealready known, difficult to learn, etc. This behaviour was found to be in contrast

[PO annual progress report 2.9 1988

75

Page 74: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

with that of the slower learning students who, in the main scanned the original list(or stack) repeatedly from top to bottom. As they afterwards acknowledged, mem­orizing itself already required so much effort that they could not afford the requiredamount of administrative work, needed for an adaptive style of learning, althoughthey admitted it to be advisable. Obviously, a computer can easily perform such anadministrative task.

When applying a computer, optimization of its user interface is a major designaspect. Accordingly, in computer aided learning of paired associates, an importantquestion is the optimal presentation of the training questions. We have thereforedesigned four methods of question presentation, which rely to different degrees uponthe two nonequivalent measures of learning, that is recall and recognition. As in­dicated, for instance by Brown (1976), the two steps functionally involved in recallare retrieval of an answer and recognition of its correctness. Our goal in the learn­ing of foreign vocabulary is recall rather than recognition of the correct translation.Accordingly, this paper describes our experimental evaluation of the learning effec­tiveness of four different methods of computerized question presentation, as regardsrecall.

Method

Presentation methods

Two well-known methods related to recall and recognition are available for present­ing test questions to a student, respectively the 'open-question method' and the'multiple-choice method'.

With open questions, the student is assumed to give the full answer, whichrequires recall of the correct translation as well as the correct spelling. In the caseof computerized testing, this method also calls for typing skill and from the computerside preferably some capability of handling correct alternative answers, spelling, etc.

The drawback of typing can largely be avoided by using the multiple-choicemethod. Multiple-choice questions have the disadvantage that they only requirerecognition, rather than recall of the correct answer, while a correct answer can alsobe obtained by elimination of improbable alternative answers, or simply by guessing.

Beside these two well-known methods, multiple-choice questions can be presentedin two other ways which are perhaps better adapted to the student's learning goal,that is recall of the foreign equivalents.

In the 'true/false method', each question is presented in combination with onlyone of the multiple-choice answers from the related 'answer list'. The correct com­bination is presented in half of the trials. The student's task is to indicate whetherthe displayed question-answer combination is 'true' or 'false'. The true/false methodprevents the student from deriving answers through elimination of improbable al­ternatives, as can be done in the multiple-choice method. In general, however, theprobability of guessing the correct answer is higher.

The desired recall property of the open-question method might be approximatedeven more closely by the 'sequential method'. In the sequential method, the open­question is presented first and, after a given fixed pause that enables the studentto recall the correct answer, the related multiple-choice answers are successivelypresented, one by one in random order. The student has to hit a 'stop' key as soon

76

Page 75: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

as the correct answer is presented. The display timing permits the student to reactcorrectly, if he is capable of recalling the correct answer in time, i.e. during theinitial pause. Because of the applied time pressure, derivation of the correct answerby elimination of improbable alternatives is less productive.

We employed the traditional 'paper-and-pencil method' for control purposes. Inthe latter case, the students are provided with a list of Dutch-English vocabularywords on a sheet of paper, together with a pencil and scribbling pad for makingnotes.

+adaptivequestionselection

+display

open question

+

®- typeanswer

+confirrnative

feedback

+

Open questionmethod

+adaptivequestionselection

+di splay question+ answers 1,2,3,4

•®- number of

the correctanswer

+confirrnative

feedback

+

Multiple choicemethod

+adaptivequestionselection

+display question

+ answer x

+®- true or

false

+confirrnative

feedback

+

True/falsemethod

adaptivequestionselection

display questiondispl. question + answer 1displ. question + anwers 2

etc.

confirrnativefeedback

Sequentialmethod

Figure 1: Flow diagrams of the test-study cycle of the four interactive question­presentation methods. Encircled S's indicate locations where action by the student isrequired.

Figure 1 gives flow diagrams of the test-study cycle of the four question presen­tation methods. The corresponding question is displayed after selection of an itemby means of the adaptive strategy to be described in the next section. In the open­question method, the question remained on the screen until the student indicatedthe end of his response by means of a 'carriage return'. Typing errors could becorrected before giving this sign. If the student did not know the answer he couldindicate this immediately by means of the carriage return.

To stimulate the student to make a quick response, the computer erased thequestion after Tl seconds (default value 3 s) in the multiple-choice and true/falsemethods. The question could be displayed again for Tl s periods by controlling the'repeat' button.

In the sequential method, the open question was first displayed for 2·Tl sec­onds, thus providing some time for recall of the correct answer. Then the questionreappeared in combination with answers from the related response list, the answersbeing presented one by one in random order for Tl seconds each. The sequencestopped the moment the student indicated what he thought was the correct answerby means of the 'stop' button. The sequence stopped automatically on presentationof the correct answer. This was done to increase speed and to prevent unnecessaryconfrontation of the student with false stimulus response associations.

77

Page 76: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Immediately after receiving the student's response, the computer indicated inall four cases: a) whether the reponse was correct, and b) the correct combinationof question and answer, for the purpose of learning. To obtain a more pronouncedfeedback, this message was displayed for a longer period (T2 instead of T1 seconds)in the case of an incorrect answer. Then the next question appeared, or the systemstopped by indicating that the required final score had been achieved.

The initial values for Tl and T2 were 3 and 5 seconds, respectively. Studentscould simultaneously decrease or increase both values by 30% steps by handling thebuttons 'faster' and 'slower'. Accordingly, the training speed could be adapted tothe student's requirements. Moreover, it enabled the student to adjust the difficultyof the question phase of the sequential method.

With the exception of the sequential method, question display was student-paced.The related pause ended by the student's response. In the sequential method, thispause could not be placed during the question-answer period, since in that casetime was the essential question ingredient. Therefore it was placed in the feedbackperiod. It ended when the 'start' button for the next question was pressed.

Adaptive strategy

The adaptive strategy, which selects the items to be tested during learning in accor­dance with the current level of the student's lesson knowledge, is based on the onedescribed by Atkinson and Paulson (1972). It works as follows.

1. The computer operates a 'scorecounter' for each item to be learned. All coun­ters are reset to zero at the beginning of learning. After a correct answer,the relevant counter is increased by one step. However, if an incorrect answeris given, the related counter is reset to zero. Errors made in the final stageare therefore penalized more heavily than errors made at the initial stage oflearning, thus diminishing the chance of achieving high scores by guessing theanswers.

2. A 'sublist' is compiled out of the four items with the lowest scores, which arethen presented sequentially to the student. After getting through the sublist,a new one is composed again containing the four items with the lowest scoresat that time, etc. Because of this procedure, incorrectly answered items arerepeated within four questions.

3. As soon as all scorecounters are above or equal to a given threshold level, thelearning procedure is stopped. We applied a 'final score level' of three.

As mathematically derived by Karush and Dear (1966) to be optimal for the 'all-or­none model' (Atkinson, Bower & Crothers, 1965) of paired-associate learning, ourstrategy also chooses that item in each trial whose current probability of being inthe learned state is least.

Subjects

Some days before the experiment, candidate subjects were jointly introduced to theexperimental set-up, by showing the different methods of learning and by informingthem about our aim to find optimal methods with regard to learning support and

78

Page 77: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

efficiency. They then had to decide whether or not to take part in the experiment;some two or three opted out. The twenty volunteers who actually participated inour experiment were young technicians (between 20 and 30 years of age) from theelectronics workshop of our laboratory.

Material

Five different lessons were prepared, each consisting of twelve different items, so that60 items had to be learned in all. Each item consisted of a Dutch word togetherwith its response list containing the correct English translation, together with three(incorrect) 'alternative answers', see Figure 2.

Onschuldiggerechtigheidbedriegengeest (verstand)tederhemel (godsdienstig)gidstreurigevangelieluigeest (ziel)rechtvaardig

=innocent (innocuous, hypocritical, impeccable)=justice (virtue, justness, judging)=to deceive (to lie, to flatter, to decease)= mind (sense, ghost, mood)= tender (sensitive, tending, soft)= heaven (sky, air, paradise)= guide (director, guardian, adviser)= sad (threat, sage, regretful)= gospel (sermon, troth, belief)= lazy Qee, dull, torpid)=spirit (ghost, mind, science)=just (straight, right, loyal)

Figure 2: The lesson material used in lesson 2. From left to right,each item consists of the Dutch stimulus word, its English transla­tion and, between brackets, the (three) related incorrect alternativeanswers.

An attempt was made to equate the levels of difficulty of the five different lessonsby selecting words with little similarity to the Dutch equivalents from the first 2000English high-frequency words from the category 'inner life', as given by De Groot(1970), and by distributing them randomly over the five lessons. The alternativeanswers were chosen so as to represent 'attractive' false translations, but however,in order to prevent interaction among the items to be learned, no correct responsewords from the other 59 items were used for them.

Procedure

We requested the students to study all five lessons in a single morning, using theabove-mentioned methods. As an introduction to learning and for the purpose ofcontrol, the first combination in each sequence was fixed, that is the paper-and-pencilmethod, combined with lesson 1. A 4x4 'Graeco-Latin square' design was used forthe remaining method/lesson combinations. Accordingly, four different sequences ofmethod/lesson combinations were used. With the twenty students available, eachsequence could be applied 5 times. Method/lesson trials consisted of the followingfive steps:

1. For assessing the initial lesson knowledge, the student had to write down theEnglish translations of the twelve words of the given lesson. The number ofcorrect answers was called the student's 'initial score'.

79

Page 78: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

2. With exception of the paper-and-pencil method, there followed a brief trainingsession with the learning method to be tested, so that the student could becomefamiliar with it. Special demonstration lessons of four items were available forthat purpose.

3. Next the specific method/lesson learning session was performed. In the paper­and-pencil method, the students were asked to finish learning the momentthey thought they had mastered the lesson. In this way we tried to prevent'saturated scores' of the faster-learning students, as could result with a fixedlearning period. With the interactive computer methods, the adaptive strategydetermined the moment of finishing (viz. a final score level of 3, for all itemsin the lesson to be learned). In order to complete the full series of five methodsper student in a single morning, a maximum of 30 minutes per method/lessontrial was maintained.

4. Immediately after the learning period, a written end test was given, consistingof open questions on the twelve items just studied. To cancel possible positionaleffects, the sequence of twelve words was permuted with regard to the initialtest. The number of correct answers achieved in this test is called the student's'end score'.

5. Finally, after one week, each student did a written retention test, consisting ofopen questions on the 5·12 words studied in all. The test list was randomizedagain to prevent positional effects from occurring. The number of correctanswers, reassigned to the corresponding lessons, is called the 'retention score'.

Score12

10

8 :;:;:

6

4

2

o

[J] Initial score

[] End score

D Retention score

OpenQuestion

Sequen­tial

True/False

Paper &Pencil

MultipleChoice

_-------Methods _

Figure 3: Initial, end and retention scores averaged over thetwenty students for the four interactive methods of computer­ized question presentation, as well as for the paper-and-pencilmethod. Thin vertical bars indicate the 95% confidence in­tervals of the means.

80

Page 79: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Experimental results

Besides the initial, end, and retention scores obtained from the handwritten tests,our computer loggings supplied data on the duration of each learning session, the to­tal number of answered questions before finishing and the percentage correct answersgiven during learning.

Figure 3 summarizes the averaged scores made with the different presentationmethods. As can be observed in Figure 3, the interactive open-question methodand the passive paper-and-pencil method were most favourable with regard to thenumber of associations learned. The other three methods produced poorer results.The differences in retention scores are not statistically significant. Nevertheless, withtheir roughly 55% lower values, they show the same tendency as the end scores, thussuggesting similarity between the different methods, as far as forgetting is concerned.

Quite a different picture emerges when we consider learning rate as derivedfrom the ratio of the difference between end and initial score and the duration ofthe related learning session. Figure 4 illustrates learning rate and study durationfor the four interactive methods, as well as for the paper-and-pencil method. Tocompensate for the proportionality found between the calculated averages and theirvariances, logarithmic scaling has been used in Figure 4.

The true/false method and the paper-and-pencil method were the fastest meth­ods, yielding a learning rate of about one item per minute. Of the two, the paper­and-pencil method gave the highest end score, but also needed the longest training.

30 I-

Learning 1.60rate

Items/min.

Studyduration

min. 20

t 1.00 t 14

10

4L......1"""-_....L;;-'L-_=_~""-_

0.50

Paper & True! Multiple Sequen- OpenPencil False Choice tial Question

8

6

5

'-­

Paper & True/ MUltiple Sequen- OpenPencil False Choice tial Question

_---Methods .- ... Methods _

Figure 4: The mean values of learning rate and study duration on a logarithmic scalefor the four interactive question presentation methods, as well as for the paper-and-pencilmethod. The 95% confidence intervals of the means are indicated by thin vertical lines.

81

Page 80: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Discussion

The main finding of our experiments is that the best interactive presentation strat­egy, that is the true/false method, yielded a learning speed almost identical to thatobtained with the passive paper-and-pencil method.

Under less stressful situations, without experimental inquisitiveness, such as athome, it will be more difficult to keep one's attention focussed while studying bythe paper-and-pencil method than the true/false method. In line with this view,Engel and Andriessen (1981) found in field experiments at home and at school,that students preferred learning by means of the true/false method to learning frompaper. They indicated that this stems from the temporal aspects of the interaction,the fair chance of giving a correct answer during initial learning, the vividness ofthe stimuli with ever-changing responses presented as questions, the adaptivity ofthe questions posed and the relief from having to administer the items not yetmastered. We assume that these aspects will acquire greater weight accordinglyas learning requires more effort, thus making computer-assisted learning of pairedassociates a desirable tool, especially for students who learn slowly.

Nevertheless, on considering the above-mentioned result, the question arises whythe true/false interactive method, adapted to the current level of mastering, doesnot produce higher learning rates than the paper-and-pencil method. In principlethe cause has to be sought in a certain inefficiency of the presentation and selectionstrategy of the items. It is indeed possible that the chosen adaptive strategy fails toselect the not yet mastered items fast enough, compared to the student-monitoredpaper-and-pencil method. The system requires a number of tests before a sufficientlyaccurate image has been built up as to the student's current knowledge level of thepaired associates to be learned, that is at least three presentations in our experi­mental setting, before an item is assumed to be known. However, if we take intoaccount that our averaged initial score amounted to about one item, this initiationeffect was not the main reason. Furthermore, near termination of a learning sessiona certain excess could occur, namely, if certain difficult items remain near the endof a session, so that the strategy's presentation sublist is composed of these harditems together with a number of items already mastered. These already mastereditems will, no doubt, involve a certain amount of 'over-learning'. Hence, it cannot beexcluded that manipulation of final score-level and sublist size might yield certainimprovements in learning speed. As indicated by Atkinson (1972), it might also beworthwhile to optimize the presentation strategy to the a priori difference in itemdifficulty.

In view of a less adaptive behaviour of the students with a slower learning rate(Engel, Andriessen & Welles, 1977)' we are less inclined to believe in a 'free' item se­lection strategy as the main sequencing procedure for a learning aid, as advocated byEllermann and Vloet (1987). They studied computer aided learning of associationsbetween Japanese Katakana characters and their written phonological equivalentsand concluded that the strategy with the greatest freedom of item selection gave thebest learning results. However, their subjects were highly educated and thereforeprobably also proficient learners, requiring no 'book-keeping' assistance. In contrast,free selection can be relevant indeed to inhomogeneous and/or context-rich paired­associate lesson material in computer-assisted instruction, see Engel, Andriessenand Schmitz (1983). In the past, such computer-controlled environments in fact

82

Page 81: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

contained unnecessary sequencing constraints.Although appropriate for the final state of mastering, the open-question method

is comparatively difficult for initial learning of paired associates, because the answerhas to be correct precise to the letter from the start. This led in some of the open­question experiments to negative emotional reactions from the students! Initiallearning of the items seems more related to the recall of broad aspects, such as theinitial part of the response word, its length, etc., and not so much to the exactspelling of the English word. Typing errors were not the main source of errors inthe open-question method, since they could be corrected before giving the carriagereturn command.

With regard to the sequential method, students stated that the presentation mo­ment and speed of the successive answers often took them by surprise, especially ifthe first alternative happened to be the correct one. A 300 ms period had alreadybeen deliberately introduced, between the open question and the sequence of succes­sively presented alternatives, thus enabling the student to be prepared and to directhis eyes to the relevant part of the screen, that is the location where the alternativeswere to appear. Our students nevertheless considered the sequential method to be'confusing', requiring much of their attention during the answering period. Perhapsit would be better to introduce the existing pause for pacing, now coinciding with theconfirmative feedback, see Figure 1, not before but after the open question, so thatthe time available for recalling the answer is controlled by the student. It was alsothought that 'attractive' false alternatives made students doubt about the answerthey had in mind. Because of the rapid successive presentation of the item sublist,they were not in the position to ponder long about why the false alternatives wereincorrect. This confused the students during 'initial' learning in particular. It isassumed that both the open-question method and the sequential method are bettercandidates for 'final' learning.

For optimal results, perhaps the type of question presentation has to be adaptedto the state of mastery of the items. The true/false method seems quite suitable forinitial learning, since the probability of giving the correct answer is high (0.5), whilethe number of questions posed per minute is also high, making it a motivating andplayful setting. It could then be followed by methods with successively less chance ofguessing, viz. by the multiple-choice, the sequential and the open-question method.

Conclusions

In this paper it has been shown that, with the introduction of the computer as an aidin learning paired associates, investigation of item-presentation methods as part ofthe user interface, is a worthwhile subject in addition to the already established top­ics of item-sequence strategies, the use of voice output (Spaai, Ellermann & Reitsma,1986) and of mnemonics. The applied presentation methods showed clear differencesin learning and motivation. It is proposed to apply the true/false method in theinitial phase of learning because of its relatively high chance of correctly guessingthe answer and its speed of testing. After that, the multiple-choice, sequential andthe open-question methods could follow, respectively. The conditions for optimalswitching between them are unknown, however.

83

Page 82: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Acknowledgements

The ideas on the different presentation methods originated from close cooperationwith J.J. Andriessen in the earlier stages of our work. Helpful comments by D.G.Bouwhuis and H.H. Ellermann on an earlier version of this paper are gratefullyacknowledged.

References

Andriessen, J.J. & Engel, F.L. (1978) First experiences with computer-aided learning ofpaired associates: The sound on slide system. Philips Research Eindhoven: Nat.Lab.Report 5429.

Atkinson, R.C. (1972) Optimizing the learning of a second-language vocabulary. Journal ofExperimental Psychology, 96, 124-129.

Atkinson, R.C., Bower, G.H. & Crothers, E.J. (1965) An Introduction to MathematicalLearning Theory. New York: John Wiley & Sons.

Atkinson, R.C. & Paulson, J.A. (1972) An approach to the psychology of instruction. Psy­chological Bulletin, 78, 49-61.

Brown, J. (1976) An analysis of recognition and recall and of problems in their comparison.In: J. Brown (Ed.): Recall and Recognition. New York: John Wiley & Sons.

Ellermann, H.H. & Vloet, P.J.C. (1987) An experimental evaluation of three computer en­vironments for the learning of paired associates. IPO Annual Progress Report, 22,97-108.

Engel, F.L. & Andriessen, J.J. (1981) Educational technology research: Computer-aidedlearning of a foreign vocabulary. Educational Technology, 21, 46-53.

Engel, F.L., Andriessen, J.J. & Schmitz, H.J.R. (1983) What, where and whence: Meansfor improving electronic data access. International Journal of Man-Machine Stud­ies, 18, 145-160.

Engel, F.L., Andriessen, J.J. & Welles, P.W.G. (1977) Onderzoek naar individuele strat,e­gieen bij het leren van Engelse woorden. Philips Project Centre Geldrop: PCG Report7009.

Groot, G. de (1970) Engelse Woordenschat, Alphabetische Basis- Vocabulaire met Systema­tische Uitbreiding. Groningen: Wolters-Noordhof-Longman.

Karush, W. & Dear, R.E. (1966) Optimal stimulus presentation strategy for a stimulus sam­pling model of learning. Journal of Mathematical Psychology, 9, 19-47.

Spaai, G.W.G., Ellermann, H.H. & Reitsma, P. (1986) Effects of two forms of sound feed­back on learning to read words. IPO Annual Progress Report, 21, 77-85.

84

Page 83: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

MIR: A Monitor for Initial Reading

C.D.J .M. van der Pol and H.H. Ellermann

AbstractThe development of our Monitor for Initial Reading (MIR) is based on two

global ideas. The first is that the software has to be developed on an incrementalbasis. MIR must be made so that it can be put to test and modified at almostany phase of development. The second idea underlying MIR is that individualreading exercises, which are dedicated to certain subskills of reading, shouldbe the most elementary constituents of MIR, and not descriptions of whatis learned in initial reading. The motivations and some of the consequencesof these two guidelines are presented in this paper. A short description of thestructure of the program is also given and the possible generalization to domainsother than initial reading is discussed.

Introduction

The ability to read words presupposes a certain proficiency in global word recognitionand/or the ability to recognize individual graphemes (spelling) and synthesizingthem into words. Many exercises for initial reading (designed for children in theage group of five to seven) try to accomplish this. They have been developed onthe basis of much experience and research (Caesar, 1980). Some exercises consistof training picture - word associations or grapheme - phoneme relations. In otherexercises, the synthesizing of phonemes into words is learned. The aim of all theexercises is to learn to read single words. Although there is some controversy asto whether the emphasis on learning to read single words is the appropriate one, itseems fair to say that word-recognition skills are perhaps the most important ones ininitial reading, because so many other reading skills (like comprehension) correlatehighly with word-recognition skills (Feenstra & Seegers, 1985).

Despite the amount of research done in this area, it is not always clear why thereading exercises are effective, probably because there is at present no clear insightinto the type of processes underlying reading, not to mention the problem of howthose processes interact in initial reading (Lesgold & Perfetti, 1981). It is a fact,however, that the set of reading exercises prescribed in modern reading methodsdo work, in general (Caesar, 1980). It can be said, therefore, that there is betterknowledge of how reading has to be taught than on what is being taught.

This simple fact has important consequences to the way a program for initialreading has to be designed, more or less as ambitiously as any of the modern Intelli­gent Tutoring Systems (ITSs), see Sleeman and Brown (1982b). As Wenger (1987)notes, knowledge about how subjects have to be taught was more a characteristicof the older Computer-Assisted Instruction (CAl) literature, while this knowledgein modern ITSs is (at least that is the idea) more a by-product of the fact that thesystem has more general knowledge about the domain taught, the skills of the stu­dent and pedagogical principles. The system that will be described in what follows,

IPO annual progress report 23 1988

85

Page 84: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

will be simply called Monitor for Initial Reading (MIR). It resembles a CAl systemmore than an ITS in this respect.

The goal of MIR is to improve the guidance of learning by adapting the wordmaterial as well as the exercises to the needs of the individual child. For thatpurpose various tests have been applied, viz. pre-, online- and post- tests of readingperformance. In these tests, reading performance is measured by testing correctnessof reading of the specific word material. Training in specific words can be givenin the various exercises. The ultimate goal is to adapt the applied type of exerciseto the online scores. In view of the erratic reading behaviour of young children, aserious problem is how to deal with the high variances in test scores. That is, at onemoment a certain word may be read correctly and very fast, another time it may beread incorrectly or correctly but very slowly. Sleeman and Brown (1982b) note thataccounting for such noisy behaviour is of great importance in a practical setting. InMIR this problem is attacked by making a sharp distinction between the exercisesand the words used in the separate exercises. Using this distinction, we believewe obtain a more stable measure of the progress of an individual student. Theaverage performance per word over all kinds of exercises indicates the 'familiarity'of the particular word to the particular child. The average reading performance perexercise over all trained words and children gives an indication of the 'effectiveness' ofspecific exercises in training. In order to make the applied type of exercise adaptive,we aim at a reading performance factor per type of exercise for a specific child. Forthe time being, MIR records a performance measure for each word over all exercisesand a performance measure per exercise over all words used for the selection ofexercises and words.

The following is a final point worth noting. When any learning activity is com­puterized it is sensible to ask whether the computer is really needed and whetherlearning cannot be done (better) with traditional means, e.g. with the help of ateacher, or by using pencil-and-paper exercises. This question leads to experimentalwork in which several media, such as school television, video, pencil-and-paper, lan­guage laboratories, etc., are compared as to their effectiveness in improving learning.Before such a test is made, the new medium should be optimized by concentratingon its strong points and, where possible, circumventing its weaknesses. Therefore,initial within-medium comparison (Clark, 1983) is chosen as the first step for thedevelopment of MIR and any comparison between the various media is omitted forthe time being.

Description of MIR

What is presented here is a global description of the system. As stated, readingexercises are the main building blocks of MIR. An example of a picture-word asso­ciation exercise, which is also used in MIR, is now given. After a spoken instructionabout how the task has to be performed, the program draws a picture in the middleof the screen. Then three alternative words and the correct word appear in randomorder on the right side of the screen. The child has a limited time (typically 10seconds) to point out the correct word with a mouse. After the response, feedbackis given and the next trial starts. More detailed descriptions of this experimentalwork and of the developed software can be found elsewhere (Reitsma, 1987; Spaai,Reitsma & Ellermann, 1988; Ellermann & Spaai, 1988; Ellermann, Van den Buys &

86

Page 85: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Van Dongen, 1987).The rest of this section serves to give a short sketch of MIR. For details of

the implementation, the reader is referred to Verwegen (1988), Van der Pol (1988)and Van der Pol (l989). The structure is such that it can easily be generalized toeducational situations other than inItial reading. We will first go into a little moredetail about the process of software development as such. We will then discuss thestructure of MIR.

Incremental software development

In view of the fact that precise system specifications cannot be given in advance,but have to be determined experimentally, the system has to be tested and modifiednot only in the final, but also from the earliest stages of development. The requiredflexibility is not easily obtained with the conventional functional approach (e.g. inPascal). The same is true of the data-flow-oriented approach (e.g. in Cobol). The'object-oriented' programming approach may overcome the problem (Cox, 1986). Inthis approach, 'objects' (modules) are defined that contain both data and relatedprocedures. Objects communicate with each other by means of 'messages', whichare in fact requests for execution of certain actions (procedures) on the related data.Results of these actions are returned, by means of messages to the calling object,or by means of screen updates and/or changes in data. For instance, an object'square' might receive a message 'display_yourself' and will give an image of itselfon the screen as a result. Manipulations of the data of objects are only possible bypassing a message to it, containing a request to perform the desired manipulation.In our case we applied 'Object-Pascal' (Schmucker, 1986), on an Apple Macintoshmicrocomputer.

Little is known from the literature about methods of deriving objects for a specificapplication. We have applied a kind of entity-relationship analysis, as known fromrelational data base design (Date, 1986)' to define the objects to be used. Fromthis analysis entity-relationship clusters were found that had a close correspondencewith the tasks that can be distinguished in MIR.

Tasks ofMIR

The tasks of MIR are:

1. Selecting the words to be exercised;

2. Presentation of the reading exerCIses, together with registration of the re­sponses made by a child;

3. Diagnosis of the progress of this individual child;

4. Adapting the type and time needed for exercises and words to be presentedto the child. In order to accomplish this, the progress per word practised ina session is determined first. Second, the decrease in time needed to finish aparticular exercise is estimated. At this moment, little is known about adaptiveschedules for the optimal sequencing of reading exercises. For the time beingMIR uses a non-adaptive fixed sequence of reading exercises;

5. Generating for the teacher a survey of the progress of specific children.

87

Page 86: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

The indicated tasks are in agreement with the structural model of an IntelligentTutoring System proposed by O'Shea (1984).

Objects of MIR

The applied software objects and their relations are shown in Figure 1. The linesin Figure 1 represent the flow of information between the objects. For the sake ofsimplicity, only the main flow of messages between objects is shown. MIR startswith activation of the Tutor object. Each user has his or her own diskette with userdata.

Figure 1: Simplified flow of information between the different objects of MIR.

Tutor: By reading the inserted user diskette this object decides who uses the ma­chine. There are two kinds of users: the teacher or experimenter and thestudent (child). When it is a teacher, the object Teacher (see below) takescontrol. When it is a student, the object Student (see below) becomes active.

Teacher: This object is an administrative tool used by the teacher or experimenter.Students can be entered into, or deleted from the system. It is possible toobtain a 'full report' (date, time, kind of words, type of exercise, readingperformance and time needed) of the progress of any student or school class,including a list of words, badly or not known. It is also possible to determinethe complete schedule of exercises to be administered for any individual studentor school class. This object is not fully implemented yet, but access to all the

88

Page 87: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

above-mentioned data, as can be found in the object Agenda, is provided viaan editor.

Student: This module consists of a number of objects, namely the object Agenda,the set of objects Exercise and the object Vocabulary. Student sends a messageto Agenda, asking what exercise and kind of words have to be practised. ThenStudent sends a message to Exercise to start the particular exercise and amessage to Vocabulary to select the desired words indicated by ~genda. Infuture, more of such tasks will have to be done by Student, especially makingmore refined decisions about what exercise and kind of words to practise aresuitable for a particular student, depending on his or her results.

Agenda: In the current version, this object contains the full predefined Schedule ofthe exercises and the kind of words to be used. It also contains the student'sHistory.

Schedule: This contains at present a fixed sequence of prescribed exercises. In thefuture this module should be adaptive.

History: A summary description of the performance of each child, which can beinspected by the object Teacher. After execution of an exercise the full reportis added to History.

Exercise: This set contains the different exercises for controlling the presentationof individual items. By means of Schedule, the object Student, via the objectAgenda determines the exercise to be selected, and sends a message to thatparticular exercise to become active. The related screen layout, as well astiming of the individual exercises are described in Van der Pol (1989). Feed­back is given in all situations, as the correctness of an answer is always given,following which the correct answer is pronounced and indicated visually. Aftercompletion Exercise sends a diagnosis in the form of a full report to Agenda.In the final version of MIR the diagnosis will be more refined.

Vocabulary: It functions like a kind of dictionary of all the words that could pos­sibly be presented. The dictionary is organized in sections, each containingwords of the same kind of difficulty, as defined by Caesar (1980). In an exercise,words belonging to the same section are trained. Each word is represented ina separate object called Item. The object Vocabulary asks Agenda via Studentfor a selection criterion by which the practise words have to be selected. Theschedules implemented in the current version (Van der Pol, 1989) select wordseither sequentially or randomly. Alternatively, words are selected that havenot yet been answered or have been answered incorrectly.

Item: Besides the words and graphemes to be trained, this set contains the al­ternative answers for the different exercises. Each separate object Item hasinformation about the possible ways an item can be presented: printed words,graphemes, pictures or spoken. Item keeps track of the detailed history of thestudent actions on the item, that is the number of presentations, whether theanswer was correct at any moment and, of course, the response time.

89

Page 88: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Discussion

The clear separation between the modules containing the exercises, the materialto be learned and the control mechanisms constitutes the strength of MIR. Theapplications of MIR are such that (sub)skills are testable and trainable in the contextof all well-defined exercises. The structure of the domains taught has to be similarto that of the ones used here, i.e. a set of items to learn and a set of exercises bymeans of which the items can be practised. Accordingly, other fields in which aMIR-like system could be useful are, for instance in learning a second language andtopography.

The control mechanisms might depend on the application. In most cases theresponse given during training can be relatively simple. This also has implicationsfor the student model (history) used by the control mechanism. At this moment,MIR contains two nonadaptive strategies (random and sequential item selection)and a primitive adaptive strategy that handles incorrectly answered items. Moreadvanced adaptive item-selection strategies are known (Groen & Atkinson, 1966)and can be quite easily incorporated into our object-oriented environment. Quite adifferent point is the lesson (type of exercise plus vocabulary) sequencing strategy.This aspect has to do with transfer of learning, and is still a point for future research.

Although we had no experience with object-oriented programming, the structureas illustrated in Figure 1 appeared useful. We feel that an object-oriented approachis quite beneficial. Experience has shown that, after a certain learning period, thedevelopment of a program certainly takes no longer than is the case with traditionalprogramming languages. The different exercises in the set Exercise are much alike,and within an object-oriented programming environment the 'inheritance' facility(Cox, 1986) can be applied quite profitably. In inheritance, code is easily reusablewithout sacrificing the independence of the objects.

In the near future we plan to implement the object Teacher completely. Fur­thermore we intend to extend Schedule by means of more advanced adaptivity. Atpresent there is no distinction between the teacher and experimenter. During theexperimental phase it makes sense to distinguish between the facilities offered to theteacher and the experimenter. For that purpose, we plan to move some facilitiesfrom Schedule to Student, and to restrict the access of Teacher to History.

Where can MIR be situated in the existing computer based training systems?Computer Based Training (CBT) evolved from Computer-Assisted Instruction/Learning (CAI/CAL, Suppes, 1979) to Intelligent Tutoring Systems (Sleeman &Brown, 1982a). In general, these systems were concerned with more complex ed­ucational material than MIR is. The central problem with the CAI/CAL systemswas that they did not incorporate an explicit student model. Accordingly, individ­ualization and extensive feedback were available only in a rudimentary form. Therecent ITSs are distinguished from the earlier CAl systems by the extended repre­sentation of student and lesson knowledge, and the possibility of presenting lessonsin an adaptive way. We hope to extend MIR towards ITS by improving the studentmodel and related adaptive strategies.

90

Page 89: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Acknowledgements

We wish to thank D. Bouwhuis, F. Engel and G. Spaai for helpful comments on anearlier version of this paper.

References

Caesar, F.B. (1980) Veilig Leren Lezen. Tilburg: Zwijsen.

Clark, R.C. (1983) Reconsidering Research on Learning from Media. Review of EducationalResearch, 59, vol. 4, 445-459.

Cox, B.J.(1986) Object Oriented Programming: An Evolutionary Approach. London: Ad­dison-Wesley.

Date, C.J. (1986) An introduction to Database Systems. London: Addison-Wesley.

Ellermann, H.H., Buys, J.J.M. van den & Dongen, A.W.A. van (1987) Documentatie vanprogramma's van het LeesBord project. IPO Report 569.

Ellermann, H.H. & Spaai, G.W.G. (1988) Lernsoftware fiir Leseanfanger. In: J. Stoffers(Ed.): Neue Techniken zum Erwerb der Schriftsprache - Lesen und Schreiben mitHilfe computerunterstiitzter Medien. Aachen: RWTH.

Feenstra, H. & Seegers, G. (1985) Een componenttheorie van leesvaardigheden: een pogingtot validering. Tijdschrift voor Onderwijsresearch, 10, 97-106.

Groen, G.J. & Atkinson, R.C. (1966) Models for optimizing the learning process. Psycho­logical Bulletin, 66, 309-320.

Lesgold, A.M., & Perfetti, C.A. (1981) Interactive processes in reading. Hillsdale, New Jer­sey: Lawrence Erlbaum.

O'Shea, T. (1984) Tools for creating intelligent computer tutors. In: A. Elithorn and R.Banerjii (Eds): Artificial and Human Intelligence. Amsterdam: Elsevier.

Pol, C. van del' (1988) De ontwikkeling van een LeesTutor in een Object georienteerde omge­ving. Master thesis Hogeschool Eindhoven.

Pol, C. van del' (1989) Beschrijving van het programma MIR versie 2x. IPO Report (inpreparation) .

Reitsma, P. (1987) Een sprekende computer als oefenmiddel bij leesmoeilijkheden. In: A.van del' Leij and J. Hamel's (Eds): Dyslexie 1987. Lisse: Swets en Zeitlingel'.

Schmucker, K.J. (1986) Object-Oriented Programming for the Macintosh. London: HaydenBook Company.

Sleeman, D. & Brown, J.S. (1982a) Introduct.ion: Intelligent Tutoring Systems. In: D. Slee­man & J.S. Brown (Eds): Intelligent Tutoring Systems. New York: Academic Press.

Sleeman, D. & Brown, J.S. (1982b) Intelligent Tutoring Systems. New York: AcademicPress.

Spaai, G. W.G., Reitsma, P. & Ellennann, H.H. (1988) Effects of segmented and whole-wordsound feedback on learning to read single words. (submitted for publication to Journalof Educational Psychology).

Suppes, P. (1979) Current trends in computer-assisted instruction. Advances in computers,18, 173-229.

Verwegell, A. (1988) Dokumentatie van het programma Monitor. IPO Report 657.

Wenger, E. (1987) Artificial Intelligence and Tutoring Sytems. Los Altos, California: Mor­gan Kaufmann.

91

Page 90: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Neighbourhood frequency effects in visualword recognition and naming

I.J. Grainger

Abstract

Two experiments are reported that examine the influence of a given word'sorthographic neighbours (orthographically similar words) on the recognitionand pronunciation of that word. In Experiment 1 (lexical decision) neighbour­hood frequency as opposed to stimulus-word frequency was shown to have astrong influence on recognition latencies and errors. Words with at least onehigher-frequency neighbour took longer to recognize and resulted in more er­rors than words with no higher-frequency neighbours. Increasing the number ofhigher-frequency neighbours, however, was shown not to increase interferencefurther. Interference was also shown to be independent of the position of letterchange between the stimulus word and its higher-frequency neighbour. In Ex­periment 2 (word naming), on the other hand, neighbourhood frequency hadlittle effect on pronunciation latencies to words but these latencies did correlatewith total number of orthographic neighbours (independently of their frequencyrelative to the stimulus word). The results are discussed in terms of processesof activation and competition operating in visual word recognition.

Introduction

Word recognition involves the selection of the correct lexical representation from asubset of candidates defined by available sensory and possibly contextual informa­tion. According to this position, a given stimulus word contacts multiple represen­tations in the mental lexicon, only one of which is eventually selected for consciousidentification. Although the majority of current models of visual word recognitionaccept such an initial multiple access (or activation) they propose widely differentmechanisms for selecting the correct representation from the candidate set. Thepresent paper provides data suggesting strong constraints in the way such selectionmechanisms may be formulated.

If one accepts that visual word recognition is at least partly subtended by recog­nition of the component letters, then it can be assumed that the amount of acti­vation received by a given lexical representation will be a function of the numberof letters shared with the stimulus (orthographic similarity). The words that areorthographically the closest to a given word are referred to as that word's ortho­graphic neighbours. Obviously, the composition of this set of neighbours will de­pend on the specific definition of orthographic similarity that is adopted. Coltheart,Davelaar, Jonasson and Besner (1977) have provided a preliminary definition of aword's orthographic neighbours as all other words of the same length that can begenerated by changing just one letter to another, preserving the letter positions.Thus, for example, the English word FORK has neighbours such as: work, cork,

IPO annual progress report 23 1988

92

Page 91: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

folk and ford. Coltheart et al. (1977) varied the number of neighbours of wordand nonword stimuli in a lexical decision task. They observed a significant effectof neighbourhood size on latencies to nonwords, in that nonwords with more neigh­bours produced longer reaction times. However, no effect of neighbourhood size wasobserved on lexical decision latencies to words. There is nevertheless some evidencefrom other research of an effect of neighbourhood size on word-naming latencies,words with larger neighbourhoods being named more rapidly (Gunther & Greese,1985; Scheerer, 1987). However, this result may simply be due to the fact, that wordswith more orthographic neighbours have more frequent spelling-to-sound correspon­dences, the frequency of such correspondences assumably influencing naming speed(Brown, 1987). The same type of criticism can be levelled at the observed absence ofneighbourhood-size effects in lexical decision latencies to words. As neighbourhoodsize is typically confounded with bigram frequency (words with more neighbours aregenerally composed of more frequent letter combinations) it may be that increasingorthographic-neighbourhood size does retard recognition latencies, but that the ef­fect is cancelled by a facilitatory effect of higher bigram frequency (Massaro, Taylor,Venezky, Jastrzembski, & Lucas, 1980; Venezky & Massaro, 1987).

It is also possible that it is not the neighbourhood size per se, but the relativefrequencies of the neighbours compared to the stimulus word that is the importantfactor here. This point makes obvious sense when one considers that the two ma­jor classes (serial search and interactive activation) of the word-recognition modelpredict that interference will be a function of the frequencies of the competitors.In serial search models (Forster, 1976; Paap, Newsome, McDonald & Schvaneveldt,1982) candidate selection typically operates via a frequency-ordered search, the mostfrequent candidates being checked first. Interference will be a function of the num­ber of candidates checked before the actual stimulus word, and will therefore be afunction of the frequencies of the stimulus word's neighbours compared to its ownfrequency. In the interactive-activation model (McClelland & Rumelhart, 1981) thelexical representation corresponding to the stimulus word 'emerges' from among itscompetitors by a process of mutual inhibition. The inhibitory output of a givennode is a function of its activation level and thus a function of its printed fre­quency (translated into resting level activation) and similarity with the stimulus.Higher-frequency neighbours will therefore provide a stronger inhibitory effect onthe stimulus word than lower-frequency ones.

A recent series of experiments by Grainger, O'Regan, Jacobs and Segui (1989)provides clear evidence that neighbourhood frequency rather than neighbourhoodsize is the critical factor in determining word-recognition latencies. In these experi­ments it was shown that lexical-decision latencies and eye-gaze durations on singlewords were not influenced by the number of orthographic neighbours of the stimulibut were adversely affected by the presence of at least one neighbour of higher fre­quency than the stimulus word itself. The results also demonstrated that increasingthe number of higher-frequency neighbours did not lengthen recognition latenciesfurther. No firm conclusions could be made from the latter result, however, as thewords in these two categories were not matched for bigram frequency (the wordswith several higher-frequency neighbours had higher bigram frequencies than thewords with only one higher-frequency neighbour). Nevertheless, the observed neigh­bourhood frequency effect is an important demonstration of multiple activation andcompetition in visual word recognition.

93

Page 92: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

The absence or not of a cumulative effect of number of higher-frequency neigh­bours is an important area for further investigation. Serial search models (Forster,1976, Paap et al., 1982) predict a cumulative effect of number of higher-frequencyneighbours; the more such neighbours the stimulus word has, the greater the num­ber of verifications to be computed. The interactive-activation model (McClelland& Rumelhart, 1981) on the other hand, predicts a neighbourhood-frequency effectwith no additional interference caused by increasing the number of higher-frequencyneighbours. These predictions are derived from simulations run using the version ofthe interactive-activation model provided in McClelland and Rumelhart (1988). Forthe purposes of these simulations and the others to be presented here, a thresholdfor activation levels was implemented in the model (B = 0.70) in order to provide ameasure of word-recognition latencies.

Another important question relative to the neighbourhood frequency effect ob­served by Grainger et al. (1989)' concerns the position of letter change betweenthe stimulus word and its higher-frequency neighbour, a factor which was not con­trolled in those experiments. If letters are scanned from left-to-right (Forster, 1976)during word recognition, then little, if any, interference should be observed whenthe higher-frequency neighbour differs from the stimulus word by its initial letter.If, on the other hand, letter information is extracted in parallel within a single fix­ation (Morton, 1970; Bouwhuis & Bouma, 1979; McClelland & Rumelhart, 1981)then neighbourhood-frequency effects should occur independently of the position ofletter change.

Exp'eriment 1: Lexical decision

The results of Grainger et al. (1989) leave open two major questions concerningorthographic neighbourhood effects in visual word recognition. 1) Does increasingthe number of higher-frequency neighbours increase the total interference? 2) Doesthe position of letter change between the stimulus word and its higher-frequencyneighbour have any effect on interference?

Experiment 1 was designed to address these two questions using the lexical deci­sion task with which neighbourhood-frequency effects had previously been observed.

Method

Design and stimuliStimulus selection was performed using Coltheart et al.'s (1977) definition of anorthographic neighbour. A table of all Dutch 4-letter words and their printed fre­quencies was created from the Dutch language lexical data base (CELEX). Theprinted frequencies were calculated from a 44-million word corpus of Dutch words(INL corpus). The positional bigram frequencies of all Dutch 4-letter words werecalculated using a token count averaged over the three bigrams (initial, medial, andterminal). Using these bigram frequencies and the printed frequency counts, stimuliwere selected for 3 distinct categories: 1) words with no higher-frequency neighbours(e.g. BIJL); 2) words with only one higher-frequency neighbour (e.g. BOUW); 3)words with more than one higher-frequency neighbour (e.g. BOEG). These threecategories represent the three modalities of the main experimental factor which weshall refer to as 'Neighbourhood Frequency'. The stimuli were matched for printed

94

Page 93: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

frequency and bigram frequency across these three categories. Stimulus- Word Fre­quency was crossed with Neighbourhood Frequency creating 6 categories of stimuli,3 containing low-frequency words with a mean frequency of 13 occurrences per mil­lion and the remaining 3 with medium-frequency words with a mean frequency of72 occurrences per million. The average bigram frequency for the low and medium­frequency words was 1774 and 8015 occurrences per million, respectively. A totalof 12 Dutch 4-letter words were selected for each of these 6 categories. The posi­tion of letter change between the stimulus word and its higher-frequency neighbourwas also controlled in this experiment. This was only varied for the words with onehigher-frequency neighbour (category 2) - in half of these words the letter change wasword-initial, and in the other half the change occurred word-terminally. 72 ortho­graphically legal and pronounceable pseudowords were constructed for the purposesof the lexical decision task. These were all four letters long.

Procedure

Stimuli were presented using a Philips P3202 personal computer linked to a PhilipsP2728-200 monitor. Hardware adjustments were made to measure reaction times tothe nearest millisecond. Each trial consisted of a fixation point appearing on thecentre of the screen for 500 ms, which was replaced by the stimulus (either a wordor a pseudoword) presented in lower-case letters and centred on the fixation point.The stimuli remained on the screen until subjects responded 'word' or 'nonword' bypressing one of two response buttons. Subjects were instructed to respond as rapidlyand as accurately as possible as to whether the letter string was or was not a Dutchword. They responded 'yes' by pressing one response button with the forefinger oftheir preferred hand and 'no' by pressing the other response button with the fore­finger of the other hand. Stimulus presentation was randomized with a differentorder for each subject. Subjects were given a set of 40 practice trials containing20 words and 20 pseudowords before doing the experiment proper. The experimentwas completed in a single session lasting approximately 15 minutes.

Subjects

Twenty-five staff and students of the Institute for Perception Research, Eindhoven,volunteered for this experiment. All were native speakers of Dutch with normal orcorrected-to-normal vision.

Results

Means of the lexical decision latencies and percent errors for each stimulus cate­gory are given in Table 1. The reaction time data was submitted to an analysisof variance with both subjects (F1) and items (F2) as random variables. Therewas a significant main effect of Neighbourhood Frequency (F1(2,48)=8.36, p<0.01;F2(2,66)=3.83, p<0.05). The main effect of Stimulus-Word Frequency, however, wasonly significant by subject (F1(1,24)=5.62, p<0.05; F2 < 1). The interaction betweenthese two factors was not significant (F 1 < 1; F2 < 1).

Planned comparisons were performed on the different combinations of the threemodalities of Neighbourhood Frequency independently of Stimulus-Word Frequency.There was a significant effect of the presence of one higher frequency neighbour(F1(1.24)=6.83, p<0.025; F2(1,44)=4.30, p<0.05). On the other hand, the slighttrend toward a cumulative effect of the number of higher-frequency orthographic

95

Page 94: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Table 1: Mean lexical decision latencies (ms) and percent errors inparentheses to the different stimulus categories in Experiment 1.

Stimulus- Word Frequencylow medium average

No higher-frequency 622 610 616neighbours (1.7) (3.4) (2.6)

Only one higher- 649 630 640frequency neighbour (4.3) (6.0) (5.2)

More than one higher- 651 645 648frequency neighbour (5.3) (2.7) (4.0)

neighbours was not significant (Fl(1,24)=1.83; F2 <1), but the latencies to category­3 words were significantly longer than those to category-l words (Fl(1,24)=13.88,p<0.005; F2(1,44)=8.64, p<O.OI). In order to test for the influence of letter changeposition on the observed neighbourhood-frequency effects, the interaction betweenNeighbourhood Frequency and this letter-change factor was examined for categories1 and 2. This interaction proved to be insignificant (Fl < 1; F2 < 1), category-2words having latencies 26 and 22 ms longer than category-l words for word-initialand word-terminal change, respectively.

An analysis of the error data indicated trends similar to the reaction time re­sults. There was a significant effect of Neighbourhood Frequency in the analysis bysubject (Fl(2,48)=3.54, p<0.05) which was not significant by item (F2(2,66)=2.43).There was no significant effect of Stimulus-Word Frequency (Fl < 1; F2 < 1) andthe interaction between these factors failed to reach the standard level of statis­tical significance (Fl (2,48) = 3.17; F2(2,66)=2.48). Planned comparisons demon­strated once again a significant effect of the presence of one higher-frequency or­thographic neighbour (Fl(1,24)=8.85, p<O.Ol; F2(1,44)=5.42, p<0.025) and an in­significant effect of increasing the number of higher-frequency neighbours (Fl(1,24) =

1.04; F2(1,44)=1.22).

Discussion

The results of Experiment 1 clearly confirm and extend the neighbourhood-frequencyeffect observed by Grainger et al. (1989). Longer lexical decision latencies and moreerrors were observed in the case of words with at least one neighbour of higherfrequency than themselves. By carefully controlling bigram frequency between thedifferent categories, one can also now conclude that there is no apparent cumulativeinterference effect of increasing the number of higher-frequency neighbours. Thelatencies and errors to words with more than one higher-frequency neighbour didnot differ significantly from the latencies and errors obtained to words with onlyone higher-frequency neighbour. This important result contradicts the predictionsof serial search models of word recognition and provides support for the interactive­activation model (McClelland & Rumelhart, 1981, 1988).

In order to provide a more thorough test of the interactive-activation model,simulations were run on all the test words used in Experiment 1. A vocabulary of

96

Page 95: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

4-letter Dutch words was installed and their resting level activations calculated fromlog frequency such that the range and distribution of these values were comparableto the English vocabulary. Predicted reaction times (ms) were obtained from themodel's output (number of cycles) using a linear function derived from runningthe model on a set of 4-letter English words whose lexical decision latencies wereknown (taken from Coltheart et al., 1977). Predicted and observed values for eachof the three Neighbourhood Frequency conditions in Experiment 1 are provided inFigure 1.

700

m predicted

~~ obtained

IIIECJ)

E 650.....c:0

"0CI1

~

600

2 3categories

Figure 1: Predicted and obtained lexical decision latenciesto the three categories of word stimuli defined by neighbour­hood frequency. Simulations were run on a version of theinteractive-activation model implemented with a vocabularyof Dutch 4-letter words.

Absolute values have been adjusted, so that the important aspect of Figure 1 isthe strikingly similar pattern of differences between each condition for the predictedand observed data. The adjustment of absolute values was necessary in that the func­tion relating reaction time to the number of processing cycles was calculated fromdata obtained in completely different laboratory settings. The interactive-activationmodel apparently provides an extremely accurate means of reflecting neighbourhood­frequency effects in visual word recognition.

The observed absence of an interaction between the effects of neighbourhoodfrequency and position of letter change confirms earlier evidence that letter infor­mation is extracted in parallel during the visual processing of short words (withina single eye fixation). The results therefore contradict the predictions of Forster's(1976) letter-parsing model and support an activation-based model where all com­ponent letter representations receive sensory input simultaneously (McClelland &Rumelhart, 1981).

One area of research closely related to the question of left-to-right versus parallelprocessing in word recognition concerns the generation of the phonological code ofa written word for its pronunciation. If there is no left-to-right letter parsing thenit would seem unlikely for a word's phonology to be generated on the basis of asequential grapheme-phoneme conversion. This leaves two possibilities open: eitherthe phonological code is read out directly from the word's lexical representation

97

Page 96: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

(addressed phonology) or its phonology is generated from sublexical units activatedin parallel (an alternative form of assembled phonology). If the first hypothesis iscorrect, one should be able to observe neighbourhood-frequency effects in naminglatencies to words.

Experiment 2: Word naming

Method

Design and St£muli

These remained the same as in Experiment 1, except that no pseudowords wererequired here and therefore only the word stimuli were presented. It should benoted that the words had been selected so that stimuli were matched for initialphoneme between categories. All the words had regular pronunciations in Dutch.

Procedure

Subjects were required to read out loud, as rapidly as possible, words presentedindividually on a computer screen. Each word was preceded by a fixation pointthat remained on the screen for 500 ms. The words were presented in lower-caseletters and centred on the fixation point. They remained on the screen until subjectsresponded. Naming latencies were recorded using a voice key set up with a PhilipsP3202 microcomputer and a AKG Q15 headset microphone. Timing was accurateto the nearest millisecond.

Subjects

Twenty-five staff and students of the Institute for Perception Research, Eindhovenvolunteered as subjects for this experiment. None of them had taken part in theprevious experiment.

Results

Means of the naming latencies in each experimental condition are given in Table 2.These latencies were submitted to an analysis of variance with subjects (F1) anditems (F2) as random variables.

Table 2: Mean naming latencies (ms) to the different stimulus cat­egories in Experiment 2.

Stimulus- Word Frequencylow medium average

No higher-frequencyneighbours 523

Only one higher-frequency neighbour 518

More than one higher-frequency neighbour 510

98

500

506

501

512

512

506

Page 97: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

There is a significant effect of Stimulus-Word Frequency (Fl(1,24)=46.92, p<0.001; F2(1,60)=6.94, p<0.025), more frequent words being named more rapidlythan less frequent words. There is however, no significant effect of NeighbourhoodFrequency (Fl(2,48) = 2.39; F2 < 1) and this factor does not interact significantlywith Stimulus-Word Frequency (Fl(2,48) = 2.43; F2 < 1). There was a tendency,however, for the words with several higher-frequency neighbours (category 3) tobe pronounced more rapidly than the words with no higher-frequency neighbours(category 1) (Fl(1,24)=4.58, p<0.05; F2(1,40)=O.81)' In the analysis by s!1bjects thisdifference interacts with Stimulus-Word Frequency (Fl(1,24)=4.62, p<0.05), the ef­fect only being significant for low-frequency words (Fl(1,24)=7.47, p<0.025). Thusthe pronunciation of low-frequency words tends to be facilitated if these words haveseveral high-frequency neighbours. Planned comparisons, taking into account theposition of letter change between the stimulus word and its higher-frequency neigh­bour, showed that this factor did not significantly affect the observed absence of adifference in latencies between categories 1 and 2 (all F's < 1).

Correlational analyses were also computed between the item means and therelevant stimulus and neighbourhood-frequency values for each word. There is asignificant negative correlation between the mean naming latencies for each wordand the number of orthographic neighbours (r(58)= -0.30, p<0.02) thus confirmingthe results of Gunther and Greese (1985) and Scheerer (1987). As expected afterthe analysis of variance, naming latencies also correlated significantly with Stimulus­Word Frequency (r(58)= -0.26, p<0.05). A significant correlation was observedbetween the naming latencies and bigram frequency (r(58)= -0.24, p<O.OI). Naminglatencies did not correlate, however, with initial bigram frequency (r(58)= -0.03).

Discussion

The observed absence of any neighbourhood-frequency effects in the naming tasksuggests that the naming response is not entirely the result of addressed phonology.If phonology was always read out from a lexical representation, then one would haveexpected to observe neighbourhood-frequency effects comparable to those observedwith lexical-decision latencies and eye-gaze durations. This suggests that a word canbe named using phonological information which is not directly read out from thatword's lexical representation. On the other hand, the fact that stimulus-word fre­quency effects were obtained with the pronunciation latencies, suggests some lexicalinvolvement in word naming. These apparently contradictory results can be accom­modated by analogy models of word naming (Glushko, 1979; Taraban & McClelland,1987). According to this type of model, on presentation of the stimulus 'dish' thesimultaneous activation of the representations for dish, wish and fish support thevowel and consonant cluster ending 'ish' which is then synthesized with the initialphoneme 'd' supported by the activation of the representations for dish, disk anddash. This therefore explains the observed correlation between naming latencies andneighbourhood size; the more neighbours there are supporting a pronunciation theeasier it will be to synthesize. It also explains the tendency for low-frequency wordswith several higher-frequency neighbours to be named more rapidly than words ofcomparable frequency with no higher-frequency neighbours. The lower the stimulusword's frequency the greater the possible influence of its higher-frequency neigh­bours, in that the stimulus word itself will provide relatively less support for its

99

Page 98: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

phonological features compared to the support provided by its neighbours.This explanation of the results of Experiment 2 agrees with the proposed account

of the results of Experiment 1. It was shown that the data of Experiment 1 do notlend support to serial search models (Forster, 1976, Paap et al., 1982) of visualword recognition. These models predict a cumulative effect of the number of higher­frequency neighbours. The data clearly indicate, however, that the necessary andsufficient condition for interference to occur is that the stimulus word has one higher­frequency neighbour. Increasing the number of higher-frequency neighbours does notproduce additional interference. Such effects were predicted from simulations runon the interactive-activation model of word recognition (McClelland & Rumelhart,1981, 1988). The process of mutual inhibition between activated word-level nodesprovides an apparently accurate account of the processes whereby word candidatescompete for identification. The reason why inhibition is not increased when wordshave many higher-frequency neighbours is that, since these neighbours mutuallyinhibit each other, their activation levels are lower at a given time than the activationlevel of an isolated higher-frequency neighbour. Thus the total inhibition directedtoward the stimulus word, being a function not only of the number of activatedcandidates but also of their activation levels, tends to be comparable in both cases.

Another important aspect of the results of the two experiments reported hereconcerns the observed absence of any letter-positional bias in neighbourhood in­terference. Neighbours that differed from the stimulus word by their initial letterproduced just as much interference as those that differed by their final letter. Thisis therefore firm evidence against a left-to-right letter parsing process in word recog­nition (Forster, 1976). Further evidence against such a position was obtained inExperiment 2, where no correlation between naming latencies and initial bigramfrequency was found, although a significant correlation was observed with meanpositional bigram frequency.

To conclude, the data presented here provide support for parallel-activation mod­els of visual word recognition where information from several letter positions is ex­tracted simultaneously and word units activated by this information compete foridentification. One way of describing these competition processes that successfullyaccounts for the neighbourhood-frequency effects observed here, is in terms of mu­tually inhibitory links between word representations.

Acknowledgements

None of this work would have been possible without the help of Ad Buijsen, whoset up the latency measurement procedures for response buttons and voice key andalso assisted with stimulus selection. I should also like to thank Don Bouwhuis forarranging my visit to IPO and for his comments on an earlier version of this paper.Many thanks to Hans Kerkman (Celex) who helped me discover the wonders ofprogramming in SQL and installed the table of Dutch 4-letter words. Finally, thesimulation work was run on a 'Dutch' version of the interactive-activation modelimplemented by Dominique Beroule.

100

Page 99: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

References

Bouwhuis, D.G. & Bouma, H. (1979) Visual word recognition of three-letter words as de­rived from the recognition of the constituent letters. Perception and Psychophysics,25, 12-22.

Brown, G.D.A. (1987) Resolving inconsistency: A computational model of word naming.Journal of Memory and Language, 26, 1~23.

Coltheart, M., Davelaar, E., Jonasson,J.T. & Besner, D. (1977) Access to the internal lex­icon. In: S. Dornic (Ed.): Attention and Performance VI. New York: AcademicPress.

Forster, KI. (1976) Accessing the mental lexicon. In: R.J. Wales and E.W. Walker (Eds):New approaches to language mechanisms. Amsterdam: North Holland.

Glushko, R.J. (1979) The organization and activation of orthographic knowledge in readingaloud. Journal of Experimental Psychology: Human Perception and Performance, 5,674-691.

Grainger, J., O'Regan, K, Jacobs, A. & Segui, J. (1989) On the role of competing units invisual word recognition: The neighborhood frequency effect. Perception and Psy­chophysics, in press.

Gunther, H. & Greese, B. (1985) Lexical hermits and the pronunciation of visually presentedwords. Forschungsberichte des Instituts fiir Phonetik und sprachliche Kommunikationder Universitiit Miinchen, 21, 25-52.

Massaro, D.W., Taylor, G.A., Venezky, R.L., Jastrzembski, J.E. & Lucas, P.A. (1980) Let­ter and word perception. Amsterdam: North Holland.

McClelland, J.L. & Rumelhart, D.E. (1981) An interactive activation model of context ef­fects in letter perception, Part 1: An account of basic findings. Psychological Review,88, 375-405.

McClelland, J.L. & Rumelhart, D.E. (1988) Explorations in parallel distributed processing:A handbook of models, programs and exercises. Cambridge, MA: MIT press.

Morton, J. (1970) A functional model for memory. In: D.A. Norman (Ed.): Models ofhuman memory. New York: Academic Press.

Paap, KR., Newsome, S.L., McDonald, J.E. & Schvaneveldt, R.W. (1982) An activation­verification model for letter and word recognition: The word superiority effect. Psy­chological Review, 89, 573-594.

Scheerer, E. (1987) Visual word recognition in German. In: D.A. Allport, D. Mackay, W.Prinz and E. Scheerer (Eds): Language perception and production: Shared mechanismsin listening, speaking, readz'ng and writing. London: Academic Press, 227-244.

Taraban, R. & McClelland, J.L. (1987) Conspiracy effects in word pronunciation. Journalof Memory and Language, 26, 608-631.

Venezky, R.L. & Massaro, D.W. (1987) Orthographic structure and spelling-sound regular­ity in reading English words. In: D.A. Allport, D. Mackay, W. Prinz and E. Scheerer(Eds): Language perception and production: Shared mechanisms in listening, speaking,reading and writing. London: Academic Press, 159-179.

101

Page 100: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

INFORMATIONERGONOMICS

102

Page 101: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Developments

F.L. van Nes

Realization of the importance of applying information-ergonomics principles inthe design and development of both professional and consumer products' is growingin industrial circles. One of the driving forces of such realization is technologicalprogress, especially in electronics and information technology. From a technologicalviewpoint, there are at present but few limitations as to what functionality can bebuilt into a consumer product. The real limits to a product's usefulness now oftenlie in its usability by those for whom the products are made, but whose capacity tounderstand and utilize them tends to be overtaxed. Part of the product functionalitymay then remain hidden. Knowledge about the way people think of and deal withsuch products is therefore increasingly desirable and in fact demanded.

Office systems and speech interfaces

Cooperation with Philips' Telecommunication and Data Systems and the AppliedErgonomics Group of Philips' Corporate Industrial Design (CID) on future applica­tions of information technology in offices was intensified. Structured user observa­tions on various types of office systems, as well as detailed expert interviews servedas input for a number of design and development workshops.

Research on electronic annotation systems, now controlled by voice commands,and on spoken versus written instructions continued this year. A new researchproject focussed on the use of voice commands in speech-to-text conversion systems.This speech-interface research is being carried on within the framework of ESPRITproject no. 385: Human Factors in Information Technology.

Car electronics

Cooperation with Philips' Consumer Electronics (CE) as well as CID in the areasof car radio and car information systems has continued. An example of the 'hiddenfunctionality' mentioned above resulted from an interview with owners of a currentcar radio: three out of ten persons interviewed did not know that their radio hadthe so-called autostore function - which, after just one key press, can automaticallysearch for and store the four or five strongest radio transmitters in the area throughwhich the vehicle passes.

Audio/video equipment

Joint projects with CE and CID have led to several proposals for interactive controlof combinations of audio/video equipment. Another example of 'hidden functional­ity' is from this area: through lack of knowledge, the majority of users of a teletextsystem with a memory for favourite page numbers does not use this memory at all.An alternative user interface tested experimentally showed improved usability.

[PO annual progress report 291988

103

Page 102: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Multimedia workstations for the office

F.L. van Nes

AbstractHuman factors research was carried out on the application of speech in three

areas of man-computer communication: instruction, voice commands for sys­tern control and annotation of documents. As to instruction, learning was foundto proceed equally fast with speech and text; a number of subjects preferredspeech to text. Secondly, in speech-to-text conversion, subjects preferred voiceto manual commands for layout and typographic control, although text inputwas slower with voice than with manual commands. Thirdly, voice annotationsare more readily made than script annotations, but processing times may belonger for voice than for script annotations. In conclusion, speech is a valu­able medium for human-computer interaction, provided the applications arecarefully chosen and a proper user interface is made.

Introduction

In contrast with human dialogues, human-computer interaction is still predomi­nantly monomedial: generally a keyboard, i.e. a manual medium, is used for com­puter input and the resulting system output is nearly always presented on screen, i.e.a visual medium (Edwards, 1988). In view of this impoverished communication stillobtaining after several decades of computer use, it is understandable that, spurred bytechnological progress, numerous efforts are now being made to investigate, developand design multimedia interfaces.

Speech is the first candidate as an alternative or a second medium, both at theinput and output side of a computer, because speech is the easiest, fastest and mostnatural mode of communication between human beings. We therefore welcomed theopportunity provided by the European Community's ESPRIT l program to intensifyour research on the human factors of speech interfaces, as part of the ESPRIT­HUFIT (Human Factors in Information Technology) 'Office Automation' project.Office tasks are an interesting domain for the application of speech interfaces, in thefirst place because of their socio-economic importance. The following are the resultsfrom this research on the use of speech in three areas:(1) provision of information on system control to the user, both before and duringtask execution: spoken instructions and help messages; (2) application of voicecommands for system control purposes; (3) addition of content information such ascomments or criticisms to other such information visually presented.

Content information is defined here as consisting of the variable messages cho­sen by the user in the application concerned. In contrast, control information isdefined as the invariable control messages that are a prerequisite to enable data tobe interchanged between user and computer system (Van Nes, 1987).

To assess its relative value properly, speech was contrasted with an alternativemedium in all three areas investigated.

IPO annual progress report 23 1988lEuropean Strategic Program for Research and Development in Information Technology

104

Page 103: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Spoken Instructions

Definition) research motive and experiments'Spoken instructions' are understood as all the directions for use which should bestudied and learned before and, where necessary, consulted during use.

Traditionally, such instructional information has always been in printed form,i.e. as text on paper or on an electronic display. But people generally dislike readingbulky printed manuals, whereas the available space for instructional text Qn a displaymay be limited, especially during tasks such as word processing. An investigationof speech as an alternative medium for instructions is held to be justified for theseand a number of other reasons (Nakatani et al., 1986).

Three experiments on spoken versus written instructions were carried out in thisstudy; two of them have been published so far (Potosnak & Van Nes, 1984; Van Nes,1987). 'Written' here and in the rest of this paper means displayed on a CRT.

All three experiments showed in the first place that a variety of tasks, word pro­cessing, electronic mail handling and annotating an electronic text, could, in fact,be learned with spoken as well as written instructions. Learning was determined bymeasuring the knowledge gained about operating the system or by directly moni­toring performance with it.

PerformanceIn the word-processing and electronic mail-handling experiments, subjects had towork first with one, then with the other instruction medium. There was a consid­erable learning effect in both experiments. This effect, measured in task-executiontime and requested number of help messages, was greater when spoken instructions

en(])-::JC

Ec(])

EI-

50

40

30

20

10

o

16en'-

~12enc~

_8t>(])'-'- 4o()

oI A Tstudy phases

Figure 1: Average initial timesneeded by subjects to study an in­struction set before answering ques­tions about it by heart: Ij aver­age additional study times neededto answer the remaining questions:Aj and total study times: T=I+A.Black bars refer to written andhatched bars to spoken instructions.

105

I Tstudy phases

Figure 2: Average number of ques­tions about an instruction set cor­rectly answered by heart after aninitial study period: Ij and aver­age total number of questions cor­rectly answered: T. Both after writ­ten (black bars), or spoken (hatchedbars) instructions.

Page 104: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

were received first, which possibly implies that speech is a better medium for theinitial learning of tasks of this type. The word-processing task, in particular, showeda larger decrement from the first to the second medium experienced by the subjects,both in execution time and number of help messages,when spoken instructions weregiven first. However, both of these performance measures were themselves alsogreater when spoken instructions were given first, so there was in any case morescope for improvement.

The annotation system instructions were on the average learned equally fastby subjects who received them in either spoken or written form. This involved acomplete set of instructions, with corresponding questions afterwards, pertainingto all aspects of dealing with the annotation system. Learning the instructionsis defined here as ability to answer the questions correctly. Figure 1 shows thatwhen about 20 minutes was initially spent in studying the hierarchically structuredinstruction sets, a certain fraction of the questions could be answered by heart.After that, a somewhat longer period of study of the instructions was needed toanswer the remaining questions. As regards the duration of both the initial andsubsequent learning periods, there were no significant differences between read andheard instructions.

Not all answers given by heart were correct. Figure 2 shows firstly, that onthe average, subjects answered about 8 questions correctly by heart after spokeninstructions compared with about 5 questions after written ones. This differencewas not significant, however. Secondly, Figure 2 shows that the total number ofquestions that were answered correctly was larger for the subjects who had receivedspoken instructions than for those who had received written ones. In this case thedifference was significant (t-test: a < .05).

PreferenceThe word-processing and electronic mail-handling experiments allowed subjects tocompare spoken with written instructions. For the word-processor tasks, five sub­jects preferred speech, three preferred text. The reasons given for their respec­tive preferences demonstrate that different, inherent aspects of both media, such asvolatility versus permanence, are assigned a different importance by the subjects,who therefore express different preferences (Van Nes, 1987). Take for example aword-processing command divided into four steps, i.e. four consecutive key presses.With spoken instructions, subjects could look at the keyboard and press requiredkeys in sequence while listening, without having to look at their screen in the in­terim, possibly several times, to read the instruction. An interview revealed (Poto­snak & Van Nes, 1984) that the subjects of the electronic mail-handling task slightlyfavoured written instructions. However, those subjects who used the mail systemwith spoken instructions before they did so with written ones rated it as more in­teresting, more useful and more fun than subjects who used the mail system withthe instruction media in reverse order.

ConclusionThe operation of relatively complicated systems, such as those for electronic textannotation, can be learned from spoken instructions. Learning may proceed just asfast with speech as with text and may possibly be more thorough. Whether speechor text is preferred for learning a task depends on the task as well as on the relativeimportance that users attach to properties of the instruction media.

106

Page 105: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Voice commands for system control

Definition, research motive and experimentVoice commands are an alternative to manual commands. The dominant computerinput medium is still the keyboard; but voice input may reduce training requirementsand increase input speed compared to typing (Bailey, 1982). Also, typing errorsremain a problem for a number of keyboard users (Ogozalek & Van Praag, 1986)'whereas it has been shown that subjects may prefer voice to keypress ~ommands,even with a considerable percentage of misrecognitions (Van Nes & Van der Heijden,1978). With recent progress in speech-recognition technology, voice input now seemsfeasible, certainly in the case of the limited vocabularies that are commonly employedfor control purposes. However, knowledge on the human factors of this input mediumis still limited, so that experiments were carried out with voice commands, usingreal and simulated speech recognition.

Only one experiment, with simulated speech recognition, with a large vocabularyof the kind that may be encountered in a speech-to-text conversion system will bereported here. Simulated recognition has been used before, for instance to determinewhether a voice-activated typewriter would be useful in composing letters (Gould,Conti & Hovanyecz, 1983). In principle, such conversion systems show an intertwinedcontent- and control- speech input, e.g. for the correct spelling of homonyms; todistinguish punctuation marks from text words with the same spelling; or for shiftingfrom lower to upper case. However, the control input may also be exerted by manualmeans in such a system, leaving the content input to voice. We investigated bothselection of command buttons on a screen with a mouse-actuated cursor and voicecommands for control input, in an experiment where the content input, that ismessages to be converted to text, were always spoken.

Professional secretaries served as subjects in two experiment sessions, one withvoice commands and the other with manual commands. They had to read a preparedtext word-by-word from paper, together with a few simple layout commands, e.g.'centre'; 'new line'. The subjects used commands such as 'text word' if they wantedto escape default interpretation of words such as 'period'. The experimenter, whosat in another room as the subject, pressed a single key for every correct input, thusdisplaying the already formed words, punctuation marks, etc. to the subject. 'Cogni­tive errors' made by the subjects, such as omitting a command like 'text word', wererecorded. The simulated speech recognition employed included simulated recogni­tion errors that had to be noted and acted upon by the subjects.

PerformanceThe average time per correct text-unit input was significantly lower for voice controlthan for manual control, see Figure 3. A 'text unit' is defined here as a word or apunctuation mark; a capitalized word (first letter or entire word) is counted as twotext units. However, the rate of errors made by the subjects during their task wassignificantly higher for the voice-commands part, which led to an error-correctiontime of 43% of the total text-input time, than for the manual-commands part, wherethis percentage was 27. Figure 3 shows that this higher error rate led to a longeroverall average entry time per text unit for voice commands. That the entry timeswere high in any case was due to (1) the fact that the entered words were, afterhe perceived them, produced and displayed by the experimenter (2) subjects having

107

Page 106: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

8C/')

"'0c0 6()Q)C/')

c

Q) 4E-- 2::J0-c

M Vcontrol

Figure 3: Total input times spent in aspeech-to-text conversion task, split upinto correct input (open bars) and er­roneous input (hatched bars)' both av­eraged over all subjects and text units,for voice (V) and manual (M) control.

C/') 160"'0c0()Q)

120C/')

cQ)

E- 80OJcC/')C/')Q)

40()0~

0..

oletter word sentence

levelFigure 4: Average processing times forscript (black bars) and voice (hatchedbars) annotations, at letter, word andsentence level.

not completely mastered their task yet in the preceding training session. The last­named fact can also be concluded from the rather high number of errors.

Preference

Notwithstanding the substantial difference in error rate, 10 of the 12 subjects pre­ferred the voice to the manual mode. One of the reasons given for this preferencewas: 'it seems to be faster'. In view of the objective results, this judgment is eitherbased on the time needed for correct text-unit input actions alone, or on an under­estimation of the time spent in error correction for spoken commands.

Conclusion

The speech-to-text conversion system tested had a lower overall text-input time withmanual than with voice commands. However, the reverse is true when only correctinput actions are taken into account.

A considerable majority of subjects preferred the voice commands to manualones. When considering this, two factors have to be taken into account, however;subjects were already using speech for content input in the conversion system in anycase, and they were not faced with (simulated) misrecognition of the same utteranceover and over again, as may happen in real recognition systems. In our system, whena 'misrecognition' occurred, the next text unit entered was 'correctly recognized'.

108

Page 107: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Voice annotations on texts

Definition, research motive and experimentsThe presentation of text on computer-driven displays provides a number of extraoptions in comparison with print on paper. For instance, notes to the text maybe added, by the original writer or others, and stored separately so that they canbe displayed together with the text or apart. Furthermore, these notes may be inwritten or spoken form, as typed or spoken annotations, respectively.

Voice may be a more suitable medium than text for some types of annotation.For example, the author found that voice annotations on scientific manuscripts werean effective means of transferring long and/or subtle comments, even from severalannotators who might disagree. This is partly because a spoken message conveysadditional information than the written one with the same wording, through itstemporal structure and intonation. However, speech messages have their limitations(Bailey, 1982; Van Nes, 1982; Aucella et al., 1987), hence a study on the relativemerits of voice and text annotations seemed desirable, both from the point of viewof the producer or sender of the annotations and from that of their consumer orreceIver.

Three experiments were carried out, one on producing and two on receiving bothtyped and spoken annotations.

Performance

Production. When subjects were given a free annotation task in which they hadto use either text or voice, they made about the same number of text and voiceannotations, but in the voice mode about twice as many words were used for con­veying approximately the same information. Making text annotations took almostthree times as long as voice annotations, the difference being caused by, the differingproduction times of typing and speaking and by specific technical properties of therespective interfaces (Van Nes, 1987).Reception. In a first experiment on receiving and subsequent processing rathercomplex, partly conflicting annotations from four different persons, two male andtwo female, both annotation types had roughly equal processing times. In a secondreception experiment designed with all the findings from the first one in mind, theannotation tasks consisted of typical secretarial correction work. A male personmade all the annotations; 16 female subjects, all professional secretaries, had to pro­cess them. At the level of letter and word corrections, both annotation types wereprocessed equally fast. For corrections involving whole sentences, text annotationsled to significantly shorter overall processing times (defined as the period betweenselection of an annotation and selection of the following annotation), as may be ob­served in Figure 4. This is probably due to the need to replay long voice annotationsin order to be able to process them, i.e. to memory limitations of the subjects.

Preference

Production. Of the 12 subjects who had to produce text as well as voice annota­tions, 8 preferred the voice version for a variety of reasons, for example: 'becauseit is faster' or: 'complete sentences are used more easily'. The 4 remaining subjectspreferred the text version for a variety of other reasons, for example: 'when typingafter reading (the text to be annotated) one stays in the same mental framework',or: 'it is easier to make changes'.

109

Page 108: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Reception. In the first experiment of this type, where complex annotations fromfour persons had to be processed, 3 subjects preferred voice, 6 preferred text and 5had a mixed preference or none at all. A rather striking result was that the subjectswere clearly influenced by the perceived authority of the respective annotators; theydid not really know what to do with annotations that were formulated as well asspoken in a doubtful manner, for instance: 'In my opinion this may be viewed as .. .'.On the other hand, some subjects explicitly objected to being ordered, so to speak,to change the text in a certain way. This was especially true when they did notknow the annotator concerned.

In the second experiment, preference for voice or text tended to depend on thelevel of the annotations to be processed. At letter level, 7 subjects preferred voice,3 preferred script and 8 had no preference. This picture gradually changed at thehigher text levels; at word level, 5 subjects still preferred voice against 6 who pre­ferred script, 7 having no preference. But at sentence level, only 3 subjects (condi­tionally) preferred voice ('provided that the speaking rate is slowed down'), whereas12 preferred script, 3 subjects having no preference. Thus, overall preference tendedto be for text when annotations from somebody else had to be processed.

ConclusionFrom the point of view of the annotator, voice appears to be the more efficientmedium, because longer voice annotations took less time to make than text annota­tions. Two-thirds of the annotators also preferred the voice version. For receivingand processing annotations the picture is more or less reversed. With regard to per­formance, the advantage of voice has disappeared, as total processing times are thesame or, for corrections involving sentences, even longer in the voice mode. This isreflected in the preference scores; taking all results from both reception experimentstogether, roughly three quarters of the subjects preferred text to voice. Making aswell as processing voice annotations in more complex situations, such as writingor refereeing a scientific manuscript need to be investigated systematically, becausesome evidence suggests voice to be especially useful then.

Discussion

In general, the foregoing data are not unfavourable to the application of speech inhuman-computer interaction. An interesting aspect of the voice-preference data isthat people vary in their positive or negative evaluation of inherent properties ofspeech, such as accentuation and volatility, and therefore judge its value differentlywith respect to other input and output media. In view of the fact that speechinterfaces, although available for many years, are only very slowly gaining ground(Aucella et aI., 1987), prudence seems justified when generalizing our research re­sults to practice. Moreover, in practical environments other factors that were notinvestigated hitherto may be important, for instance disturbing others with audiblespeech to or from a machine. However, it seems clear that speech can be a valuablemedium for computer input as well as output in appropriate applications with aproper user interface. If speech is used to complement manual input and visual out­put while making a distinction between different types of information, full advantagemay be taken of the favourable properties of such a speech channel.

110

Page 109: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Acknowledgements

The work reported in this paper was carried out by a team in which all memberscontributed to the results. Many thanks are extended to Leo Beuk, Jan Douma,Joe Hary, Theo de Jong, all the subjects, and especially to Wessel Kraaij, LuisellaKraak, Piet van Lingen, Martijn de Loor, Henk Sprenkels and Elly van Veghel, whocarried out the experiments from which results were quoted.

References

Aucella, A., Kinkead, R., Schmandt, C. & Wichansky, A. (1987) Voice: technology search­ing for communication needs. Proceedings CHI'87 Human Factors in Computing Sys­tems and Graphics Interface (Toronto, Apr£! 5-9, 1987). New York: ACM, 41-44.

Bailey, R.W. (1982) Human Performance Engineering. Englewood Cliffs, New Jersey:Prentice-Hall Inc., 304.

Edwards, A.D.N. (1988) The design of auditory interfaces for visually disabled users. Pro­ceedings CHI'88 Human Factors in Computing Systems (Washington D.C., May 15­19, 1988). New York: ACM, 83-88.

Gould, J.D., Conti, J. & Hovanyecz, T. (1983) Composing letters with a simulated listeningtypewriter. Communications of the ACM, 26(4), 295-308.

Nakatani, L.H., Egan, D.E., Ruedisueli, L.W., Hawley, P.M. & Lewart, D.K. (1986) TNT:a talking tutor 'N' trainer for teaching t,he use of interactive computer systems. Pro­ceedings CHI'86 Human Factors in Computing Systems (Boston, April 13-17, 1986).New York: ACM, 29-34.

Nes, F.L. van & Heijden, J. van del' (1978) The use of computers by ordinary people. IPOAnnual Progress Report, 13, 102-107.

Nes, F.L. van (1982) Perceptive, cognitive and communicative aspects of data processingequipment. Proceedings 1982 International Zurich Seminar on Digitai Communica­tions - Man-Machine interaction (Zurich, March 9-11, 1982), 259-262.

Nes, F.L. van (1987) Human factors engineering of interfaces for speech and text in an officeenvironment. Proceedings 4th Annual ESPRIT Conference (Brussels, September 28­29, 1987), 1452-1457.

Ogozalek, V.Z. & Praag, J.C. van (1986) Comparison of elderly and younger users on key­board and voice-input computer-based composition tasks. Proceedings CHI'86 HumanFactors in Computing Systems (Boston, April 13-17, 1986). New York: ACM, 205­211.

Potosnak, K.M. & Nes, F.L. van (1984) Effects of replacing text with speech output in anelectronic mail application. IPO Annual Progress Report, 19, 123-129.

111

Page 110: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

COMMUNICATION AIDS

112

Page 111: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Developments

H.E.M. Melotte

Speech for the handicapped

PocketstemThis project aiming at the development of a pocket-size communication aid for thevocally handicapped (Waterham) was rounded off formally in August this year withan evaluated basic model of the Pocketstem (Deliege et aI., this issue). An inter­ested industrial partner has already been contacted to develop the basic model intoa commercially available aid. Current work mainly concerns the development ofalternatively operated versions (Bierens).

TiepstemSince the current portable keyboard-to-speech communication aid 'Tiepstem' is ap­parently a good first step towards a useful aid for people with a vocal handicap, wedecided on further development in cooperation with an industrial partner. Thanksto financial support (collective demand), this work can be started in January 1989(Deliege et aI., this issue). Additionally, a software program has been developed foreasy operation of the Tiepstem.

Aids for the visually handicapped

Optical magnifiers for readingAfter several investigations into the influence of retinal magnification and horizontaland vertical reading field width of loupes on the reading performance, the practicalaspects of the use of loupes have been examined. A group of 10 partially sightedsubjects (visual acuity of up to 0.1) have appreciated a number of 35 commerciallyavailable hand-held magnifiers as to their practical usefulness.These magnifiers differed in focal distance, dimensions, shape and weight. The re­sults of this work are in preparation for publication (Neve). It is our intention toround off the project by publishing the main results in a way which is easily ac­cessible and comprehensible to manufacturers, distributors, prescribers and users ofoptical magnifiers.

Task illuminationAfter two literature studies concerning possible links between eye diseases and in­tensity and spectral distribution of light, the first phase of this project has startedwith pilot experiments to determine the effect of luminance, contrast and type sizeof nonsense text on search performance. In order to investigate possible wavelengthdependency, experiments with different bands of narrow wavelengths are in prepa­ration (Van Heijnsbergen).

[PO annual progress report 231988

113

Page 112: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Advices

Much attention has been paid this year to the preparation of the establishment ofa Visual Advice Centre at Eindhoven. Part of the training of two future employeesof the centre has been carried out at our institute. We hope that these efforts willlead to improvement of the technical aspects of low-vision care (Neve, Jorritsma).

Other communication aids

On the basis of users' experiences with a 1977 version of a head-controlled writingapparatus for motorically handicapped people, a software program has been de­veloped by which an Atari PC can be operated by means of a generally availablesonar headcontrol device. A first evaluation of this system has started with multiplesclerosis and amyotrophic lateral sclerosis (Heuvelmans).

114

Page 113: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Realization and evaluation of two speechcommunication aids

R.J .H. Deliege, LM.A.F. Speth-Lemmens* and R.P. Waterham(in alphabetical order)

* Institute for Rehabilitation Research, Hoensbroek.

Abstract

Two projects dealing with the possibilities of speech communication aidsfor the speech impaired will be discussed in this paper. Since the group ofspeech impaired is very diverse, the two projects differ in target group, in thespeech technology used and consequently in the complexity of input of theaids. In both projects (Tiepstem and Pocketstem) experimental models andin one project (Pocketstem) prototypes have been realized. All models wereevaluated with potential users. This paper describes the aids, their evaluation,the evaluation results and finally some conclusions and future plans.

Introduction

For some years research has been going on at our institute into the applicationof synthetic speech in aids for the handicapped. In two projects the possibilitiesof a speech communication aid using synthetic speech are investigated (Deliege &Waterham, 1986). These two projects differ in the speech technology used (andconsequently in the complexity of the input and the speech quality) and in theirtarget groups. One project uses speech resynthesis in an attempt to realize anaid that is very easy to operate but is limited in vocabulary (called 'Pocketstem').This aid should be useful for the group of speech impaired that have additionalhandicaps (e.g., motor, language). The other project uses speech synthesis (diphoneconcatenation) in an attempt to realize an aid with an unlimited vocabulary (called'Tiepstem'). This aid will necessarily require some linguistic and typing skills onthe part of the user.

Because of the complex nature of the problem (development of aids for the hand­icapped) and the human factors aspects involved, assistance from potential users(and their therapists) is required. This assistance is used in formulating the userdemands for our devices and also for evaluating the realized models. The develop­ment of the aids was carried out at the Institute for Perception Research (IPO) andthe evaluation was coordinated by the Institute for Rehabilitation Research (IRV).

The field evaluation of the Pocketstem and the Tiepstem was recently finished.After a short description of both devices, this paper will present the evaluation andits results. Finally the future plans for both projects will be explained.

IPO annual progress report 29 1988

115

Page 114: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Functional description

Pocketstem

The Pocketstem is a speech communication aid with a speech output that has acapacity of 28 messages (Waterham & Verhoeven, 1987). The messages were readout by someone, recorded, coded to lower the data rate and stored. Participantsin the evaluation can select these messages in advance from an available set of over200 messages, where necessary, supplemented with personal messages. Operationof the Pocketstem consists in selecting the desired message. The selected messageis made audible by pressing a key. To facilitate selection, these keys are labelledwith pictograms specially designed for this purpose. Every message is stored in twodifferent versions (e.g., with a different intonation). When a key is pressed twice,the message is spoken differently the second time, in order to improve intelligibility.The audio volume is adjustable by means of a switch located at the bottom of thePocketstem.

Tiepstem

The Tiepstem is a speech communication aid using a limited text-to-speech con­version. Input to the system is by means of a QWERTY keyboard. Input has tobe in a pseudophonetic notation (e.g., 'poolietsie' instead of 'politie'). The typedmessage is made audible when the key labelled 'spreek' (speak) is pressed. A LiquidCrystal Display shows what is typed. To facilitate the input process a simple texteditor has been incorporated, allowing characters in the messages already typed tobe changed, added or deleted (Deliege, 1986). Numbers need not be spelled out, butcan be entered by means of number keys. A synthetic intonation contour is gener­ated, based on the punctuation marks and a special accent mark which the user hasto provide in the input sentence (Menting, 1984). These intonation contours addto the naturalness and intelligibility of the synthetic speech. The user also has theoption of storing frequently used sentences (or parts of sentences) under an arbitraryalphanumeric character sequence. This feature can increase communication speed.The user can display these stored sentences and delete them when no longer needed.

Evaluation

General

Both the Pocketstem and the Tiepstem have been evaluated in the field in orderto obtain information about their technical durability, their usefulness and theiracceptability. These evaluations have been carried out by lending the devices tovarious potential users all over the country. For the sake of uniformity and compa­rability of the results, questionnaires were used for guidance during the collection ofthe evaluation data (Speth-Lemmens & Oostinjen, 1986a, 1986b). The results aredivided into five categories:

1. Review of usefulness:-Technical usefulness (e.g., malfunctions, battery capacity);

116

Page 115: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

-Practical usefulness (e.g., operation, intelligibility, portability);-Personal usefulness (e.g., communication speed, frequency of use);

2. Impressions as to application (e.g., attracting attention, taking initiative);

3. Survey of the selection criteria of the users (e.g., physical abilities, cognition);

4. Therapy aspects (e.g., training phase, determination of the vocabulary);

5. Survey of environmental reactions (e.g., reactions to speech output);

The participants in the evaluation were selected in such a way that a wide varietyin diagnosis, age, physical ability and communication handicap was represented.

Pocketstem

An early experimental model of the Pocketstem, called the 'Compacte Spraakhulp',was the subject of a short evaluation in 1985 (Oostinjen et al., 1985). The results ofthis evaluation were used in the design and development of the Pocketstem. For theevaluation of the Pocketstem, 5 specimens were available for a period of 8 months. Inall, 24 persons suffering from a temporary or permanent communication handicap,used the Pocketstem for periods ranging from 1 day to 24 weeks (see also Table 1).

Table 1: Survey of participants in the evaluation of the Pocketstem and the respectiveperiods of use.

Participants Periods of use (weeks)5 cerebral paresis 24,13,12,7 and 23 cerebral vascular accident 6,5 and 32 cerebral contusion 23 and 101 laryngectomy 7- group of 13 medium-carel 15, ranging from 1 day to

intensive-care patients several weeks per person

After this formal evaluation period the evaluation was continued whenever po­tential users were available, using the same procedure. Furthermore, after the formalevaluation period the first specimens of the Pocketstem with a new speech synthe­sizer (Philips PCF8200) became available, so that the realization of a Pocketstemwith a female voice was possible. These were also made available for use in practice(some with a female voice). The users of the Pocketstem after the formal evaluationperiod are summarized in Table 2.

Tiepstem

For the evaluation of the Tiepstem, two specimens were available for a period of3 months. During this evaluation period 8 persons used the Tiepstem, for periodsranging from a few days to 8 weeks (see also Table 3). We tried to select, for thisevaluation, the subjects with at least the physical and cognitive abilities required tooperate this aid.

117

Page 116: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Table 2: Survey of participants in the evaluation of the Pocketstem and the respectiveperiods of use (after the formal period).

Participants Periods of use (weeks)1 cerebral paresis 401

1 cerebral vascular accident 162 cerebral contusion 281 and 671

- group of ~ 30 medium-carel 171,2, ranging from 1 day to30 intensive-care patients several weeks per person

Table 3: Survey of participants in the evaluation of the Tiepstem and the respective periodsof use.

Participants Periods of use (weeks)4 cerebral contusion 1,1,2,81 cerebral vascular accident 42 laryngectomy 1,82 post-operative patients 1,1

Evaluation results and conclusions

Pocketstem

The results of the formal evaluation period (Table 1) are summarized in an evalua­tion report (Speth-Lemmens, 1987a). The variety of participants in the evaluationand the nature of it (including personal observations) make it impossible to drawstatistically valid conclusions. The presented results are therefore based both on theanswers given by the users and on our own impressions. The general results fromthe evaluation are as follows.

Various groups of speech-impaired persons (e.g., with cerebral contusion or men­tally disabled), their therapists and their environment consider this aid to be usefulfor communication. Persons with a temporary communication handicap (intensive­care/medium-care) reacted positively to this possibility of expressing somethingquickly. This was especially true during the first phase of a temporary speech hand­icap, which is often a difficult phase. The Pocketstem was found to provide a goodstimulus to communication. Persons with a permanent communication handicapused the Pocketstem mostly together with a communication aid already in use (e.g.,Sharp EL-7100, Canon communicator, Bliss-board or the 'Taalzakboek'). In general,reactions to this aid were enthusiastic.

From the evaluation we can draw the following conclusions:

• The Pocketstem in its present form is useful for various speech impaired.

• Expanding and/or updating the set of messages is desirable for some users.

• Alternative input methods are desirable for some users.

I Still in use on 1-8-88.2Time used per Pocketstem, one with male voice, one with female voice.

I

118

Page 117: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

• A female or child's voice is wanted by some of the users.

We can conclude that the Pocketstem satisfies the needs of a considerable numberof potential users.

Tiepstem

The results of the evaluation of the Tiepstem are also summarized in an evaluationreport (Speth-Lemmens, 1987b). As was its purpose, this evaluation gives a firstimpression of the possibilities of the Tiepstem. In general, speech synthesis wasreceived with enthusiasm. The following points, however, were noted during theevaluation:

• Operation of the aid requires a high level of physical ability (for operating akeyboard).

• The pseudophonetic input requires knowledge of this spelling and explicitawareness of Dutch pronunciation.

• The input was often misspelled, partly for the above mentioned reasons. Thiscaused less-intelligible speech and required much editing.

Future plans

In an attempt to combine the advantages of both Pocketstem and Tiepstem, acommunication link between both aids has been realized. This makes it possible toload the Pocketstem with messages prepared on the Tiepstem. The advantage ofthis communication link is a more flexible programming of the Pocketstem. Thiswill allow a user of the Pocketstem to have his message set easily updated, forinstance to accommodate for daily changing needs. In this case, the programminghas to be carried out by someone other than the user himself, who is not able tooperate the Tiepstem. Another application of this communication link can be forusers who are capable of operating the Tiepstem, but do not achieve a sufficientlyhigh communication speed. They may use the Tiepstem to prepare their messagesand the Pocketstem to reproduce them. For reproduction, the Pocketstem is stillfaster and easier than the memory facility implemented in the Tiepstem. The useof the Tiepstem as a loading device for the Pocketstem implies, however, that thequality of the speech produced by the Pocketstern will be that of the speech synthesistechnique. Evaluation has still to be carried out to see if this new facility indeedoffers the advantages we expect.

As far as the Pocketstern alone is concerned, we already concluded that thecurrent device is useful for a considerable number of users. An industrial enterprisetherefore has been contacted to make it a commercial product.

A prototype series will first be produced and the devices evaluated in order tomake a good estimation of the market for them. An aspect that deserves specialattention when turning the Pocketstern into a commercial product is the way the setof messages will be provided. Besides the attempt to commercialize the basic modelof the Pocketstem, research is being continued in order to realize different versions ofthe device adjusted to the needs of specific sections of our potential user group. Forinstance, a one-key operated version will allow speech impaired with severe motor

119

Page 118: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

handicaps to operate the Pocketstem, and a version using a symbol language (suchas Bliss or Dominolex) as input will allow users familiar with the language to selecta message from an extended set of messages with the help of only a limited numberof keys.

The evaluation results of the Tiepstem showed ample room for improvement.A new project will therefore be started in which a new model of this kind of aidwill be developed and evaluated. The new model will incorporate full grapheme-to­phoneme conversion software, so that input will be in normal Dutch spelling. Aneffort will also be made to design a more user-friendly input system. The speechsynthesis technique will be brought up-to-date (e.g., new speech synthesizer, du­rational control). An industrial partner will participate in this project in order tofacilitate the process of making this aid commercially available.

References

Deliege, R.J.H. (1986) Technische beschrijving van de Tiepstem. IPO Report 548.

Deliege, R.J.H. & Waterham, R.P. (1986) Application of speech synthesis and resynthesisin two speech communication aids. IPO Annual Progress Report, 21, 110-115.

Menting, P.G. (1984) Towards a keyboard-to-speech system. IPO Annual Progress Report,19,42-45.

Oostinjen, E., Balkom, H. van & Soede, M. (1985) Evaluatie 'Compacte Spraakhulp I.P.O.'.IRV internal report, October 1985.

Speth-Lemmens, I.M.A.F. & Oostinjen, E. (1986a) Evaluatie 'Compacte Spraakhulp', eva­luatielijsten. IRV internal publication.

Speth-Lemmens, I.M.A.F. & Oostinjen, E. (1986b) Evaluatie 'Tiepstem', evaluatielijsten.IRV internal publication.

Speth-Lemmens, I.M.A.F. (1987a) Evaluatie Pocketstem. IRV internal report, IRV/8 doc.(87).

Speth-Lemmens, I.M.A.F. (1987b) Evaluatie Tiepstem. IRV internal report, IRV/1 doc.(87).

Waterham, R.P. & Verhoeven, M.W.C. (1987) Technische beschrijving van de Pocketstem.IPO Report 606.

120

Page 119: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

INSTRUMENTATIONAND

SOFTWARE

122

Page 120: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Developments

L.F. Willems

The personnel in our instrumentation group has changed considerably. At the be­ginning of this year Jan Tiesinga left our lab. He was specialized in microprocessorcontrolled equipment. At the end of the year Gerard Moonen retired from our lab.after many years of dedicated service. His speciality was the design of computerinterfaces. Up to now one new member has joined our group: Klaas de Graaf, whohas experience in the design of electronic equipment, mainly analogue circuits.

Our computer installation has not changed considerably since the description inthe last year's Progress Report. It is a matter of constant concern to have adequatecomputer facilities available for the laboratory. On the one hand, one is inclined tobe tempted by the many novelties on the market but, on the other, it is important toavoid the excessive turbulence which can be caused by new products or new softwarewith their attendant growing pains.

Our instrumentation efforts are mainly oriented towards the design of interfacesfrom computers to equipment used in perception experiments. Often, these inter­faces are not commercially available. In this issue a description is given of somedevices used in a psychoacoustic experimentation set-up and controlled through anIEEE bus.

We have also tackled the difficult problem of the development of controllablelight sources. In visual stimulus presentation and in tachistoscopic experiments it isvery often required to have light sources switched on and off in a controlled way withtransition times in milliseconds, while simultaneously accommodating considerableluminance levels. In this field we are investigating two possibilities: one is a CRTsystem based on tubes used in projection TV, while a second is based on fluorescentgas discharge tubes with current control.

[PO annual progress report 23 1988

123

Page 121: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

A new experimentation set-up forpsychoacoustic research

J .G. Jonker

AbstractA new IEEE-488 controlled experimentation set-up for psychoacoustic re­

search was put into use. Three devices of this set-up were our own developments,since there were no suitable ones commercially available. The 'Parallel Timers'and the 'Fourth-dB Attenuators' were developed to manipulate the stimuli pre­sented to the subjects. The first device is a set. of four t.imer modules whichproduce programmable pulses t.o an accuracy of O.lms. They are used t.o enableand disable audio switches and cont.rol event.s in t.he st.imulus presentat.ion. The'Fourt.h dB At.t.enuat.ors' are used for a programmable attenuat.ion of st.imuli inO.25dB st.eps to an accuracy of O.ldB.

Anot.her part. of the set.-up is the 'Reaction Recorder'. This is a device t.ocollect reactions, as well as react.ion times, from subjects. A maximum of eight.subjects can be handled by one recorder.

The experimentation set-up

The purchase of a new computer to control psychoacoustic experiments was the mainreason for the development of a new set-up. The new computer is a Microvax with

an IEEE-488 interface bus for communication with peripherals. The main purposeof the new set-up is to control complete experiments by one host via this IEEE­488 interface. The experimentation set-up consists of equipment for the generationof acoustic stimuli, such as programmable oscillators, devices for the manipulation

of the stimuli, such as filters and attenuators, and interfaces to the subjects forcollecting responses (see Figure 1). All devices are programmable via the IEEE-488interface.

s

ostimulus st.im. st.imulus st.imulugenerat.or manipulator

experiment. t tcontrol ! IEEE-488-host-

subject. reactionIinterface

subject.Figure 1: The experimentation set-up

Three devices of the set-up had to be developed by the instrumentation groupsince they were not commercially available. The 'Fourth-dB Attenuators' and the'Parallel Timers' are needed for manipulating the acoustic stimuli, while the 'Reac­

tion Recorder' provides an 'intelligent' IEEE-488 interface to the subjects.

IPO annual progress report 29 1988

124

Page 122: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

The Fourth-dB Attenuators

This is a set of four programmable audio attenuators with one common IEEE-488interface. The required specifications are an attenuation range of 60dB or more witha step size of O.25dB and an accuracy of O.1dB.

Digitally controlled attenuators

Since the attenuators have to be controlled by the IEEE bus, a circuit with a digitallycontrolled attenuator has to be used. All available solutions use multiplying DACs.These are usually voltage-current converters used in combination with an Opampand feedback resistor as current voltage converter (see Figure 2). In the case of a16-bit multiplying DAC the attenuation of the circuit is

fLDAC

L = inputcode; 0 ::::: L::::: 65535

Figure 2: The diagram of a linearly coded digital attenuator

A basic problem of this approach is that multiplying DACs are linear, whileaudio signals demand attenuation by equal dB steps. Some manufacturers supplylogarithmically coded multiplying DACs, but none of them has a smallest step ofO.25dB and a range of 60dB. Translating every O.25dB step into a linearly coded at­tenuation and using a linearly coded multiplying DAC is no solution either, becausethe resulting accuracy is too low.

att. single 16-bits DAC 12-bits combinationL K L

code appr.(dB) ace. (dB) att.code gain code appr.(dB) acc.(dB)O.OOdB 65535 -0.0001 0.0003 102 4061 +32.001 0.0440.25dB 63677 -0.2498 0.0001 104 4023 +31.750 0.0430.50dB 61870 -0.5000 0.0001 108 4061 +31.500 0.041....63.50dB 44 -63.461 0.24 4059 108 -31.500 0.04163.75dB 43 -63.660 0.29 4023 104 -31.750 0.04364.00dB 41 -64.074 0.29 4061 102 -32.001 0.044

Table 1: The codes, approximation and accuracy of the two attenuator circuits

The two most important sources of error of the attenuator of Figure 2 are thefinite accuracy of the linearly coded DAC, usually expressed in multiples of oneleast significant bit (1 LSB), and the error introduced by rounding off to the nearestinteger code word. These constant absolute errors lead to relative errors that are toohigh for large attenuations. Table 1 shows the codes for the attenuations between 0

125

Page 123: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

and 64dB for a 16-bit multiplying DAC. It is found that a 16-bit multiplying DACwith lLSB accuracy is not suitable for an attenuator circuit with a 64-dB range andO.I-dB accuracy.

Instead of using a multiplying DAC with a higher accuracy, we preferred analternative solution, the schematic principle of which is shown in Figure 3. Thisattenuator was used in an earlier development (Dobek, Van Nes & Willems, 1979)and consists of two 12-bit multiplying DACs. One of them is used as attenuator andthe other as programmable feedback. This circuit attenuation is the quotient of twoattenuations. An additional advantage is that the total range is doubled.

u·tflDAC

12

LK 4096

U0 = c . Ui · 4096 L

Figure 3: The circuit of the logarithmic attenuator

The problem of the absolute error introduced by round-off can be largely elimi­nated by an adequate selection of the numerator and denominator, given a certainratio.

Higher code words are preferable because they achieve a higher accuracy. Thelowest accuracy is obtained for -32dB of attenuation or +32dB of gain (see Table 1).The multiplying DACs used are 12-bit ones with an accuracy of 1/4 LSB. With thiscircuit it is possible to create a logarithmic DAC that has the same accuracy as acircuit with a single 18-bit multiplying DAC (more exactly 18.4-bit resolution and1/2 LSB accuracy) over a range of 78dB. A fixed -32dB of attenuation in series withthe attenuator is used to compensate the 32dB of gain.

The PARallel TIMERS

Another stimulus manipulator in the experimentation set-up is an audio switch withpreprogrammed turn-on and -off envelopes. These switches can be turned on andoff by a trigger pulse. In order to have programmable turn-on and -off pulses, thePARallel TIMERS are developed to control four audio-switches. The timers have

To=triggermodule-ll-----------~

module-2

module-3 f----------....T", T.ll

module-41-------....T", T'll

Figure 4: The timer events

126

Page 124: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

two programmable events: Ton and To!! (see Figure 4). At Ton, which is the timeafter the trigger event To, the output of the timer module becomes active (low). AtTo!!, the other event, the output becomes passive (high). The events To!! and Tonare programmable from O.Oms to 6553.4ms in O.lms steps. The timers are calledparallel because there are four timer modules controlled by one IEEE interface thatall have the same trigger moment To. The host can define the trigger event To bysending the 'trigger' command which starts the internal timer. A timer module canbe made passive by setting both events at 6553.5ms, which will never be reached bythe internal timer.

The PARallel TIMERS can report the status of each module. The host can askfor a status byte, that returns a logical 'one' for every module which has passed its

To!!'There is also the possibility of asking a module to report the programmed Ton

and To!! of every module. The host must send a specified command, after whichthe PARallel TIMERS send a small record of all the programmed events.

The Reaction Recorder

The main purpose of the Reaction Recorder is to release the host computer from con­tinuous checking for reactions of subjects. This subject interface is able to 'record'the responses for a maximum of eight subjects simultaneously. In most experimentsup to now, subject interfaces are digital and consist of a few switches or a small key­board. Some acoustic experiments use an analog potentiometer as subject interfacein combination with an analog-digital converter to generate a digital output to thehost.

Only the input at the time of the subject's reaction is of interest. When akeyboard is used, the reaction is the key that is pressed by the subject as well asthe time this key is pressed. Since the host's input is digital, this is an input aftera change from all inputs 'zero' to one or more inputs 'one'.

The reaction recorder is essentially an IEEE-488 interface to eight different 16­bit digital inputs. These inputs can be used to connect a small keyboard with 16keys, or an analog-to-digital converter. The recorder has a timer to measure thetime the input changes. This timer is enabled by a trigger command that is alsoused to start the experiments. The reaction recorder is also equipped with eight16-bit digital outputs that can be used to give feedback to the subject. The reactionrecorder can operate in several modes:

• The tracking modeIn the tracking mode the recorder acts as a direct connection between thehost and the inputs (see Figure 5, mode switch c). In this mode the recorderonly supplies the data of a specified input when the host requests it. Thedisadvantage of collecting reactions in the tracking mode is that it is verytime-consuming because most of the time there is no reaction at all.

• The service request modeIn the service request mode the recorder acts as a reaction filter (see Figure5, mode switch b). The recorder is able to recognize reactions, and will onlyreport these to the host. The recorder will read all eight inputs every millisec­ond and if there is a reaction, i.e. a change in one of the inputs, an interrupt,

127

Page 125: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

called service request, is generated. The host can respond to that request andask for the data. Although the processing of the host is greatly reduced, thismode has one serious disadvantage. When the host waits several millisecondsafter a request, because of jobs with higher priority, it is possible that a newreaction occurs. Since the recorder cannot buffer reactions in this mode, theprevious reaction will be overwritten.

• The memory modeIn the memory mode the reaction recorder stores the filtered reactions in alocal memory (see Figure 5). This mode can be used when the host doesnot need the reactions during the experiment to generate new stimuli. Therecorder is able to store a maximum of 64 reactions per input. A stored reactionconsists of the 16-bit input data at the moment of an input change and thetime in milliseconds since the start of the experiment. Although 64 reactionsare sufficient with most experiments, it is possible that the buffer will overload.Therefore the reaction recorder generates a service request when there is onlyroom left for two reactions in the buffer.

Unlike the PARallel TIMER there is no time limit. When the maximum of the timeris reached, the timer will be reset and an identifier will be stored in the data bufferinstead of a reaction to indicate the time lap. The host is now able to calculate thereal reaction time by adding an offset of 6553.6ms.

to host

modeswit~h a Imemory I I reaction1

/O-tL J-ri filter....---+---r

c

Figure 5: Illustration of the input modes

Conclusions

a) memory mode

b) service request mode

c) tracking mode

digital inputs

The purchase of a new computer for the control of psychoacoustic experiments re­sulted in the development of three IEEE-488 controlled devices with properties thatsuit the demands of the set-up. Together with some other IEEE-programmable de­vices it is now possible to completely control psychoacoustic experiments by meansof the host. Especially when the memory mode of the reaction recorder is used, hostinterrupts are reduced to a minimum.

References

Dobek, J.J.G.M., Nes, A.C. van & Willems, L.F. (1979) Some applications of the multiply­ing DAC. IPO Annual Progress Report, 14, 142-146.

128

Page 126: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Publications 1988

P.622 F.J.J. Blommaert and H. Timmers

Letter recognition at low contrast levels: effects of letter size

In: Perception, 1987, 16,421-432

Contrast variation was used to measure recognition thresholds for lower-case letterswith the aim of obtaining a better understanding of the role that early stages of visualprocessing play in letter recognition. Frequency-of-recognition curves were measuredfor alphabets differing in letter size. Since variation of the adaptational state of theeye quantifiably changes the characteristics of primary visual processing, recognitionthresholds were measured both at a high (150 cd/m2 ) and at a low (0.9 cd/m2

)

adaptation level. Thresholds decreased as letter size increased, in a way comparablewith data on visual acuity. At the lower adaptation level, recognition thresholdsbecame higher, which is also in accordance with visual acuity data. Furthermore, theslopes of the frequency-of-recognition curves for alphabets as a function of log contrastdecreased with decreasing letter size. It is argued that this is mainly caused by anincreasing dispersion of internal representations of individual letters on the internalpsychological scale as letter size decreases.

P.623 D.J. Hermes

Measurement of pitch by subharmonic summation

In: Journal of the Acoustical Society of America, 1988, 89(1}, 257-264

In order to account for the phenomenon of virtual pitch, various theories assumeimplicitly or explicitly that each spectral component introduces a series of subhar­monics. The spectral-compression method for pitch determination can be viewed asa direct implementation of this principle. The widespread application of this princi­ple in pitch determination is, however, impeded by numerical problems with respectto accuracy and computational efficiency. A modified algorithm is described thatsolves these problems. Its performance is tested for normal speech and 'telephone'speech, i.e., speech high-pass filtered at 300 Hz. The algorithm outperforms theharmonic-sieve method for pitch determination, while its computational requirementsare about the same. The algorithm is described in terms of nonlinear system theory,that is subharmonic summation. It is argued that the favourable performance of thesubharmonic-summation algorithm stems from its corresponding more closely withcurrent pitch-perception theories than the harmonic sieve does.

P.624 D. Beroule

Guided propagation inside a topographic memory

In: M. Caudill and Ch. Butler (Eds): IEEE first international conference on neuralnetworks, 1987. New York: IEEE, 1987, session IV, 469-476

[PO annual progress report 291988

129

Page 127: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Recognition of a signal which propagates freely in the outside world can be representedby guided propagation (or flow) through a three-dimensional network. The presentpaper describes this theory and applies it to the treatment of acoustic and visualpatterns.

P.625 D. Beroule

The never-ending learning

In: R. Eckmiller and ChI'. v.d. Malsburg (Eds): Neural computers. Berlin: Springer,1988, 219-230

A processing principle is presented that is supported by a dynamic memory, whichcauses learning to be involved in the overall treatment. By emphasizing the opera­tional constraints of this principle and the concrete tasks to be performed being takeninto account, a modular and parallel architecture is gradually defined. It is shownthat this architecture arises in the course of processing by means of two complemen­tary mechanisms: the long-term reinforcement or dissolution of memory pathways,and the episodic sprouting of new pathways. The resulting system basically detectscoincidences between a cross flow of internal signals and an afferent flow of incomingsignals.

P.626 B.A.G. Elsendoorn

Communicatiehulpmiddelen bij spraakstoornis (Communication aids for the vocallyhandicapped)

In: De Ingenieur, 1988, 4, 76-77

Various disturbances may occur in the communication process between human beings.These may present themselves on the 'sender' as well as the 'receiver' side. Muchresearch effort has been invested in trying to solve those problems occurring with theformer. As an example, two communication aids will be presented that may help toreduce these disturbances in the communication process to acceptable proportions.

P.627 R.J.H. Deliege, I.M.A.F. Speth-Lemmens and R.P. Waterham

Ontwikkeling en evaluatie van twee communicatiehulpmiddelen met spraakuitvoer(Development and evaluation of two speech communication aids)

In: Nederlands Tijdschrift voor Ergotherapie, 1988, 2,37-40

Recent developments in the field of speech technology and microelectronics can beapplied in communication aids for the speech-impaired. This article describes twoprojects in which the usefulness of synthetic speech in a communication aid is inves­tigated. The two projects differ in the speech technology used and consequently invocabulary, speech quality and operation of the aids developed and in their targetgroup. Experimental models are built (IPO) and evaluated in practice (IRV) in bothprojects.In the first project this resulted in a small communication aid (Pocketstem) that iscapable of speaking 28 prestored messages. Selection takes place through 28 picture­labeled keys.In the second project this resulted in a portable keyboard-to-speech system (Tiep­stem). Input to this system is in a pseudophonetic notation. The system uses an

130

Page 128: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

almost normal QWERTY keyboard for input and an LCD display for feedback. Textcan be edited and frequently used messages can be stored in a memory.Both aids have been evaluated in practice by potential users. The evaluation of thePocketstem showed that this aid meets the needs of many users. Therefore productionfacilities are currently sought. The evaluation of the Tiepstem showed possibilities forsuch an aid, but indicated some problems with the pseudophonetic input used. Thisproblem will be covered in a forthcoming model.

P.628 J.M.B. Terken and S.G. Nooteboom

Opposite effects of accentuation and deaccentuation on verification latencies for Givenand New information

In: Language and Cognitive Processes, 1987, 2, 145-163

Accentuation results in faster recognition of words expressing new (focal) information.To find out whether accentuation speeds up the comprehension of words expressinggiven information as well, the presence or absence of accents was varied independentlyfor these categories in three experiments. Degree of Givenness was varied acrossexperiments. Listeners verified spoken descriptions of pictures. Accentuation wasfound to interact with the Given/New variable: Given information was verified fasterwhen the word expressing it was unaccented. New information was verified fasterwhen the word expressing it was accented. These findings suggest that listeners donot simply give more attention to accented words, but rather process accented andunaccented words in different ways. It. is hypothesized that the presence of an accentleads the listener to give primary attent.ion to the acoustic/phonetic properties of theword and to construct an interpretation from the bottom up, and that the absenceof an accent on a word leads him to map it onto the limited set of discourse entitieswhich are currently activated, with less attention to the acoustic/phonetic properties.

P.629 H. Schmidt, G.W.G. Spaai and W. de Grave

Opsporen van misconcepties bij middelbare scholieren (A technique for exploring HighSchool students' misconceptions)

In: Tijdschrift voor Onderwijsresearch, 1988, 13(3), 129-140

In two experiments, a method was tested for discovering misconceptions in students'knowledge and ideas on science subject matter as taught in secondary schools. Themethod entails presenting small groups of students with a description of a set of nat­ural phenomena. They are requested to produce as many explanations as possiblefor these phenomena and discuss these explanations critically. The discussions wereaudiotaped and verbatim protocols were screened for misconceptions. The methodappeared to be successful. In experiment 1 on osmosis, twelve qualitatively distinctmisconceptions were established. Experiment 2, on the behaviour of a plane takingoff, unravelled three misconceptions concerning aerodynamics. In both experiments,performances of novices and experts were compared. Only where the subject matterwas explicitly part of the curriculum of the experts (osmosis) could clear differencesbetween novices and experts be observed. As differences in performance betweennovices and experts on the aerodynamics problem were largely absent, it is suggestedthat subjects have little ability in elaborating on relevant knowledge available in theirmemory if they do not recognize a problem as a special case of something else. Inaddition to characteristics reviewed from literature, three other attributes of miscon­ceptions emerged from the experiments. First, misconceptions often seem to be just

131

Page 129: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

not available in the memory when needed, but are actively constructed in response toa novel situation. Second, misconceptions appear to be highly metaphorical in nature.And third, while constructing a misconception, students tend to be mainly sensitiveto the perceptual aspects of a situation.

P.630 J.H.D.M. Westerink and J.A.J. Roufs

A local basis for perceptually relevant resolution measures

In: Proceedings 1988 SID International Symposium, Digest of Technical Papers, Ana­heim, California, May 24-26, 1988. Volume XIX, 360-363

Two experiments are described concerning the perception of image quality. Resultsshow that a local measure of resolution will reflect the quality sensation more ad­equately than one based on the description in the spatial domain. This throws adifferent light on a vast range of perceptually relevant resolution measures proposedin Iiterature.

P.631 F.L. van Nes

Auditel: a field trial of telephonic data retrieval with voice recognition

In: Proceedings 12th Human Factors in Telecommunication Symposium, HFT '8B,May 24-27, The Hague

In 1983 the Dutch PTT and Philips set up a joint working party under the nameof 'Speech Processing', 'to allow reliable statements to be made on whether speechprocessing could be used in subscriber services'. To this end, the working party in­vestigated the feasibility of an information service for the general public with voicecommands and voice output.In order to be able to use speaker-independent voice recognition, samples of all wordsneeded in the service were collected from about 180 male and female speakers andused to make recognition templates.In order to accommodate the needs of new, as well as experienced users, several dia­logue structures were conceived and tested in a Wizard-of-Oz experiment with siInu­lated voice recognition. These structures were then used in a field experiment with 122subjects. The results of this experiment showed that a simple public-network informa­tion service with speaker-independent voice recognition is possible, both technicallyand in terms of user performance, although not without certain problems for theseusers. Error correction proved awkward. In systems with voice recognition wherethis is the case, the overall recognition score should be higher than 80% in order toensure that the users are only confronted with the difficult correction procedures inexceptional cases.

P.632 C. Ode

Rising pitch accents in Russian intonation: an experiment

In: A.A. Barentsen, B.M. Groen and R. Sprenger (Eds): Studies in Slavic and GeneralLinguistics. Amsterdam: Rodopi, 1988, 11,421-439

This article discusses a listening experiment which was conducted in order to find outhow many times of rising pitch accent must be distinguished in the intonation of spon­taneous Russian. The linguistic material was taken from a spontaneous monologue in

132

Page 130: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

which pitch phenomena had earlier been analysed using the stylization method. Theexperiment was set up as an individual sorting test for twenty native subjects. Twentyutterances were selected for this test. Subjects were instructed to classify rises on thebasis of melodic resemblance. The results are presented, followed by a discussion.

P.633 R.J.H. Deliege, I.M.A.F. Speth-Lemmens and R.P. Waterham

Ontwikkeling en evaluatie van twee communicatiehulpmiddelen met spraakuitvoer(Development and evaluation of two communication aids with speech output)

In: Logopedie en Foniatrie, 1988, 60, 220-224

Available speech technology and microelectronics can be applied in speech commu­nication aids for the vocally handicapped. Such speech communication aids can bedivided into two classes according to the speech technology used, one that uses 'storedspeech', that is some coded form of previously spoken natural speech and one thatuses 'speech synthesis', where utterances that do not need to be previously spoken canbe generated. These two classes differ in various aspects, such as vocabulary, speechquality, complexity of input and consequently the potential user group.Our goal is to investigate the usefulness of synthetic speech in such communicationaids. For this purpose we did not limit ourselves to one of the above-mentioned classes,but approached the problem with two projects. One uses 'stored speech' and the other'speech synthesis'. In both projects experimental devices are designed, built (IPO)and evaluated (IRV) in order to incorporate suggestions and user feedback.In the first project this resulted in a small (hand-held) communication aid (Pocket­stem), capable of speaking 28 prestored messages. Selection is done by pressing oneof 28 picture-labelled keys on a membrane keyboard.The other project resulted in a portable keyboard-to-speech system (Tiepstem). Textis entered in semiphonetic notation, using an almost normal QWERTY keyboard. AnLCD screen, together with editing facilities, add to user-friendliness. There is storagefacility which can be used to store and recall often-used messages.Both devices have been subject to an evaluation by potential users. Evaluation of thePocketstem has shown that this communication aid meets the needs of many speech­impaired persons. Production possibilities are currently being sought. Evaluationof the Tiepstem has shown application possibilities for such a device, but has alsoindicated problems with the current semiphonetic input. A forthcoming model willincorporate full text-to-speech conversion to overcome these problems.

P.634 H.E. Henkes, H. Bouma, L.H. van del' Tweel and G. Verriest

Leesbaarheid van teksten in musea (Legibility of texts in musea)

In: Museumvisie, 1988, 2, 59-61

Legibility of texts in musea tends to be poor. Difficulties for elderly visitors are evengreater, because visual needs are in a more critical range. The paper gives some rulesof thumb which, if observed, assure moderate to good legibility. The rules pertain to(a) letter size in relation to reading distance (b) type faces (c) the use of colour andcontrast and (d) illumination. It is advisable to have a panel of elderly people judgelegibility, for which purpose reading charts are defined.

P.635 P. Reitsma

Reading practice for beginners: Effects of guided reading, reading-while-listening, and

133

Page 131: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

independent reading with computer-based speech feedback

In: Reading Research Quarterly, 1988, 29, 219-235

The purpose of this study was to determine which of three ways of practising readingbest facilitates the development of efficient reading skills in beginners: guided reading,reading-while-listening, or independent reading with computer-generated speech feed­back available for students to use at will. Seventy-two first-grade students in the threeexperimental conditions and a control condition read a passage of text each day forfive consecutive days. Except in the control condition, the five texts repeated 20 tar­get words that were relatively hard for beginners to read. Students were tested on the20 words before and after treatment, and changes in rate and accuracy were analysed.Both guided reading and independent reading with self-selected speech feedback werefound to be significantly more effective than the control and reading-while-listeningconditions. The findings suggest that increases in reading efficiency depend largelyon the amount of independent, autonomous reading activity of young readers. If suchindependent activity is included, computer-aided practice with speech feedback seemspromising as a means of improving reading skills of beginners.

P.636 J.P. van Hemert

Different time models in pitch tracking

In: Proceedings Speech '88, 7th FASE Symposium, Edinburgh, 22-26 August, 1988,113-120

The traditional method of estimating pitch in speech is first to determine a suitablemeasurement criterion, which reflects the harmonic match between a pitch candidateand the speech signal in the time window, and then to optimize this criterion. Lo­cal optimization methods choose the fundamental frequency with the best harmonicmatch for each frame and therefore treat all frames independently. However, thepitch in consecutive frames is strongly correlated. Therefore single-side methods usethe pitch in previous frames to help the estimation procedure, whereas the pitch valuesof all frames are interdependent in global optimization methods. Such global methodsoften combine the measurement criterion with a smoothness criterion that reflects thedeviation of the pitch from a predicted value. The article describes and compares thethree methods (local, single-side and global optimization) and evaluates their results.

P.637 D.J. Hermes

Vowel-onset detection

In: Proceedings Speech '88, 7th FASE Symposium, Edinburgh, 22-26 August, 1988,787-793

Various fields in speech research show that vowel onsets are perceptually among themost important phonetic speech-signal events. An algorithm will be presented thatcorrectly predicts the large majority of vowel onsets in running speech. It is based onthe simple assumption that vowel onsets are characterized by the occurrence of rapidlyincreasing resonance peaks in the amplitude spectrum. Based on the neurophysiologyof the auditory nervous system, arguments will be presented that identify vowel onsetswith those speech segments that induce a strong short-term adaptation, and point tothose parts of the syllables where most of their significant information is concentrated.Implications for speech analysis will be indicated.

134

Page 132: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

P.638 J.H. Eggen

The evaluation of speech quality resulting from differences in speech-coding schemes

In: Proceedings Speech '88, 7th FASE Symposium, Edinburgh, 22-26 August, 1988,1203-1208

This paper discusses the evaluation of speech quality resulting from differences inspeech-coding schemes. Articulation tests are not sensitive enough to discriminatespeech samples of high intelligibility but noticeable differences in quality. One wayto improve the sensitivity of such tests is to present the speech stimuli in noise.We used speech as the interfering noise. The speech interference test (Nakatani &Dukes, 1973) provides a functional and sensitive method for measuring speech quality.The quality of a speech sample is expressed as a quality factor Q. We made somemodifications to the speech interference test. The method of constant stimuli wasreplaced by a 'simple up-and-down' adaptive procedure for measuring the speechinterference threshold. With monosyllabic eve words as test material, we used bothan articulation test and a newly developed monosyllabic adaptive speech interferencetest (MASIT) to evaluate the quality of nine different speech-coding techniques. Thisset was supposed to be a representative sample of the various speech-coding techniquescurrently used for the analysis, manipulation and resynthesis of speech in the Dutchspeech-research laboratories. Our results indicate that MASIT is, indeed, much moresensitive in discriminating high-quality speech samples than the standard articulationtest. However, the results suggest that the quality factor Q has no general validity asa measure of speech communication quality.

P.639 A.M.L. van Dijk-Kappers

Temporal decomposition of speech: compactness measures compared

In: Proceedings Speech '88, 7th FASE Symposium, Edinburgh, 22-26 August, 1988,1343-1350

Speech production can be considered as a sequence of overlapping articulatory ges­tures, each of which may be thought of as a movement towards an ideal, but oftenunattained, articulatory position corresponding to a given phoneme. These articula­tory gestures result in spectral variation in the acoustic speech signal.Atal (1983) has proposed a method for speech coding based on so-called temporaldecomposition of speech into a sequence of target functions and associated targetvectors. The first may correspond to articulatory gestures and the latter to ideal ar­ticulatory positions. Although developed for economical speech coding, this methodalso provides an interesting tool for deriving phonetic information from the acousticspeech signal. In this paper we are especially interested in the correspondence be­tween target functions and phonemes. Within our modified and extended version ofthe temporal decomposition method, we can use several compactness measures to de­termine the target functions. Here we will describe the performance of three of thesemeasures, one of which will be the original compactness measure of Atal.

135

Page 133: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

P.640 J. 't Hart

Spraakgeluid (The sound of speech)

In: M.P.R. van den Broecke (Ed.): Ter Sprake. Dordrecht: Foris Publications, 1988,40-72

'Spraakgeluid' (The sound of speech) is a contribution to a book about speech meantto be comprehensible to a fairly wide public. The chapter on the sound of speech triesto describe just so much of the characteristics and properties of sound in general asis necessary for understanding the acoustic composition of speech in terms of spec­tral representations of vowels, diphthongs and consonants, in isolation as well as inconnected speech.

P.641 R. Collier

Spraakmelodie (Speech melody)

In: M.P.R. van den Broecke (Ed.): Ter Sprake. Dordrecht: Foris Publications, 1988,124-131

Speech melody is an intriguing feature of spoken language because it is associated withthe great expressiveness of the human voice. The paper describes research that aimsat unravelling those physical changes in the fundamental frequency of speech whichare important for the perception of speech melody. In the second instance, attentionis paid to the communicative relevance of perceptually salient pitch variations.

P.642 H.J. Bullinger, E.N. Protonotarios, D.G. Bouwhuis and F. Reim

Information technology for organizational systems. Concepts for increased competi­tiveness

In: H.J. Bullinger, E.N. Protonotarios, D.G. Bouwhuis and F. Reim (Eds): Informa­tion technology for organizational systems. Proceedings of the First European Confer­ence on Information Technology for Organizational Systems-EURINFO '88, Athens,Greece, 16-20 May, 1988. Amsterdam: North Holland, 1988

This is a collection of papers presented at the EURINFO '88 congress in Athens, 16-20May, 1988. The objective of EURINFO '88 was to initiate a broad dialogue betweenthe Providers and Users of Information Technology. The book is divided into threesections.Information Management is devoted to organizational systems and the impact ofinformation technology on offices and logistic systems. Special emphasis is given tosocial aspects and user centred design.Advances in Information Technology gives a survey of Local and Wide AreaCommunication Networks, of Information System Development and on User Interfacesand Human-Computer Interaction.Applications highlights systems that have been designed for specific purposes ina number of fields. These encompass Health Industry, Public Administration andServices, Education, Banking, Insurance and Production. In this section there arealso treatments of Emerging Informat.ion Technology for small enterprises and forpeople with disabilities.

136

Page 134: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

P.643 D.J. Hermes and J. 't Hart

Visual feedback of intonation for the deaf by means of automatically stylized contours

In: Research News - International Journal of Rehabilitation Research, 1987, 10(4),457-458

A short description is presented of the research project that, aims at solving someproblems encountered when deaf people are provided with visual feedback of theintonation of their speech. The directions in which the solutions of these problems aresought, viz. close-copy stylization and detection of vowel onsets, are briefly indicated.

P.644 J.A.J. Roufs, M.C. Boschman and M.A.M. Leermakers

Visual comfort as a criterion for designing display units

In: G.C. van del' Veer and G. Mulder (Eds): Human-computer interaction: Psycho­nomic aspects. Springer Verlag, 1988,53-74

Scaling of reading comfort, eye-movement characteristics and performance speed aremeasured during a visual search task on a VDU. They are highly correlated andsufficiently sensitive to physical screen parameters to be used as design criteria. Lu­minance contrast, physical sharpness and character fonts are shown to be importantparameters.

P.645 F.L. van Nes

The legibility of visual display texts

In: G.C. van del' Veer and G. Mulder (Eds): Human-computer interaction: Psycho­nomic aspects. Springer Verlag, 1988, 14-25

The legibility of a text, on paper or VDTs, is determined by text properties thatinfluence the visual reading processes, by influencing the reader's search for certaintext parts and their subsequent recognition. Such properties are: (1) layout, e.g.line lengths and distances; text density; the make-up of tables; (2) colour, i.e. thenumber of different letter and background colours on a screen and the related generaland specific effects on reading, such as perceptual grouping and accentuation; (3)typography, i.e. letter type and font. From the description of the influence of textproperties on legibility, some rules can be deduced which aim at optimizing displaylegibility.

P.646 A.J.M. Houtsma, W.M. Wagenaars and T.D. Rossing

Auditory demonstrations: Een serie audio demonstraties over het menselijk gehoorop CD (Auditory demonstrations: A series of audio demonstrations about hearing onCompact Disc)

In: Nederlands Akoestisch Genootschap, 1988, 92, 65-70

In 1978 a set of auditory demonstration tapes was prepared at the Laboratory of Psy­chophysics at Harvard University under the supervision of Prof. David M. Green, withthe cooperation of several of his colleagues in the USA and the Netherlands. Based onthe popularity and success of this tape set, a new series of recorded demonstrationson auditory phenomena has been prepared. A total of 39 demonstrations, grouped

137

Page 135: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

under 7 subjects, has recently become available on Compact Disc. The disc comeswith a booklet in which each subject is introduced, each demonstration explained indetail and modern literature references are provided on each topic.

P.647 A.J.M. Houtsma, G.J. Kleinhoonte van Os, A.J. van der Kolk, N.A.M. Merks, A.Verheijen and J.J. Vlaskamp

Proposal for amendment on IEC 268-13: Listening tests on loudspeakers

In: Committee Methodology Loudspeaker Testing (CML): Report of a series of discus­sions to improve listening test procedures for evaluation of loudspeakers for domesticuse. The Hague, March 1988

The IEC 268-13 recommendation, developed in the early 198013, deals with measure­ments and tests on loudspeakers. The section on listening tests has become inade­quate because of the rapid technical development of loudspeaker systems which canvary from directly-radiating to olllnidirectional systems. This report covers a seriesof proposed amendments to the most recent IEC 268-13 version of 1985. The amend­ments were drafted by a panel of Dutch, Belgian and English sound specialists comingfrolll acoustic research, loudspeaker industry and the Dutch Consumers Union.

P.648 D.R. Gentner, S. Larochelle and J. Grudin

Lexical, sublexical, and peripheral effects in skilled typewriting

In: Cognitive Psychology, 1988, 20, 524~548

It is generally accepted that expert typewriting performance is strongly affected bythe sequence of letters being typed, but there is controversy about the importance ofunits larger than single letters, such as digraphs or words. We studied expert typiststranscribing prose texts and random words. Analyses of interstroke intervals demon­strated the presence of digraph frequency, word frequency and syllable boundaryeffects in addition to the expected effects of movement difficulty. Word frequency andsyllable boundary effects function primarily at the perceptual level, whereas digraphfrequency and physical difficulty effects function primarily at the motor level.

P.649 M.J. van del' Vlugt

Spraakgeluid en woordherkenning: het relatieve gewicht van het begin en eind vaneen gesproken woord (Speech sound and word recognition: the relative weight of theinitial and final part of a spoken word)

Doctoral dissertation, Eindhoven University of Technology, December 1987

The aim of the research reported on in this thesis was to increase our knowledge of theprocesses underlying the recognition of spoken words by humans. More in particular,the object was to gain insight into the degree to which the word recognition processis more sensitive to word-initial than to word-final speech sounds. In a number ofexperiments an attempt has been made to answer this question.The first chapter describes the indications to be found in the literature which haveled many researchers to assume the word recognition process to be more sensitive tospeech sounds from the beginning of a word than to speech sounds later in the word.It is argued that the results of the research discussed could also be explained from thepresence or absence of listeners' uncertainty about the position of the speech sounds

138

Page 136: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

that were audible in the word to be recognized.The second chapter goes into the recognition of words of the consonant-vowel-conso­nant (CVC) structure. These words are of interest because they contain little redun­dancy on the lexical level as compared to longer words. In a recognition experimentperformed with CVC words, no indication was found for a special role in the recogni­tion process of speech sounds from the beginning of a word. The possibility that thespeech sounds from the beginning of a word do playa special role in the recognitionof longer words cannot be excluded.Chapter three concentrates on the recognition of longer polymorphematic words. Thecontributions of prefixes and suffixes to word recogniton were compared in an ex­periment. These affixes carried widely varying amounts of lexical information andwere added to somewhat degraded monomorphematic word stems in Dutch syntheticspeech. Although there was a strong effect of lexical information on word recognition,no difference was found between the contributions of prefixes and suffixes.The recognition of longer monomorphematic words is the issue in chapter four. Inan experiment, the effects of masking either initial or final parts of polysyllabic andmonomorphematic synthesized Dutch words with noise were compared. The amountof lexical redundancy carried by initial and final parts of words was the same. Theresults of the experiment showed that words were recognized equally well, no matterif initial parts or final parts were masked.The final chapter combines the results of the experiments of the previous three chap­ters and claims that speech sounds later in a word can contribute as much to recogni­tion as speech sounds from the beginning of a word, if two conditions are satisfied: (1)the word must not yet have been recognized at, the moment when speech sounds laterin the word become audible, because then it is trivial that those speech sounds do notcontribute to recognition, and (2) there must be no uncertainty about the positionthat the intelligible speech sounds should occupy in the word to be recognized. Themain conclusion was therefore that the word recognition process is not more sensitiveto speech sounds from the beginning of a word than to speech sounds later in theword.

P.650 H. Bouma

Hoe sturen we de techniek in de richting van ouderen? (How to steer technologytoward the service of elderly people?)

In: Symposiumboek Techniek - Vergriizing: Hoe f KIVI, 1988, 82-86

A survey is provided of factors that determine the choice of industrial products andservices. Next, aspects are analysed in which the elderly population differs from thegeneral population. The increasing proportion of elderly people calls for reconsidera­tion of their influence on the choice and design of products and services. Concertedactions of assertive elderly people may themselves well prove to be a most effectivemeans of catalysing the necessary readjustment.

P.651 N.J. Willems, R. Collier and J. 't Hart

A synthesis scheme for British English intonation

In: Journal of the Acoustical Society of America, 199, 84(4), 1250-1261

A synthesis scheme is proposed that provides British English utterances with a va­riety of acceptable artificial Fo contours. It is based on the acoustic analysis of alarge corpus of (semi-) spontaneous utterances and on the perceptual evaluation of

139

Page 137: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

synthetic Fa contours that have been stylized and standardized. The scheme consistsof three parts: (a) an explicit description of the perceptually relevant Fa changes,i.e., the pitch movements as found in the corpus; (b) combination rules that specifythe possible sequences of pitch movements in contours at the clause level; and (c)rules of sequence that govern the concatenation of pitch contours at the level of thesentence. The algorithm has been implemented in the form of a computer programthat provides any input utterance with fully specified Fa values which can be mergedwith its other source parameters in a (linear predictive code) LPC speech file. Thusthe program can be incorporated in a text-to-speech system or be used as a researchtool. A representative sample of synthetic Fo contours, generated by the program, hasbeen evaluated in a formal listening experiment with 30 native British subjects. Theresults indicate that the artificial pitch contours sound as acceptable as their naturalcounterparts.

P.652 J.G. Beerends and A.J.M. Houtsma

The influence of duration on the perception of single and simultaneous two-tone com­plexes

In: H. Duifhuis, J.W. Horst and H.P. Wit (Eds): Basic issues in hearing: Proceedingsof the 8th International Symposium on Hearing, Paterswolde. London: AcademicPress, 1988, 380-385

The influence of duration on the perception of pitch in complex tones was quantifiedby measuring the 'Goldstein sigma' function. An important experimental finding wasthat subjects tend to switch to the analytic mode of pitch perception when stimuli areshortened. Identification experiments with two simultaneous pitches involved muchless switching from the synthetic to analytic mode of listening. Analysis showed thatshortening of the complex tones first affected identification performance for the lesssalient pitch. This effect was already noticeable with durations of about lOOms, inwhich case the identification of single notes was hardly affected.

P.653 J.A.J. Roufs and R.M. Smith

Classified bibliography on brightness-luminance relations

In: CIE Technical Committee Report, 1988, 78

The bibliography is a result of renewed activity of the CIE in the field of brightnessand brightness contrast as the psychological attributes and their relation to luminance.About 850 titles have been gathered up to 1986. The bibliography consists of twoparts:

1. an alphabetical part listing all first authors for quick reference;

2. a classified part in 9 classes agreed on at the plenary meeting in Amsterdam(1983).

Special codes refer to the status of the documentation as verified on the basis of theoriginal, on the basis of abstracts, etc. The bibliography is available at the bureau ofthe CIE in Vienna.

140

Page 138: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

P.654 J.A.J. Roufs

Brightness-luminance relations

In: CIE Journal, 1988, 7(2),50

Brightness and brightness contrast are probably the most important and traditionallycherished perceptual attributes for the lighting engineer. However, their relation toluminance is complicated. OlE has recently renewed its activities in this field. Thesubject was raised again in technical committee TO 1.4 (Vision) and passed on to aspecial committee 'Brightness-luminance relations'. It was decided to start an updat­ing analysis of the 'state of the art' with a classified bibliography. This was completedin 1987 under the title 'classified bibliography on brightness-luminance relations' byJ.A.J. Roufs and R.M. Smith. It contains 850 titles ordered alphabetically and in 9classes.

P.655 F. Bimbot, S.M. Marcus and G. Ohollet

Localisation et representation temporelle d'evenements phonetiques: Applications enet,iquetage, en segmentation et en synthese (Temporal localization and representationof phonetic events: Applications on labelling, segmentation and synthesis)

In: Proceedings JEP, 15e Journees d 'Etudes sur la Parole, Aix en Provence, 27-90Mai 1986, 175-178

Speech is a complex code, for which information units are highly context-dependent.Spectrally stable regions can however often be localized. Nonstationary segments(polysons) involving interaction between two or more units are found between theseregions.Atal proposed a technique for decomposing speech into temporally overlapping events.Originally developed for speech coding, its relationship with phonetic labelling wasonly suggested. Marcus modified Atal's technique in order to obtain robust functionswhich may be more plausibly related to phones.This paper relates Atal's technique to a class of time-dependent models. It demon­strates how the combination of this technique with a spectral stability criterion canbe used as a tool for phonetic decoding of speech segments. Emphasis is placed onapplications to speech synthesis and a framework for developing acoustic rules is pro­posed.

P.656 B.A.G. Elsendooru and H. Bouma

Working models of human perception

In: B.A.G. Elsendooru and H. Bouma (Eds): Working models of human perception.London: Academic Press, 1989.

This book is a record of a workshop organized to celebrate the 30th anniversary of thedistinguished Institute for Perception Research (IPO), Eindhoven, The Netherlands.Research in sensory and cognit,ive information processing by humans covers a vastarray of perceptual faculties. This volume brings together the work of scientists fromdiverse areas of Human Perception research with the aim of assessing the commonrelevance and usefulness of concepts developed in their different fields.The information-processing faculties of hearing, vision, speech perception, reading,learning, recollection, computer control (and many more) occur in a single brain. Itwould seem probable that they have elements in common, and they are considered

141

Page 139: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

here in a single volume with the theme of integration uppermost.The book addresses both theoretical and applied issues and is aimed at all concernedwith speech, hearing, vision, reading, phonetics, language, computing - the spectrumof information processing and perception sciences.

P.657 A.J.M. Houtsma

Some remarks on Adrian Fourcin's Lecture about 'Links between voice pattern per­ception and production'

In: B.A.G. Elsendoorn and H. Bouma (Eds): Working models of human perception.London: Academic Press, 1989, 93-99

Three aspects of Adrian Fourcin's lecture are commented upon. One concerns therole of articulatory features in auditory speech discrimination or identification exper­iments. Another concerns the indispensable influence of auditory feedback on theproduction of speech and song. A final comment deals with the necessity of simpli­fying information that is derived from speech of a hearing-impaired speaker and fedback in the form of a visual substitute.

P.658 S.G. Nooteboom

Speech coding, speech synthesis and voice quality

In: B.A.G. Elsendoorn and H. Bouma (Eds): Working models of human perception.London: Academic Press, 1989, 127-138

The analysis and resynthesis of speech, whether used for on-line speech coding or as atool for studying speech perception and testing ideas on the generation of high-qualityartificial speech, will ultimately profit from more basic insights into the nature of voicequality, and particularly speaker-dependent differences in voice quality. Such insightsare to a large extent still lacking and therefore will have to be obtained.This paper mentions some factors that might contribute to perceived differences invoice quality, and that seem worthwhile topics for further perceptual study. One isthe slight disturbances in phase angles from period to period, potentially giving avoice a somewhat rough or raspy character. Another is the multplicative noise causedby the air-flow pulses during phonation. A third is a more-or-Iess constant noise dueto glottal leak.The paper also pleads for the assessment of the perceptual limits of phase perceptionin complex, speech-like sounds, and the study of the nature of perceptual integrationand disintegration of such complex sounds that have both harmonic and noisy com­ponents.Further explorations of the acoustic correlat,es of voice quality might profit from real­istic models of the phonating glottis, including its coupling to the vocal tract.

P.659 D.G. Bouwhuis

Reading as goal-driven behaviour

In: B.A.G. Elsendoorn and H. Bouma (Eds): Working models of human perception.London: Academic Press, 1989, 341-362

Reading is routinely considered to be a largely automatic activity for which the con­ventional media are perfectly adequate. The evolution of reading shows that many

142

Page 140: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

aspects of the reading process have been su bject to strong selective pressure beforeleading to the current, mostly widely distributed forms of script.But even within existing alphabetic systems many features seem at first sight coun­terproductive to the efficient and automatic processing that reading apparently is. Itis suggested here that reading coevolved with its reading environment; print script,lines, pages, books and newspapers.Subtle changes in the reading environment such as necessitated by electronic displaysystems may, therefore, fundamentally affect reading performance. In this way fun­damental processing aspects and ergonomic aspects of reading seem to be intricatelymixed. It is proposed that a better insight into the reading process can be gainedby modelling it more globally as an information-processing task with a specific goal.Typical properties of the reading strategy can be identified from this model and un­derstood in terms of a detailed goal-driven activity.

P.660 H. Duifhuis

Current developments in peripheral auditory frequency analysis

In: B.A.G. Elsendoorn and H. Bouma (Eels): Working models of human perception.London: Academic Press, 1989, 59-65

Research has been carried out from both physiological and biophysical points of viewinto how the peripheral ear processes sound. This paper tries to give an indication ofwhere these two approaches converge and where new developments in the near futureare to be foreseen.

143

Page 141: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Papers accepted for publication

MS.570 F.J.J. Blommaert

Early-visual factors in letter confusions

To appear in: Spatial Vision

For the purpose of quantifying models of letter recognition, similarities are often spec­ified in terms of stimulus properties. In this paper an approach based on similaritiesbetween internal letter representations or internal letter images is advocated, i.e. it isargued that optical and retinal factors playa more prominent role in letter confusionsthan is usually assumed. To illustrate this, letter images were calculated on the ba­sis of earlier experimentally determined point-spread functions (Barbur & Ruddock,1980; Blommaert & Roufs, 1981; Blommaert, Heynen & Roufs, 1987). Next, dataon confusion matrices from Bouma (1971) were taken to evaluate different measureswhich might be useful in quantifying similarities between internal letter representa­tions. In the analysis of experimental data, Luce's (1959, 1963) choice model hasbeen used. It was found that if similarities were expressed in terms of differencesbetween image contours, a fair first-order approximation of Bouma's experimentalresults could be formulated (overall correlation coefficient of 0.95). Other measures,like correlations between spatial frequency spectra of letter images were found to beless successful. The method used provides a means to relate quantitatively stimulusfeatures and optical and early-visual factors to letter confusions.

MS.582 P.G. Vos and H.H. Ellermann

Precision and accuracy in the reproduction of simple tone sequences

To appear in: Journal of Experimental Psychology: Human Perception and Perfor­mance

Four experiments investigated the precision and accuracy with which amateur musi­cians are able to reproduce sequences of tones varied only temporally, so as to havetone and rest durations constant over sequences. The tempo varied over the musi­cally meaningful range of between 5 and 0.5 tones per second. Experiments one andtwo supported the hypothesis of attentional bias towards having the attack moments,rather than the departure moments, precisely timed. Experiment three corroboratedthe hypothesis that inaccurate timing of short inter-attack intervals is manifested ina lengthening of rests rather than tones, as a result of more motor activity during thereproduction of rests. Experiment four gave some support to the hypothesis that theshortening of long inter-attack intervals is due to mnemonic constraints, affecting therests rather than the tones.Both theoretical and practical consequences of the various findings are discussed, III

particular with respect to timing in musical performance.

MS.583 J.G. Beerends and A.J.M. Houtsma

Pitch identification of simultaneous diotic and dichotic two-tone complexes

IPO annual progress report 23 1988

144

Page 142: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

To appear in: Journal of the Acoustical Society of America

This study examines subjects' ability to recognize the pitches of two missing fun­damentals in two simultaneous two-tone complexes whose partials are distributed invarious ways between subjects' ears. The data show that identification performanceis affected on different levels. Limited frequency resolution in the peripheral auditorysystem can degrade performance but only if none of the four stimulus partials is au­rally resolved. Identification performance is only weakly dependent on the mannerof distributing partials between the ears. In some cases it was found that, probablyat a very central level (e.g. attention), the identification processes of both simulta­neous pitches interfere with one another. Some subjects are more likely to identifythe pitch of one two-tone complex when the harmonic order of the other complex ishigher than when this harmonic order is lower. Finally, some subjects tend to hearthe complex tones analytically, i.e., perceive pitches of single partials instead of themissing fundamentals for some distributions of partials between the ears.

MS.587 A.C. den Brinker and J.A.J. Roufs

Nonlinear parameter estimation applied to psychophysically measured impulse re­sponses

To appear in: IEEE Transactions of Biomedical Engineering

A technique is presented for the estimation of the impulse response, based on datafrom a psychophysical experiment on threshold vision. A two-step method is used forthe estimation of the model parameters. The first step is a Hankel matrix approach,the second an unweighted least-squares method. Results of this estimation techniqueare presented. The model with the estimated parameters corroborates other psy­chophysical data. The estimates obtained are adequate for the intended purposes ofsimulation and modelling.

MS.603 R.J.H. Deliege

An experimental Dutch keyboard-to-speech system for the speech-impaired

To appear in: Speech Communication

An experimental Dutch keyboard-to-speech system has been developed to explore thepossibilities and limitations of Dutch speech synthesis in a communication aid for thespeech-impaired. The system uses diphones and a formant synthesizer chip for speechsynthesis. Input to the system is in pseudophonetic notation. Intonation contoursusing a declination line and various rises and falls are generated, starting from aninput consisting of punctuation and accent. marks. The hardware design has resultedin a small, portable, battery-powered device. A short evaluat.ion with potential usershas been carried out, which has shown possibilit.ies for such a device but has alsoindicat.ed some problems with the current pseudophonetic input.

MS.605 S.C. Nooteboom and M.J. van der Vlugt

A search for a word-beginning superiority effect

To appear in: Journal of the Acoustical Society of America

This paper reports two experiments examining whether or not auditory word recog-

145

Page 143: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

nition is more sensitive to word-initial than to word-final stimulus information. In afirst experiment the contributions of prefixes and suffixes t,o word recognition werecompared. These affixes carried widely varying amounts of lexical information, andwere added to somewhat degradedmonomorphematic word stems in Dutch syntheticspeech. Although there was a strong effect of lexical information on word recognition,no difference was found between the contributions of prefixes and suffixes. In a secondexperiment the effects of masking with noise either initial or final parts of polysyl­labic and monomorphematic synthesized Dutch words were compared. The amountof lexical redundancy carried by initial and final parts of words' was the same. Againno difference was found. We conclude that the process of lexical activation duringspoken-word recognition is equally sensitive to word-initial and word-final stimulusinformation. A special role of word onsets remains, because these can ensure propertemporal alignment between stimulus and candidate word forms.

MS.610 L.L.M. Vogten and E. Berendsen

Review of 'From Text to Speech: The MITalk System' by J. Allen, M.S. Hunnicut,D. Klatt, R.C. Armstrong and D.E. Pisoni

To appear in: Journal of Phonetics

Book review.

MS.612 H.C. van Leeuwen

A development tool for linguistic rules

To appear in: Computer, Speech and Language

Linguistic rewrite rules are very popular in phonology, as they are very suitable fordescribing phonological processes. These rules can also be used for string manipulationsuch as grapheme-to-phoneme (spelling-to-sound) conversion, which is needed in text­to-speech systems.In this paper a tool is presented with which one can develop and test a set of linguisticrules which together define a scheme to convert an input string to an output string.One possible application is the design of a grapheme-to-phoneme conversion system.The system is approached from the point of view of linguists, since they are the mainusers of such a system. The linguist has to specify conversion data which, togetherwith the development system form a conversion system. First the basic system isdiscussed, viz. the format in which the linguist must present the information, followedby a discussion of how the system supports the development of linguistic rules.A special characteristic of the system is that input-to-output relations are preserved.Given a set of rules which defines a grapheme-to-phoneme conversion, the systemcan be used as an analysis tool for statistics on grapheme-to-phoneme relations. Thepaper is concluded with the discussion of some additional characteristics of the system,which are compared to those of some other systems, and a survey of the applicationsin which it is used.

MS.613 J.B.a.S. Martens and G.M.M. Majoor

The perceptual relevance of scale-space coding

To appear in: Signal Processing

146

Page 144: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

In our research on image-coding algorithms we have adopted the following startingpoints. First, processing by coding algorithms should, as closely as possible, matchwhat we know about the human visual system. Second, owing to the lack of accept­able objective criteria, proper evaluation of coding algorithms and parameter settingsrequire perceptual experiments.In this paper we summarize the so-called scale-space model and describe its applica­tion to image coding. In the scale-space model an image is passed through Gaussianfilters of decreasing bandwidth. The variation between successively filtered responsesis very systematic, so that little information is required to pass between them. Start­ing from a low bandwidth version of the original image, we make a prediction for aversion with a higher bandwidth. Only the prediction errors need be transmitted torecover this higher resolution picture. The process is repeated at a number of levels(called scales) in order to arrive at the original image. For data-reduction purposes,several approximations of these prediction errors can be studied. Evaluation of theresulting coded images is done by means of perceptual experiments. It is also shown inthis paper that a one-to-one correspondence can be established between the differentstages of the scale-space coder and a well-known model of the human visual systemthat is based on psychophysical data..

MS.614 M.M. Taylor, F. Neel and D.G. Bouwhuis

Introduction: Dialogue and multimodal dialogue

To appear in: M.M. Taylor, F. Neel and D.G. Bouwhuis (Eds): The Structure ofMultimodal Dialogue. Amsterdam: North Holland, 1988

This introduction provides a survey of the main themes of the book. They are:

• Prologue: Dialogue and Useful Metaphors

• User Models and Belief Structures

• Discourse Structure and Processing

• Parallel Communication

• Properties of Human Dialogues

• Applications and Architectures

• An Integrative Part as an Overview

Five sound demonstrations on a sound sheet are included in the book. They coveraspects of speech synthesis, three examples of human communication with intelli­gent natural language enquiry systems, with Voice I/O and an example of immigrantconversation.

Global conclusions that can be drawn on the basis of the contributions are that

a) there is little experimental work on multimodal dialogue, i.e. other ways of in­teraction than with voice only;

b) there are many relevant disciplines and they address only part of the dialogueproblem.

This means that the contributions have not resulted in a unified view of multimodaldialogue, which is still the most natural to us. Yet the complementarity and multi­faceted nature of many approaches imply that many basic tools are available and canbe effectively combined to gain a better understanding of such dialogues.

147

Page 145: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

MS.615 H.E.M. Melotte and J.J. Neve

Visuele hulpmiddelen bij slechtziendheid (Visual appliances for partially-sighted)

To appear in: J.J. Vos en Oh.P. Legein (Eds): Dog en Werk. Een ergoftalmologischewegwijzer. 's Gravenhage: SDU uitgeverij

'Partially sighted' is interpreted as being a situation in which the visual performance ofthe person in question cannot be improved, either by ophthalmological care or by usingconventional spectacle corrections or contact lenses, in such a way as to satisfy personalneeds or those of society. A broad classification on the basis of possible visual-fielddefects (no defect, central and paracentral defect, peripheral defect) has been adopted.The usefulness of visual appliances such as magnifiers, telescopic systems, COTVs andillumination is discussed in relation to the presence or absence of visual-field defects.

MS.617 J.H.D.M. Westerink and J.A.J. Roufs

Subjective image quality as a function of viewing distance, resolution and picture size

To appear in: SMPTE Journal

This paper describes two experiments on the subjective quality of complex scenes.Slide projections were used as stimuli and varied in viewing distance, resolution andpicture size. The subjective quality was judged by a group of twenty subjects bymeans of categorical scaling.The results of the experiments show that the (angular) resolution, expressed in periodsper degree, and the picture angle spanned by the display each have an independentinfluence on the quality. Subjective quality improves with resolution, but saturates ata resolution (6dB cut-off frequency) of approximately 25 periods per degree. Thereis also a linear relationship between the subjective quality and the logarithm of thepicture angle.In the discussion, these results are compared with those of a number of experimentsknown from literature. The results are also interpreted in terms of consequences forHigh-Definition TV.

MS.620 R.J.H. Deliege, LM.A.F. Speth-Lemmens and R.P. Waterham

Development and evaluation of two speech communication aids

To appear in: Journal of Medical Engineering and Technology

In this paper, two projects (Tiepstem and Pocketstem) on the possibilities of speechcommunication aids for the speech-impaired will be discussed. Since the group ofspeech-impaired people is very diverse, the two projects differ in target group, inthe speech technology used and consequently in the complexity of input of the aids.Experimental Tiepstem and Pocketstem models and Pocketstem prototypes have beenrealized. All models were evaluated with potential users. This paper describes theaids, their evaluation, the evaluation results and some conclusions and future plans.

MS.632 J. Smurzynski and A.J.M. Houtsma

Auditory discrimination of tone-pulse onsets

To appear in: Perception and Psychophysics

148

Page 146: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Two experiments are reported in which difference limens (DLs) were measured foronset times of a 1000-Hz tone pulse using an adaptive 2AFC procedure and (mostly)well-trained subjects. In the first experiment DLs were measured for the rise timeof linear onset ramps at rise-time values of between 10 and 60ms. The DLs followWeber's law up to a rise time of about 50ms, and do not support the notion thatrise times are perceived in a categorical manner. In the second experiment, DLs wereobtained for linear, exponential, and raised-cosine onset envelopes at rise-time valuesof between 10 and 40ms. When energy differences in the critical band around 1000Hzare computed for just-discriminable onsets, values of between 0.7dB (lOms rise time)and 0.3dB (40ms rise time) are found. These 'equivalent intensity' DLs show the same'near miss to Weber's law' behaviour as intensity DLs for pure tones.

MS.634 J.J. Neve

Reading with hand-held magnifiers

To appear in: Journal of Medical Engineering and Technology

In addition to retinal magnification, the reading field is an important concept forunderstanding the influence of the use of hand-held magnifiers on the reading pro­cess. Three possible horizontal reading fields can be distinguished: the monocularreading field, the binocular reading field and the composite reading field. In order tolearn about the strategies that subjects use when reading with the aid of a magni­fier, magnifier displacement was measured while subjects read texts under conditionsthat provided a variety of reading-field widths and text widths. It was found thatindividual subjects use different strategies (i.e. they use the monocular, binocular orcomposite reading-field width).In a following experiment for a large number of reading magnifiers, the monocular fieldwidth was determined experimentally. In this experiment subjects were asked to de­fine that part of the reading-field width which is free of aberrations. From these datathe optimal width for reading magnifiers with different focal length can be specified.

MS.636 J. 't Hart, R. Collier and A. Cohen

A perceptual study of intonation: an experimental-phonetic approach to speechmelody

Cambridge: Cambridge University Press

This book is meant to give a survey of the work on intonation (mainly of Dutch, butalso of British English), done at IPO for almost 25 years to date. After a chapterdealing with the phonetic aspects of intonation, viz. physiological, acoustical andperceptual, the general framework of our perceptually oriented experimental analysisof intonation is sketched in an exhaustive and explicit account of the experimentalprocedures used. Chapter 4 presents a theory of intonation in a number of proposi­tions, to each of which the supporting experimental evidence is given. Chapter 5 isdevoted to declination, chapter 6 examines the extent to which the results for Dutchintonation can be further generalized. The following chapter is about existing as wellas potential applications. A brief final chapter looks back, by way of evaluation, atthe aims and problems formulated in the introductory chapter. One important con­clusion is that the perceptual study of intonation offers a way out of the multitudeof linguistic and phonetic problems that apparently accompanied intonation researchin the past. Our ultimate aim being to try to understand the role of intonation inspeech communication, we have considered it our first task to unravel the perceptual

149

Page 147: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

structure of intonation captured in a model of the listener.

MS.637 R.J. Beun

The recognition of declarative questions

To appear in: Journal of Pragmatics

In this paper I will discuss how questions of a declarative sentence type can be rec­ognized in isolation and in natural dialogue. Declarative questions were taken fromtelephone dialogues where subjects tried to get information from an informant atAmsterdam airport. In previous experiment.s these questions were isolated from theoriginal context and presented on tape to subjects together with a number of answers.A disadvantage of this method is that it is impossible to distinguish the influenceof prosodic indicators from that of linguistic ones. Here an experiment is describedwhere utterances were presented on a screen to eliminate prosodic characteristics andto concentrate on linguistic indicators only. The occurrence of certain pragmatic par­ticles plays a decisive role in the interpretation by the subjects of the declarative as aquestion.

MS.643 J.M.B. Terken and G. Lemeer

Effects of segmental quality and intonation on quality judgments for texts and utter­ances

To appear in: Journal of Phonetics

The appreciation of natural and synthesized dull sentence intonation was measuredin speech with good and poor segmental qualit,y, in texts and individual utterances.Natural intonation in texts was preferred to dull intonation both in speech with goodand with poor segmental quality. In individual utterances, natural intonation waspreferred to dull intonation in speech with good segmental quality, but not in speechwith poor segmental quality. It is concluded that, as the quality of synthetic speech issteadily improving, listeners will make higher demands on the naturalness of syntheticintonation.

MS.645 H. Bouma

Goed leesbaar drukwerk op papier en beeldscherm (Legible print on paper and display)

To appear in: J.J. Vos en Ch.P. Legein (Eds): Dog en Werk. Een ergoftalmologischewegwijzer. 's Gravenhage: SDU uitgeverij

A survey is given of certain factors that determine the legibility of printed matter. Thenecessity for making such factors explicit has increased since many designers of printand screen layout have no background in graphics as it has developed over centuriesthrough trial and error. Perceptual research on reading processes has provided a firmbackground. Factors considered are: letter configurations (fonts), letter size, luminouscontrast, the use of colour, layout and illumination.

150

Page 148: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

MS.649 J.J. Neve

Some characteristics of hand-held magnifiers

To appear in: Proceedings of the Workshop on Applications of Technology in LowVision, London, 11-13 July 1988

The simple hand magnifier (or loupe) is one of the most widely used low-vision aids.In most cases they are used for reading.Low-vision patients have several complaints relating to the use of these magnifiers, forinstance the amount of magnification, the field of view, the quality of the image, theweight of the loupe and the fact that the magnifier has to be moved in order to readtext. Obviating those complaints is not as easy as it may seem to be, since they aremore or less inherent in the use of a magnifier. In order to get a better understandingof the difficulties encountered in using a magnifier, we focus our attention on thephysical quantities which seem to be of major importance, such as the magnificationof the retinal image, the field of view (or reading-field width) and the aberrations ofthe image.

MS.650 H.H. Ellermann and G.W.G. Spaai

Educatieve software voor het aanvankelijk lezen (Educational software for initial read­ing)

To appear in: J. Klep en P. Kommers (Eds): Didactische Systeemanalyse

A very detailed description of learning environments is necessary if the aim is tomake good educational software. In the literature the environments are not definedin sufficient detail for this purpose. Therefore a number of decisions which cannot bebased on the relevant literature have to be made during the process of implementation.Another problem is that there is not enough knowledge available as to the possibilitiesand difficulties that are associated with the use of computers in education. In thispaper both issues will be discussed for the domain of initial reading.

MS.653 H.H. Ellermann and G.W.G. Spaai

Lernsoftware fiir Leseranfanger (Educational software for initial reading)

To appear in: Proceedings Symposium Neue Techn£ken zum Erwerb der Schriftsprache- Lesen und Schreiben lernen mit Hilfe computerunterstiitzter Medien, Aachen, den 9.Marz 1988

If the computer is to be used as a supplement to a standard initial reading curriculum,three conditions should be satisfied. First, t,he child should be able to work with thecomputer without too much, or even any, help from the teacher. Second, instructionsand questions should be given by the computer, which therefore has to be equippedwith speech-production facilities. Third, the interaction between the child and thecomputer should be relatively intense. An active role of the child, as contrasting withthe role of the child as a passive recipient of information, is a necessity.The 'leesbord', a system which has been developed at IPO, at least aims at fulfillingthese three requirements. A variety of computer programs has been written to exerciseskills like letter-sound relationships, enhancing the sight vocabulary, learning to readwords in a meaningful context. It became clear, however, that the development ofcomputerized reading exercises requires a detailed definition of reading exercises thatis not currently available. This insight has guided most of the research in the Leesbord

151

Page 149: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

project. Examples of the research and a discussion of the obtained results are given.

MS.656 J.A.J. Roufs and A.M.J. Goossens

The effect of gamma on perceived image quality

To appear in: Proceedings International Display Research Conference, San Diego,1988

The effect of gamma on perceptual image quality and on the strength of its underlyingdimensions brightness contrast and sharpness, have been studied for complex black­and-white still scenes displayed on a TV monitor. Gamma was found to have a scene·dependent optimum value which is higher than expected. This optimum was foundto be uniquely determined by (subjective) brightness contrast. Although (subjective)sharpness is influenced by gamma, experiments show that no interaction is to beexpected in the range of interest. The results also show that luminance contrast ratiois an inadequate measure of brightness contrast.

MS.660 A.J.M. Houtsma

Majeur carillons: mooi of lelijk? (Carillons with major-third bells: good or bad?)

To appear in: PSychologie

This poster-format paper describes the computation of profiles for major-third carillonbells by finite elements. It also presents the results of blind hearing experiments withsuch bell sounds synthesized on a computer.

152

Page 150: Institute for Perception Researchalexandria.tue.nl/tijdschrift/IPO 23.pdf · Institute for Perception Research IPO Annual Progress Report ... Reading processes in the case of magnified

Reprints and preprints of IPO Publications

Single copies of material from this issue of the IPO Annual Progress Report may be madefor personal, noncommercial use. Permission to make multiple copies must be obtainedfrom the Institute for Perception Research. Illustrations may be used only with explicitmentioning of the source.

Requests for reprints or preprints of publications listed above should be addressed to:

LibraryInstitute for Perception ResearchP.O. Box 5135600 MB EindhovenThe Netherlands

ColophonThe following persons contributed to the productionand distribution of this issue of the IPO AnnualProgress Report:

F.L. Engel,

D.J. Hermes,

J.B.O.S. Martens

J .A. Pellegrino

C.G. Basten

A. Smith-Hardy

Ms P.J. Evers

R.L.H.M. Smits

Editors

Design

Illustrations

Language correction

Coordination and distribution

Typing

Printing by the Reproduction and Photography Sec­tion of the Eindhoven University of Technology.

[PO annual progress report 29 1988

153