Top Banner
Journal of Experimental Psychology: Human Perception and Performance 1990, Vol. 16, No. 4, 742-754 Copyright 1990 by the American Psychological Association, Inc 0096-1523/90/$00.75 Duplex Perception: A Comparison of Monosyllables and Slamming Doors Carol A. Fowler Haskins Laboratories, New Haven, Connecticut, and Dartmouth College Lawrence D. Rosenblum University of California, Riverside Duplex perception has been interpreted as revealing distinct systems for general auditory perception and speech perception. The systems yield distinct experiences of the same acoustic signal, the one conforming to the acoustic structure itself and the other to its source in vocal- tract activity. However, this interpretation has not been tested by examining whether duplex perception can be obtained for nonspeech sounds that are not plausibly perceived by a specialized system. In five experiments, we replicate some of the phenomena associated with duplex perception of speech using the sound of a slamming door. Similarities between subjects' responses to syllables and door sounds are striking enough to suggest that some conclusions in the speech literature should be tempered that (a) duplex perception is special to sounds for which there are perceptual modules and (b) duplex perception occurs because distinct systems have rendered different percepts of the same acoustic signal. Liberman and Mattingly (1985, 1989) proposed that speech perception is subserved by a "module" (Fodor, 1983, 1985) distinct from the general auditory system. The phonetic mod- ule fits the defining characteristics of modules outlined by Fodor (1983). There is neurophysiological evidence for a distinct part of the nervous system responsible for language. In addition, listeners have little conscious awareness of speech processing in that they cannot describe speech as sounding like anything else but speech (in contrast to, say, Morse code, which can be heard as both a meaningful message and a series of meaningless "dots and dashes"); that is, speech perception is "cognitively impenetrable." Further, speech processing is "informationally encapsulated"; for example, listeners hear synthetic speech and even highly impoverished "sine-wave speech" (Remez, Rubin, Pisoni, & Carrell, 1981) as speech even though they are aware that such signals are not produced by a human vocal tract. Finally, there is evidence that speech processing is domain specific; that is, it is special to speech. Evidence for this last claim comes from comparisons of the objects of speech perception with ostensible objects of non- speech auditory perception. In speech, dimensions of the percept fail to correspond, in many cases at least, to obvious dimensions of structure in the acoustic speech signal and do correspond to the vocal-tract activity that might have pro- duced the signal (see Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967, and Liberman & Mattingly, 1985, for reviews and discussion of the findings). In contrast, in at least some cases, objects of nonspeech auditory perception correspond more closely to acoustic structure. For example, This research was supported by National Institute for Child and Human Development Grant HD 01994 to Haskins Laboratories. We thank Alvin Liberman for his comments on the manuscript and Bruno Repp for supplying the synthetic-speech stimuli used in the experiments. Correspondence concerning this article should be addressed to Carol A. Fowler, Haskins Laboratories, 270 Crown Street, New Haven, Connecticut 06511. in isolation, a frequency glide will sound to a listener like a pitch guide. However, integrated with an acoustic signal for a consonant-vowel (CV) syllable, it will contribute to the per- ception of a consonant of which the pitch glide is not experi- enced as a part. Moreover, rather different frequency glides that sound quite distinct in isolation may sound like indistin- guishable tokens of a given consonant when they are inte- grated into different syllabic frames. Apparently, they sound alike just when they signal the same consonantal gesture of the vocal tract (Liberman et al., 1967). The acoustic signals for the same gesture are different in different syllables because of coarticulatory overlap of consonant and vowel production. These findings and compatible others led to the develop- ment of the motor theory of speech perception. In the theory, phonetic gestures of the vocal tract, and not acoustic structure per se, are perceptual primitives of speech, whereas acoustic structure itself provides the primitives for auditory perception of other sounds. This implies that perceptual processing of speech must be distinct from general auditory processing, and it must somehow recover the articulatory source of the acous- tic speech signal. According to the motor theory, to help unravel effects of coarticulation on the acoustic signal, lis- teners engage their own speech motor systems in perceiving the speech of others. The talker's acoustic signal serves as the basis for a hypothesis as to the sequence of consonants and vowels that, when coarticulated, would give rise to the signal. In the theory, listeners use their speech motor systems (in Liberman & Mattingly, 1985, an "innate vocal tract synthe- sizer") to test the hypothesis. This gives the percept its motor character. Not unexpected from this perspective are findings that talkers' idiosyncratic methods of articulating some pho- netic segments (Bell-Berti, Raphael, Pisoni, & Sawusch, 1979) appear to affect their perception of those segments as pro- duced by a speech synthesizer. The phenomenon of duplex perception (e.g., Liberman, Isenberg, & Rakerd, 1981; Mann & Liberman, 1983; Rand, 1974) has been interpreted as providing particularly strong evidence in favor of two claims of the motor theory: that 742
13

Duplex perception: A comparison of monosyllables and slamming doors

May 13, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Duplex perception: A comparison of monosyllables and slamming doors

Journal of Experimental Psychology:Human Perception and Performance1990, Vol. 16, No. 4, 742-754

Copyright 1990 by the American Psychological Association, Inc0096-1523/90/$00.75

Duplex Perception: A Comparison of Monosyllables andSlamming Doors

Carol A. FowlerHaskins Laboratories, New Haven, Connecticut, and

Dartmouth College

Lawrence D. RosenblumUniversity of California, Riverside

Duplex perception has been interpreted as revealing distinct systems for general auditoryperception and speech perception. The systems yield distinct experiences of the same acousticsignal, the one conforming to the acoustic structure itself and the other to its source in vocal-tract activity. However, this interpretation has not been tested by examining whether duplexperception can be obtained for nonspeech sounds that are not plausibly perceived by a specializedsystem. In five experiments, we replicate some of the phenomena associated with duplexperception of speech using the sound of a slamming door. Similarities between subjects' responsesto syllables and door sounds are striking enough to suggest that some conclusions in the speechliterature should be tempered that (a) duplex perception is special to sounds for which there areperceptual modules and (b) duplex perception occurs because distinct systems have rendereddifferent percepts of the same acoustic signal.

Liberman and Mattingly (1985, 1989) proposed that speechperception is subserved by a "module" (Fodor, 1983, 1985)distinct from the general auditory system. The phonetic mod-ule fits the defining characteristics of modules outlined byFodor (1983). There is neurophysiological evidence for adistinct part of the nervous system responsible for language.In addition, listeners have little conscious awareness of speechprocessing in that they cannot describe speech as soundinglike anything else but speech (in contrast to, say, Morse code,which can be heard as both a meaningful message and a seriesof meaningless "dots and dashes"); that is, speech perceptionis "cognitively impenetrable." Further, speech processing is"informationally encapsulated"; for example, listeners hearsynthetic speech and even highly impoverished "sine-wavespeech" (Remez, Rubin, Pisoni, & Carrell, 1981) as speecheven though they are aware that such signals are not producedby a human vocal tract. Finally, there is evidence that speechprocessing is domain specific; that is, it is special to speech.

Evidence for this last claim comes from comparisons of theobjects of speech perception with ostensible objects of non-speech auditory perception. In speech, dimensions of thepercept fail to correspond, in many cases at least, to obviousdimensions of structure in the acoustic speech signal and docorrespond to the vocal-tract activity that might have pro-duced the signal (see Liberman, Cooper, Shankweiler, &Studdert-Kennedy, 1967, and Liberman & Mattingly, 1985,for reviews and discussion of the findings). In contrast, in atleast some cases, objects of nonspeech auditory perceptioncorrespond more closely to acoustic structure. For example,

This research was supported by National Institute for Child andHuman Development Grant HD 01994 to Haskins Laboratories.

We thank Alvin Liberman for his comments on the manuscriptand Bruno Repp for supplying the synthetic-speech stimuli used inthe experiments.

Correspondence concerning this article should be addressed toCarol A. Fowler, Haskins Laboratories, 270 Crown Street, NewHaven, Connecticut 06511.

in isolation, a frequency glide will sound to a listener like apitch guide. However, integrated with an acoustic signal for aconsonant-vowel (CV) syllable, it will contribute to the per-ception of a consonant of which the pitch glide is not experi-enced as a part. Moreover, rather different frequency glidesthat sound quite distinct in isolation may sound like indistin-guishable tokens of a given consonant when they are inte-grated into different syllabic frames. Apparently, they soundalike just when they signal the same consonantal gesture ofthe vocal tract (Liberman et al., 1967). The acoustic signalsfor the same gesture are different in different syllables becauseof coarticulatory overlap of consonant and vowel production.

These findings and compatible others led to the develop-ment of the motor theory of speech perception. In the theory,phonetic gestures of the vocal tract, and not acoustic structureper se, are perceptual primitives of speech, whereas acousticstructure itself provides the primitives for auditory perceptionof other sounds. This implies that perceptual processing ofspeech must be distinct from general auditory processing, andit must somehow recover the articulatory source of the acous-tic speech signal. According to the motor theory, to helpunravel effects of coarticulation on the acoustic signal, lis-teners engage their own speech motor systems in perceivingthe speech of others. The talker's acoustic signal serves as thebasis for a hypothesis as to the sequence of consonants andvowels that, when coarticulated, would give rise to the signal.In the theory, listeners use their speech motor systems (inLiberman & Mattingly, 1985, an "innate vocal tract synthe-sizer") to test the hypothesis. This gives the percept its motorcharacter. Not unexpected from this perspective are findingsthat talkers' idiosyncratic methods of articulating some pho-netic segments (Bell-Berti, Raphael, Pisoni, & Sawusch, 1979)appear to affect their perception of those segments as pro-duced by a speech synthesizer.

The phenomenon of duplex perception (e.g., Liberman,Isenberg, & Rakerd, 1981; Mann & Liberman, 1983; Rand,1974) has been interpreted as providing particularly strongevidence in favor of two claims of the motor theory: that

742

Page 2: Duplex perception: A comparison of monosyllables and slamming doors

DUPLEX PERCEPTION IN SYLLABLES AND DOOR SLAMS 743

speech perception is subserved by a specialized perceptualsystem, and that the speech module yields a perceptual objectthat is not immediately based on acoustic structure. In duplexperception, listeners report hearing a single piece of acousticinput as both speech and nonspeech simultaneously. In oneparadigm (e.g., Mann & Liberman, 1983), the first and secondformants of a synthetic consonant-vowel speech syllable andthe steady-state part of its third format (the "base") arepresented to one ear, while the remaining portion of thesignal, the third format transition, is presented to the otherear. The third formant transitions chosen for isolation in theexperiment determine place of articulation for the syllable-initial consonant. However, in isolation, the transition soundslike a nonspeech pitch glide or "chirp." When the transitionand base are presented dichotically in the appropriate tem-poral relationship, listeners report hearing an unambiguousintegrated syllable in the ear receiving the base, and, at thesame time, a nonspeech "chirp" in the ear receiving thetransition.

For the motor theorist, the duplex phenomenon providesstrong support for a distinct speech module. The fact that thetransition is heard simultaneously both as part of a speechsyllable and as a nonspeech chirp implies that two distinctperceptual mechanisms give rise to the perceptual experience:one yielding perception of the speech syllable and one yieldingperception of the pitch glide. Heard as nonspeech, the tran-sition sounds like the frequency glide that it is, whereas,integrated with a syllable, it sounds like a consonant of whichthe glide is not experienced as a part. Liberman and Mattingly(1989) suggested that although the general auditory perceivingsystem yields a percept of the acoustic input itself, the speechmodule and a small number of other highly specialized mod-ules of the auditory system do not. Whereas the outputs ofthe general auditory system are "homomorphic" with respectto the structure of stimulation at the sense organ (in that, forexample, a frequency glide sounds like a pitch glide), theoutput of the speech module is "heteromorphic."

A question is why listeners do not always experience duplexperception. That is, how does the speech module know what,in stimulation, belongs to its domain and how does it ordi-narily prevent the general auditory system from yielding adistinct homomorphic percept of stimulation in the speechmodule's domain? Recently, Whalen and Liberman (1987)proposed an answer obtained using a new methodology forobserving duplex perception. Their answer is that the speechmodule is preemptive: It gets initial access to acoustic input,and the general auditory system gets its leftovers.

For their experiment, Whalen and Liberman used a syn-thetic speech base but a sinusoidal third formant transitionand sounds were presented diotically rather than dichotically.The transitions were sinusoids to enhance the distinctness ofthe two parts of the duplex percept, which now would beheard in the same spatial location. In addition, however, afinding of duplex perception with sinusoidal transitions is allthe more remarkable, because base and transition are discor-dant.

In the experiment, subjects were first asked to label thetransitions presented in isolation as either "da" or "ga." Onthis task, although subjects were able to discriminate the

transitions, as evidenced by their consistent assignment ofone transition to the category "da" and the other to "ga," theyheard neither as particularly "da"- or "ga"-like; averaged oversubjects, accuracy was random on the task. Next, a "duplexitythreshold" was determined for each subject. The base, whichsounded like an ambiguous "da" to most subjects, was pre-sented diotically with one of the transitions. Subjects wereinstructed to adjust the intensity level of the transition untilthey just started to hear both an ambiguous syllable ("da" or"ga" depending on which transition was presented) and anonspeech "chirp." At these critical intensity levels, tests ofduplexity were conducted by having subjects match the du-plexed transition to transitions presented in isolation. Subjectsperformed well above chance on this task, indicating that trueduplexity was occurring. Likewise, subjects were very accuratein identifying the consonant as "da" or "ga" both at theduplexity threshold and below it. There was in fact no signif-icant difference between subjects' syllable-labeling perform-ance when the transitions were presented at intensities aboveand below the duplexity threshold.

The results of the experiment by Whalen and Libermancan be summarized as follows. When the base was presentedalone, subjects generally reported hearing an ambiguous "da."When the transition was present, but at intensities below theduplexity threshold, subjects reported hearing an unambigu-ous "da" or "ga," depending on the transition. Finally, attransition intensities at or above the duplexity threshold,subjects reported hearing both the unambiguous syllable anda distinct nonspeech chirp, which corresponded to whichevertransition was present.

Because the transition contributes to the speech percept atintensity levels lower than those at which it is heard asnonspeech, Whalen and Liberman concluded that the proc-essing of the transition as speech has priority. In other words,the speech module "preempts" the input and then passes anyremainder of the signal to the general auditory system. Whalenand Liberman concluded that such preemption reflects "theprofound biological significance of speech" (p. 171).

We interpret the earlier research findings that led to themotor theory differently than did Liberman and his col-leagues, and therefore are disposed to look for an alternativeaccount of duplex perception. Evidence that speech perceptsfrequently fail to reflect the acoustic structure in an obviousway, but do reflect the vocal tract behaviors producing thesignal, led Liberman and his colleagues to conclude thatperception of speech is different from perception of othersound-producing events, and led them to look inside theperceiver for an explanation for the difference. From a differ-ent perspective, however, evidence that listeners to speechrecover the physical sound-producing activity of the vocaltract suggests that, in its public aspect, perceiving speech isvery much like perceiving other environmental events, andthat there is no need to look inside the perceiver to explainhow the percept acquires its motor character (Fowler, 1986;Fowler & Rosenblum, in press; Rosenblum, 1987).

By the "public aspect" of perception, we refer to the eventsin an environment that perceivers can be shown to recover inperception and the information in stimulation available toperceptual systems that support that recovery. Accordingly,

Page 3: Duplex perception: A comparison of monosyllables and slamming doors

744 CAROL A. FOWLER AND LAWRENCE D. ROSENBLUM

the public aspect of perception is complementary to its covertor private aspect, which includes whatever processing thenervous system may undertake in subserving perceptual re-covery of environmental events. When we suggest that, in itspublic aspect, speech phenomenon is like perception of otherenvironmental events, we are proposing that the kinds ofthings that can serve as perceptual objects (physical events inthe environment) and the role that stimulation at the senseorgan plays in perception are alike, but we are not making aclaim that, necessarily, covert nervous system activities arethe same.

Visual perceivers experience seeing a world outside of them-selves; they do not experience seeing the structured optic arraythat stimulates the retina. In visual perception, that is, opticalstructure serves as information for its causal environmentalsource (e.g., Gibson, 1966, 1979). Accordingly, in Libermanand Mattingly's terms, visual perception, like speech percep-tion, is generally "heteromorphic;" dimensions of perceptualexperience are not dimensions of the optical structure at thesense organ. More positively, however, dimensions of percep-tual experience are those of the environmental causes ofstructure. Optic arrays are not objects of perception them-selves; rather, they are the means by which observers perceivethe environment. By the same token, if a perceiver sees, forexample, a glass on a table and reaches out to pick it up, mostlikely he or she will pick up an object conforming in hapticexperience to the object experienced visually. That is, in hapticperception, perceivers feel real-world objects that cause com-plex deformations of the skin (cf. Gibson, 1962); they do notperceive the skin deformations themselves; like vision, hapticperception is heteromorphic.

More generally, in its public aspect, perception is the meansby which organisms come to know their environment byusing structure in stimulation at the sense organs, not as anobject of perception itself, but as information about its causesin the environment. The kind of perception that Libermanand Mattingly call "heteromorphic" allows perceivers to knowjust one world, the one out there. That kind of perception is"homomorphic," mostly, with respect to environmentalcauses of structure in stimulation. In speech, listeners use anacoustic signal as information for its causal source—the ges-tures of the vocal tract that realize the talker's phoneticmessage. However, we claim, they do the same thing whenthey receive acoustic products of any sound-producing event;speech perception is not special in this regard.

What then of duplex perception? Does it not, in any case,suggest that there is a distinct mechanism subserving speechperception? We hypothesize that it has another interpretation.The purpose of our experiments was to explore that possibil-ity.

Experiment 1

The research by Whalen and Liberman suggested a startingpoint for our research. In that study, listeners only reported aduplex percept when the intensity of the transition was ratherhigh. Possibly, the sine-wave transition was wholly integratedwith the syllabic base until its intensity was too high toconstitute a plausible syllable-initial transition for the base.

Excess energy in the transition was perceived as the acousticproduct of a distal source distinct from the syllable. If thatmanner of "parsing" the acoustic structure underlies the du-plex percept, then it should be possible to obtain duplexperception, and "preemptiveness," for the acoustic product ofalmost any sound-producing event if part of the product ismade unnaturally intense in relation to the remainder.

There have been reports of duplex perception in the litera-ture for stimuli other than speech. However, in our view,some of the reports are mistaken and others do not challengeWhalen and Liberman's interpretation of their findings. Breg-man (1987) reported instances of duplexity in visual percep-tion. His examples are of occlusion of part of one object byanother. In one example, part of a square is occluded by atransparent object with opaque stripes; one of the stripesoverlays most of one side of the square. Even so, most viewersidentify the partially occluded object as a complete square. Inour view, this kind of example lacks the essential propertiesof duplex perception. The observer does not see one fragmentof the pictorial display in qualitatively different ways at thesame time. The observer sees that a stripe hides a side of thesquare. Moreover, the observer finds it completely acceptableif he or she is told that, in fact, the square is incomplete.In contrast, the isolated formant transition does not includea /g/. Rather, the transition gives rise to two qualitativelyvery distinct perceptual experiences, and the two perceivedobjects can even be heard as emanating from different loca-tions in space. Finally, listeners cannot hear the syllable baseunintegrated with the transition in the way that they canimagine an incomplete square behind the occlusion. Thereare also reports of duplex perception for musical stimuli(Collins, 1985; Pastore, Schmuckler, Rosenblum, & Szczesiul,1983). We chose to look for duplex perception for othernonspeech sounds, however, because we wanted to use soundsfor which it could not plausibly be supposed that listenershave a specialized perceptual module. A safe category ofsounds in that regard would seem to be the category ofproducts of human artifacts. We chose the sound of a doorslamming.

Our view of perception requires us to begin our investiga-tion with sounds that are causal consequences of sound-producing events rather than arbitrarily synthesized soundpatterns. This creates two difficulties for an attempt to repli-cate Whalen and Liberman's procedures exactly or, to a lesserdegree, to replicate other procedures in which duplex percep-tion is achieved for speech. One is that we do not have themeans currently to substitute a caricature of a fragment of adoor-slam sound for the fragment itself analogous to Whalenand Liberman's sine-wave formant transitions. This makes itmore difficult to detect true duplexity, because the duplexedfragment is not as acoustically distinct from the integratedsound as it would be if it could be caricatured. A seconddifficulty is that nonspeech sounds must rarely, if ever, occurin da/ga-like pairs in which the sounds share most of theiracoustic structure and differ by an easily parsable fragment ofthe spectrum. Whereas Whalen and Liberman and otherinvestigators could show that perception of the base is changedappropriately depending on which transition is presentedsimultaneously with it, and could test for a duplex percept byhaving subjects identify the nonspeech chirp on trials designed

Page 4: Duplex perception: A comparison of monosyllables and slamming doors

DUPLEX PERCEPTION IN SYLLABLES AND DOOR SLAMS 745

to elicit duplex perception, we cannot. However, we can askwhether duplex responses increase systematically with inten-sity of the chirp analog, and we can compare the relativefrequency of duplex responses with responses indicating thatthe sound fragments do not integrate.

Method

Subjects. Subjects were 16 students at Dartmouth College, Han-over, New Hampshire, who participated in the experiment for coursecredit. All reported normal hearing.

Materials. We recorded the slamming of a metal door to a sound-attenuating chamber. Next, we low-pass filtered the signal at 10 kHzand sampled it at 20 kHz using a 16-bit A to D converter. Figure 1(top left) shows the waveform of the acoustic signal, which was 169ms in duration, and (top right) a spectral cross-section taken 10 msfrom stimulus onset. We .then filtered the signal digitally in two ways.We low-pass filtered the signal at 3 kHz to make a "base" analogue(bottom right of Figure 1), and we high-pass filtered it, also at 3 kHz,to make a "chirp" analogue, henceforth, the "excerpt" (bottom left).(The filtering distorts the signals somewhat. This may move ourmanipulation in the direction of that of Whalen and Liberman's useof sinusoidal transitions in making the fit between the parts somewhatdiscordant. However, we found the effects of filtering unnoticeablewhen the fragments were recombined in their appropriate intensityrelations.) To us, the base sounded like a door slam; however, it wasdiscriminable from the original metal-door sound because the metal-lic clanging sound was largely absent. The excerpt did not sound likea door slam, but rather like the sound of something being shaken.

Finally, we made new versions of the high-pass filtered signal byattenuating or amplifying it. Sampled digitized voltages constitutingthe excerpt were multiplied by 0, 0.05, 0.1, 0.15, and 0.2 to make a

low-intensity series, by 0.9, 0.95, 1, 1.05, and 1.1 to make an inter-mediate-intensity series, and by 4, 4.5, 5, 5.5 and 6 to make a high-intensity series.

We made four test sequences from the set of originals. The first,meant to familiarize listeners with the sounds, presented the originalmetal-door slam, the base, and the excerpt in sequence five times.There was 1 s between items in a sequence of three and 3 s betweensequences. The second test order, an identification test, presentedeach of the three signals—original metal door, base, and excerpt—eight times each in random order. There were 3 s between trials. Inboth of these sequences, stimuli were presented diotically at a com-fortable listening level.

Each of 45 trials in both final tests presented the base paired with1 of the 15 attenuated or amplified excerpts. Each version of theexcerpt appeared 3 times on the test. Trials were randomized, andthere were 3 s between trials. The tests differed only in that, in one,stimuli were presented diotically, whereas in the other, they werepresented dichotically.

Procedure. Subjects were run in groups of 1 to 3 students. Theytook the tests in the order described previously except that the orderof the diotic and dichotic versions of the test was counterbalanced.

Subjects were first told that they would be listening to threedifferent sounds, and that we wanted to know how people wouldidentify or describe them. As we played the repeating sequence ofsignals (metal door, base, excerpt), they were to try to identify eachsound. Failing that, they were to describe the sound as best theycould.

Having collected their written identifications or descriptions, wenext gave them names to call each sound. We told them that the firstsound (the original metal-door sound) was the sound of a metal doorslamming, the second (the base) was the sound of a wooden doorslamming, and the third (the excerpt) was the sound of some smallmetal pellets being shaken in a cup. We played the five sequences of

ORIGIN

01000 0.020 0.040 0.060 0.080 0.100 0.120 0.140 0.160SECONDS

4 6

Frequency in kHz

10

£ -60002L^ "8000o_l

-100.00

>. 0.00'5>

J -20.00 -

I -4000 -u

<^ -60.00 -o>

-1 -8000 -

-100000

4 6Frequency in kHz

:0

4 6

Frequency in kHz

10

Figure 1. (Clockwise from top left) Waveform of the metal-door sound; spectral cross-section fromthe door sound; cross-section after low-pass filtering; cross-section after high-pass filtering.

Page 5: Duplex perception: A comparison of monosyllables and slamming doors

746 CAROL A. FOWLER AND LAWRENCE D. ROSENBLUM

the three sounds over to the subjects and asked them to associate thenames "metal door" (M), "wooden door" (W), and "shaking sound"(S) to the three sounds, respectively.

Next, they took the 24-item identification test to determine whetherthey could distinguish and label the three sounds. On each trial, theywere asked to identify the sounds as the metal door, the woodendoor, or the shaking sound by writing an identifying letter on theanswer sheet. They were asked to guess if they were uncertain.

On the diotic test, we told subjects that they would either hear oneof the three sounds on each trial or else they would hear two of thesounds presented simultaneously. On their answer sheet next to eachtrial number, they were to write one identifying letter if they heardjust one of the three sounds or two different letters if they heard two,guessing at the identity of either one or both if necessary. They werenot told that when they heard two sounds, one would always be ashaking sound; accordingly, any pairing of the three sounds consti-tuted an acceptable answer on those trials on which listeners heardtwo sounds. Instructions on the dichotic test were the same exceptthat subjects were asked to assign their identifications to the ear inwhich the identified stimulus had been presented. On their answersheet, the left-most of two blank spaces for each trial was for identi-fication of stimuli heard in the left ear and the right-most space foridentifications of stimuli heard in the right ear.

Diotic

&3

lew miririU- hijjhintensity

wooden door

nil-till door

dnpli-x

middle Kijrhintensity

Figure 2. Major response categories in the diotic (top) and dichotic(bottom) conditions of Experiment 1.

Results

Identifications and descriptions. Six subjects identified themetal-door slam as the sound of a door slamming or beingclosed. All subjects identified the sound as a hard collision ofsome sort. (For example, 4 subjects repotted hearing the soundof a drum beat and 2 subjects reported the sound of a footfall, such as "footsteps on stairs" and "boot clomping onfloor.")

Four subjects identified the base as a door closing and, asfor the metal door, the remainder specified some form of hardcollision. (For example, 2 heard a drum beat, 1 heard a bookbeing dropped, and another heard a heavy box beingdropped.)

No one associated the excerpt with the sound of a doorclosing. Almost all reported something being shaken (e.g.,maracas, castanets, tambourine, keys, baby rattle).

Although listeners were not particularly good at identifyingthe door slams as such, they were very good at narrowing thepossibilities down to a highly restricted class of sound-pro-ducing events (hard collisions involving rather heavy objects).It is not possible to tell from the literature, whether listeners'accuracy in identification of the door sound as such is or isnot typical of sounds of this duration and familiarity. In anycase, important for our purposes, listeners associated theexcerpt with a rather different class of events than the doorsound and the base.

Identification test. No subject made an error identifyingthe shaking sound. All errors were confusions between themetal door and the base. On average, however, performancewas high (91% correct) on the 16 trials of the identificationtest in which the sound was either the metal door or the base.The errors were evenly divided between those conditions.

Duplex teats. The most frequently used response cate-gories on these tests were M, W, and MS (metal door andshaking sound). We present responses in those categories inFigure 2 for the diotic (top) and the dichotic (bottom) tests.

In the figure, responses are collapsed across intensities withineach of the three intensity ranges (low, middle, and high) ofthe excerpt. In general, there were no obvious trends in useof the response categories within each intensity range.

On the diotic test, by analogy with the findings of Whalenand Liberman (1987), we might expect subjects to hear justthe base when the intensity of the excerpt is far below itsnatural intensity relation to the base, to hear the integratedmetal door at intermediate-intensity levels (which surroundand include the natural intensity relation of the excerpt to thebase), and to hear the metal door plus a residual shakingsound at high-intensity levels of the excerpt. That is exactlywhat we found.

The W responses were predominant at low intensities ofthe excerpt (76% of total responses to stimuli in that intensityrange), but they were rare at medium (13.8%) and high (1.3%)intensities. The decrease with intensity is highly significant,F(2, 28) = 138.8, p < .0001. (The Fs here and on other duplexidentification tests are based on an arc-sine transformation ofthe data, required because variances were very different acrossthe intensity ranges. In particular, they decreased markedlywith decreases in response frequency. Analyses of variance[ANOVAs] using the arc-sine transformed data did not changethe outcome in any important way. Effects that were signifi-cant using untransformed data remained significant undertransformation. Likewise, effects that were nonsignificant inthe one analysis were nonsignificant in the other).

The M responses were uncommon at low (12.4%) and highintensities (11.7%), but predominant at intermediate intensi-ties (64.8%). The effect of intensity was significant, f(2,28) = 34.44, p<. 001.

Duplex responses (MS, that is, "metal door." "shakingsound") were rare at low-intensity (.4%) and intermediate-intensity (8.8%) levels, but predominant at high intensities(60.8%). The effect of intensity was highly significant, F(2,28) = 48.46, p<. 001.

Page 6: Duplex perception: A comparison of monosyllables and slamming doors

DUPLEX PERCEPTION IN SYLLABLES AND DOOR SLAMS 747

One other response category, not depicted in the figure, isalso of interest. The WS ("wooden door," "shaking sound")responses are those in which listeners heard two sounds, butapparently failed to integrate the excerpt with the base. Thiscategory of response is rare in experiments in which stimuliare fragments of speech syllables. On the diotic test of thepresent experiment, this response category is somewhat morecommon, accounting for 11.7% of all responses on average.There was no significant change in the frequency of usage ofthis category across intensity conditions. However, numeri-cally, there were twice as many WS responses at the highestintensity levels, where a two-sound percept is expected, thanat the low and intermediate levels.

In the present analysis, there was a significant interactionof intensity and condition—whether subjects took the diotictest before or after taking the dichotic test, F(2, 28) = 4.75,p = .02; however, the patterning of response frequencies wasthe same regardless of the order in which the diotic test wastaken. Wooden-door responses had frequencies of 65% and87% at low chirp intensities for subjects taking the diotic testfirst and second, respectively; they were 15% and 12.5% atintermediate intensities and 2.5% and 0% at high intensities.This interaction was nonsignificant for the other responsecategories.

As shown in Figure 2, the dichotic results exhibited bothsimilarities to and differences from the diotic test. As forsimilarities, W responses are predominant (49.2%) at lowintensities and rare at intermediate (4.2%) and high (1.7%)intensities. The difference in response frequency across con-ditions is highly significant, F(2, 28) = 111.25, p < .0001. Inan ANOVA on the data from both the diotic and dichotictests, however, there is a significant main effect of test, F( 1,14) = 9.21, p = .009, because there are more W responses onthe diotic test. In addition, there is a Test x Intensity inter-action because that difference is largest at the low range ofintensities, F(2, 28) = 7.72, p < .01.

Like W responses, the duplex response, MS, patternedordinally on the dichotic test as it did on the diotic test. It isuncommon at low intensities (10%) and increasingly commonat intermediate (39.9%) and high (64.4%) intensities. Thechange with intensity is highly significant, F(2, 28) = 46.29,p < .001. There is a difference in the outcome between thediotic and dichotic tests, however, that gives rise both to amain effect of test in an ANOVA including data from bothtests, F(l, 14) = 7.89, p = .014, and a Test x Intensityinteraction, F(2, 28) = 5.91, p = .007. The major differenceis that MS is considerably more frequent at intermediatelevels of intensity on the dichotic (39.9%) than on the diotictest (8.8%).

The M responses showed a different pattern on the dichoticthan on the diotic test. These responses already account forabout one quarter (27.2%) of responses on the dichotic testin the low-intensity ranges, whereas they accounted for just12% of responses in the low-intensity ranges on the diotictest. Moreover, on the dichotic test, M responses decrease infrequency monotonically over the three intensity ranges, F(2,28) = 11.25, p = .003, whereas they peak in the middle rangeof intensities on the diotic test. Here, as for the other re-sponses, both the main effect of test, F(\, 14) = 17.25, p <

.001, and the interaction of test and intensity, F(2,28) = 21.9,p < .001, are significant in an ANOVA.

A final difference in the outcome of the diotic and dichotictests concerns the response category WS. These occur on 12%of all trials in the diotic test, but on 28% of dichotic trials.This difference is significant, F(l, 14) = 18.9, p = .0007; itdoes not interact with intensity. In the dichotic data only,there is a significant change in the frequency of use of thisresponse category with intensity, F(2, 28) = 6.13, p = .006,because the response category is used less frequently in thelow intensity range than in the others.

Discussion

The most important finding of the experiment is thatsubjects made a large number of responses suggestive of aduplex percept on both the diotic and dichotic tests. More-over, they did so, as predicted, predominantly at the higherintensity ranges of the nonspeech excerpt. Although it ispremature in the absence of further tests to conclude thatlisteners are experiencing a duplex percept rather than inte-grating the two sound fragments cognitively, the experimentdoes replicate the major response pattern associated with theduplex speech phenomenon in the literature. Explorations ofthe bases for the duplex responses are presented as Experi-ments 3-5.

A second important finding is a response pattern for thediotically presented door sounds replicating the one for diot-ically presented speech that led Whalen and Liberman toidentify speech perception as "preemptive." Their finding wasthat, under conditions of diotic presentation of a base syllableand a /g/ transition, for example, at intensities at which thetransition first became detectable in any form, listeners inte-grated it with the base and reported hearing /ga/, but did notyet report a duplex percept. It was only at higher intensitiesof the transitions that listeners reported a duplex percept.Whalen and Liberman's interpretation was that the speechsystem gets initial access to incoming acoustic signals. Stim-ulation is passed to the general auditory system only if thespeech system rejects it. For Whalen and Liberman, preemp-tion illustrates the special biological importance of speech.

Our findings with filtered door sounds weaken this inter-pretation. With diotic presentation of the base and excerpt,our results replicate those of Whalen and Liberman. Whenthe upper frequencies of the door sound first became detect-able in any form, subjects reported hearing the metal door,but did not yet report hearing a duplex percept. It is only athigher intensities of the excerpt that MS is reported. It isimplausible, however, that there is a specialized perceptualsystem for door sounds that is given initial access to theacoustic signal, and so an alternative interpretation of thefindings is required.

That preemptiveness requires a different accounting thanWhalen and Liberman offered is also suggested by recentfindings of Ciocca and Bregman (1989). They presented asyllable base to one ear and a transition (for /da/ or /ba/ inone stimulus set and for /da/ or /ga/ in another) to the otherear. However, they embedded the critical transition in severaldifferent sequences designed either to "capture" it, and hence

Page 7: Duplex perception: A comparison of monosyllables and slamming doors

748 CAROL A. FOWLER AND LAWRENCE D. ROSENBLUM

prevent its integration with the base, or else not to capture it.The capturing sequences were either of transitions identicalto the critical transition or else forming an ascending ordescending series into which the transition fit as an integralpart. Capturing patterns compared with control patterns sig-nificantly reduced the likelihood that the critical transitionintegrated with the syllable base. In some instances, the cap-turing sequence itself was preemptive, so that listeners weremore likely to report hearing the syllable base than the inte-grated syllable. Ciocca and Bregman rejected the idea thatduplex perception reflects independent processing of the tran-sition by distinct perceptual systems and suggest instead thatit reflects perceptual strategies for allocation of parts of acomplex acoustic signal to their probable sources.

Our interpretation of our findings agrees largely with thatconclusion. With diotic presentation, we found that (a) Mresponses predominate precisely at the intensities of the ex-cerpt that reflect its natural intensity relation to the base, and(b) above those intensities, listeners report hearing two sounds,not one. We interpret these findings as showing that listenersare sensitive to the cospecification of a coherent slammingevent by excerpt and base. When two sound fragments jointlyspecify a sound-producing event, in general, such an eventwill be perceived. Any residual acoustic signal will be per-ceived as having a distinct origin. This account is not com-plete. In his review and investigation of six types of dichoticfusion, Cutting (1976) identified a type of fusion that he called"spectral," in which the first formant of a CV syllable ispresented to one ear and the higher formants are presented tothe other ear. The formants do "fuse" under these conditions.However, when the upper formants are attenuated relative tothe first formant, listeners do not report duplexity (i.e., theydo not hear the residual Fl amplitude as a distinct sound;e.g., Rand, 1974). Instead (now with Fl and F2 presented toone ear and F3 to the other), Liberman and Mattingly (1989)reported that manipulations of relative intensity affect theperceived location in space of the fused syllable. However,duplexity may occur with spectral fusion when the fundamen-tal frequencies of the dichotic syllable fragments are madediscordant by as little as 2 Hz (Cutting, 1976).

Our findings on dichotic presentation of the door parts arenot identical in pattern to those on diotic presentation. Oneset of differences suggest that, presented to a different earfrom the base, the excerpt sounds louder to subjects than itdoes presented to the same ear, perhaps reflecting a releasefrom masking under conditions of dichotic presentation (cf.Rand, 1974), who found release from masking by Fl ofconsonant place of articulation information in upper for-mants when formants are presented dichotically). This hasthree consequences. One is that M responses are rather fre-quent even at the lowest intensities of the excerpt. Second,those responses decrease in frequency as intensity increases,because duplex, MS, responses become predominant. Thethird is that the duplex response occurs predominantly evenwhen the two door parts are in their natural intensity relation.Perhaps because of this apparent difference in loudness of thedichotically presented excerpts, the dichotic test provides noevidence of preemptiveness of the "metal door" percept inthe sense that listeners hear two sound sources as soon as they

hear the excerpt at all. With the ranges of intensities that weused (and so with a gap in the intensities examined between0.2 and 0.9 of the excerpt's natural intensity relation to thebase), responses shift from predominantly W responses topredominantly MS responses.

The finding that duplex responses come in early withdichotic presentation may alternatively or in addition reflectthe extra information the listener gets with dichotic presen-tation (in the different apparent locations of the two doorparts) and there are two distinct sound sources. This locationinformation may also explain why WS responses occur morefrequently with dichotic than diotic presentation. Integrationmay be less likely when the acoustic fragments are differentlylocalized.

The next experiment further explores duplex perception forthe door-closing sounds. Experiment 2 examines dichoticpresentation of the base and excerpt when intensities of theexcerpt fill in the range between the low and intermediateintensities of Experiment 1. In this way, we can see whether,with dichotic presentation, there is a range of intensities inwhich M responses predominate in the absence of a duplexpercept. There should be such a range if the reason for thedifferences between diotic and dichotic presentation in Ex-periment 1 is a release from masking with dichotic presenta-tion of the excerpt. An alternative possibility is that thelocation information provided by dichotic presentation elim-inates any range in which a single, integrated, sound is re-ported.

Because evidence of a range of intensities in which listenersexperience integration without duplexity had not been soughtyet with dichotic presentation of syllabic bases and chirps, weconducted parallel experiments with door and speech sounds.If speech is truly preemptive, as Whalen and Liberman pro-posed, then preemption should be evident with dichotic aswell as with diotic presentation of base and chirp fragments.

Experiment 2

Method

Subjects. Subjects were 12 students at Dartmouth College whoparticipated for course credit. All were native speakers of English whoreported normal hearing.

Materials. Nonspeech stimuli were those of Experiment 1 exceptthat, in the present experiment, the multipliers for the voltages of theexcerpt were 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 2.5, 3, and3.5.

Speech stimuli were the /ga/ syllable, base, and /g/ transition usedby Repp, Milburn, and Ashkenas (1983), with the transition ("chirp")ranging in its intensity relation to the base in the same way as thedoor excerpt described previously. As described by Repp et al., thespeech stimuli were created on the Haskins Laboratories parallel-resonance synthesizer. The full syllable and base were 250 ms induration with 50-ms transitions; the chirp was the 50-ms transitionof the third formant of the full syllable. In the full syllable and base,the first formant rose from 285 to 771 Hz while the second formantfell from 1770 to 1233 Hz. The steady-state value of the third formantwas 2525 Hz. The /ga/ transition rose from 2018 Hz to the steady-state value.

Five stimulus sequences were made using each stimulus set. Resultson just the first three sequences are reported here. For the door

Page 8: Duplex perception: A comparison of monosyllables and slamming doors

DUPLEX PERCEPTION IN SYLLABLES AND DOOR SLAMS 749

stimuli, the first two sequences were also the first two sequences thatsubjects experienced in Experiment 1. First, the three sounds—fulldoor sound, base, and excerpt—were presented in sequence five times.Second was a 30-item identification test in which 10 tokens of eachsound were presented in random order for identification. Third wasa 130-item dichotic test consisting of 10 pairings of base plus excerptat each of the excerpt's 13 intensities. There were 3 s between trialson the identification and dichotic tests. The fourth sequence was adiscrimination test presented as Experiment 4.

Test sequences using speech stimuli were analogous to the non-speech sequences.

Procedure. Subjects were run in groups of 1 to 3 students. Subjectstook all 10 test sequences. The order of speech and nonspeech testswas counterbalanced across subjects. Instructions to subjects were thesame as those in Experiment 1.

Results

Door sounds. Performance averaged 96% on the identifi-cation test, and, as in Experiment 1, essentially all of theerrors were confusions between the "wooden" and "metal"doors. Performance on the dichotic test is shown in Table 1.Even at the lowest intensities of the excerpt, subjects tendedto report two sounds. The modal response at intensities 0.1and 0.2 were WS. Thereafter, it was MS; hence, duplexresponses appear considerably before the excerpt reached itsnatural intensity relation to the base. The frequency of thatresponse increases with intensity of the excerpt. In this test,there is no range of intensities at which subjects reportedhearing "metal door" predominantly. The response WS, inwhich the listener reports two sounds but fails to integrate theexcerpt with the base, occurred on 30% of trials, about asfrequently as it did in the analogous dichotic test of Experi-ment 1.

Speech sounds. Identification performance on speechstimuli averaged 92.3% with all errors confusions between the

Table 1Performance on the Dichotic Test in Experiment 2

Auditory perception Speech perception

Intensity0.10.20.30.40.50.60.70.80.91.02.53.03.5

W

282521101184212———

M

135410107137442

—2

WS

43"37"3734362828292127192823

MS

153338"46"43"57"54"61"74"67"78"72"74"

D

11987744672232

G47"313729272228211513722

DC

9151013131613131915171617

GC

3345a

45"52"52"57"54"61"58"70"74"79'78"

Note. All category values represent percentage of responses. W =wooden door (or base); M = metal door (integrated base and excerpt);WS = wooden door, shaking sound (two sounds unintegrated; MS =metal door, shaking sound (two sounds integrated); D = da (base);G = ga (integrated base and chirp); DC = da, chirp (two sounds,unintegrated); GC = ga, chirp (two sounds, integrated).* Modal response.

base and the full syllable. Performance on the dichotic test isshown in Table 1.

Performance on that test is similar to that on the nonspeechsounds with two exceptions. Whereas the response "metaldoor" is rare on the dichotic nonspeech test, the analogousresponse "ga" is not. It is the modal response at the lowestintensity of the chirp and the second most frequent responseat intensities between 0.2 and 0.9. Likewise, whereas theresponse WS is fairly common on the nonspeech test, occur-ring on 30% of trials, "da, chirp" (EXT) is less common onthis test, occurring on just 14% of trials. Otherwise, perform-ance on the dichotic speech and nonspeech tests is verysimilar. At all intensity relations higher than 0.1, the modalresponse on the speech test is the duplex response "ga, chirp"(GC), and the frequency of that response increases with chirpintensity.

Discussion

The results of Experiment 2 replicate findings of Experi-ment 1 that duplex responses come in at lower excerpt inten-sities with dichotic than with diotic presentation of doorsounds. Duplex responses come in early with dichotic pres-entation of speech fragments as well; possibly there is ananalogous diotic/dichotic difference there.

In contrast with diotic presentation of speech and doorsounds, with dichotic presentation, there is no range of inten-sity relations of excerpt to base (excepting 0.1 for speech) atwhich a single integrated percept is reported predominantly.With dichotic presentation, evidence that the two parts of aspeech syllable or door sound arise in different locations mayincrease the likelihood that subjects report two sounds. Re-markably, however, it does not prevent integration of the twofragments so that, at most intensities, the modal response isthe duplex response.

Although the response patterns are quite similar for speechand nonspeech stimuli, there is a major difference evident inExperiment 2. The most frequent response other than theduplex response in the speech tests is the integrated response"ga," whereas in the nonspeech test it is WS. This suggeststhat integration of the parts of the door sound is less compel-ling than integration of the parts of a speech syllable.

Possibly this difference is fundamental. Possibly integrationof the parts of a door sound is cognitive, whereas integrationof speech fragments is perceptual. That is, whereas integrationof speech fragments may occur, as Whalen and Libermansupposed, in a special module for perceiving speech sounds,integration of door parts may be an intellectual endeavor onthe part of a perceiver who hears the parts as separate andinfers how they would sound were they integrated. The re-maining experiments are designed to test for that difference.

Nausbaum, Schwab, and Sawusch (1983) proposed thatwhen listeners receive a syllable base in one ear and a transi-tion in the other, they may provide duplex responses becausethey can identify the consonant based on the transition itself.The consonant identification based on the chirp is ascribedto the ear receiving the ambiguous-sounding base. This inte-gration of consonant information from the chirp with syllable

Page 9: Duplex perception: A comparison of monosyllables and slamming doors

750 CAROL A. FOWLER AND LAWRENCE D. ROSENBLUM

information from the base, then, is not perceptual but rathercognitive in origin.

Repp et al. (1983) tested two predictions of this proposalusing a three-formant /da/-/ga/ continuum, the base and/ga/ transition endpoint of which we used in Experiment 2.One prediction is that subjects should be able to identify aconsonant based on the critical formant transition presentedalone. This prediction was disconfirmed. A second predictionwas that, asked to attend to the ear receiving the base in adichotic duplex trial, subjects should be able to match thebase to itself presented in isolation. Repp et al. tested thisprediction using an AXB discrimination test. In this para-digm, listeners hear three syllables in succession, designed A,X, and B. Their task is to decide whether the middle syllableX is more like A or more like B. On critical trials in theexperiment by Repp et al., the A (or B) syllable was the basespeech syllable that we used in Experiment 2, here presentedto the listeners' left ear; the X syllable was the base presentedto the left ear and a /g/ transition presented to the right ear;the B (or A) syllable was full /ga/ presented to the left ear.Listeners were asked to attend just to the sounds presented tothe left ear and to base their judgments only on those sounds.In the left ear, then, subjects received "base, base, /ga/" orelse "/ga/, base, base" on critical trials. However, overwhelm-ingly, they judged the middle syllable (paired on the rightchannel with an appropriately timed /g/ transition) as morelike the full /ga/ than like the base. This finding, along withevidence that listeners cannot identify /g/ from its thirdformant transition, shows that integration of base and transi-tion is impenetrable by listeners and hence not able to beachieved cognitively.

Experiments 3 and 4 test these hypotheses for the doorsounds. Experiment 3 tests whether listeners can identify themetal-door sound from its upper frequencies.

Experiment 3

We already have some information from Experiment 1 thatthe upper frequencies of the door do not sound like a door-slamming sound. When asked to identify the sound, none ofthe 16 subjects reported hearing a door sound. The presentexperiment asked for the same identification, but providedsubjects with more information about the sounds they wouldbe listening to than we had provided in the first identificationtest of Experiment 1. In particular, on each trial of an iden-tification test, subjects heard either a high-pass filtered metal-door sound—the excerpt of Experiments 1 and 2—or a high-pass filtered wooden-door sound. They knew that one soundwas a filtered metal-door slam, and that the other was afiltered wooden-door sound; their task was to decide whichwas the metal and which was the wooden door sound.

Method

Subjects. Subjects were 32 students at Dartmouth College whoparticipated for course credit. All reported normal hearing.

Materials. Stimuli were the original and filtered metal doorsounds used in Experiment 1 as well as an original wooden-door slamand its upper frequencies (3 kHz and greater). The wooden-door slam

was recorded in the same way as the metal-door slam of Experiment1 except that we placed plastic tape over the latch of the woodendoor. The tape did not prevent the door from latching, but it dideliminate the sound of metal hitting metal during latching.

Two tests were made using these stimuli. In one, meant to famil-iarize listeners with the unfiltered door sounds, the original metal-door slam and the new wooden-door slam were presented in alter-nation five times. There was 1 s between sounds within an alternatingpair and 3 s between pairs. A second, identification, test orderpresented just the excerpts 10 times each in random order. Therewere 3 s between trials.

Procedure. Two groups of subjects were run. By analogy to theprocedure of Repp et al. (1983), we asked one group of 16 subjectsto identify the excerpts as the filtered sound of a metal or woodendoor without their having prior experience with the unfiltered doorsounds. (That is, subjects in this group heard the second test orderdescribed previously without having heard the first.) However, we rana second group of subjects whose members were given prior experi-ence with the original door sounds. We added this new conditionbecause we thought that subjects may have had more general famil-iarity with the syllables used by Repp et al. than with the door soundsof the present experiment.

Subjects in this latter group were told that the first sound of eachpair presented in the first test order was the sound of a metal doorslamming and the second was the sound of a wooden door slamming.They were to listen to the repeating pair and to learn to associate thelabel "metal door" to the first sound and "wooden door" to thesecond. They were invited to listen to the five repetitions of the paira second time. No subject requested to do so, however; the doorsounds were quite distinct and hence easy to learn to label.

To identify the excerpts in the second test sequence as the filteredmetal or wooden door, subjects in both groups wrote an identifyingletter on their answer sheet. They were instructed to guess if theywere unsure as to the correct answer.

Results

Subjects performed poorly on the test. On average, theyidentified the metal-door excerpts as the filtered sound of ametal door on 35% of trials and the wooden-door excerpt asthe filtered sound of a metal door on 71 % of trials. Thisdifference is significant, F(l, 30) = 71.73, p = .009, but itreflects a systematic misapprehension on the part of listeners.There was also a significant main effect of group (that is,whether or not listeners had prior experience hearing theunfiltered door sounds). Those with no prior experience re-ported hearing the metal door on 50% of trials compared with55% for those with experience, ,F(1, 30) = 8.76, p = .006. Theinteraction group and excerpt type was nonsignificant.

Individual subjects were highly consistent in their labelingperformance; some were consistently correct and others wereconsistently wrong. On average, subjects with no prior expe-rience hearing the filtered sounds gave the same label (eithermetal door or wooden door) to the metal door excerpt on81 % of trials and to the wooden door excerpt on 84% of trials;the corresponding values for subjects with prior experiencewere 91% and 89%. This outcome suggests that the excerptswere highly discriminable; however, clearly the metal doorexcerpt does not sound like the metal door to subjects. Hence,we conclude that listeners cannot cognitively integrate thedoor excerpt with the base and infer the sound of the metaldoor in the duplex experiments already presented.

Page 10: Duplex perception: A comparison of monosyllables and slamming doors

DUPLEX PERCEPTION IN SYLLABLES AND DOOR SLAMS 751

Experiment 4

The next experiment tests the second prediction of a viewthat subjects' duplex responses arise in a cognitive integrationof base and excerpt sounds. It is that subjects can hear thebase as the base on a trial in which they report a duplexpercept. In this experiment we use the AXB procedure ofRepp et al. (1983) and compare the speech and door stimuliof Experiment 2.

Method

Subjects. Subjects were 12 students at Dartmouth College whoparticipated for course credit. They are the same students who partic-ipated in Experiment 2.

Materials. Stimuli were the full /ga/, the base syllable and the/ga/ transition, and the metal door, the base and the excerpt ofExperiment 2. These stimuli were used to construct two AXB tests.The tests were the same except that one presented the speech stimuliand the other the door sounds.

In the speech test, half of the trials presented "base, base,/ga/" tothe left ear; the /ga/ transition was presented to the right ear simul-taneous with the middle, base syllable, in the left ear. The transitionwas presented at three times its natural intensity relation to the base.The other trial type was identical to the first except that it presentedsyllables in the order "/ga/, base, base" to the left ear. Ten tokens ofeach trial type were presented in random order to subjects. There was1 s between syllables in a trial and a 4-s interval between trials. Thenonspeech test was analogous to the speech test, differing only in thestimuli presented to listeners. That is, "base, base, metal door" or thereverse order was presented to the left ear, while the metal-doorexcerpt was presented to the right ear simultaneous with presentationof the middle base sound to the left ear.

Procedure. Subjects were told that they would hear three soundsin succession on each trial in the left ear. Simultaneous with themiddle sound, they would also hear a sound in the right ear that theywere to ignore. Their task was to decide whether the middle left earsound was more like the first or more like the third sound on eachtrial. They were to indicate their choices by writing A (the middlesound is more like the first sound) or B (the middle sound is morelike the third sound) on their answer sheets.

As noted in the Procedure of Experiment 2, the order in whichsubjects took the speech and nonspeech tests was counterbalanced.Subjects took the AXB tests after taking the speech or nonspeechtests described as Experiment 2.

Results

On the speech tests, subjects matched the base sound pairedwith the /ga/ transition to the full /ga/ (and hence not to theisolated base) on 68% of trials. Although this performancelevel is lower than that found by Repp et al. (1983), it issignificantly different from a chance value of 50%, t ( l l ) =4.32, p = .001, with 11 of 12 subjects matching the duplexstimulus to the full /ga/ more than 50% of the time.

On the nonspeech test, the outcome was different. On thistest, only 3 of 12 subjects matched the base with the full doorsound more than 50% of the time, whereas 8 of 12 matchedit with the isolated base sound more than 50% of the time.On average, subjects matched the duplex base with the fulldoor sound on 32% of trials. This value does not differ from50%, t(l 1) = -1.78, p = .09, and, of course, the tendency to

differ that it does exhibit suggests a failure to integrate theexcerpt with the base in a way that is impenetrable to intro-spection.

Performance on the speech and nonspeech sounds differedsignificantly, t(\ 1) = 2.95, p = .013.

Discussion

Experiment 3 and the identification test of Experiment 1indicate quite clearly that listeners cannot identify the metal-door sound from its high-pass filtered fragment. In this re-spect, duplex perception for the door sounds is no moresubject to cognitive integration than is duplex perception ofspeech sounds. Even so, the present experiment does reveal adifference between the duplex speech and door percepts. Inparticular, integration of the two syllable fragments across theears is less penetrable than analogous integration of the doorfragments; subjects do hear the door base in the left ear underconditions of dichotic presentation of base and chirp andunder instructions to attend to that ear.

This difference in outcome between the door and speechfindings may help to satisfy concerns a reviewer of this articleraised about the AXB procedure as applied to questions ofintegration. The reviewer suggested that the outcome of anAXB procedure can mimic effects of integration in the ab-sence of true perceptual integration of dichotic sound frag-ments. In particular, in the procedure used by Repp et al.(1983) and us, listeners may fail to integrate the base andchirp or excerpt during the X phase of the trial, but maymatch the signal anyway with the acoustically more complexof A or B. Because the original syllable or door sound will bemore complex than its base, the response pattern will mimicintegration. However, if our subjects were adopting that strat-egy, inexplicably they adopt it only when listening to speechsounds and not to door sounds. In fact, the difference inoutcome is associated with a noticeable difference in experi-ence listening to the speech and door tests. Whereas we findit impossible to hear anything but "ga" on X trails, we do notfind it hard attending to the ear receiving the door base tohear the base on X trials. A failure to integrate compellinglyin perceptual experience is associated in these tests with afailure to match the X trial to the original whole sound and atendency to match it instead to the base.

The difference we found between speech and door perceptsin fact is consistent with another difference obtained in Ex-periment 2. Whereas reports of DC ("da," "chirp") were rarein the speech test, occurring on just 14% of trials, the analo-gous response WS ("wooden door," "shaking sound") wasfairly common, occurring on 30% of trials. This implies a lesscompelling integration of door-sound parts than of speech-sound parts.

There is an apparent discrepancy in outcome on testsinvolving door fragments across the experiments. Whereasthe duplex response MS predominates in the dichotic identi-fication tests of Experiments 1 and 2 when the high-passfiltered door sound is as high in intensity as it is on the AXBtests of the present experiment, in the AXB dichotic testsubjects did not match the duplex stimulus with the full metaldoor on most of the trials. Possibly, the important difference

Page 11: Duplex perception: A comparison of monosyllables and slamming doors

752 CAROL A. FOWLER AND LAWRENCE D. ROSENBLUM

was the instruction in the AXB test to attend to the left-earstimuli. Whereas that instruction is ineffective when stimuliare speech sounds, it may be effective for the door sounds.

We have found a clear difference in performance on speechand door sounds. In the last experiment, we ask whether thedifference can be ascribed not to the sound sources themselvesbut to the means by which we created them. Whereas thespeech sounds were synthesized, the door fragments necessar-ily were made by filtering a natural door sound. In addition,whereas the chirp spans just the first 50 ms of the 250-mssyllable, the door excerpt spans the whole temporal extent ofthe door sound. With dichotic presentation, this latter situa-tion comes closer to Cutting's (1976) category of "spectralfusion" (in which, for example, Fl of a syllable may bepresented to one ear and F2 to the other) rather than the"spectral/temporal fusion" that commonly gives rise to du-plex percepts. To determine whether these differences betweenthe speech and door stimuli determine the differences inresponse patterns to the two classes of sound, in Experiment5, we look at duplex perception of a filtered speech syllableusing the AXB procedure of Experiment 4.

Experiment 5

Method

Subjects. Subjects were 16 students at Dartmouth College whoparticipated for course credit. All were native speakers of English whoreported normal hearing.

Materials. We recorded a variety of CV monosyllables producedby Carol A. Fowler. After some exploration, we chose a /ga/ syllable279 ms in duration. Low-pass filtered at 1,100 Hz; we identified thesound as /ba/ (the base). High-pass filtered also at 1,100 Hz; thesyllable sounded like a distorted isolated vowel.

Using these stimuli, we made three test sequences. In one, the full/ga/, the base, and the high-pass filtered syllable were presented inorder five times. Next was a 30-item identification test in which thethree sounds appeared 10 times each in random order. There were3 s between trials. The final test was an AXB discrimination testanalogous to that in Experiment 4. In one type of trial, subjects heard"base, base, /ga/" in the left ear; the high-pass filtered sound waspresented to the right ear simultaneously with the middle left-earsound at 1.2 times its natural intensity relation to the base. (At higherintensities, the sound peak clipped.) In the other trial type, the orderof left-ear sounds was "/ga/, base, base." There was 1 s betweensyllables of a trial and 4 s between trials.

Procedure. Subjects first heard the three sounds in order. Theywere asked to identify each one in writing or, failing that, to describeit as best they could. Next, we told subjects to call the first sound/ga/, the second, /ba/, and the third, a "chirp." We then replayed therepeating sequence of sounds so that subjects could learn to attachour labels to them. Next, subjects took the 30-item identification testand finally the AXB test. Instructions on the AXB test were the sameas for those used in Experiment 4.

Results and Discussion

On the first open-ended identification test, all subjects heardthe base as beginning with a bilabial consonant. Twelvesubjects reported a syllable beginning with "b"; 3 heard asyllable-initial "w"; 1 repotted "p." Just 2 subjects heard anyconsonant at all in the high-pass filtered sound. One subjectreported "b" and one "w." Thirteen subjects heard an isolated

vowel; 1 subject reported a "spring sound." Disappointingly,just 2 subjects reported that the full /ga/ syllable started with"g"; 2 reported "d," 2 reported no initial consonant, and 10reported "b." Of the 10 subjects in the last category, 6 reportedhearing the same consonant at the beginning of the base andthe full /ga/. Even so, performance was high on the identifi-cation test in which we tested subjects' ability to apply ourlabels to the stimuli. Performance averaged 97.3% on thattest.

On the AXB test, 73% of responses associated the duplexsyllable with the full /ga/. This value differs significantly fromthe chance value of 50%, t(l5) = 3.28, p = .001.

Of the 16 subjects, 13 exceeded 50% selection of/ga/ asmore like the duplex syllable. This outcome does not differfrom the outcome using synthetic speech. Accordingly, wecannot ascribe the difference on AXB performance that wefound between synthetic speech and filtered door sounds tothe filtering process itself.

General Discussion

A major goal of these experiments was to determinewhether we could obtain duplex perception for sounds whoseperception could not be ascribed to a specialized "module."We have been partially, but not wholly, successful.

Our experiments indicate that for diotic presentations thedoor sounds show the same general effects as for dioticallypresented speech. This includes evidence analogous to thatwhich led Whalen and Liberman to conclude that the speechmodule is preemptive. That is, at intermediate levels of inten-sity of door excerpt, listeners reported hearing the metal-doorsound, whereas at higher intensities they reported the duplexresponse predominantly. Clearly, however, our findings donot signify that there is a door module that is preemptive withrespect to the general auditory system.

The results with dichotic presentation of door and speechsounds did reveal some differences. Responses indicative ofintegration without duplexity were rare for door sounds butnot for speech; in contrast, responses suggestive of a failure tointegrate across the ears were considerably more common fordoor than for speech sounds; finally, under instructions to doso, listeners to the door sounds, but not the speech syllables,could focus their attention on the base when the excerpt orchirp was presented in the other ear.

There are at least two interpretations of these differences.First, speech perception may be the product of a specializedmodule as Liberman and his colleagues suppose. However,this leaves at least one finding with the door sounds unex-plained. With dichotic presentation of the base and excerpt,and no instructions to attend to the ear receiving the base,subjects reported MS predominantly, not WS, despite evi-dence that they do not recognize the excerpt as part of themetal-door sound.

An alternative interpretation is that the differences we seebetween diotic and dichotic presentation of door sounds andbetween dichotic presentation of door and speech soundsreflect differences in integrability of the sound fragments. Inturn, those differences arise for reasons that we understandonly in part. Consider the differences between door soundsdiotically and dichotically presented. The main differences on

Page 12: Duplex perception: A comparison of monosyllables and slamming doors

DUPLEX PERCEPTION IN SYLLABLES AND DOOR SLAMS 753

the duplex identification test are a considerably lower intensityof the excerpt at which MS responses begin to predominatewith dichotic presentation and a greater frequency of WSresponses with dichotic presentation. Both findings suggestthat with separation of the parts of the door sounds betweenthe ears, and hence with strong information that the soundparts arise in different locations in space, listeners are morelikely to report hearing two sounds, not one, and they are lesslikely to integrate the sounds across the ears. As to why anyintegration occurs at all when the sounds are localized differ-ently, we speculate that the provocation is the compellingspectral fit between the parts.

As for the difference between speech and door sounds underdichotic presentation, we speculate that they also reflect dif-ferences in the integrability of the parts, although our oneattempt to eliminate such a difference by filtering the speechdid not succeed. We do not know all of the conditions thatpromote integration or separation of distinct sounds (but see,e.g., Bregman, Abramson, Doehring, & Darwin, 1985; Breg-man & Dannenbring, 1973). However, conditions includespectral similarity (e.g., Darwin & Gardner, 1986; Darwin &Sutherland, 1984) and continuity (Bregman & Dannenbring,1973; Ciocca & Bregman, 1989). Possibly, speech syllablesare associated with more richly structured acoustic signalsthan are door sounds, and so acoustic syllable parts exhibitmore compelling evidence of fitting together to specify acommon sound producing source than do acoustic door parts.

If the findings of duplex perception for the door sounds arereal, then we must conclude that duplex perception itself doesnot index involvement of two distinct perceptual systemsinside a perceiver. There can be no specialized modules forperceiving door slams. Perhaps, instead, duplexity can be usedas a tool to uncover the acoustic information that listenersuse to determine the number and kinds of sound-producingevents in the environment (cf. Bergman, 1987). Further in-vestigation in that area may help explain the greater tendencyfor speech than door-sound fragments to integrate.

There is a final important point to make concerning infer-ences about perception of speech and nonspeech that inves-tigators draw from studies of duplex perception. Whereasresearchers have written as if they are studying perception ofboth speech and nonspeech using the duplex paradigms, thecomparison can be described differently. It is also a compar-ison of perception based on acoustic signals that can beascribed to coherent physical events in the environment (vo-cal-tract activity) and others (transitions) that cannot be as-cribed to a definite source. That is, the distinction betweenspeech and nonspeech is confounded with that between signalshaving and not having definite causal sources in the environ-ment. A better way to compare speech and nonspeech percep-tion using duplex paradigms is to substitute sounds fromnatural nonspeech events for the syllables in the duplex par-adigms. Is speech the only event that when jointly specifiedby distinct sound fragments leads to duplex perception andpreemption? We think not. Rather, when there is acousticinformation spanning distinct acoustic signals that specifiesany coherent sound-producing source, that cospecificationleads to preemption and, under the proper circumstances, toduplex perception.

References

Bell-Berti, F., Raphael, L., Pisoni, D., & Sawusch, J. (1979). Somerelationships between speech production and speech perception.Phonetic^ 361, 373-383.

Bregman, A. (1987). The meaning of duplex perception: Sounds astransparent objects. In M. E. H. Schouten (Ed.), The psychophysicsof speech perception (pp. 95-111). Dordrecht, The Netherlands:Martinus Nijhoff.

Bregman, A., Abramson, J., Doehring, P., & Darwin, C. (1985).Spectral integration based on common amplitude. Perception &Psychophysics, 37, 483-493.

Bregman, A., & Dannenbring, G. (1973). The effect of continuity onauditory stream segregation. Perception & Psychophysics, 13, 308-312.

Ciocca, V., & Bregman, A. (1989). The effects of auditory streamingon duplex perception. Perception & Psychophysics, 46, 39-48.

Collins, S. (1985). Duplex perception with musical stimuli: A furtherinvestigation. Perception & Psychophysics, 38, 172-177.

Cutting, J. (1976). Auditory and linguistic processes in speech percep-tion: Inferences from six fusions in dichotic listening. PsychologicalReview, 83, 114-140.

Darwin, C., & Gardner, R. (1986). Mistuning a harmonic of a vowel:Grouping and phase effects on vowel quality. Journal of the Acous-tical Society of America, 79, 838-845.

Darwin, C., & Sutherland, N. (1984). Grouping frequency compo-nents of vowels: When is a harmonic not a harmonic? QuarterlyJournal of Experimental Psychology, 36A, 193-208.

Fodor, J. (1983). The modularity of mind. Cambridge, MA: MITPress.

Fodor, J. (1985). Precis of The Modularity of Mind, the Behavioraland Brain Sciences, 8, 1-42.

Fowler, C. (1986). An event approach to the study of speech percep-tion from a direct-realist perspective. Journal of Phonetics, 14, 3-28.

Fowler, C. A., & Rosenblum, L. D. (in press). The perception ofphonetic gestures. In I. G. Mattingly & M. Studdert-Kennedy(Eds.), Modularity and the motor theory of speech perception.Hillsdale, NJ: Erlbaum.

Gibson, J. J. (1962). Observations on active touch. PsychologicalReview, 69, 477'-491.

Gibson, J. J. (1966). The senses considered as perceptual systems.Boston: Houghton-Mifflin.

Gibson, J. J. (1979). The ecological approach to visual perception.Boston: Houghton-Mifflin.

Liberman, A. M., Cooper, F., Shankweiler, D., & Studdert-Kennedy,M. (1967). Perception of the speech code. Psychological Review,74, 431-461.

Liberman, A. M., Isenberg, D., & Rakerd, B. (1981). Duplex percep-tion of cues for stop consonants: Evidence for a phonetic module.Perception & Psychophysics, 30, 133-143.

Liberman, A. M., & Mattingly, I. G. (1985). The motor theory ofspeech perception revised. Cognition, 21, 1-36.

Liberman, A. M., & Mattingly, I. G. (1989). A specialization forspeech perception. Science, 243, 489-494.

Mann, V., & Liberman, A. M. (1983). A. Some differences betweenphonetic and auditory modes of perception. Cognition, 14, 211-235.

Nusbaum, H., Schwab, E. C., & Sawusch, J. (1983). The role of"chirp" identification in duplex perception. Perception & Psycho-physics, 33, 323-332.

Pastore, R., Schmuckler, M., Rosenblum, L. D., & Szczesiul, R.(1983). Duplex perception with musical stimuli. Perception &Psychophysics, 33, 469-479.

Rand, T. (1974). Dichotic release from masking for speech. Journal

Page 13: Duplex perception: A comparison of monosyllables and slamming doors

754 CAROL A. FOWLER AND LAWRENCE D. ROSENBLUM

of the Acoustical Society of America, 55, 678-680. Whalen, D., & Liberman, A. (1987). Speech perception takes prece-Remez, R., Rubin, P., Pisoni, D., & Carrell, T. (1981). Speech dence over nonspeech perception. Science, 237, 169-171.

perception without traditional speech cues. Science, 212, 947-950.Repp, B., Milburn, C., & Ashkenas, J. (1983). Duplex perception:

Confirmation of fusion. Perception & Psychophysics, 33, 333-338. Received July 17, 1989Rosenblum, L. D. (1987). Towards an ecological alternative to the Revision received December 7, 1989

motor theory of speech perception. PAW Review, 2, 25-29. Accepted December 11, 1989 •

All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.