Top Banner
Exp Brain Res (2005) 160: 273282 DOI 10.1007/s00221-004-2005-z RESEARCH ARTICLES Nadia Bolognini . Francesca Frassinetti . Andrea Serino . Elisabetta Làdavas Acoustical visionof below threshold stimuli: interaction among spatially converging audiovisual inputs Received: 25 June 2003 / Accepted: 8 June 2004 / Published online: 13 November 2004 # Springer-Verlag 2004 Abstract Crossmodal spatial integration between audito- ry and visual stimuli is a common phenomenon in space perception. The principles underlying such integration have been outlined by neurophysiological and behavioral studies in animals; this study investigated whether the integrative effects observed in animals also apply to humans. In this experiment we systematically varied the spatial disparity (0°, 16°, and 32°) and the temporal interval (0, 100, 200, 300, 400, and 500 ms) between the visual and the auditory stimuli. Normal subjects were required to detect visual stimuli presented below threshold either in unimodal visual conditions or in crossmodal audiovisual conditions. Signal detection measures were used. An enhancement of the perceptual sensitivity (d) for luminance detection was found when the audiovisual stimuli followed a simple spatial and temporal rule, governing multisensory integration at the neuronal level. Keywords Crossmodal integration . Visual-auditory interaction . Perceptual sensitivity Introduction A stable representation of external space requires the integration of information from multiple sensory modal- ities. In the real world our attention must often be coordinated crossmodally so that we can select informa- tion from a common external source across several modalities. Crossmodal integration may be the rule rather than the exception in real world perception. It seems adaptive that multiple sources of information, derived from the different modalities, can be combined to yield the best estimate of the external properties (Driver and Spence 2000). Audition is important in orienting (Knudsen and Brainard 1995), and we depend upon sound localization for orientation toward significant distal events which occur outside the field of view and during vision occlusion and darkness. Recent studies have investigated the possibility that sound also improves visual perception. In particular, it has been documented that an auditory uninformative peripheral cue enhances perceptual processing of subse- quent visual stimulus (Driver and Spence 1998; McDonald et al. 2000; Spence and Driver 1997). These findings have been interpreted as evidence of the existence of cross- modal spatial attention: the exogenous shift of attention in one modality (audition) leads to a corresponding shift of attention in another modality (vision; Driver and Spence 1998; Macaluso et al. 2001). Physiological mechanisms underlying the enhancement of visual processing by an auditory stimulus have been extensively investigated in animal work. This has demon- strated the existence of special rules underlying this integration at the cellular level. Nevertheless few studies have investigated in humans the behavioral effects of these rules. Neurophysiological studies have documented in several brain structures the existence of multisensory neurons responding to stimuli in different modalities, for example, vision and audition. At the single cell level inputs from different modalities are integrated by the multisensory neurons according to three main principles (Stein and Meredith 1993): the first concerning the spatial proximity of the stimuli (spatial rule), the second the temporal interval between the stimuli (temporal rule), and the third the nature of the multimodal response (inverse effectiveness). According to the spatial rule, only spatially coincident stimuli from different modalities are integrated, producing neuronal response enhancement, whereas spatially dispar- ate stimuli produce either response depression or else are not integrated, producing no interaction at the single cell N. Bolognini . F. Frassinetti . A. Serino . E. Làdavas (*) Dipartimento di Psicologia, Università degli Studi, Viale Berti Pichat 5, 40127 Bologna, Italy e-mail: [email protected] Tel.: +39-051-2091347 Fax: +39-051-243086 N. Bolognini . F. Frassinetti . A. Serino . E. Làdavas Centro Studi e Ricerele di Neuroscienze Cognitive, Cesena, Italy
10

?Acoustical vision? of below threshold stimuli: interaction among spatially converging audiovisual inputs

Apr 10, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ?Acoustical vision? of below threshold stimuli: interaction among spatially converging audiovisual inputs

Exp Brain Res (2005) 160: 273–282DOI 10.1007/s00221-004-2005-z

RESEARCH ARTICLES

Nadia Bolognini . Francesca Frassinetti .Andrea Serino . Elisabetta Làdavas

“Acoustical vision” of below threshold stimuli: interaction amongspatially converging audiovisual inputs

Received: 25 June 2003 / Accepted: 8 June 2004 / Published online: 13 November 2004# Springer-Verlag 2004

Abstract Crossmodal spatial integration between audito-ry and visual stimuli is a common phenomenon in spaceperception. The principles underlying such integrationhave been outlined by neurophysiological and behavioralstudies in animals; this study investigated whether theintegrative effects observed in animals also apply tohumans. In this experiment we systematically varied thespatial disparity (0°, 16°, and 32°) and the temporalinterval (0, 100, 200, 300, 400, and 500 ms) between thevisual and the auditory stimuli. Normal subjects wererequired to detect visual stimuli presented below thresholdeither in unimodal visual conditions or in crossmodalaudiovisual conditions. Signal detection measures wereused. An enhancement of the perceptual sensitivity (d′) forluminance detection was found when the audiovisualstimuli followed a simple spatial and temporal rule,governing multisensory integration at the neuronal level.

Keywords Crossmodal integration . Visual-auditoryinteraction . Perceptual sensitivity

Introduction

A stable representation of external space requires theintegration of information from multiple sensory modal-ities. In the real world our attention must often becoordinated crossmodally so that we can select informa-tion from a common external source across severalmodalities. Crossmodal integration may be the rule ratherthan the exception in real world perception. It seems

adaptive that multiple sources of information, derivedfrom the different modalities, can be combined to yield thebest estimate of the external properties (Driver and Spence2000).

Audition is important in orienting (Knudsen andBrainard 1995), and we depend upon sound localizationfor orientation toward significant distal events which occuroutside the field of view and during vision occlusion anddarkness. Recent studies have investigated the possibilitythat sound also improves visual perception. In particular, ithas been documented that an auditory uninformativeperipheral cue enhances perceptual processing of subse-quent visual stimulus (Driver and Spence 1998; McDonaldet al. 2000; Spence and Driver 1997). These findings havebeen interpreted as evidence of the existence of cross-modal spatial attention: the exogenous shift of attention inone modality (audition) leads to a corresponding shift ofattention in another modality (vision; Driver and Spence1998; Macaluso et al. 2001).

Physiological mechanisms underlying the enhancementof visual processing by an auditory stimulus have beenextensively investigated in animal work. This has demon-strated the existence of special rules underlying thisintegration at the cellular level. Nevertheless few studieshave investigated in humans the behavioral effects of theserules. Neurophysiological studies have documented inseveral brain structures the existence of multisensoryneurons responding to stimuli in different modalities, forexample, vision and audition. At the single cell levelinputs from different modalities are integrated by themultisensory neurons according to three main principles(Stein and Meredith 1993): the first concerning the spatialproximity of the stimuli (spatial rule), the second thetemporal interval between the stimuli (temporal rule), andthe third the nature of the multimodal response (inverseeffectiveness).

According to the spatial rule, only spatially coincidentstimuli from different modalities are integrated, producingneuronal response enhancement, whereas spatially dispar-ate stimuli produce either response depression or else arenot integrated, producing no interaction at the single cell

N. Bolognini . F. Frassinetti . A. Serino . E. Làdavas (*)Dipartimento di Psicologia, Università degli Studi,Viale Berti Pichat 5,40127 Bologna, Italye-mail: [email protected].: +39-051-2091347Fax: +39-051-243086

N. Bolognini . F. Frassinetti . A. Serino . E. LàdavasCentro Studi e Ricerele di Neuroscienze Cognitive,Cesena, Italy

Page 2: ?Acoustical vision? of below threshold stimuli: interaction among spatially converging audiovisual inputs

level. The spatial property of multisensory integrationdepends on the organization in zones of excitation andinhibition that define the receptive fields of multisensoryneurons. The receptive fields of these neurons areorganized in two areas: a central excitatory area, calledthe best area, surrounded by an inhibitory area. Becauseauditory and visual receptive fields of bimodal neuronsoverlap (Jay and Sparks 1987; King and Hutchings 1987;King and Palmer 1985; Knudsen 1982; Middlebrooks andKnudsen 1984), spatially coincident audiovisual stimulifall within their excitatory receptive fields and enhanceone another’s effects. However, if the stimuli are spatiallydisparate, one may fall within the inhibitory receptiveregion and depress the effects of the other, or it may beprocessed as a separate event (Meredith and Stein 1986a;Stein and Meredith 1993). More precisely, when one of thetwo stimuli falls within the inhibitory region of itsreceptive field, a bimodal neuron’s response is depressed,whereas when both stimuli fall within the inhibitory regionof their respective receptive fields they are not integratedand a neuron’s response is no different from the responseto the single-modality stimuli presented alone.

According to the temporal rule, maximal levels ofbimodal enhancement are achieved when two inputs arepresented simultaneously, without temporal disparity. Thishas been found to be the “optimal” interactive period formost multisensory neurons (Meredith et al. 1987), but it isnot always the rule. In some neurons, for example,combinations of unimodal auditory and visual stimuli atsome intervals (50 and 150 ms) also produce responseenhancement. However, at longer intervals (200 and300 ms) audiovisual stimulation produced responsedepression of a neuron’s activity or else no interaction(Meredith et al. 1987; Stein and Meredith 1993).

Finally, according to the inverse effectiveness rule, thereis an inverse relationship between the effectiveness of thestimuli and the neural response evoked by the stimuli. Thecombination of weak unimodal stimuli produces greaterresponse enhancement than combinations of potent stim-uli, that is, combining two unimodal stimuli, neither ofwhich alone are capable of evoking an obvious effect onthe neuron’s activity, can dramatically enhance responsesin multisensory neurons (Meredith and Stein 1986b; Steinand Meredith 1993).

Multisensory enhancement has been demonstrated bynumerous electrophysiological studies (e.g., Bell et al.2001; Frens and van Opstal 1998; King and Palmer 1985;Peck 1987). In addition, the behavioral predictions fromthe electrophysiological data have been confirmed in ahost of species; the combination of crossmodal cuesproduces a significantly better behavioral response thaneither of the individual unimodal stimuli. Conversely,some authors (Populin and Yin 2002) have confirmed thebasic electrophysiological effects of bimodal interactionbut did not find support for the rule of “inverseeffectiveness.” They tested for bimodal enhancement inthe superior colliculus of behaving cats trained to orient toauditory, visual, and bimodal stimuli. Surprising, theynever observed the large enhancement responses in

bimodal stimulation reported in anesthetized (Meredithand Stein 1983; Newman and Hartline 1981) and in awakeanimals (Bell et al. 2001; Frens and van Opstal 1998;Wallace et al. 1998) and in the superior colliculus ofawake human using fMRI (Calvert et al. 2001). Onepossible explanation of their findings is that Populin andYin changed the definition of multisensory enhancementfrom a “multisensory response that is better than the bestunimodal response” (Meredith and Stein 1983, 1986a,1996) to one that is “better than the sum of the twounimodal responses.” Therefore it may be that theirfindings would not differ substantially from those ofpreceding studies (Meredith and Stein 1983, 1986a, 1996)if the same definition were adopted. In conclusion,whatever the reason for Populin and Yin’s reported failureto find the same magnitude of enhancement of otherstudies, the existence of a crossmodal audiovisual inte-gration system is not under question.

An experiment was conducted to investigate whetherthe integrative effects observed in animals can also befound in humans (Frassinetti et al. 2002a). The spatialdisparity (0°, 16°, and 32°) and the temporal interval (0and 500 ms) between the visual and the auditory stimuliwere systematically varied; moreover, due to the inverserelationship between the effectiveness of the stimuli andthe response evoked in bimodal cells the visual target wasdegraded by using a visual mask. Subjects were instructedto detect only the presence of visual targets and to ignorethe sounds. The results show that an auditory stimuluspresented at one spatial location facilitates responses to avisual target at that location. Conversely, when the samevisual and auditory stimuli were presented at spatiallydisparate loci, the detectability of the visual stimulus didnot improve. Moreover, the capacity of an auditorystimulus to enhance the detectability of a visual stimuluswas evident only when the two stimuli were presentedsimultaneously. When the auditory stimulus preceded thevisual stimulus by 500 ms no improvement in visualdetectability was found.

Neurophysiological studies (Meredith et al. 1987; Steinand Meredith 1993) have shown that maximal multi-sensory interactions are achieved when periods of peakactivity of unimodal discharge trains overlap. This situa-tion is usually achieved when stimuli are presentedsimultaneously, although a temporal window for multi-sensory interactions might overcome small temporaldiscrepancies between stimuli. It has been found that insome neurons combination of unimodal auditory andvisual stimuli at intervals of 50 and 150 ms also produceresponse enhancement. To determine whether this is alsothe case in humans, we systematically varied the temporalinterval (0, 100, 200, 300, 400, and 500 ms) between thevisual and the auditory stimuli as well as the spatialdisparity (0°, 16°, and 32°). Due to the inverse relationshipbetween the effectiveness of the stimuli and the responseevoked in bimodal cells the visual target was degraded asin the previous study by Frassinetti et al. (2002a) using avisual mask.

274

Page 3: ?Acoustical vision? of below threshold stimuli: interaction among spatially converging audiovisual inputs

In some of the detection experiments on spatial attentionthe dependent measure is visual simple reaction time, butit has long been argued (Duncan 1980; Shaw 1980) thatspatial cuing effects on simple reaction time may reflectcriterion shifts rather then genuine effects of attentionupon perception: subjects may reduce the amount ofevidence required for deciding whether a target hasoccurred on the cued side. Therefore in the present studywe used a more direct determination of visual detectionprocesses, using signal detection measures. These enablethe separation of the two components involved inperception processes: the d′ parameter, which reflectssubject’s perceptual sensitivity to discern a sensory eventfrom its background (perceptual level) and the β param-eter, which reflects subject’s decision criterion of response(postperceptual level). These measures allow us toexamine whether the expected improvement in visualdetection induced by an auditory stimulus is due to a trulyperceptual effect rather than a postperceptual effect.According to the crossmodal integration hypothesis, thed′ parameter should vary with the degree of spatial andtemporal correspondence between the auditory and thevisual stimuli. If crossmodal integration facilitates visualperceptual processing, an increased perceptual sensitivity(d′) is expected when presenting temporally overlappingvisual and auditory stimuli in the same spatial position.The effect, although it might be present at short temporalinterval, should disappear at longer intervals. Conversely,if crossmodal integration affects postperceptual decisionprocesses, a reduction of the decision criterion parameterβ is expected when a simultaneous sound is presented inthe same location.

Materials and methods

Participants

Fifteen participants, all students from the University ofBologna, received course credits for their participation(mean age 22 years). All had normal or corrected-to-normal vision and normal hearing and all were righthanded. The participants were naive as to the purpose ofthe experiment, and they gave informed consent toparticipate in the study according to the Declaration ofHelsinki and the ethics committee.

Apparatus and stimuli

The apparatus consisted of a plastic horizontal arc (height30 cm, length 200 cm) fixed on the table surface in whichthe visual and the auditory stimuli were positioned. Eightpiezoelectric loudspeakers (0.4 W, 8 Ω) were locatedhorizontally at the subject’s ear level at an eccentricity of8°, 24°, 40°, 56° to the right and to the left of the centralfixation point. They were covered by a strip of black fabricattached to a plastic arc preventing any visual cues abouttheir position. The auditory stimuli were created by a

white-noise generator (80 dB). Visual stimuli were locateddirectly in front of every loudspeaker: four light displayspoking out of the black fabric were placed at aneccentricity of 40° and 56° to either side of the fixationpoint. Note that we refer to the auditory positions by labelsA1–A8 moving from left to right, and, similarly, wedescribe the corresponding visual stimuli positions bylabels V1–V4 (see Fig. 1). Each light display containedfour red light-emitting diodes (LEDs) arranged to form a1° square and one green LED positioned at the center ofthe square.

The visual target consisted of the illumination of thecentral green LED (luminance 90 cd/m2 each). After thevisual target the visual mask appeared, i.e., a simultaneousflash of all four red LEDs (luminance 80 cd/m2 each). Thetarget and the mask always appeared in the same position.The duration of the visual target was gradually reducedfrom 100 to 60 ms to maintain a 60–70% subject’s hit rateoverall. For each subject, at the end of each block of trials,the number of correct responses in the unimodal visualcondition was calculated. If the subject’s hit rate washigher than 70%, the duration of the visual target wasreduced 10 ms or more in order to lower the hit rate.

The duration of the visual mask increased proportion-ally to the decrease of visual target so that the visualstimulation (visual target + visual mask) had always thesame duration of 110 ms. The auditory stimulus had thesame duration as the visual target. Timing of stimuli andsubject’s responses were controlled by an ACER 711TElaptop computer, using a custom program (XGen Experi-mental Software, http://www.psychology.nottingham.ac.uk/staff/cr1/) and a custom-made hardware interface.

Procedure

Participants sat on a chair in a dimly lit and soundattenuated room at 70 cm in front of the apparatus facingstraight ahead with the body midline aligned with thecenter of the apparatus. For the entire duration of theexperiment they had to fix their gaze on the centralfixation point which was a small white triangle (1°)located in the center of the apparatus. Fixation wasmonitored visually by the experimenter standing behind

Fig. 1 Overview of the position of light displays and loudspeakers

275

Page 4: ?Acoustical vision? of below threshold stimuli: interaction among spatially converging audiovisual inputs

the apparatus facing the subject. The task was to press abutton with the index of the right hand to indicate that thevisual target was present and to refrain from pressing whenthe target was absent.

Four different kinds of sensory stimulation werepresented: (a) unimodal visual condition, i.e., a visualtarget followed by the mask, (b) unimodal visual catchtrial condition, i.e., only a visual mask without any visualtarget, (c) crossmodal condition, i.e., an auditory stimuluspresented together with a visual stimulus (visual targetplus visual mask), (d) crossmodal catch trial condition, i.e., an auditory stimulus presented with a visual mask,without any visual target. In the crossmodal condition theauditory stimulus could be presented either in the sameposition of the visual stimulus (visual target plus visualmask or only visual mask) or in a different position, that is,16° and 32° of spatial disparity from the visual stimulus.

Then the auditory stimulus could be simultaneous to thevisual one, that is, temporally coincident, or separated by atemporal interval, that is, the auditory stimulus precededthe visual stimulus of 100, 200, 300, 400, and 500 ms.

Participants were instructed to detect only the presenceof the visual target (green LED), by pressing a button withthe index finger of the right hand and to ignore the sounds.

There were the following trials: 72 unimodal visualtrials and 72 unimodal visual catch trials (totalling 36 trialsfor each of the four visual positions); 144 spatiallycoincident crossmodal trials and 144 spatially coincidentcrossmodal catch trials (totalling 12 trials for each of thefour visual positions and for each of the six temporalintervals); 360 spatially disparate crossmodal trials and360 spatially disparate crossmodal catch trials (totalling 12trials for each of the ten crossmodal spatially disparatestimulations and for each of the six temporal intervals).

Fig. 2 Mean ±SD of d′ valuesfor each temporal interval (0,100, 200, 300, 400 and 500 ms)for V1–V4. White bars Unimo-dal visual conditions; black barscrossmodal visual-auditory con-ditions (sp same position, 16n16° of nasal disparity, 32n 32°of nasal disparity); *P≤.05 pair-wise comparisons between uni-modal and crossmodal condi-tions

276

Page 5: ?Acoustical vision? of below threshold stimuli: interaction among spatially converging audiovisual inputs

The total number of trials (1,152) was equally distributedin 36 experimental blocks (32 trials each) given inpseudorandom order and run on consecutive days.

Results

To investigate whether the auditory cuing affects visualdetection performance, d′ values and the β values obtainedin each spatial position (V1–V4) were collapsed and wereanalyzed separately. Because the number of nasal andtemporal disparate crossmodal conditions were not thesame, two analyses of variance were carried out. In one themain factors were: temporal interval (0, 100, 200, 300,400, and 500 ms) and condition (unimodal conditions, inwhich only a visual stimulus was presented in V1–V4, andcrossmodal conditions, in which an auditory stimulus waspresented either in the same position of the visual one, i.e.,

the spatially coincident crossmodal condition, or at 16° orat 32° nasal from the visual stimulus, i.e., spatiallydisparate conditions). The second analysis of variance wassimilar to the first with the exception that in thecrossmodal disparate conditions only trials in which theauditory stimulus was presented at 16° temporal from thevisual stimulus were analyzed. Thus here the two mainfactors were: temporal interval (0, 100, 200, 300, 400, and500 ms) and condition (unimodal conditions, in whichonly a visual stimulus was presented in V2 and V3, andcrossmodal conditions, in which an auditory stimulus waspresented either in the same position as the visual one i.e.,the spatially coincident crossmodal condition or at 16°temporal from the visual stimulus, i.e., spatially disparateconditions. Whenever necessary, pairwise comparisonswere conducted using Scheffé’s test.

Fig. 3 Mean ±SD of d′ valuesfor each temporal interval (0,100, 200, 300, 400, and 500 ms)for V2 and V3. White barsUnimodal visual conditions;black bars crossmodal visual-auditory conditions (sp sameposition; 16t 16° of temporaldisparity); *P≤.05 pairwisecomparisons between unimodaland crossmodal conditions

277

Page 6: ?Acoustical vision? of below threshold stimuli: interaction among spatially converging audiovisual inputs

Signal detection analyses

Compared to the unimodal visual condition, there was aselective increase in perceptual sensitivity (d′) only whenthe auditory stimulus was presented at the same spatialposition as the visual stimulus, and the two stimuli werepresented simultaneously. In contrast, when the same twostimuli were presented at spatially disparate loci, or theauditory stimulus preceded the visual stimulus of 100–500 ms, no visual enhancement was found in thecrossmodal conditions.

The first analysis of variance revealed a significanteffect of the main factors, temporal interval (F(5,70)=7.49,P≤.00001) and condition (F(3,42)=9.16, P≤.00009). Fortemporal interval a significant difference was foundbetween the condition in which stimuli were temporallycoincident (d′=1.26) and the condition in which they were

presented at a temporal disparity of 500 ms (d′=.91,P=.0002). Moreover, there was a significant increase in d′in the spatial coincident condition (d′=1.26) compared tounimodal (d′=1.08, P≤.02) and crossmodal condition, i.e.,when the two stimuli were presented at a spatial disparityof 32° (d′=.98, P≤.0001). More interestingly, there was asignificant interaction between temporal interval andcondition (F(15,210)=7.93, P≤.00001). A significant differ-ence was found in d′ values between the spatiallycoincident crossmodal condition the unimodal condition(d′=1.08) only when the temporal interval between the twostimuli was 0 ms (d′=1.93, P≤.0001). In contrast, for theother intervals the same comparisons were not significant:100 ms (d′=1.27, P=1.0), 200 ms (d′=1.41, P=.99), 300 ms(d′=1.23, P=1.0), 400 ms (d′=1.02, P=1.0) and 500 ms (d′=.68, P=.92; see Fig. 2).

Fig. 4 Mean ±SD of β valuesfor each temporal interval (0,100, 200, 300, 400 and 500 ms)for V1–V4. White bars Unimo-dal visual conditions; black barscrossmodal visual-auditory con-ditions (sp same position; 16n16° of nasal disparity; 32n 32°of nasal disparity); *P≤.05 pair-wise comparisons between uni-modal and crossmodal condi-tions

278

Page 7: ?Acoustical vision? of below threshold stimuli: interaction among spatially converging audiovisual inputs

The second analysis of variance revealed a significanteffect of main factor temporal interval (F(5,70)=7.8,P≤.000007) but not condition (F(2,28)=2.55, P≤.09). Fortemporal interval significant difference was found betweenthe condition in which stimuli were temporally coincident(d′=1.6) and that in which stimuli were presented at atemporal disparity of 400 ms (d′=1.2, P≤.01) and 500 ms(d′=1.07, P≤.00004). Moreover, there was a significantinteraction between temporal interval and condition(F(10,140)=4.14, P≤.00005). The difference in d′ valuesbetween the spatially coincident crossmodal condition andthe unimodal condition (d′=1.22) was significant when thetemporal interval between the two stimuli was 0 (d′=1.97,P≤.03). In contrast, no difference was found for the otherintervals: 100 ms (d′=1.39, P=.99), 200 ms (d′=1.61,P=.95), 300 ms (d′=1.43, P=.99), 400 ms (d′=1.23, P=1.0),and 500 ms (d′=.83, P=.95; see Fig. 3).

Response criterion analyses

Analyses of response criterion (β) data showed thatsubject’s response to visual stimuli was influenced by thepresence of the auditory stimulus, but not in a spatialspecific way: β values were lower in all crossmodalconditions than in unimodal ones, but independently ofboth the spatial position of the auditory stimulus and thetemporal interval between the two stimuli.

The first ANOVA revealed a significant effect of mainfactors, temporal interval (F(5,70)=6.17, P≤.00009) andcondition (F(3,42)=94.47, P≤.000001). For temporal inter-val significant difference was found between the conditionin which stimuli were temporally coincident (β=1.27) andthat in which stimuli were presented at a temporaldisparity of 300 ms (β=1.41, P≤.007), 400 ms (β=1.4,P≤.01), and 500 ms (β=1.43, P≤. 001). Moreover, therewas a significant decrease in the spatial coincident

Fig. 5 Mean ±SD of β valuesfor each temporal interval (0,100, 200, 300, 400 and 500 ms)for V2 and V3. White barsUnimodal visual conditions;black bars crossmodal visual-auditory conditions (sp sameposition; 16t 16° of temporaldisparity); *P≤.05 pairwisecomparisons between unimodaland crossmodal conditions

279

Page 8: ?Acoustical vision? of below threshold stimuli: interaction among spatially converging audiovisual inputs

condition (β=1.04, P≤.000001) compared to unimodalcondition (β=2.1) and in all crossmodal conditions, i.e.,when the two stimuli were presented at a spatial disparityof 16° (β=1.13, P≤.000001) and 32° (β=1.22,P≤.000001). No other comparisons were statisticallysignificant.

There was also a significant interaction between tem-poral interval and condition (F(15,210)=3.7, P≤.000009).Compared to the unimodal visual condition, there was adecrease in β values in all crossmodal conditions,independently of the temporal interval between theauditory and the visual stimulus (P≤.05 in all comparisons;see Fig. 4). Moreover, when the two stimuli weretemporally coincident, there was a significant differencebetween spatially coincident condition and crossmodalconditions in which the two stimuli were presented at adisparity of 32° (.74 vs. 1.17, P≤.003).

The second ANOVA revealed a significant effect of themain factor condition (F(2,28)=57.44, P≤.000001). Therewas a significant increase in β in the unimodal condition(β=2.28) compared to spatial coincident condition(β=1.01, P=.000001) and crossmodal condition, i.e.,when the two stimuli were presented at a spatial disparityof 16° (β=1.06, P≤.000001). No other comparisons weresignificant. The interaction between temporal interval andcondition was not statistically significant (F(10,140)=1.76,P≤.07; see Fig. 5).

Discussion

Neurophysiological studies in cats and monkeys by Steinand colleagues have revealed the existence of multimodalneurons in the superior colliculus which synthesize visual,auditory, and somatosensory inputs. These studies havealso clearly stressed the functional characteristic of thesemultimodal neurons and how spatiotemporal correlation isimportant for such integration, at both behavioral andneural levels. Stimuli in different modalities occurring atthe same time and place tend to be processed as referringto the same external event, thus producing an enhancementof neuronal responses. Stimuli coming from spatiallydisparate positions or at different times are processed asseparate events (Meredith and Stein 1986a; Stein andMeredith 1993).

The results of the present study clearly show theexistence of an audiovisual integration system also inhumans and the relevance of spatiotemporal correlation forthe integration of the auditory and visual modalities. Inparticular we found that an auditory stimulus presented atone spatial location facilitates the detection of a visualtarget at that location. However, when the auditorystimulus was presented at a spatial disparity of 16° or32° from the visual stimulus, the detectability of the visualstimulus did not improve. Moreover, the capacity of anauditory stimulus to enhance the detectability of a visualstimulus depends on the temporal lag between the twoevents. A visuoperceptual enhancement was found whenthe two stimuli were presented simultaneously. In contrast,

when the auditory stimulus preceded the visual stimulusby 100–500 ms, the improvement was not found.

Moreover, the results from the signal detection analysesshowed that a sound influences either the perceptual or thepostperceptual processing of a visual stimulus, but indifferent ways. The perceptual sensitivity (d′) increased ina spatially and temporally specific way. For all spatialpositions considered in the present study there was aselective enhancement of d′ when the auditory stimulusappeared at the same time, and in the same spatial positionof the visual one. In contrast, when there was a spatialdisparity (16°, 32°) and/or a temporal disparity (100, 200,300, 400, 500 ms) between the auditory and the visualstimulus, no improvement in d′ parameter was observed;under such conditions the detectability of the visualstimulus was not influenced by the presence of a sound. Atvariance with d′, the results of the decision criterionparameter, showed a decrease in β in the crossmodalcomparing to unimodal stimulation. A decreased βvalue inthe crossmodal conditions means that a spatial auditorycue modified the subject’s willingness to respond to thevisual target by reducing the uncertainty of subject’sdecision. In other words, the decrease in β suggests thatthe sound biased the subject towards making a “yes”response. In the present study this decrease was indepen-dent of both the spatial position of the auditory stimulusand the temporal interval between the visual and theauditory stimulus. Thus the beneficial effect of sound onvisual perception was not spatially and/or temporallybiased, and consequently the outcomes related to β valuescannot be explained by the spatial and temporal rulesgoverning multisensory integration, although in somecircumstances response bias can play a substantial role incross-modal enhancement of perceived visual stimuli(Odgaard et al. 2003). Conversely, they can be explainedby considering the possibility that the auditory stimulusacted as an alerting signal (Robertson et al. 1998).

However, the results related to d′ are in strict accordancewith the principles governing multisensory integration atthe cellular level as described by Stein and Meredith inanimal studies. During electrophysiological recording ofmultimodal neurons’ activity in the superior colliculus theauthors noted that multisensory enhancement and depres-sion were determined by the spatial and temporalcharacteristics of the stimuli: by manipulating theseparameters the same combinations of stimuli producedresponse enhancement or depression in the same neurons.Therefore the present results, in line with previous studies(Frassinetti et al. 2002a, 2002b; Lovelace et al. 2003),indicate that sensory perception can be affected directly bycrossmodal audiovisual integration.

The findings of the present behavioral study confirm inhumans the response enhancement due to the integrationof the two modalities when the two stimuli weresimultaneous. Since auditory stimuli require about 80 msless time than visual stimuli to activate auditory-visualbimodal or unimodal neurons in the primate superiorcolliculus (Wallace et al. 1996), it is also possible that theinteractive temporal “window” for the interaction of inputs

280

Page 9: ?Acoustical vision? of below threshold stimuli: interaction among spatially converging audiovisual inputs

is longer. As there is evidence that we are more tolerant ofdelays in auditory than in visual ones rather than thereverse (Van de Par and Kohlrausch 2000), it would beinteresting to repeat our experiment with the auditorystimulus lagging behind the visual one.

Previous studies in normal subjects failed to demon-strate the validity of the rules governing audiovisualintegration. Stein and coworkers (1996) examined thepossibility that auditory stimuli alter the judgment of thevisual intensity and in a manner consistent with the spatialrule of multisensory integration. They found that theperceived visual intensity was enhanced regardless ofwhether an auditory stimulus was spatially coincident orwas displaced 45° to the right or left of a visual stimulus.One possible explanation of the findings by Stein et al. isthat multisensory neurons are not involved in functions forwhich stimulus localization in space is not essential. Theperceptual task of assessing stimulus intensity is likely tobe one of these.

Many studies have shown that the detection of a visualtarget can be facilitated by the prior or simultaneouspresentation of a spatially nonpredictive auditory cue fromthe same side of the visual target. Such crossmodal cueingresults have often been attributed to a beneficial shift ofcovert exogenous crossmodal attention toward the cuedposition (i.e., to crossmodal attentional capture). In themajority of experiments on crossmodal attentional capture(or exogenous crossmodal spatial orienting) participantsare instructed to maintain central fixation while making anaccelerated detection or discrimination response to a visualtarget presented on either side of fixation. In these studiesa spatially nonpredictive peripheral cue (such as a shortnoise burst) is presented shortly before the target (typicallyat stimulus onset asynchronies of 0–1000 ms) on either thesame or opposite side (for an exhaustive review of theliterature see Spence 2001). The results show thatresponses are often faster and/or more accurate for visualtarget presented on the same side as the auditory cue thanfor visual target appearing on the uncued side (McDonaldet al. 2000; Spence and Driver 1997). [Using similarprocedure, many studies have demonstrated that thedirection of an upcoming saccade can affect discriminationof visual (Deubel and Schneider 1996; Hoffman andSubramaniam 1995; Kowler et al. 1995) and auditory(Rorden and Driver 1999) targets.]

Thus it seems that for the crossmodal attentional effectobserved when the competition is between two stimulipresented above threshold in opposite hemifields (left andright) the temporal interval between the two stimuli canvary from 100 to 500 ms (McDonald et al. 2001). Incontrast, for the phenomenon described in the presentstudy approximate temporal synchrony is required inproducing the integrative effects. Indeed, the enhancementof visual response was found only when the two stimuliwere presented simultaneously and disappeared at longerintervals (100–500 ms). Thus it seems that the integrativeeffect manifests itself when more than one position withineach visual field is stimulated and/or mainly when at leastone of the two stimuli is presented under-threshold. This

means that the results of the present study are not directlycomparable with those of the McDonald et al. study, inwhich only two spatial positions in opposite hemifieldswere stimulated, or with most of the typical spatial-cueingstudies, where the stimuli are presented above-threshold.

In conclusion, the results of the present study show thatcombining two seemingly unimodal stimuli enhancesvisual responses when a visual stimulus alone is notcapable of evoking an obvious response. This suggeststhat this integrated visuoauditory system is commonlyused in everyday life when at least one of the two sourcesof information is weak, for example, during partiallyvision occlusion and darkness and in a pathologicalsituation such as when one modality has been damaged bya cerebral lesion (Frassinetti et al. 2002b).

Acknowledgements This work was supported by grants fromMURST to E.L.

References

Calvert GA, Hansen PC, Iversen SD, Brammer MJ (2001) Detectionof multisensory integration sites by application of electro-physiological criteria to the BOLD response. Neuroimage14:427–438

Deubel H, Schneider WX (1996) Saccade target selection and objectrecognition: evidence from a common attentional mechanism.Vision Res 36:1827–1837

Driver J, Spence C (1998) Crossmodal attention. Curr Op Neurobiol8:245–253

Driver J, Spence C (2000) Beyond modularity and convergence.Curr Biol 10: R731–R735

Duncan J (1980) The demonstration of capacity limitation. CognPsychol 12:75–96

Frassinetti F, Bolognini N, Làdavas E (2002a) Enhancement ofvisual perception by crossmodal visuo-auditory interaction.Exp Brain Res 147:332–343

Frassinetti F, Pavani F, Làdavas E (2002b) Acoustic vision ofneglected stimuli: interaction among spatially convergingaudiovisual inputs in neglect patients. J Cogn Neurosci14:62–64

Frens MA, Van Opstal AJ (1998) Visual-auditory interactionsmodulate saccade-related activity in monkey superior collicu-lus. Brain Res Bull 46:211–224

Hoffman J, Subramaniam B (1995) The role of visual attention insaccadic eye movements. Percept Psychophys 57:787–795

Jay MF, Sparks DL (1987) Sensorimotor integration in the primatesuperior colliculus. II. Coordinates of auditory signals. JNeurophysiol 57:35–55

King AJ, Hutchings ME (1987) Spatial response property ofacoustically responsive neurons in the superior colliculus of theferret: a map of auditory space. J Neurophysiol 57:596–624

King AJ, Palmer AR (1985) Spatial response proprieties of visualand auditory information in bimodal neurons in the guinea-pigsuperior colliculus. Exp Brain Res 60:492–500

Knudsen EI (1982) Auditory and visual maps of space in the optictectum of the owl. Neuroscience 2:1177–1194

Knudsen EI, Brainard MS (1995) Creating a unified representationof visual and auditory space in the brain. Annu Rev Neurosci18:19–43

Kowler E, Anderson E, Dosher B, Blaser E (1995) The role ofattention in programming saccades. Vision Res 35:1897–1916

Lovelace CT, Stein BE, Wallace MT (2003) An irrelevant lightenhances auditory detection in humans: a psychophysicalanalysis of multisensory integration in stimulus detection.Cognit Brain Res 17:447–453

281

Page 10: ?Acoustical vision? of below threshold stimuli: interaction among spatially converging audiovisual inputs

Macaluso E, Frith C, Driver J (2001) Multisensory integration andcrossmodal attention effects in the human brain. Science292:1791a

McDonald JJ, Teder-Salerjarvi WA, Hillyard SA (2000) Involuntaryorienting to a sound improves visual perception. Nature407:906–908

Meredith MA, Stein BE (1983) Interactions among convergingsensory inputs in the superior colliculus. Science 221:389–391

Meredith MA, Stein BE (1986a) Spatial factors determine theactivity of multisensory neurons in cat superior colliculus.Brain Res 365:350–354

Meredith MA, Stein BE (1986b) Visual, auditory and somatosen-sory convergence on cells in superior colliculus results inmultisensory integration. J Neurophysiol 56:640–662

Meredith MA, Stein BE (1996) Spatial determinants of multisensoryintegration in cat superior colliculus neurons. J Neurophysiol75:1843–1857

Meredith MA, Nemitz, JW, Stein BE (1987) Determinants ofmultisensory integration in superior colliculus neurons. I.Temporal factors. J Neurosci 10:3215–3229

Middlebrooks JC, Knudsen EI (1984) A neural code for auditoryspace in the cat’s superior colliculus. J Neurosci 4:2621–2634

Newman EA, Hartline PH (1981) Integration of visual and infraredinformation in bimodal neurons in the rattlesnake optic tectum.Science 213:789–791

Odgaard EC, Arieh Y, Marks LE (2003) Cross-modal enhancementof perceived brightness: sensory interaction versus responsebias. Percept Psychophys 65:123–132

Populin LC, Yin TC (2002) Bimodal interactions in the superiorcolliculus of the behaving cat. J Neurosci 22:2826–2834

Robertson IH, Mattingley JB, Rorden C, Driver J (1998) Phasicalerting of neglect patients overcomes their spatial deficit invisual awareness. Nature 395:169–172

Rorden C, Driver J (1999) Does auditory attention shift in thedirection of an upcoming saccade? Neuropsychologia 37:357–377

Shaw ML (1980) Identifying attentional and decision makingcomponents in information processing. In: Nickerson RS (eds)Attention and performance VIII. Erlbaum, Hillsdale, pp 277–296

Spence C, Driver J (1997) Audiovisual links in exogenous covertspatial orienting. Percept Psychophys 59:1–22

Stein BE, Meredith MA (1993) Merging of the senses. MIT Press,Cambridge

Stein BE, London N, Wilkinson LK, Price DD (1996) Enhancementof perceived visual intensity by auditory stimuli: a psycho-physical analysis. J Cogn Neurosci 8:497–506

Van de Par S, Kohlrausch A (2000) Sensitivity to auditory-visualasynchrony and the jitter in auditory-visual timing. In:Rogowitz BE, Pappas TN (eds) Human vision and electronicimaging V. Proceeding of the SPIE, vol 3959. SPIE Press,Bellingham, pp 234–242

Wallace MT, Wilkinson LK, Stein BE (1996) Representation andintegration of multiple sensory inputs in primate superiorcolliculus. J Neurophysiol 76:1246–1266

Wallace MT, Meredith MA, Stein BE (1998) Multisensory integra-tion in the superior colliculus of the alert cat. J Neurophysiol80:1006–1010

Bell AH, Corneil BD, Meredith MA, Munoz DP (2001) Theinfluence of stimulus properties on multisensory processing inthe awake primate superior colliculus. Can J Exp Psychol55:123–132

Peck CK (1987) Visual-auditory interactions in cat superiorcolliculus: their role in the control of gaze. Brain Res420:162–166

Spence C (2001) Crossmodal attentional capture: a controversyresolved? In: Folks CL, Gibson BS (eds) Attraction, distraction,and action: multiple perspectives on attentional capture.Elsevier, Amsterdam

282