Signal detection theory in the 2AFC paradigm: attention ...Vision Research 40 (2000) 3121–3144 Signal detection theory in the 2AFC paradigm: attention, channel uncertainty and probability

Vision Research 40 (2000) 3121–3144

Signal detection theory in the 2AFC paradigm: attention, channeluncertainty and probability summation

Christopher W. Tyler *, Chien-Chung ChenSmith–Kettlewell Eye Research Institute, 2318 Fillmore St., San Francisco, CA 94115, USA

Received 17 March 1999; received in revised form 3 December 1999

Abstract

Neural implementation of classical High-Threshold Theory reveals fundamental flaws in its applicability to realistic neuralsystems and to the two-alternative forced-choice (2AFC) paradigm. For 2AFC, Signal Detection Theory provides a basis foraccurate analysis of the observer’s attentional strategy and effective degree of probability summation over attended neuralchannels. The resulting theory provides substantially different predictions from those of previous approximation analyses. Inadditive noise, attentional probability summation depends on the attentional model assumed. (1) For an ideal attentional strategyin additive noise, summation proceeds at a diminishing rate from an initial level of fourth-root summation for the first fewchannels. The maximum improvement asymptotes to about a factor of 4 by a million channels. (2) For a fixed attention field inadditive noise, detection is highly inefficient at first and approximates fourth-root summation through the summation range. (3)In physiologically plausible root-multiplicative noise, on the other hand, attentional probability summation mimics a linearimprovement in sensitivity up to about ten channels, approaching a factor of 1000 by a million channels. (4) Some noise sources,such as noise from eye movements, are fully multiplicative and would prevent threshold determination within their range ofeffectiveness. Such results may require reappraisal of previous interpretations of detection behavior in the 2AFC paradigm.© 2000 Elsevier Science Ltd. All rights reserved.

Keywords: Psychophysics; Summation; Probability summation; 2AFC; Attention; Uncertainty; Signal detection theory; Additive noise; Multiplica-tive noise

www.elsevier.com/locate/visres

1. Introduction

A principal function of early human vision is toanalyze the spatial structure of images of the visualworld. This information is then used to develop arepresentation of the properties of the objects before usand their layout in 3D space and in time. Despiteprevious attempts, a valid analytic framework has yetto be applied to the variety of spatial integration phe-nomena measured in laboratory studies. The analysisprovided in this paper will demonstrate the deficienciesin previous approaches and form the basis for a com-prehensive analysis of spatial summation based on thetenets of Signal Detection Theory, specifically in thecontext of detection and discrimination tasks measuredby the two-alternative forced-choice (2AFC) paradigm.

The analysis is valid for summation in any stimulusdomain, but it will be illustrated with specific referenceto summation in one and two-dimensional spatialvision.

Detailed analysis of summation behavior requiresaccurate models of the kinds of summation principlesthat can operate in psychophysics. The kind of summa-tion performed by physiological receptive fields will betermed physiological summation (whether linear ornonlinear), to distinguish it from probability summa-tion performed on the outputs of a set of decisionvariables (even though the latter operation must alsoultimately be a physiological process in the brain). Theprimary theoretical analysis will be developed under theassumptions of Signal Detection Theory: that the mainsource of noise is external, Gaussian and independentof stimulus contrast. The theory also encompasses con-ditions where threshold is dominated by internal Gaus-sian noise and other forms of the noise distribution.

* Corresponding author. Fax: +1-415-3458455.E-mail address: [email protected] (C.W. Tyler).

0042-6989/00/$ - see front matter © 2000 Elsevier Science Ltd. All rights reserved.PII: S0042-6989(00)00157-7

C.W. Tyler, C.-C. Chen / Vision Research 40 (2000) 3121–31443122

Summation is quantified over arrays of processingmechanisms that are equal in sensitivity, although thetheory could be extended to arbitrary sets of processes.Extension of the analysis to cases where the internalnoise properties are some function of internal signalstrength reveals major departures from the behaviorwith independent noise.

No complete account of summation behavior underthe 2AFC paradigm has been published despite itswidespread use for several decades. The most extensivepublished analysis of these issues is by Pelli(1985), which provides the basis for much of thepresent treatment, although many of our conclusionsdiffer from the approximations derived in that paper.One of Pelli’s goals was to show that WeibullHigh-Threshold Theory could approximate the full pre-dictions of Signal Detection Theory for the 2AFCparadigm. The approximations were valid over a lim-ited range under the assumptions Pelli made, but he didnot develop the theory in more general cases. A keyassumption was that the human observer is alwaysoperating under conditions of high uncertainty. Thisinterpretation seems inherently implausible inpracticed observers and we show that there are condi-tions under which this assumption is violated. Hencethe 2AFC predictions need to be developed in accurateand usable form for a full treatment of psychophysicaldata.

The paper is divided into four main sections. Thefirst section considers the implications of previousanalyses of 2AFC probability summation throughHigh Threshold Theory and finds these approaches tobe fundamentally flawed in several respects. Thesecond section develops the analysis of 2AFCsummation through Signal Detection Theory limited byadditive noise (from either external or internalsources). In the third section, the implications of avariety of non-ideal attentional strategies arespelled out for this additive noise case. The finalsection expands the analysis to cases where theinternal noise properties are some multiplicative func-tion of internal signal strength, revealing major depar-tures from the behavior with signal-independentnoise.

1.1. Assumptions of the 2AFC analysis

The assumptions of the main 2AFC analysis (Sec-tions 3–5) are generally straightforward. There are alsosubsidiary issues that arise from considering alterna-tives to some of the assumptions. These alternatives arenoted in brackets (A note on terminology; The term‘distribution’ is used here to imply a probability densityfunction, PDF, to which some noise variable conforms,as in ‘Gaussian distribution’. The cumulative integral of

such a function is termed its ‘cumulative distributionfunction’, or CDF).

1. In the 2AFC paradigm, the observer is presentedwith two defined stimulus events, both containingsome background condition, while one also con-tains a test stimulus to be detected. The observer’stask is to indicate which of the two events includedthe test stimulus.

2. There are sources of noise present in the stimulusevents. Any component of the noise that is corre-lated between the two events forms part of thebackground from which the test is to be discrimi-nated. We therefore consider ‘noise’ to include allsources of trial-to-trial variation that are uncorre-lated between the stimulus events.

3. The noise is assumed to be white in space and time(for a fixed stimulus level) and Gaussian in itsprobability density function (PDF). The Gaussianassumption is plausible because of the CentralLimit Theorem that the PDF for combinations ofnon-Gaussian noise is asymptotically Gaussian. Ifthere are many sources of external and internalnoise impinging at the decision site, therefore, theresulting noise is most likely to be Gaussian. [Al-ternatively, the PDF is assumed to take the form ofa Poisson noise distribution.] [The Ideal Observerformulation makes the restrictive assumption thatthere are no noise sources except those present inthe stimulus.]

4. The noise is assumed to be additive and indepen-dent of the strength of the test stimulus. [Alterna-tively, the noise variance is assumed to vary assome function of stimulus strength.]

5. Without the noise, the internal signals for eachmechanism on which a decision is based are as-sumed to vary linearly with stimulus strength. [Al-ternatively, the internal signal is assumed toincrease directly with stimulus strength above somelevel but be limited by a threshold such that theinternal signal remains at zero below that level. Ifthe threshold occurs at or above the level of thesystem noise in the absence of a test stimulus, it isknown as a ‘high threshold’.]

6. The visual system is assumed to consist of some(large) number of local mechanisms that transmitindependent signals concerning the state of theoutside world. The mechanisms are independent inthe sense that their noise sources are statisticallyindependent.

7. Each local mechanism is assumed to summate lin-early over space within some weighting functionknown as its summation field. The summation maybe over signals that are preprocessed for somestimulus attribute (such as orientation) by earlierneural mechanisms. [The Ideal Observer formula-

C.W. Tyler, C.-C. Chen / Vision Research 40 (2000) 3121–3144 3123

tion assumes that there is a summation field match-ing the profile of each stimulus presented.] [Theinefficient Ideal Observer formulation assumes thateach summation field is incompletely sampled to asimilar extent, with the loss of a constant propor-tion of information for all fields.]

8. The local mechanisms are assumed to draw fromthe same local noise sources at all sizes of summa-tion fields.

9. The signals from the local mechanisms are as-sumed to be combined by some nonlinear processknown as an ‘attention field’ that is able to surveythe local signals and isolate the largest signal. Theimplications of several types of control over thesize of the attention field are considered. [The IdealAttention formulation assumes that the attentionfield matches the stimulus extent, even when thelocal summation fields do not.]

10. The observer’s 2AFC decision is assumed to derivefrom the larger of the signals from the attentionfield for the two stimulus events.

2. Problems with High Threshold Theory in thepresence of additive noise

This section considers the implications of previousanalyses of 2AFC probability summation in relation toHigh Threshold Theory and finds inherent problemswith such approaches in several respects. These flawsindicate that High Threshold Theory does not providea firm basis for the analysis of attentional integration ofneural information in the presence of additive noise. Toexplain these problems, we first review High-ThresholdTheory, but the source references should be consultedfor full details.

2.1. O6er6iew of High Threshold Theory

High Threshold Theory (Quick, 1974) is an analysisof the detection of signals that assumes that detection islimited by a noise-free, or fixed, threshold, below whichno stimulus information is transmitted (Fig. 1a). Thetheory gets its name because the threshold is assumedto be high with respect to any noise in the signalarriving at the decision site. The goal of HighThreshold Theory is to define the properties of summa-tion over independent channels, which has come to beknown as ‘probability summation’. In spatial vision, theprobability summation hypothesis implies that themechanism of attention is distributed over many spatialchannels rather than being focal, since one cannotmonitor many channels without attending to them. It isthen assumed that, on every trial, the attention mecha-nism can select the maximum channel response over themonitored range for use in the detection decision andignore all other channels. Probability ‘summation’ isthus a max operator rather than a summing operator inthe normal sense, and has generally been considered asthe minimal combination rule among independentmechanisms.

The psychometric function C is the theoretical formof the observer’s proportion correct in a detection taskas a function of stimulus strength. In Quick’s (1974)version of High Threshold Theory, the psychometricfunctions Ci for each individual channel with meanresponse Ri are given by the Weibull function:

Ci=1−e− (Ri )

b

,

where Ri= f� s

ai

�for stimulus strength s (1a,b)

with f being in general any monotonic function, aidetermining the sensitivity of the ith local mechanismand b controlling the steepness of the psychometric

Fig. 1. (a) High-threshold analysis, that noise distribution in the absence of signal (left distribution) lies below some threshold level (Ru). Thesignal distribution varies in its position as the signal varies (arrow), and passes across the threshold as R increases to reveal some proportion ofthe signal distribution of correct responses (shaded area). (b) Weibull predictions for probability summation over number of samples (in time, areaor any other stimulus parameter) for assumed psychometric exponents of b=4 and 1.3 (d % powers of 3.2 and 1; see below).


function. For the remainder of this treatment, we willassume that f is a linear function, such that

Ri=sai

, s\0. (1c)

How the function behaves for negative s depends onthe stimulus domain. For luminance, and for (Michel-son) contrast, there are no negative signals, so thefunction does not exist in the negative region. For otherstimulus domains, the negative portion will have to beanalyzed according to its particular properties.

The Weibull function is derived from the theory ofFailure Analysis and represents the combination ofexponentially decaying failure functions. The effect onthe overall psychometric function of probability sum-mation over the set of channels (assuming equal sensi-tivities ai for the mean responses Ri of the individualchannels to the stimulus) is based on the standardstatistical formula that probabilities of not detectingmultiple events should be multiplied together:

C=1−e− (R)b

=1−5n

(1−Ci)=1−5n

(1−e− (Ri )b

)

=1−e− (n1/bRi )

b

(2)

Psychophysical threshold is estimated by solving Eq. (2)for C=0.5 (This basic version of the theory assumesthat the observer’s guessing rate is zero). For a criterionlevel of the output function C, the similarity in form ofthe first and last expressions makes it clear that theeffective mean response over the set of channels isR=n1/bRi (see Robson & Graham, 1979, for details).Thus, as the stimulus extent is increased to sample moreof the local mechanisms, the internal response increasesin proportion to the bth root of the number of mecha-nisms sampled by the attention mechanism.

Fig. 1b depicts the degree of probability summationover the number of mechanisms sampled for the typicalcase of b=4 and for the hypothetical case of a linearpsychometric function, when b=1.3 (Pelli, 1987). Em-pirically, the exponent b of the Weibull approximationto psychometric data may take values from 1.3 to 6(Mayer & Tyler, 1986). Under the assumption that f isa linear function (Eq. (1c)), the low value represents thetheoretical low limit on the expected slope if the stimu-lus is present in all the channels that the visual systemis monitoring (and there is no phase uncertainty; Pelli,1985). A high value for b represents a high degree ofchannel uncertainty. When there is minimal uncer-tainty, probability summation effects are predicted tobe large relative to the possible contrast measurementrange (Fig. 1). If b=1.3, for example, sensitivity im-provement of as much as a factor of 200 is predicted forprobability summation over n=1000 equally stimu-lated channels. Such a result would be predicted by anincrease in stimulus diameter by a factor of �30 onhomogeneous retina, if n represents the number of local

retinal filters). Under such conditions, probability sum-mation could not be dismissed as a minor, near-threshold effect. The generation of such largesummation effects from purely attentional processeswould cloud the issue of what physiological summationmight be taking place because the two effects are ofcomparable magnitude.

2.2. High threshold analysis of probability summationassumes non-Gaussian additi6e noise

Suppose a signal with intensity s can produce aninternal response distribution D(r ;R,s), where r repre-sents the dimension of the random internal responsevariable, with mean R (which is assumed to be amonotonic function of s) and standard deviation s.Under the assumption of a high threshold, this noisedistribution is progressively revealed as the signal inten-sity moves up beyond the threshold level. Thus, if thenoise distribution is additively independent of the meanresponse, the probability of detecting signal s is theintegral of the internal signal-plus-noise distributionfrom the threshold Ru to infinity. The Weibull formula-tion of the psychometric function (equation 1a) mustcorrespond to this integral for some particular noisedistribution Db(r−R) around the mean signal R,

C=1−e−Rb=&�

Ru

Db(r−R)dr (3)

where Ru is the mean internal response level atthreshold.

For the assumption that the PDF of the signal+noise distribution generating the Weibull function is offixed form, Db(r), it can be solved by taking thederivative of both sides of Eq. (3) for each integrationlimit

bRb−1 e−Rb

= limo�

Db(o−R)−Db(Ru−R) (4a)

from which,

Db(r)=brb−1 · e−rb, with r=R−Ru−r,

for rBR−Ru (4b)

On the assumption that the mean response R is linearwith the external stimulus strength s, equation (4) defi-nes the implied PDF that would have generated theWeibull expression for the measured psychometricfunction. The forms of the psychometric functions andthe implied noise distributions Db(r) for values of bfrom 1.3 to 8 (corresponding to d % exponents from 1 to6.5; see following sections) are shown in Fig. 2 for aYes/No experiment (assuming zero false alarm rate). Itis evident that the implied noise distributions in thelower panel are generally far from approximating aGaussian form except in the mid-range of parametervalues, the special case where b:4 (the value for which


Fig. 2. (a) Psychometric functions with Ru=1 predicted by HighThreshold Theory for values of 1.3, 2, 4 and 8 for the exponent b.Functions are corrected for guessing. (b) Implied noise distributionfunctions according to equation (4), plotted relative to the meanresponse (i.e. as the additive noise distribution). Note the markedchange in distribution shapes as the exponent varies.

provement in sensitivity with an increasing number ofchances to detect the presence of a signal. This theoryassumes that probability summation occurs because theobserver can identify the max of the samples of sig-nal+noise distributions provided by each of n stimu-lated channels. If we assume a physiological version ofthe high-threshold system that has Gaussian noiseadded to the signal, the internal response after proba-bility summation is provided by the distribution of suchmax values over trials. Note that, for such probabilitysummation to occur, the threshold has to be appliedafter the max operator. For this analysis, the maxoperator is assumed to function like an ideal attentionmechanism, in that it samples from all of, and onlyfrom, the relevant channels.

In general, it is a well-known statistical rule that thecumulative distribution of maximum values for a set ofsamples D(ri) from a parent distribution D(r) (where ris the instantaneous internal response) is given by theintegral of the parent distribution to the power of thenumber of values within each sample:& r

−�

max[D(ri)]i=1:n

dr=�& r

−�

D(r %)dr %nn

(5)

Thus, the expected distribution Mn(r ;R,s) of the maxof a set of samples is given by taking the derivative ofboth sides of Eq. (5) with respect to their independentvariables:

Mn(r ;R,s)= maxi=1:n

[D(ri)]=ddr�& r

−�

D(r %)dr %nn

(6)

The mean R and standard deviation s parameters inthe expression for the max distribution Mn imply thatwe are deriving the form of the expected function of theresulting probability distribution, which may be charac-terized by the parameters of its location and spread. Itdoes not imply that these are the only parameters of thedistribution (as they would be for a Gaussian distribu-tion), merely that we restrict our consideration to thesetwo parameters.

To obtain the new threshold signal level, the signalcan be reduced until the max distribution Mn reachesthe original threshold criterion again. The extent towhich the signal has to be reduced constitutes theimprovement in sensitivity attributable to probabilitysummation on the basis of the max rule. If the noise isassumed to be additive, however, this process createsthe fatal problem that, for a large enough number ofchannels, the mean signal needs to be set to a negativevalue in order to bring the signal+noise distributiondown to threshold. Fig. 3a depicts the case for suchsummation over 100 channels, where the initial signal isassumed to have a mean of two times the internalthreshold level and a s of 0.67 (so as to provide 75%correct performance at this signal level). The max distri-bution for 100 channels from Eq. (6) has a mean of

Pelli, 1983, established that the Gaussian is a goodapproximation). Thus, Weibull analysis is not an accu-rate theory for the description of systems with a highthreshold and Gaussian noise unless the psychometricslope happens to fall at this mid-range value. In prac-tice, empirical slopes have been found to approximatethis value in many situations (Robson & Graham, 1981;Williams & Wilson, 1981; Pelli, 1985), but there may besubstantial inter-observer differences (Mayer & Tyler,1986) and large changes in slope under certain circum-stances (Tyler, 1997). Thus, there is a need for acomprehensive and accurate theory of probability sum-mation when the assumptions of High Threshold The-ory are violated.

2.3. High-threshold probability summation fails foradditi6e noise

High Threshold Theory has been widely used topredict the effects of probability summation, the im-


about 4.5, giving essentially 100% correct performance.Fig. 3b shows how the signal has to be readjusted tobring the tail of the max distribution down belowthreshold so as to reattain 75% correct performance.On the linear assumption of Eq. (1c), the level of themean internal response in the null interval correspondsto an external signal of zero (The internal scale isarbitrary, so we choose zero to represent the mean nulllevel for analytic convenience). Thus, since the internalsignal needs to be reduced from 4.5 back to thethreshold level of 1.0 (dashed arrow) the external signalrepresented by the filled arrow has to go substantiallynegati6e before 75% correct performance is achieved.

The problem is fatal in some domains, such as theamplitude of light, because negative signals do notexist. Other domains, such as contrast, may be definedin such a way that there are negative signals, but theproblem reasserts itself because the system containsnegative-sensitive elements (e.g. off-center cells) thatrespond positively to the negative signal. Thus, ratherthan becoming less detectable by its max value, thesignal becomes more detectable as the correspondingminimum of the set of samples (at the left-hand tail ofthe distribution in Fig. 3) passes above the correspond-ing negative threshold before the max falls below thepositive threshold. Once again, therefore, it is impossi-ble to return to the 75% performance level after themax operator has taken effect.

High-threshold analysis is immune to this problemonly if the noise on the signal is multiplicati6e withsignal strength rather than additive, and hence can bereduced indefinitely by appropriate signal reductionswithout the signal going negative. Thus, if the noise ispurely multiplicative, the max level on the noise distri-bution may be freely reduced to the threshold level to

provide a measure of the threshold sensitivity for theinput signal. High-threshold analysis is self-consistentin that the noise implied by the Weibull formulationhas the property of being multiplicative. Because thisproperty is rarely made explicit, it should be mentionedthat the property follows from Eq. (2), which showsthat the Weibull psychometric function has a constantform when plotted on log coordinates, i.e. is scaled inproportion to signal amplitude. This implies that thelimiting noise is similarly scaled through the probabilitysummation operator. To reiterate, Fig. 3 goes further inshowing that the assumption of additive noise is incom-patible with Weibull analysis in general.

Note that, for mixed additive and multiplicative noisesources, reducing the signal will tend to reduce themultiplicative noise to the point where additive noisedominates. Since there are always sources of additivenoise in any physical signal-detection system (e.g. ther-mal noise, and quantal noise considered with respect tomodulation variables, such as a sinusoidal grating,which keep the mean signal constant), any noise-limitedthreshold is likely to be limited by its additive compo-nent. The only amelioration of this problem is if thehigh threshold is so high that it sits at or above the levelfor the max of the additive noise from all monitoredchannels (which one might term an ‘ultra-high’threshold). Were it any lower, the negative signal prob-lem would be encountered. Thus, for Weibull analysisto operate, the system must be functioning withthresholds so high as to be quite inefficient, especiallyconsidering that the degree of probability summationrequired by the quantitative application of UncertaintyTheory may be of the order of many thousands or evenmillions of channels (Pelli, 1985). The Weibull analysisof probability summation is thus implausible in realisticthreshold systems.

Fig. 3. (a) Threshold signal-plus-noise distribution for 75% correct detection (left distribution, with 75% of the area above the threshold level of1) together with the distribution of maxes over 100 channels (right distribution). Since the max distribution is effectively all above the thresholdlevel the signal would be detectable close to 100% of the time if probability summation were in operation. (b) Thus, the signal level has to bereduced (dashed arrow) until the max distribution sits at the 75% level above threshold. The problem is that this reduction produces a negativevalue for the mean signal in each channel (filled arrow), which is likely to be unobtainable in typical vision paradigms.


The conclusions from the analysis of High ThresholdTheory are:1. The high-threshold analysis developed by Quick

(1974) implies a reciprocal relationship between theexponent of the psychometric function and the logslope of its probability summation behavior (Fig.1b). If there are conditions where psychometricfunctions are empirically found to be shallow (andthe noise sources locally independent), steep summa-tion slopes would be predicted. In practice, suchconditions have been found to show shallow sum-mation slopes, calling the theoretical frameworkinto question. For suprathreshold masking condi-tions, a range of studies such as Foley and Legge(1981) and Kersten (1984) report exponents of the d %function close to 1 for the 2AFC paradigm, evenwhen the mask is a noise background that is ran-domly independent at all locations within the stimu-lus. Nevertheless, Kersten (1984) showed thatsummation is negligible under suprathreshold,noise-masked conditions. Both because such near-unity exponents imply strong summation behavior,and because it is hard to conceptualize a thresholdoperating under ‘suprathreshold’ conditions, HighThreshold Theory cannot be applied to suchdata. There is thus need for a theory that can beused to analyze suprathreshold discrimination ex-periments.

2. The form of the Weibull function implies bizarrevariations in the noise distribution (Fig. 2b) if it isassumed that the neural noise is additive in thethreshold range. Since noise asymptotically Gaus-sian (such as quantal noise in the light, thermalnoise in the photoreceptors or retinal noise in theganglion-cell outputs), High Threshold Theory isincompatible with plausible assumptions about thenoise distribution.

3. Quick’s High Threshold analysis through theWeibull function assumes the performance islimited by a high threshold rather than by noise ofany kind. However, noise is an unavoidablecomponent of the analysis of the 2AFC paradigm.In order to adapt the high-threshold analysisto the 2AFC paradigm, Pelli (1985) made the as-sumption that the observer was monitoringa much larger number of channels than were stimu-lated as a means of obtaining a steep psychometricfunction that approximated threshold behavior.Thus, Pelli’s approximation fails if the stimulus isstructured so as to stimulate as many channels asthe observer is monitoring, because thepredicted psychometric function is then shallowand violates the high-threshold assumptions. Notheoretical analysis for these conditions has beenpublished.

3. Signal Detection Theory for the Ideal Observer andits Bayesian approximation

Following from the inadequacies of High ThresholdTheory, this section develops the analysis of summationproperties in the two-alternative forced-choice (2AFC)task. The analysis is approached by specification of thepsychometric function through Signal Detection The-ory as limited by additive noise. When the only sourceof this additive noise is quantum fluctuations (in acontrast detection task), the Signal Detection Theoryanalysis amounts to a single-channel Ideal Observermodel. The implications of an attentional strategy ap-proximating Ideal Observer behavior are also spelledout for this additive noise case.

3.1. Specification of the psychometric function

The first step to understanding psychometric functionin a 2AFC task is to specify the proportion correct ofthe observer’s responses. The 2AFC task typically in-volves the presentation of two stimulus intervals (orspatial stimulus regions), one of which contains thestimulus to be detected while both contain the back-ground condition from which the stimulus is to bedistinguished. The observer’s task is to estimate whichinterval contains the discriminative stimulus. Tradition-ally, the observer is assumed to exhibit ideal behaviorin three ways:1. to have exact knowledge of the stimulus and to view

it with a matched filter, excluding all irrelevantinformation

2. to be noise-free; performance is limited only bynoise in the physical stimulus

3. to respond according to the maximum output of thefilter in the two intervals, with no confusion.

When the first assumption is violated by ignorance ofthe correct filter, the observer may still adopt an idealattentional strategy across a set of filters, to make thebest guess as to which is the optimal filter to select oneach trial. The second assumption may be violated bythe introduction either of early noise before the filter orof late noise at the decision stage. In the case of earlynoise, the observer’s performance will still reflect theform of the Ideal Observer, but at reduced efficiency. Inthe case of late noise, the threshold will become inde-pendent of stimulus extent as long as the late noisedominates other sources of noise.

In Signal Detection Theory (SDT), the proportioncorrect is conceptualized through an imaginary ROC(receiver operating characteristic) curve of proportionof hits versus proportion of false alarms (Green &Swets, 1966), treating each trial as a separate Yes/Notask with a different criterion. Not only is this instanta-neous criterion inaccessible, but the 2AFC proportioncorrect is defined as the area under the ROC curve,


Fig. 4. Derivation of the 2AFC psychometric function. (a) Internalresponses for signal levels from zero (distribution xn) through variousmean levels R1−Rm. (b) Difference distributions between the distri-butions for each signal level and that for no signal. Proportioncorrect is derived by integrating areas p1 to pm to the right of thevertical line at zero difference indicating the response criterion. (c)Psychometric function derived from plotting the area of each differ-ence distribution lying above the zero criterion from (b). (d) Log (z)transform of the cumulative function to provide a straight line inprobit coordinates.

For signal intensity s, let r1 be the internal responseto the first observation interval, r2 be the internalresponse to the second interval and k be the interval theobserver chose as the signal interval. (Note that here sserves as a scalar for signal strength and s as an indexfor the signal interval. Similarly, n is the index for thenull interval whereas n defines the number of stimulatedchannels elsewhere.) We assume that there is a fixedsignal level throughout the test interval.

The observer indicates the first interval as the signalinterval (6=1) if r1−r2\0 and indicates interval (6=2) if r2−r1\0. The response is correct if either 6=1when the signal is the first interval, denoted by Bsn\or 6=2 when the signal is the second interval, denotedby Bns\ . The proportion correct in terms of theinternal difference response d is:

pcorr(d)=p(6=1�Bsn\ )*p(Bsn\ )+p(6=2�Bns\ )*p(Bns\ )

=p(r1−r2\0�Bsn\ )*p(Bsn\ )+p(r2−r1\0�Bns\ )*p(Bns\ ) (7)

If rn is the internal response to the null interval and rsis the internal response to the signal interval, Eq. (7)can be rewritten in terms of the psychometric function:

C(s)=p(rs−rn\0�Bsn\ )*P(Bsn\ )+p(rs−rn\0�Bns\ )*p(Bns\ )

=p(rs−rn\0)=p(d\0)=&�

0

Zs(d ;D)dd (8)

where d=rs−rn is the difference between signal andnull interval internal responses and Zs(d ;D) is the PDFof the difference distribution for signal strength s (nor-malized in units of its standard deviation), with meanD.

Eq. (8) describes the relation between the proportioncorrect and the observer’s internal responses to signaland null intervals. The psychometric function can beobtained by repeating the computation of Eq. (8) for allrelevant signal intensities. Fig. 4 illustrates the relationsbetween the internal responses and psychometric func-tion based on Gaussian additive noise.

The probit transform (Finney, 1952) is the appropri-ate representation of the psychometric function, on thebasis of the additive Gaussian assumption. It normal-izes proportion correct to its standard deviation unit(z-score) through the inverse cumulative Gaussian func-tion F−1. That is

Zs(d ;D)=F−1(C(s)) (9a)

In the psychophysical literature, the normalized signalZs(d ;D) represents the detectability of the signal atstimulus level s, defined by

d %=Zs(d ;D) (9b)

which requires a further level of abstraction from thistwo-dimensional distribution of signal strength andcriterion level (see Macmillan & Creelman, 1993 fordetails).

The 2AFC task is amenable to a simpler form ofanalysis based on the difference distribution of theinternal responses (MacMillan & Creelman, 1993). Oneach trial, the observer responds by indicatingwhichever observation interval produces a larger inter-nal response, which amounts to taking the differencebetween the two internal signals and picking the inter-val according to the sign of this difference signal (seeFig. 4). The criterion is therefore fixed in this differencespace, at a difference of zero (whereas it can range fromtrial to trial over the whole extent of the internalresponse distribution). Stated formally:


3.2. Comparison of a Gaussian and a non-Gaussianexample

Suppose the system response is dominated by onechannel whose internal response distribution DN(r) inthe null interval is added Gaussian noise with expectedvalue 0 and standard deviation s, denoted as DN(r):G(r ;0,s). At signal intensity s, the internal response inthe signal interval will also have a Gaussian distribu-tion but with mean R and standard deviation s, de-noted as DR(r):G(r ;R,s). From the properties of theGaussian distribution, the difference distribution ofd=rs−rn is another Gaussian distribution with meanR and standard deviation 2s. From Eq. (8), theproportion correct is

C(s)=1−F(d,D,sN+sR)=1−F(−r ;R,2sN)=F(r ;R,2sN) (10)

where F denotes the Gaussian cumulative distributionfunction. Eq. (10) is commonly used in fitting thepsychometric function to 2AFC data (MacMillan &Creelman, 1993).

In general, however, it is important to avoid theimplication that the psychometric function matches thecumulative distribution function of its underlying prob-ability distribution. The match is valid only if the noiseis additi6e to the mean internal signal strength R and ifits distribution is symmetric (as revealed by Eq. (10)).In general, different signal levels may produce differentrs distributions if the noise is non-additive, and in turnaccess different DR(d). Thus, the general 2AFC psycho-metric function would not be a cumulative function ofany particular difference distribution. Only when thenoise is additive and symmetric (e.g. Gaussian) will the

difference distributions at different signal levels all havethe same variance and the psychometric function isequivalent to its CDF (e.g. the cumulative Gaussian orerf ). On the other hand, if the noise distribution isPoisson rather than Gaussian (a common alternativeassumption) the noise is no longer additive but varieswith the mean level, and also is asymmetric. Thus, thepsychometric function derived from Eq. (8) will notexactly match the cumulative distribution function (Fig.5).

3.3. O6er6iew of Ideal Obser6er analysis

The Ideal Observer formalism assumes that the ob-server has complete knowledge of the stimulus and usesa single matched filter to detect its presence (Wiener,1949). The Ideal Observer therefore is effectively aBayesian detector with a prior probability of 1.0 on thematched filter and zero elsewhere. Optimal performancewith an ideal filter is assumed to occur with linearsummation over the noisy filter inputs sampled by thefield. The summation properties of the filters will varywith respect to a large number of stimulus attributes.For simplicity, we consider the case of spatial summa-tion over two-dimensional stimuli S(x,y) varying in onedimension of overall size. This variable size dimensioncould be the height, the width, the area, or any parame-ter that is linear with the number of sources of input toeach summing field over the domain (x,y). The inputfor the matched filter is provided by discrete sensorswith independent noise sources drawn from the sameunderlying distribution. When the local regions haveidentical sources of independent Gaussian noise withstandard deviation s, the summed output of each fieldis given by summing over the product of the stimulusprofile and the matching ideal filter. We can show thatthe signal-to-noise ratio in such a matched filter isproportional to the square root of the stimulus area.

In general, the response of the matched filter can beapproximated as the weighted sum of its responses tothe samples

R=% S(x,y) · I(x,y) (11a)

and the signal variance as the weighted sum of the localvariances

sR2 =% S(x,y) · I(x,y) · s2, (11b)

Hence

sR=s�% S(x,y) · I(x,y)�1/2 (11c)

The discriminability of the ith stimulus, d %i, therefore,can be approximated as the reciprocal of the standarddeviation times the sampling interval. Appendix A

Fig. 5. Theoretical psychometric function for Poisson noise (fullcurve), showing failure to match the cumulative distribution in thiscase of non-Gaussian noise (dashed curve).


Fig. 6. Physiological implementation of ideal observer behavior. Thincurves: summation behavior for five individual Gaussian filters as theextent of a Gaussian test stimulus is varied (arrowheads indicate filterextents or number of input samples in each filter at half-height).Thick curve: fourth power attentional summation over the individualchannels approximates ideal observer summation behavior (a logslope of −0.5, dashed line) in the range where physiological filtersare available, with departures below and above. The ideal strategy isto read out from that filter only when it matches the stimulus extent.At the two ends of the range, only one filter dominates detectionbehavior and hence system performance (thick line) departs from theideal slope of −0.5 to follow the function for the most sensitive filterin that region.

If the task is summation over a range of stimulussizes, the Ideal Observer model requires a summingreceptive field matching every size of stimulus for whichthe summation behavior is exhibited. A physiologicalimplementation of such behavior is depicted in Fig. 6,where the attention mechanism is assumed to switch tothe receptive field size matching the stimulus presentedin each condition. This behavior is possible only if thestimuli are presented in blocks of trials, so that theform of the next stimulus on each trial is known. Thus,if human observers exhibit a log-log summation slopeof −1/2 (dashed curve in Fig. 6) they may be said tomanifest Ideal Observer behavior, in the sense of usingideal matched filters to improve in the way an IdealObserver would, even if the absolute sensitivity is lessthan predicted for an Ideal Observer (i.e. lower thanideal efficiency). Such (inefficient) Ideal Observer be-havior may be taken as evidence that the brain hasaccess to summing fields matching the sizes of all thetested stimuli, either present and selectable by attentionas in the central region of Fig. 6, or alternatively as anadaptive mechanism re-forming itself for each newstimulus condition.

If the system has access to only a limited range ofsumming field sizes, the summation slope shouldasymptote to −1 for stimulus sizes below that of thesmallest summing field size and should asymptote to 0for sizes above that of the largest summing field, asdepicted by the bold curve in Fig. 6. Thus, the form ofthe summation function in any stimulus domain carriesimportant information about the range of summingfield sizes operating in that domain (see Gorea & Tyler,1997, for an example in the temporal domain andKersten (1984), for an example in the spatial domain).The model that the brain contains an adaptive filterre-forming itself for each new stimulus condition seemsto be incompatible with the occurrence of a limitedsummation range, for why would such adaptive capa-bility fail at a particular point?

3.4. The concept of probability summation

Probability summation is an option available to adecision mechanism with access to a number of inde-pendent signals reflecting the occurrence of a stimulus.The analogy is with a group of human monitors look-ing out for an approaching plane, for example. Theprobability of detecting the plane is higher if detectionis considered to have occurred when any one of themonitors spots the plane than by relying on a loneobserver. In other words, probability summation corre-sponds to a decision rule in which the group decision isdefined by a response from any single member of thegroup. This decision rule corresponds to defining adetection event when the signal in any one of m moni-tored channels reaches a criterion level. This decision

shows how characterization of the stimulus size interms of the sampling density within the stimulus envel-ope allows the discriminability to be expressed in termsof the effective area Ai of the stimulus

d %i=RisRi�8

A i1/2

s, (12)

showing that ideal discriminability is proportional tothe square root of stimulus area.

Note, however, that there is a problem with applyingthis model in practice, since the psychometric functionin this model is based on a linear relation between d %and signal strength. This linear relation is violated bymost d % measurements, which typically show an expo-nent of about 2 (e.g. Stromeyer and Klein, 1974).Similarly, translation of this prediction into the Weibullformat yields a predicted Weibull exponent of 1.3 inEq. (2) (whereas most measurements show exponents of3–4). Extension of the theory to non-ideal attentionbehavior, which encompasses steeper exponents, is leftto the next section. First we consider an approximationto ideal behavior that can be used if the observer knowsthe set of stimulus types that may be presented in ablock of trials, even if the particular stimulus is notknown in advance on each trial.


structure is implemented by applying a max rule to allthe channel outputs and defining the detection eventwhen the max reaches some preset criterion level (Pelli,1985; Kontsevich & Tyler, 1999a). The neural imple-mentation of such a decision rule may be designated as‘attentional summation’.

Although probability summation is often consideredas a purely mathematical operation, it is meaningless inthe context of the human vision (in a single observer)unless it is mediated by some neural hardware. Thisraises the issue of whether there are independent neuralchannels and what is meant by a max operator in aneural system. In terms of detection theory, two chan-nels are considered independent when they are gov-erned by sources of noise that are statisticallyindependent. There is plenty of evidence for a highdegree of statistical independence among even neigh-boring cortical neurons (e.g. Freeman, 1994, 1996;Shadlen & Newsome, 1998), so cortical neurons can beconsidered to be separate channels for this purpose.What would constitute the probability summation ormax operator? It needs to be a neural system receivingsignals from an array of channels (or axons) that havestatistically independent noise up to that point. It thenneeds to respond when any of these inputs manifests asignal but not otherwise. Such a neural system wouldhave this property if it would transmit a spike thatinitiated a detection response on receiving a spike fromany one of its inputs. The threshold characteristic ofcortical neurons with wide-field input sampling thusprovides the requisite hardware for a max operator.

In terms of the detection of signals in additive noise,the optimal strategy is to use a matched filter, toconvolve the stimulus input with a linear filter exactlymatching the stimulus profile. It is possible to approxi-mate ideal observer strategy by performing probabilitysummation over the full set of filters in the form of themax of the signal-to-noise ratios (Pelli, 1985). Thisapproach may be considered an ideal (or Bayesian)attentional strategy in that the observer knows the setof likely filters to survey on each trial. This strategy willhave the effect of isolating the most efficient filter underany condition, and hence mimic ideal observer behaviorwithout requiring prior knowledge of the stimulus.However, implementation of this strategy does requirethe neural system to have an accurate representation ofthe noise level, in order to compute the signal-to-noiseratios. Simply taking the max over raw signals will tendto emphasize the noisiest fields. But if it is plausible thatthe neural system normalizes to the prevailing (long-term) noise level, then a max operator would provide amechanism for implementing Ideal Observer behavior.

It is common practice to combine the response out-puts in neural network models by a Minkowski summa-tion rule:

R=�%

n

(Rip)n1/p

(13)

where the summing exponent is often set at p=4. Notethat such fourth-power summation (thick curve in Fig.6) produces a completely smooth curve in the rangewhere the filters are present even though in this exam-ple the assumed filters are separated by factors of twoin size. It is thus possible to approximate Ideal Ob-server behavior with relatively coarse physiologicalsampling in a particular domain if there is some way toimplement in the cortex the Minkowski summation ofEq. (13) with a high summation exponent.

3.5. Attentional summation in 2AFC experiments doesnot conform to high threshold analysis, but deri6esfrom the s of the difference distribution

For 2AFC detection using more than one channel,attentional (or ‘probability’) summation effects shouldbe analyzed through Signal Detection Theory. For atractable analysis, we assume n stimulated channels ofequal sensitivity with additive Gaussian noise. For thefull analysis, we will consider the situation where theobserving system is monitoring more channels (m) thanare being stimulated. The statistical combination rulefor attentional summation of the responses over chan-nels is derived again from the maximum value of the setof m monitored channel responses in each stimulusinterval (Pelli, 1985; Palmer, Ames, & Lindsey, 1993).For the null stimulus of the pair, which by definitioncontains only noise, the combined response distributionMm(R,sR) is based on the noise-alone distributions inthe responses of all m channels. Mathematically, thiscombined distribution is given in terms of the expectedvalues of the distributions by the derivatives in a similarfashion to Eq. (6), omitting the distribution variablesfor clarity:

Mm(R,sR)= maxi=1:m

[DN ]=ddr�& r

−�

DN dr %nm

=mDN�& r

−�

DN dr %n(m−1)

(14)

The two parameters in the expression Mm(R,sR) for themax distribution imply that we are deriving the form ofthe expected function of the resulting probability distri-bution, which may be characterized by the parametersof its location and spread (as for the High ThresholdTheory of Eqs. (5) and (6)).

With the inclusion of n signal channels for the signalinterval of the stimulus pair, the max must be takenover the maxes of the separate n signal+noise andm−n noise-alone distributions:

Mn,m(R,sR)=max�

maxi=1:n

[DR(ri)] maxi=n+1:m

[DN(ri)]n


Fig. 7. Max distributions for a Gaussian probability density functionfor numbers of samples increasing in factors of 10 from n=1 to 1million. Note the decreasing standard deviation and small asymmetryof these max distributions.

suming that the observer employs an ideal attentionwindow that always matches the stimulus extent, so thatno unstimulated channels are monitored. Nevertheless,it is assumed that the observer cannot perform idealsummation over the stimulus area, but is forced tomonitor a set of n local channels to find which gives themax response in any test interval (Pelli, 1985).

Fig. 7 shows the numerical distributions for samplesof maxes computed according to the derivation of Eq.(14) for noise alone (or Eq. (15) for signal+noise withm=n) in factors of ten from n=1 to 1 million chan-nels of equal sensitivity. The s of these max distribu-tions decreases by a factor of about four (in contrast tothe factor of 200 decrease predicted for only 1000channels under High-Threshold Theory with no uncer-tainty). In each case, the observer’s task is to distin-guish between sample stimuli drawn from the maxdistributions of noise-alone and signal+noise for sum-mation over a given number of channels. Discriminabil-ity therefore improves with the reciprocal of thereduction in s in these max distributions (Fig. 7), asshown in the leftward shift of the d % functions of Fig.8a. The consequent improvement in sensitivity at thelevel of d %=1 is depicted in Fig. 8b. Because thefunction in Fig. 8b defines ‘ideal’ probability summa-tion for the 2AFC paradigm, we provide the values intabular form in the Appendix for ready reference. Notethat the signal+noise max distributions have to becomputed by time-intensive numerical integration. Wehave therefore developed an approximation method(Chen & Tyler, 1999) that captures this function within1% accuracy. (Pelli, 1985, had also considered this

=ddr��& r

−�

DR dr %nn

·�& r

−�

DN dr %nm−n�

(15)

In the general case, Eq. (15) does not simplify in themanner of Eq. (14).

The simplest case of 2AFC attentional summation isthe case where m=n, so there is no uncertainty as towhich of the monitored channels contain the stimulus,and the two distributions differ only in their mean levelof internal response. This situation corresponds to as-

Fig. 8. (a) Theoretical d % functions under 2AFC probability summation assumptions. Note that the exponent (or steepness) is almost invariant withnumber of equally-sensitive channels monitored from n=1 to 1 million (assuming no uncertainty). 2AFC summation behavior is thereforeessentially invariant with the d % criterion selected. (b) 2AFC probability summation over six decades on (unequal) double-log coordinates,compared with summation slopes for full summation (−1), for ideal observer summation (−0.5) and for Weibull summation assuming b=4(−0.25). Note that the 2AFC summation function is never steeper than a slope of −0.25, and becomes extremely shallow for more than aboutten samples.


function and provided an approximation formula thatis accurate to within 20%.)

Thus, the complete analysis of 2AFC attentionalsummation over channels of equal sensitivity showsthat the Ideal Attention operator provides dramaticallydifferent ‘probability summation’ behavior than thatimplied by Pelli’s (1985) high-uncertainty approxima-tion to High-Threshold Theory. At its steepest, this2AFC function exhibits a slope of only about −0.25(from one to four samples) and soon produces negligi-ble summation for larger numbers of samples. The keyreason for the difference between this prediction andthat for the Weibull approximation is that the tails inthe Gaussian distribution fall much more rapidly thanexponential tail of the Weibull distribution. A summa-tion mechanism that focuses on the information in thistail region will necessarily give different results for thetwo distributions. Justification for the ubiquity of theGaussian distribution is discussed in the assumptionssection of Section 1.

Consider the practical implication of the summationfunction of Fig. 8b. For most reported psychophysicaltasks, the smallest stimulus might plausibly stimulatemany local mechanisms. The expected starting point fora probability summation prediction would then besome way down this curve, say at the 102 level, beyondwhich little improvement is evident. Under the idealattention assumption, the only way to achieve summa-tion exponents even close to the reported values ofaround −0.25 (Watson, 1979; Robson & Graham,1981; Williams & Wilson, 1981; Pelli, 1985) would be toassume that attention can be focused onto a singleneural channel for the smallest stimulus in the series.

A major prediction of the High Threshold theory ofprobability summation is that the summation exponentcan be predicted from the empirical exponent of thepsychometric function measured during the summationexperiment (Quick, 1974). This prediction has beenborne out in several studies (Watson, 1979; Robson &Graham, 1981; Williams & Wilson, 1981; Pelli, 1985),but the result may be coincidental because none have6aried the psychometric exponent to determine whetherthe summation exponent varies as predicted. Neverthe-less, this analysis shows that the extent of 2AFC atten-tional summation varies even where the exponent of thepsychometric function is invariant at a value close toone (Fig. 8a) and provides a much smaller improve-ment in sensitivity than is predicted by High-Thresholdanalysis for conditions yielding shallow exponents (Fig.8b). Even the early part of the 2AFC attentional sum-mation slope is never steeper than −0.25 (although itmust be said that this corresponds to a value commonlyassumed for the Weibull exponent, b). Studies thathave assumed such a slope, therefore, would seem tohave a valid estimate of the probability summationeffects as long as the number of elements of equal

sensitivity that they are summing remains less thanabout four. The analysis of Fig. 8 could therefore beregarded as validating the use of Minkowski summa-tion with an exponent of 4 as long as the number ofchannels remains small and the other assumptions ofthe analysis are met.

Conversely, there is a major situation in which thesummation slope remains unaffected while the psycho-metric steepness varies. This behavior can occur whenthe observing system monitors more channels than arebeing stimulated. This situation is conventionally de-scribed as the system having uncertainty as to whichchannels are being stimulated and is the topic of thenext section.

4. Signal Detection Theory with channel uncertainty(and additive noise)

This section develops the implications of a variety ofnon-ideal attentional strategies for 2AFC in the addi-tive noise case.

4.1. Channel uncertainty effects and their eliminationby rescaling

Channel Uncertainty Theory is an elaboration ofSignal Detection Theory in which the number of neuralchannels m monitored in the brain is greater than thenumber of channels n stimulated (by ratio M=m/n)(derived formally in Pelli, 1985). The level of uncer-tainty would then be defined as log10 M (assumed to be0 up to this point in the treatment). (An equivalenttheory of attentional distraction among the m channels,even where the observer is certain which channel isbeing attended, has been developed by Kontsevich &Tyler, 1999a.) For the present analysis, we assume thatonly one channel is being stimulated and that thedecision is mediated by attention to successively largernumbers of channels in a non-ideal attentional strategy.Such behavior has been offered as an explanation forthe relatively steep psychometric functions that areoften measured in practice (Pelli, 1985; Kontsevich &Tyler, 1999a). The full lines in Fig. 9a show the d %functions obtained through the 2AFC derivation ofEqs. (13) and (14) for the certain condition (monitoringonly one stimulated channel) and uncertain conditions(in which from 10 up to one million channels aremonitored, with only one stimulated). The d % functionsget progressively steeper in this operating range aschannel uncertainty increases. The dashed lines in Fig.9a show an analytic approximation to these d % func-tions that was fitted over the full set within the rangefrom d %=0.5 to 2 (i.e. within the practical measurementrange). The approximation is a power function whoselog slope U (straight dashed lines) is related simply touncertainty (log10 M) by the expression:


Fig. 9. (a) Log–log d % functions under various degrees of channel uncertainty (M=1–1 000 000 in factors of ten running left to right) when onechannel is stimulated. Dotted lines represent the least squares fit of equation (16) within the readily measurable range of −0.5B log d %B0.5(horizontal dashed lines). (b). Summation behavior with an attention window of increasing extent, due to increase of channel uncertainty asstimulus size increases over the array of local filter channels, at log d % levels of −0.5, 0 and 0.5 (three curves corresponding to the horizontalcriterion lines in a). Threshold rises gradually at first as number of monitored samples is increased, then shows little further effect.

d %=A(s/s0)U, with U=C+B log10M (16a,b)

where M is the ratio of monitored to stimulated chan-nels and A, B, C and s0 take values of 7.9862, 0.4468,1.0779 and 9.5414.

The point of presenting this analytic approximationis that it allows reverse inference of the level of uncer-tainty from the log slope of the psychometric function,fitted to the data as a straight lines on double-logarith-mic d % coordinates. Pelli (1985) had provided a similarapproximation to a Monte Carlo simulation of thetheoretical curves that we derive analytically, but hisapproximation was formulated in terms of a Weibullanalysis and consequently appeared to emphasize thelower range of d % values, which are unmeasurable inpractice. Our reanalysis focuses on the most accessiblerange of the psychometric function, that between log d %values of −0.5 and 0.5 (or percent correct valuesbetween about 60 and 90%). Fitting in this rangegenerates fits at high levels of uncertainty that aresubstantially shallower than Pelli’s. One can use ourfitted function to derive the inferred uncertainty directlyfrom equation (16), within the accuracy of the slopedetermination (Empirically, slopes may be determinedwith an accuracy of about 0.1 log units in 300 trialsusing the efficient Bayesian maximum likelihood al-gorithm proposed by Kontsevich & Tyler, 1999b; cf.Cobo-Lewis, 1997. This accuracy would imply a practi-cal resolution of about 6 discriminable slopes in theslope range from 1 to 4).

The inverse equations for sensitivity at the criterionlevel of d %=1 are straightforward:

M=10(U−C)/B and s=s0A−1/U (17a,b)

Equivalently, channel uncertainty effects may be re-moved by extrapolating the measured slope of the logd % function up to d %=8, then extrapolating back downa slope of U=C to provide an estimate of the sensitiv-ity that would have been obtained with no channeluncertainty. The extrapolation back to the level ofd %=1 may be approximated by dividing the measuredthreshold by a value of 8. This simplified procedureallows compensation of channel uncertainty effects withminimal computation, merely from knowledge of thelog d % slope. For complete accuracy, the computed d %functions as depicted in Fig. 9 may be used to model ofthe psychometric function with no approximation. If athreshold estimate is required to be more accurate thanthe proposed approximation formula, the data for thepsychometric function may be fitted over the family ofcomputed d % functions to refine the compensation forchannel uncertainty.

Of course, removing channel uncertainty does notimply eliminating measurement error in the estimates,only eliminating the bias in the threshold estimateintroduced by channel uncertainty. The adjustedthreshold estimates are no less variable, but thresholdchanges due to varying uncertainty levels are elimi-nated. In situations where the channel uncertainty re-mains constant across conditions, such bias reduction isnot needed. But in cases where it may vary, such assummation functions over any stimulus domain, it iscritical to partition the threshold variations between theunderlying sensitivity variations and the effects of prob-


ability summation, as described in the followingsections.

4.2. 2AFC attentional summation with a fixed attentionwindow

The previous section considered the general case ofestimating the degree of uncertainty from the psycho-metric function. With this analysis in hand, we mayevaluate the particular case of the effect on threshold ofvarying stimulus extent with a fixed attention window.For studies that do not expend the effort required tomeasure the psychometric slope, it is important to havea model of the effects of uncertainty under plausibleassumptions. Clearly, if the attention window can bematched to the stimulus extent, the uncertainty (oropportunities for distraction, see Kontsevich & Tyler,1999a) will remain constant at zero and have no effecton the measured summation function. However, in thiscase the slope of the psychometric function should below (assuming a linear transducer), which is known tobe invalid in many situations.

Contrary to Robson and Graham’s (1981) claim forthis situation, 2AFC spatial probability summation ef-fects with a fixed attention window are not propor-tional to 1/b (b being the exponent of the Weibullapproximation, equation (1)). Such summation effectsare controlled by the change in the exponent as uncer-tainty is reduced by increasing stimulus area (Fig. 10a);as seen Fig. 10b, the summation effects at d %=1 ap-

proximate a log slope of −1/4 over most of the rangeof ratios of stimulated to monitored samples. Thisresult may be considered a justification for the wide-spread use of 4th power Minkowski rule to approxi-mate probability summation. It is a quite differentanalysis from that developed by Williams and Wilson(1983), Robson and Graham (1981) and even Pelli(1985), since those analyses all assumed a fixed form ofthe log psychometric function. In contrast, the shapevaries substantially in the fixed-attention-windowmodel of Fig. 10a. Nevertheless, it may correspond to aplausible set of assumptions, so tabular values for theexample depicted in Fig. 10 are provided in Table 1.The fixed-attention-window model is the main theoreti-cal alternative to the probability summation effects ofthe ideal attention window of Fig. 8.

Thus, the curve of 2AFC attentional (or ‘probabil-ity’) summation in double-log coordinates may haveeither a concave or an approximately linear form ac-cording to whether the attention window is assumed tomatch the stimulus extent (Fig. 8) or to remain fixed(Fig. 10). The two forms are empirically distinguishablefrom threshold measurements alone. Note that, toprovide the fourth-root approximation, the fixed atten-tion window must be at least as large as the largeststimulus, and detection efficiency will necessarily beextremely low for the smallest stimuli. Because summa-tion is only probabilistic within this large attentionfield, efficiency will still be low for stimuli filling theattention field. Thus, the assumption of the fourth-

Fig. 10. Probability summation for varying numbers of samples within a fixed attention window (assumed here to allow a maximum of 1000samples). (a) Psychometric functions in log d % versus log stimulus strength. Note similarity in shape to those in Fig. 9 but with extra shifts at highuncertainties. (b) Summation as a function of ratio of number of samples to total number monitored, at the three d % criteria indicated by thehorizontal lines in (a). Thick dashed line in (b) depicts a slope of −1/4, which provides a good approximation to fixed-window probabilitysummation over most of the computed range.


Table 12AFC max summation effects in relative threshold valuesa

Number of samples Multiplicative noiseAdditive noise

Fixed attention windowIdeal attention window Increasing attention window Ideal attention window

3.9311 1.0000 1.00001 1.00003.2978 1.22350.8235 0.45222

0.74423 2.9836 1.3524 0.28962.7823 1.44214 0.21430.69632.6354 1.51040.6632 0.17135

0.63856 2.5202 1.5653 0.14372.4278 1.6107 0.12447 0.61912.3503 1.65000.6032 0.11038

0.59009 2.2823 1.6845 0.09952.2245 1.714210 0.09100.57871.8695 1.90460.5149 0.053020

0.484730 1.6827 2.0110 0.04021.5566 2.082640 0.03360.46581.4626 2.13770.4523 0.029450

0.442160 1.3889 2.1826 0.02651.3278 2.218370 0.02440.43391.2763 2.24920.4271 0.022780

0.421390 1.2313 2.2762 0.0213100 0.4163 1.1924 2.3005 0.0203

0.9475 2.45380.3871 0.01472000.3725300 0.8158 2.5398 0.0124

0.7275 2.6003400 0.01110.36290.6620 2.64410.3560 0.0102500

0.33671000 0.4762 2.7793 0.00800.32012000 2.9078 0.0064

3.07200.3015 0.004950000.289310 000 3.1868 0.00410.278520 000 3.3005 0.0035

3.44370.2658 0.002950 0003.5475 0.0025100 000 0.25733.65240.2495 0.0022200 000

0.2403500 000 3.7793 0.00193.8743 0.00171 000 000 0.2339

a Results for fixed attention window assume a window size of 1000 samples.

power approximation to attentional summation carriesthe implication that the neural system is operating atlow efficiency, and is not applicable to situations whereefficient detection performance is demonstrable.

4.3. Two-component summation and channel analysis

A classic case in both spatial and color vision is thesummation for the detection of two stimulus compo-nents as their intensities are varied relative to eachother. The results of this paradigm are plotted on adual axis plot of the contrast threshold for the pair ofcomponents when combined in a variety of ratios(Guth, 1967; Graham & Nachmias, 1971; Stromeyer &Klein, 1974; Yager, Kramer, Shaw, & Graham, 1984).These references should be consulted for the theoreticaldevelopment, but various outcomes are summarized inFig. 11a. If the two components are detected entirelyindependently (by a noiseless max rule), the detectioncontour forms a square corner (independent channels);

if they are added linearly in a single mechanism limitedby late noise (linear summation), the detection contouris a negative oblique line; if they are combined linearlybut detection is limited by independent sources ofGaussian noise in the two channels, the detection con-tour is a circular arc (squaring).

Two-component summation is an important case forchannel analysis in general, because it represents thecombination rule between adjacent channels for detec-tion by sets of channels in any domain. Channel sum-mation is often modeled as a fourth-power Minkowskirule (or pth norm) for combination over channels (Gra-ham & Nachmias, 1971; Stromeyer & Klein, 1974;Williams & Wilson, 1981; Wilson, McFarlane, &Phillips, 1983; Yager et al., 1984). The justification forthis rule is usually expressed in terms of Weibull analy-sis, which we show to be on shaky grounds, but thesituation may be reanalyzed for the 2AFC paradigmwith Gaussian noise.


The strict 2AFC probability summation prediction isbased on the case where the system takes the max onevery trial, for two channels with additive Gaussiannoise, as the relative signal strength is varied betweenthe channels (Fig. 12). The analysis of this situation isessentially an uncertainty analysis because the observeralways monitors both channels as the stimulation pro-gresses from one channel alone through both togetherto the other alone. Even at the extremes, where onlyone channel is stimulated, the joint signal always has toexceed the max of the noises in both channels. Theapplicable distributions are plotted in Fig. 12 in termsof both the max response over the two channels in eachtest interval and the difference response d between thesignals in each pair of 2AFC intervals. The responsedistributions for the two intervals are set so that thedetectability for an individual channel falls at the 75%correct position. The 2AFC uncertainty prediction maybe developed for a full range of component ratios, as isshown by the full curve in Fig. 11a. This prediction ispart-way between the linear and the square-law summa-tion rule of linear summation over sources with inde-pendent additive noise. In fact, it is well-described by apth norm (Minkowski summation rule) with a power of1.5 in the case of linear d % functions (i.e. involving noadditional uncertainty about the stimulus properties).

Although most reported cases of two-componentsummation under the 2AFC paradigm show less com-

plete summation than this probability summation pre-diction, their analysis requires consideration of the freeparameter of the slope of the psychometric function,which is rarely specified in published studies of two-component summation. When the slope is steep, oneinterpretation is that there is much additional uncer-tainty, i.e. the observer is monitoring many more chan-nels of whatever kind than are being stimulated (Fig.9a). Quantitatively speaking, most 2AFC studies reportthe exponent of the d % function to be close to 2 in thefovea (e.g. Stromeyer & Klein, 1974), which implies aratio of monitored to stimulated channels of M=116(with n=1). As can be seen in Fig. 11b, this power of2 assumption produces a curve matching a Minkowskiexponent between 3 and 4.

Another commonly reported value of the d % exponentis 3 (approximating reports of the Weibull b from 3.5to 4). This slope requires a channel monitoring ratio ofM=20 000, but this large increase generates a curvethat is sharper than that for the exponent of 4. Beyondthis range, double-precision computation was no longercapable of computing the required max distributionsfor the Gaussian function, but we could use the analyticapproximation in the form of the Poisson distributionthat we developed for this purpose (Chen & Tyler,1999). The resulting curves for d % exponents of 4 and 6show that, again, there is very little change in the shapeof the curve. Thus, one can conclude that there is no

Fig. 11. (a) Theoretical functions for two-component summation under various combination rules, assuming linearity of the psychometricfunctions. Thin curves: Minkowski summation with exponents of one, two and four and independent channels. Thick curve: max rule. (b) Thincurves show theoretical functions for two-component summation assuming accelerating psychometric functions with the levels of uncertaintyrequired to approximate psychometric slopes of 2, 3, 4 and 6. Dashed curves show Minkowski summation with exponents of 1 (linear), 2(squaring) and 4 for comparison. Note that even a steep slope of six departs substantially from the corner prediction for independent channels,so that one can expect to determine accurately whether two channels are fully independent or subject to some kind of combination rule.


Fig. 12. Derivation of the 2AFC probability summation predictionfor two-component summation. Dash-dotted curves: max distributionfor the two channels for the noise-alone interval; full curve: maxdistribution for the two channels for the signal+noise interval; heavydashed curve: distribution of the differences between the intervals. (a)Both channels equally stimulated. (b) Only one channel stimulated.

5. Effects of multiplicative noise

5.1. Multiplicati6e noise makes the psychometricfunction shallower

Instead of the classical assumption of additive noise,analysis of the noise in cortical neurons suggests that itmay have a multiplicative component, with sR in Eq.(10) increasing according to some function of thestrength of the mean signal R (Tolhurst et al., 1981). Ina general expression, the total noise in the signal maybe expressed by a power relation:

sR8kRq+sN (18)

The additive constant sN represents some irreduciblelevel of noise that is present even when there is nosignal, when the multiplicative component kRq will fallto zero. Such additive noise is a physiological requisitebecause no real system is noise-free.

Other than scaling the noise according to Eq. (18),the multiplicative analysis employs the identical ana-lytic structure of Signal Detection Theory developed inSection 3, but the results are very different. The pres-ence of multiplicative noise radically alters the expectedshape of the psychometric function derived by insertingEq. (18) into Eq. (15), in both the absence and thepresence of channel uncertainty. Even in the absence ofuncertainty, the log-log steepness of the psychometricfunction changes according to the rate of increase ofnoise with stimulus intensity. If the exponent q of thisrate of increase is 0.5, as in Poisson noise (whichgoverns the quantal fluctuations of light, for example),the psychometric slope for a single channel goes toabout 0.5 (Fig. 13a, bold curve), a striking deviationfrom what is seen empirically in psychophysical mea-surements. (Eq. (18) assumes that the noise distributionis Gaussian rather than strictly Poisson, a good approx-imation for high mean levels of quantal events.) Notethat a slope of q=0.5 represents a tremendously shal-low increase of d % with stimulus strength, implying thatthe measurable range of the psychometric functionextends over as much as two log units, the entire visiblecontrast range.

5.2. Dramatic probability summation with multiplicati6enoise

If we evaluate the effects on sensitivity of taking themax of n equally-sensitive channels in the presence ofroot-multiplicative Gaussian noise, the results are alsoprofound (Fig. 13b). Summation over the first tenchannels actually exceeds the amount expected for lin-ear summation in additive noise (see tabular valuesspecified for this condition in Table 1). This resultseems counterintuitive, but it arises because any de-crease in the signal provides a concomitant decrease in

plausible degree of steepness of the psychometric func-tion that will push the curve to the corner of the box ifprobability summation is operating. Finally, it shouldbe noted that the effect of uncertainty in increasingthe exponent is essentially equivalent to the samechange in exponent from an accelerating threshold non-linearity.

In conclusion, the analysis of the two-componentsummation paradigm in terms of the Minkowski sum-mation rule (Eq. (13)) provides an adequate approxima-tion to the 2AFC behavior in additive Gaussian noise,as long as the Minkowski exponent is not misinter-preted according to High-Threshold Theory.


the accompanying noise level. Since the effect of proba-bility summation is to allow a small decrease in thesignal to begin with, the multiplicative reduction in thenoise provides a further enhancement of the signal/noise ratio, resulting in the net improvement depicted inFig. 13b.

This dramatic degree of probability summation un-der root-multiplicative noise conditions has powerfulimplications for neural processing, which seems to begoverned by this type of principle throughout the cor-tex (Tolhurst, Movshon, & Thompson, 1981; Tolhurst,Movshon, & Dean, 1983; Vogels, Spileers, & Orban,1989). Shadlen and Newsome (1998) point out that themultiplicative behavior makes the signals at individualneurons so noisy that they cannot account for thediscriminative behavior of the animal as a whole, evenif the neuron’s response is optimal for the local stimula-tion employed. They estimate that the activity must beintegrated over 50–100 neurons to account for theobserved behavior, implying that the signal/noise ratioof the optimal neuron is about a log unit below therequired level.

However, the plot in Fig. 13b implies that a differentstrategy is available under root-multiplicative noiseconditions. Instead of integrating the activity of 100neurons, and losing the potential specificity availablefrom the elements of that assemblage, the cortex couldmonitor the activity of just ten rele6ant neurons. Takingthe max of the ten responses gives the required boost ofa factor of 10 in signal/noise ratio, equivalent to sum-ming over 100 neurons. Thus, a much smaller pool isrequired for the same gain in detectability, if the brainis capable of implementing a max rule. Such implemen-

tation seems plausible because it is the core operationof an attentional process, for which there is muchbehavioral and increasing neurophysiological evidence.In fact, a simple neural threshold has the effect ofimplementing a max rule in a psychophysical taskwhere the stimulus is reduced until the last response ofthe most sensitive neuron carries it. (The detailed effectsof a hard threshold on 2AFC performance, which arebeyond the scope of the present treatment, are dis-cussed in Kontsevich & Tyler, 1999c.)

5.3. Disambiguating multiplicati6e noise and uncertainty

Inclusion of channel uncertainty in the case wherethe noise is somewhat multiplicative has similar effectto the case of additive noise (see Fig. 9a), except thatthe entire fan of uncertainty functions is rotated tobecome shallower. Fig. 14 plots sample psychometricfunctions for square-root-multiplicative noise (the caseof p=0.5 in Eq. (18)). At first sight, it might seem thatthis result implies that the empirical effects of the noisemultiplier and uncertainty would be hard to disentan-gle. However, notice that the steepening effects of un-certainty in Fig. 14 are much reduced at high levels ofd %. Thus, the steepness at high d % (say, above the level ofd %=2), are highly diagnostic of the degree to which thenoise is multiplicative. If the fitted slope in this regionof the unmasked psychometric function is 1 or above,as in Fig. 9a, the implication is that the major compo-nent of the noise is additive. A high d % slope signifi-cantly less than 1 (Fig. 14), on the other hand, is strongevidence for multiplicative noise operating in the near-threshold region.

Fig. 13. Effects on probability summation of assuming square-root multiplicative noise according to Eq. (18) with p=0.5. (a) Shallower d %functions. (b) Dramatically enhanced summation behavior for threshold stimulation at the criterion of d %=1 that is even supralinear for smallnumbers (Upper and lower criterion levels are omitted for clarity).


Fig. 14. Effects of uncertainty on the steepness of the d % function withthe root-multiplicative noise assumption, with various values of M.Unlabeled curves have monitoring ratios in factors of ten from ten to100 000. Note that a high level of uncertainty is required before thefitted slope approaches 1.

image being viewed. Whatever the distribution of eyemovements, they will introduce some level of fluctua-tion in the response of a local filter viewing any kind ofcontrast stimulus. The resulting fluctuation is a form ofnoise that is necessarily in direct proportion to thestimulus contrast (assuming that the eye movements areindependent of contrast). This property of direct pro-portionality may be shown analytically in terms of thetemporal waveform of the signal fluctuation of theoutput of each linear filter Iki(x,y) responding somestimulus S(x,y), such as a sinusoidal grating, projectedon to the moving retina.

ri(t)=Iki(x,y)s · S(x−Dx(t),y− (Dy(t)))

=s · [Iki(x,y)S(x−Dx(t),y− (Dy(t)))] (19)

where is the convolution operator, Dx(t), Dy(t) isthe retinal shift over time and s is the scaling constantof stimulus strength.

Thus, for a given filter and eye-movement sequence,the filter output ri(t) is directly proportional to thecontrast of the stimulus, because convolution is a linearoperation. We may treat the response of the filterderived from such eye movements as a noise source bydetermining its standard deviation sE computed oversome temporal window t1:t2 according to

sE=�& t2

t 1

�ri(t)−

& t2t 1

ri(t)dt�2

dtn0.58ri(t)8s (20)

The standard deviation of this source of noise is thusdirectly proportional to the contrast of the backgrounddisplay. Such proportional noise will tend to overtakeother sources of noise that do not increase so rapidlywith contrast, and will therefore tend to dominate athigh contrast. We are not aware of any previous con-sideration of such noise.

The effect of proportional multiplicative noise (q=1in Eq. (18)) on the form of the psychometric functionsis shown in Fig. 15. The fitted slopes become evenshallower than in the case of square-root noise (Fig. 14)when the signal rises out of the additive noise regime,where the slopes approximate unity. As stimulusstrength increases, the effect is to make the functionsasymptote to a constant d % level, with no further im-provement in sensitivity at high stimulus strengths. Thishorizontal asymptote thus becomes a conspicuous sig-nature of the presence of full multiplicative noise. Suchbehavior has rarely been seen in psychometric functionsfor contrast detection (e.g. it is not evident in thehigh-contrast study of Foley & Legge, 1981), suggestingthat this type of multiplicative noise is not a usualfeature of contrast detection tasks. However, it is notclear that previous workers have designed their studiesfor careful evaluation of this high d % region of thepsychometric function, so there is room for furtherevaluation of particular situations of interest before the

There is a curious crossover in the functions in Fig.14 at low d %, where the curve for no uncertainty actu-ally shows a slightly higher threshold than the curvesfor low uncertainty. This result may seem counterintu-itive, but it arises from the necessary assumption thatthere is an additive component to the noise (Eq. (18)),which tends to reduce sensitivity as the multiplicativecomponent approaches the level of the additive compo-nent at low contrasts. As uncertainty increases, it de-emphasizes the role of the additive noise, effectivelyincreasing the sensitivity. Without this additive compo-nent, the log slope fitted to the psychometric functionwould be at 0.5 even with no uncertainty. However,there must always be a noise component that is additivewith respect to contrast due to the existence of quantalnoise in the stimulus and thermal noise in the receptors.Because these curves are governed by two free parame-ters, the particular summation function at some level onthe curves is not of canonical interest, and summationfunctions are therefore not plotted for this case. If suchmultiplicative noise is implicated in detection behavior,the role of additive and multiplicative noise compo-nents must be estimated by measurement of full psy-chometric functions.

5.4. Fully multiplicati6e noise introduces psychometricsaturation

A more extreme form of multiplicative noise is thecase where the noise sR is directly proportional to thestimulus strength (q=1 in Eq. (18)). Direct proportion-ality is not implausible, as such a form occurs in thecase of noise due to eye movement fluctuations over the


case may be considered to be settled. For example,although such noise might not be expected in a simpledetection task, it might plausibly be found in a difficultdiscrimination task where the contrast threshold is be-ing measured as a function of a slight spatial differencebetween two stimuli, with long-duration presentationsallowing eye-movement-generated noise in the pedestalstimulus to become a significant factor limiting discrim-ination performance.

For summation over increasing numbers of mecha-nisms, the curves of Fig. 15 show that the additive noiseregime (approximating a slope of 1) tends to dominatethe domain of measurable (and computable) range of d %functions. As a result, derivation of a summation curvefor this case is relatively meaningless because its formwould depend on the exact ratio of additive to multi-plicative noise assumed. When in the domain domi-nated by the fully multiplicative noise (horizontal leg ofcurves), summation is indeterminate because reductionof the signal would be accompanied by a proportionatereduction of the noise, and signal/noise ratio (discrim-inability) would be maintained at a constant level. Inthe full-multiplicative noise regime, therefore, discrim-inability is insensitive to the signal level, and thresholdcannot be determined. Only when the signal level isfinally reduced into the domain dominated by additivenoise (the left-hand region of Fig. 15) would summa-tion revert toward the form depicted in Fig. 8b.

6. Conclusion

Psychophysical measures of summation are widelyused as indexes of underlying integrative mechanisms in

visual processing. The preceding analysis provides arigorous approach to the universe of such mechanisms,detailing the properties of physiological summation,ideal observer summation and attentional summation inthe 2AFC detection paradigm, for situations of bothadditive and multiplicative noise limiting the detectiontask. The key difference between these three types ofsummation is the type of attention process accessing anarray of filters. If attention accesses a single filter, thephysiological summation within that filter predomi-nates; if attention can switch among filters matchingeach stimulus, ideal observer behavior occurs; if atten-tion can access the max response of an array of filters,the result has been described as probability summation,but we fav

Signal detection theory in the 2AFC paradigm: attention ...Vision Research 40 (2000) 3121–3144 Signal detection theory in the 2AFC paradigm: attention, channel uncertainty and probability

Documents