Attending to moving objects - PsyArXiv

Attending to moving objects

Alex O. Holcombe

Updated on 2022-03-30

2

Contents

Preface 5

Objects that move! 7

Bottlenecks, resources, and capacity 11

The biggest myth of object tracking 15

Which aspect(s) of tracking determine performance? 21

Spatial interference 23

Unitary cognition (C = 1 processes) 31

Objects and attentional spread 33

Grouping 45

Two brains or one? 49

Knowing where but not what 59

Abilities and individual differences 71

Towards the real world 77

Progress and recommendations 79

3

4 CONTENTS

Preface

Cite this as:

Holcombe, A.O. (to appear). Attending to moving objects. Cambridge Univer-sity Press.

This book, posted with chapter hyperlinks, cross-references, and embeddedmovies at tracking.whatanimalssee.com, reviews some of what we know aboutmultiple object tracking by humans. It is expected to be published by Cam-bridge University Press in their Cambridge Element series.

This is a highly abridged version of an earlier draft, which greatly exceededthe publisher’s word limit. I am re-purposing the deleted chapters on temporallimits and on serial versus parallel processing into a separate manuscript. Letme know if you’re interested in seeing that material.

Contact me (he/him) with any comments via twitter or email - [email protected]

I thank Hrag Pailian and Lorella Battelli for helpful comments, and also thankHrag Pailian for providing high-resolution figures of his work.

© Alex O. Holcombe 2022

5

tracking.whatanimalssee.com

https://www.cambridge.org/core/what-we-publish/elements/elements-in-perception

https://twitter.com/ceptional

mailto:[email protected]

mailto:[email protected]

6 CONTENTS

Objects that move!

Attention was one of the earliest topics of scientific psychology. By “scientificpsychology”, I mean the tradition of laboratory studies that arguably began inthe late 19th century. In 1890, William James famously provided a definition ofattention that began with “It is the taking possession by the mind, in clear andvivid form, of one out of what seem several simultaneously possible objects ortrains of thought.” At this time, plenty of experiments were being done, usingcustom apparatus such as devices that measured response time and presentedauditory and visual stimuli.

Figure 1: A ’complication apparatus’ from the Harvard laboratory of HugoMunsterberg. This instrument was used to measure the effect of attention toone stimulus on responses to another. A subject who focused on one of thenumbers on the large dial would have a delayed reaction to the sound of thebell, and vice versa.

Hugo Münsterberg, one of the first with a laboratory that studied attention,was highly interested in attention and moving stimuli. His 1916 book ThePhotoplay: A Psychological Study described his theory of the cinema, andincluded a twenty-page chapter on attention.

7

8 OBJECTS THAT MOVE!

When psychology and the study of attention grew rapidly following World WarII, the study of visual attention was dominated by static stimuli, often stimulipresented briefly with a tachistoscope. Very few researchers used moving ob-jects, and this continued until the 1980s. This was, in part, a technology issue.Scientific laboratories, or psychology laboratories at least, lagged the technologybeing introduced to arcades and even of homes. In 1979, the first popular homegame console, the Atari, introduced the game Asteroids.

Figure 2: Asteroids was released by Atari in 1979.

Paying the game meant shooting and dodging asteroids that came from alldirections. Avoiding a collision seems to require monitoring more than oneasteroid at a time. Nothing like this would be studied scientifically, however,until much later.

Psychologists were slow to take up the programming of computers to presentstimuli, and even slower to use them to create moving stimuli. In the 1970s,Zenon Pylyshyn was pondering the possibility of a primitive visual mechanismcapable of “indexing and tracking features or feature-clusters” (as he put itin Pylyshyn and Storm (1988); I haven’t been able to get copies of the 1970sreports that he mentions) as they moved. By the mid 1980s he was programmingan Apple II+ computer, when it also became an exciting addition to my ownhousehold. I rapidly accumulated a library of games on floppy disks.

9

Pylyshyn and Storm (1988) announced that they had programmed their AppleII+ to create a display with ten identical objects moving on random trajectories,connected to a telegraph key with a timer to record response times. Theyalso used an early eyetracker, and a movement of the eyes away from fixationtriggered termination of a trial. Thus they were able to investigate the abilityto covertly, without eye movements, keep track of moving objects.

In the associated experiments, up to five of the ten moving objects were des-ignated as targets by flashing at the beginning of the trial. The targets thenbecame identical to the remaining moving objects, the distractors, and movedabout randomly. It is immediately apparent that one can do this. While view-ing the display, people report having the experience of being aware, seeminglycontinually, of which objects are the targets and how they are moving about.In the movie embedded below, one is first asked to track a single target to be-come familiar with the task, and then subsequently four targets are indicated,at different speeds.

Figure 3: A demonstration of the multiple object tracking (MOT) task, createdby Jiri Lukavsky.

Aside from the important demonstration that people could do the basic task,the main result of Pylyshyn and Storm (1988) was that the processes thatunderlie tracking are limited in how many targets they can faithfully track. In

https://www.youtube.com/embed/lAQM4QJRYV8

10 OBJECTS THAT MOVE!

their experiments, Pylyshyn and Storm (1988) periodically flashed one of themoving objects, and if that object was a target, the participant was to press thetelegraph key. On trials with more targets, errors were much more common,from just 2% of target flashes missed when only one of the ten objects was atarget, to 14% of target flashes missed when five of the objects were targets.

The notion of keeping track of moving objects is familiar to many of us fromeveryday life. If you’ve ever been responsible for more than one child while atthe beach or at the park, you know the feeling of continuously monitoring thelocations of multiple moving objects. If you’ve ever played a team sport, youmay know the feeling of monitoring the positions of multiple opponents at thesame time, perhaps including a player with the ball and a player they mightpass the ball to. If you’ve ever been to a scientific conference, you may knowthe feeling of monitoring the position and posture of one researcher relative toa few others they are chatting with, in order to best time your approach.

.

In other books or chapters about visual attention, you won’t find much aboutobject tracking. The study of visual attention is still dominated by stimuli thatdon’t move. Most cognitive psychology researchers have, however, heard a fewthings about object tracking. But much of what they probably heard is wrong.In this book, we’ll bust a few myths about tracking and see that studying it hasyielded some unique insights about our limited capacities.

This book is far from a comprehensive review. I originally wrote a much longerbook, but had to delete entire chapters to keep this below the word limit. InChapter I list some of what’s left out and where to learn more.

Bottlenecks, resources, andcapacity

Quickly, what is fourteen times eleven? Calculating that in your head wouldprobably take you at least a few seconds. And if I set you two problems ratherthan just one, for example by asking you to also divide sixty-eight by seventeen,I’m sure you would do those problems one at a time. Indeed, our minds maybe completely incapable of doing two such problems simultaneously (Oberauer,2002; Zylberberg et al., 2010).

Such limitations are remarkable given that each of our brains contains morethan 80 billion neurons. The problem is not a lack of neurons, really, but howthey are arranged - our mental architecture.

Multiplying and dividing two-digit numbers feels like a rather abstract problem,and not something we do each day. So you might think that if we were doinglots of such problems each day, we could do more than one at a time. This isprobably wrong - consider that a task we do have daily practice with is reading.Yet despite years of reading dozens if not hundreds of words a day, a large bodyof evidence suggests that humans can read at most only a few words at a time,and much research further indicates that we can really only read one word ata time (White et al., 2018; Reichle et al., 2009). It seems, then, that at leastsome of the bottlenecks of human information processing are a fixed propertyof our processing architecture.

To flesh out the way I’m using the word ‘bottleneck’, consider a standard softdrink bottle. Most of the volume of the soft drink is in the bottom, wider partof the bottle. If you invert the bottle, most of its liquid contents will press downon the neck. The narrowness of the neck restricts the rate at which the contentscan exit the bottle. Similarly, a large volume of signals from sensory cortexascending the cortical hierarchy would press up against higher areas that aremore limited in capacity. The large volume of visual signals is partly a result ofthe retinotopic organization of the retina and sensory cortex - each has multipleneurons dedicated to each bit of the visual field, all working in parallel.

The parallel processing happening in sensory cortices gets a number of tasks

11

12 BOTTLENECKS, RESOURCES, AND CAPACITY

done, so that higher stages can save the trouble. These tasks include the en-coding of motion direction throughout the visual field, as well as color andorientation. A local and regional differencing operation happens for these fea-tures, resulting in salience, such that odd features become conspicuous in ourvisual awareness. In the below display, for example, you should be able to findthe blue objects very quickly.

Figure 4: Thanks to featural attention, you should be able to find the bluecircles very quickly.

Orienting mechanisms direct attention to colors or orientations that differ fromtheir surroundings. This is only possible because massively-parallel retinotopicbrain areas have orientation- and color-tuned neurons. For other judgments,higher, post-bottleneck brain areas that are very limited in capacity are critical.The visual word form area of in the occipitotemporal sulcus of the left hemi-sphere needed to recognize words is one example (White et al., 2019). Beinglimited in processing capacity to just one stimulus, the word-recognizing processwill not recognize a word in a crowded scene until something directs to it thevisual signals from a word. We often use the term selective attention to refer tothis “something” that directs particular visual signals through the bottleneckto limited-capacity processes. If there were no bottlenecks, there would be noneed for selection, until the point of action (Allport, 1987).

So far the picture I have painted is one of entirely feed-forward processing, witha torrent of visual signals impinging on a narrow bottleneck of signals thatcontinue onward. But cortical processing is rarely a one-way street, and theway visual attention works is no exception. Visual attention seems to work inlarge part by biasing processing within visual cortices, rather than leaving thatunchanged and blocking all but a few signals at a later bottleneck stage. It maybe the limits on the control signals from high-level (possibly parietal) cortexthat restrict processing capacity as much as does a structural convergence ofvisual signals as they penetrate the brain.

A resource metaphor is more apt if limited capacity is a result of a finite pool of

13

neural resources in parietal cortex that biases which visual signals are cognitivelyprocessed. I suspect that neither way of thinking is entirely correct. Restrictedneural populations for controlling visual processing may be a limiting factor,but so may be neural convergence, amounting to a bottleneck-like aspect ofneural architecture. Thus I will sometimes use the term “limited resource”when referring to how we are restricted in how many visual representations areprocessed.

The word “resource” has the useful connotation that people can choose how toapply their finite processing capacity, as a resource is something that is availableand can be used in different ways. For example, it suggests that one can usethree-quarters of one’s processing capacity for one target while using the otherquarter for a second target. As mentioned previously, there is evidence thatpeople can favor one target over another when tracking both (Chen et al., 2013;Crowe et al., 2019).

While word recognition seems to be able to process only one stimulus at atime, other visual judgments may be limited in capacity relative to sensoryprocessing, but have a capacity greater than one. Object tracking seems to beone such ability. Certainly by the coarse performance measures typicaly used inthe literature, the processing involved in tracking multiple objects seems to besimultaneous, although researchers haven’t fully ruled out the possibility thattracking multiple objects happens via a one-by-one process that rapidly switchesamong the tracked objects.

The existence of very limited-capacity processes illuminates why object trackingis important. By having the location of moving objects of interest in mind, weare able to rapidly deploy all our attentional resources to any one of the objects.This allows more limited-capacity processes to deliver rapid results.

limit.

14 BOTTLENECKS, RESOURCES, AND CAPACITY

The biggest myth of objecttracking

The issue of how many objects we can track is an important one for under-standing tracking’s role in natural behavior as well as the performance of labo-ratory tasks. Unfortunately, some major misconceptions about this have becomewidespread. As examples, when writing about the “object tracking system”, Pi-azza (2010) stated that “One of the defining properties of this system is thatit is limited in capacity to three to four individuals at a time”, and Fougnieand Marois (2006) wrote that “People’s ability to attentively track a number ofrandomly moving objects among like distractors is limited to four or five items”. This is a myth, one that is perpetuated by more recent papers with statementslike “participants can track about four objects simultaneously” (Van der Burget al., 2019)Two claims are often behind these statements, as is evident in the more explicitstatements of Doran and Hoffman (2010), who wrote that “the main finding”from the object tracking literature “is that observers can accurately track ap-proximately four objects and that once this limit is exceeded, accuracy declinesprecipitously.” The first claim is that there is a constant limit of around fourobjects, and the second is that accuracy declines precipitously when the limitis reached. The vaguer statements in other papers, such as “researchers haveconsistently found that approximately 4 objects can be tracked” (Alvarez andFranconeri, 2007) and “people typically can track four or five items” Chesneyand Haladjian (2011) tend to support both beliefs in the mind of an unsus-pecting reader. For each of the quotations in this paragraph and the precedingparagraph, I have checked the evidence provided, and the papers cited, as well asthe papers those cited papers cite. Each paper provides no evidence supportingthe claim that performance decreases very rapidly once the number of targetsis increased above some value. A gradual decrease in performance is seen asnumber of targets is increased, with no discontinuity; not even a conspicuousinflection point. For example, Oksama and Hyönä (2004), one of the paperssometimes cited in this context, tested from two targets to six targets . Afterthe five-second phase of random motion of the objects, one object was flashedrepeatedly and participants hit a key to indicate whether they thought it was

15

16 THE BIGGEST MYTH OF OBJECT TRACKING

one of the targets. The proportion of trials in which participants were wrongincreased steadily with target size, from 3% incorrect with two targets, to 16%incorrect with six targets.

Pylyshyn and Storm (1988) may be the paper most frequently invoked whena limit of four objects is claimed. But Pylyshyn and Storm (1988) found aquite gradual decrease in performance (their Figure 1) as the number of targetswas increased from one to five, and five was the most targets that they tested.And nowhere did Pylyshyn and Storm (1988) state that there is a value beyondwhich performance declines rapidly. In 1994, early in a paper Pylyshyn et al.(1994) wrote that people can track “at least four” objects, but later in thepaper they made a statement that could be interpreted as suggesting a limit offour objects, writing that it is “possible to track about four randomly movingobjects” (Pylyshyn et al., 1994). I suspect that this sort of slide toward backinga hard limit is caused in part by the desire for a simple story. It may alsostem from an unconscious oversimplification of one’s own data, as well as thetheoretical commitment of Pylyshyn to the idea that tracking is limited by aset of discrete mental pointers.

Recall that this myth is associated with two claims, both that there is a “limit”after which performance decreases rapidly, and that this limit is consistentlyfound to be about four. Because the published evidence indicates that thefirst claim is completely false, let’s put that aside and consider a soft versionof the second claim, one that seems plausible. Specifically, is it the case thattracking performance falls to some particular level, such as 75% correct, atabout four targets, even if this does not mark a hard limit or inflection point?And instead of a performance criterion like 75%, one might alternatively usea criterion like the halfway point from ceiling to chance (Holcombe and Chen,2013), or the “effective number of items tracked”, calculated by applying aformula to percent correct with the number of targets and distractors (Schollet al., 2001). Charitably, this may be what Alvarez and Franconeri (2007) meantwhen they wrote: “researchers have consistently found that approximately 4objects can be tracked (Intriligator & Cavanagh, 2001; Pylyshyn & Storm, 1988;Yantis, 1992)”, and the early studies they cited are arguably compatible withthis statement. However, work published over the last fifteen years has revealedthis to be an artifact of researchers using similar display and task characteristics.That is, findings that approximately four objects can be tracked based on someperformance criterion are arguably just an accident of researchers using similardisplay parameters. One of the most salient of these parameters is object speed.

The enormous influence of object speed in conventional trackig displays wasmade clear by Alvarez and Franconeri (2007), who tested participants with adisplay of sixteen wandering discs. The fewer the number of discs designatedas targets, the faster the participants could track each target. In other words,when there were only a few targets, participants could track them even whenthey moved at high speeds. At very high speeds, participants could track onlyone object with reasonable accuracy. This indicated that the truth of the idea

17

that participants can track four objects is entirely dependent on the speed ofthose objects. Additional evidence for this was found by my lab, among others(Holcombe and Chen, 2012; Feria, 2013). Soon, a few other display parametersthat strongly affect the number of objects that can be tracked were discovered,in particular object spacing (Franconeri et al., 2008; Holcombe et al., 2014).

In summary, it is incorrect to say that people can track about four movingobjects, or even that once some number of targets is reached, performance de-clines very rapidly with additional targets. The number that can be trackedis quite specific to the display arrangement, object spacing, and object speeds.If a researcher is tempted to write that “people can track about four objects”,they should stipulate that this refers to certain tasks, display characteristics,and performance measures

This same issue of how to characterize a human cognitive limit has long be-deviled the study of short-term memory, a literature in which the most fa-mous papers is arguably “The magical number seven, plus or minus two: Somelimits on our capacity for processing information” (Miller, 1956). At an in-ternational workshop in 2013, two dozen working memory researchers met todevelop “benchmarks for models of working memory”. One of the issues theygrappled with was how to characterize the limit on how many items people canremember. In the paper that came out of the workshop, the researchers pointedout that “observed item limits vary substantially between materials and testingprocedures” (Oberauer et al., 2018). They suggested, however, that much ofthis variability could be explained by humans’ ability to store groups of itemsas “chunks” and thus the group endorsed a statement that there is a limit of“three to four chunks” (Cowan, 2001). In the case of short-term memory, then,the observed variability in experiments’ results can potentially be explained bya common underlying limit of three to four chunks that manifests as differentobserved item limits depending on circumstances (in particular, the opportunityfor chunking).

In the case of MOT, it is possible that researchers will be able to identify aset of circumstances that consistently yield a mean tracking limit, in the modalhuman, of three or four targets, if “limit” is defined as performance falling to aparticular level on some performance metric. Probably these circumstances willsimply be certain spacings, speeds, object trajectories, and number of objectsin a display. It would be nice if some underlying construct, the counterpart ofmemory’s “chunks”, would be identified to explain how performance changeswhen other circumstances are used. That would constitute real progress intheory development. However, I don’t see anything like that in the literaturecurrently.


Different tasks, same limit?

Even with the idea that there is a particular number of objects one can trackdiscarded, there remains a related claim, one common in the literature, thatmight remain viable. This claim is frequently tangled up in the myth reviewedabove, for example it may be stated as the idea that there is a “magical numberfour”. After discarding the attachment to the idea of a number that does notvary with circumstances, the remaining notion is that very different tasks havethe same number-of-objects limit when tested in comparable circumstances.For example, Bettencourt et al. (2011) stated that visual short-term memoryand <OT show “an equivalent four-object limit”, and Piazza (2010) similarlyclaimed that visuo-spatial short-term memory, ultra-rapid counting (subitizing),and multiple object tracking share a common limit of “three or four items”.

Perhaps the leading candidate for a hard limit in cognitive psychology is the“subitizing” or near-exact numerosity perception limit mentioned in the previouschapter. Many researchers consider the idea of a sudden decrease in accuracywhen the number of objects shown goes from less than four to more than fouras well-supported (Revkin et al., 2008). Four objects and fewer is frequentlyreferred to as the “subitizing range”, with performance approximately as goodfor rapidly counting four objects as it is for two or one. Note that this is verydifferent than in tracking, for which speed thresholds decline rapidly from oneto two targets, as well as to three and four. For visual working memory, whichas we mentioned several experts have characterized as being limited to three orfour chunks, whether there is a discontinuity after four objects or at any pointremains highly debated (e.g. Robinson et al., 2020).

Overall then, it is doubtful that tasks such as object tracking, visual workingmemory, and subitizing can be said to have a common limit. Ideally someonewould assess this by measuring the limits for all three tasks using the samestimuli, but it is unclear how to equate the information available across tasks.Especially difficult is comparing performance with the briefly-presented staticstimuli used in subitizing and working memory tasks to the extended exposuresof moving stimuli needed to assess object tracking. A stronger understandingof the processes mediating tracking would be required to model performance ofthe two tasks using a common framework via which they could be compared atan underlying psychological construct level. However, another approach is tomeasure the tasks of interest in large numbers of individuals and see whetherthe different task limits strongly co-vary between individuals. The findings fromthis approach are reviewed in section .

To summarize this chapter, there are three common misconceptions about whathas been shown about object tracking. The one that we have just discussedis that different tasks show the same limit. This may actually be true, butI know of no research that have used the same display and task settings toadequately back this up. In the previous section of this chapter, we saw thatit is not justified to say that as the number of targets is increased to about

DIFFERENT TASKS, SAME LIMIT? 19

four, performance falls to a certain criterion level. One must specify particulardisplay and task characteristics in order to make that statement true. A finalmisconception is that performance falls very rapidly when one increases thenumber of targets past a particular number (the “limit”).

Given that tracking performance does depend greatly on circumstances and fallsgradually rather than displaying a discontinuity at a particular target number,what are the implications for how tracking works?


Which aspect(s) of trackingdetermine performance?

While Pylyshyn theorized that tracking is accomplished by a fixed and discreteset of pointers, the dependence on display characteristics of the number of ob-jects one can track hints that the underlying process may be continuous andflexible rather than determined by a fixed, discrete set. Based on the flexibleresource metaphor, a person might be able to apply more resource to particulartargets to reduce the deleterious influence of object speed or another factor.There is good evidence for this (e.g. Chen et al., 2013), which will be discussedlater, but here I would like to explain the resource concept more, and make animportant distinction.

To understand why we can track several objects in some circumstances, but onlya few in others, we must distinguish between display factors that impose datalimitations on tracking, and display factors that impose resource limitations.

The “data” in data limitation refers to sensory data (Norman and Bobrow,1975). If a target moving on an unpredictable trajectory moves outside theedge of our visual field, it is the absence of data that prevents it from beingtracked. No amount of mental resources can overcome having no data, at leastfor an unpredictable stimulus. Data limitations may also occur when sensorysignals are impoverished, even if not entirely absent. For example, it is a datalimitation that prevents tracking when an object travels at such a fast rate thatour neurons hardly register it. So, by data here we are referring not to the rawactivation pattern of our photoreceptors, but rather the visual signals availableafter additional preattentive sensory processing.

People with poor visual acuity may perform less well on a task than peoplewith high visual acuity, due to differences in the sensory data that they have towork with. Thus, some individual differences are almost certainly due to datalimitations rather than variation in tracking processes between people. Whenperformance is data-limited, bringing more mental resources to bear provideslittle to no benefit. The most popular way of investigating this is by varying thenumber of stimuli one needs to process, as in visual search studies. If the numberof stimuli one must evaluate does not affect how well a person can perform a

21

22 WHICH ASPECT(S) OF TRACKING DETERMINE PERFORMANCE?

task, this suggests that the task is data-limited rather than resource-limited,because performance is evidently the same regardless of the proportion of theputative resource can be devoted to it.

Resource-limited processing is typically more interesting for those interested invisual attention and the capacity limits on mental processing. If response timeor error rate increases with the number of distractors presented in a visual searchtask, that is classically interpreted as meaning that a resource-limited processis required for success at the task. However, science is hard - an elevation ine.g. error rate can also occur even if there is no resource limitation, if each ad-ditional distractor has a non-zero probability of being mistaken for a distractor,yielding more errors with more distractors even if the probability of successfullyevaluating each individual stimulus remains unchanged (Palmer, 1995).

The particular number of objects that can be tracked with reasonable accuracyis thus highly dependent on stimulus conditions, and some of these conditionsmay reflect data limitations rather than a resource limitation. Still, even inideal conditions it seems clear that the number of objects that can be trackedis much less than the number of objects that are simultaneously processed byearly visual areas. In other words, there is some sort of resource limitation.

We’d like to know what factors consume the resource. I’ll also be using theterm “resource-intensive” - if a deleterious stimulus factor is resource-intensive,that means that increasing the amount of resource devoted to the stimuluscan compensate for that stimulus factor. One example is speed of the movingtargets. An increase in targets’ speed can hinder performance, but reducingthe number of targets, which provides more resource to the remaining targets,can make up for that. Moreover, if one object moves faster than another, itconsumes more resource. The evidence for that is that the addition of a fast-moving target hurts tracking performance for a first target more than does theaddition of a slow-moving target

Speed, then, appears to be resource-intensive. Speed also can result in a data-limitation, at very high speeds, but long before such speeds are reached, speedis resource-intensive.

One should not assume, however, that when one manipulates something abouta display, that something is the only thing that changes. Increasing the speedof the objects in a display can also result in more close encounters betweentargets and distractors, unless one shortens the duration of the trial to equatethe total distance the objects travel. Thus, it could be that dealing with spatialinterference is what consumes resource, rather than speed. This brings us tothe next chapter, which is all about spatial interference.

Spatial interference

Details of the world that are much smaller than ourselves, like the fibers thatmake up paper, or the individual ink blotches laid down by a printer, are in-accessible to the naked eye. This is a familiar limit on our visual abilities, onethat is measured when we go to the optometrist. Line segments or objects thatare too close together are experienced as a single unit.

Even when two objects are spaced far apart enough that we perceive them astwo objects rather than one, they are not processed entirely separately by thebrain. Receptive fields grow larger and larger as visual signals ascend the visualhierarchy, and this can result in a degraded representation for objects that arenear each other. This is an example of spatial interference.

A large body of psychophysical work has investigated how spatial interferenceimpairs object perception at high display densities (e.g., Wolford, 1975; Korte,1923; Strasburger, 2014).

In the above display, if you gaze at the central dot, you likely will be ableto perceive the middle letter to the left fairly easily as a ‘J’. However, if whilekeeping your eyes fixed on the central dot you instead try to perceive the centralletter to the right, the task is much more difficult. This spatial interferencephenomenon is called “crowding” in the perception literature.

Most studies of crowding ask participants to identify a letter or other stationarytarget when flanking stimuli are placed at various separations from the target.The separation needed to avoid crowding varies depending on the display spatialarrangement, but on average is about half the eccentricity of the target; theinterference diminishes rapidly as separation increases beyond that (Bouma,1970; Gurnsey et al., 2011). In the display above, for example, the letters onthe same side as the ‘J’ are separated from it by more than half the ’J’s distancefrom the fixation point, so they have little to no effect on its identification.Setting the targets and distractors in motion has little effect (Bex et al., 2003).

When flankers are presented close to a target, they not only prevent identifi-cation of the target, they can also prevent the target from being individuallyselected by attention, and apparently also for multiple object tracking (Intrili-gator and Cavanagh, 2001). When a target and distractor are too close to be

23

24 SPATIAL INTERFERENCE

O OJ S R L H Y M S

Figure 5: When one gazes at the central dot, the central letter to the left is notcrowded, but the central letter to the right is.

distinguished by tracking processes, the tracking process may end up trackinga distractor rather than the target.

Crowding happens frequently in the displays typically used by MOT researchers,in that in most experiments, objects are not prevented from entering the targets’crowding zones (as mentioned above, about half the stimulus’ eccentricity). It isnot surprising, then, that in typical MOT displays, greater proximity of targetsand distractors is associated with poor performance (Shim et al., 2008; Tombuand Seiffert, 2008).

Spatial interference does not explain why track-ing many targets is more difficult than trackingonly a few

In 2008, Steven Franconeri and colleagues claimed that spatial interference isthe only reason why performance is worse when more targets are to be tracked(Franconeri et al., 2008). In the previous chapter, we introduced the idea of a re-source that, divided among more targets, results in worse tracking performance,Franconeri suggested that this does not exist, or at least, for tracking the onlything that becomes depleted with more targets is the area of the visual field notundergoing inhibition; the inhibition stemming from the inhibitory surroundthat surrounds each tracked target (Franconeri et al., 2008, 2013a, 2010). The

SPATIAL INTERFERENCE DOES NOT EXPLAIN WHY TRACKING MANY TARGETS IS MORE DIFFICULT THAN TRACKING ONLY A FEW25

inhibitory surrounds of nearby targets interfere with each other. A provocativeimplication of their theory, they pointed out, was that “there is no limit on thenumber of trackers, and no limit per se on tracking capacity” (Franconeri et al.(2010), p.920); a very large number of targets could be tracked because as longas targets could be kept far from each other, they could be tracked. JoiningFranconeri in making this claim was Zenon Pylyshyn himself as well as othersenior visual cognition researchers including James Enns, George Alvarez, andPatrick Cavanagh, my PhD advisor.

In testing this theory, Franconeri et al. (2010) unfortunately did not isolate theseparation between objects from other display variables. MOT studies gener-ally rarely control the retinal separation among the objects in a display, evenwhen they require participants to fixate at the display center. Franconeri et al.(2010), for example, kept object trajectories nearly constant in their experi-ments but varied the total distance traveled by the objects (by varying bothspeed and trial length), on the basis that if close encounters were the only causeof errors, they should be proportional to the total distance traveled. The re-sult was that performance did decrease with distance traveled, but there waslittle effect of the different object speeds and trial durations that they used.This was taken as strong support for the theory that only spatial proximitymattered. As Franconeri et al. (2010) stated, their hypothesis that “barringobject-spacing constraints, people could reliably track an unlimited number ofobjects as fast as they could track a single object” constituted a “simple andfalsifiable hypothesis”.

To test Franconeri’s simple and falsifiable hypothesis, in 2012 my student Wei-Ying Chen and I conducted several experiments. We kept the objects in a dis-play widely separated to avoid the spatial interference that Franconeri claimedwas the only factor that could prevent people from tracking many targets. Inone experiment, we used an ordinary computer screen but created a wide-fielddisplay by having participants bring their noses quite close to it. This allowedus to keep targets and distractors dozens of degrees of visual angle from eachother (Holcombe and Chen, 2012). The basic display configuration is shown inFigure 6.

We found that even when all the objects in the display were extremely widely-spaced, speed thresholds declined dramatically with the number of targets. Tous, this appeared to falsify the theory of Franconeri et al. (2010). For the 2011Vision Sciences Society conference where we reported these findings, we entitledour poster “The resource theory of tracking is right! - at high speeds one mayonly be able to track a single target (even if no crowding occurs)”. What wemeant was that each target takes up some of a very limited processing capacity- a resource that was attentional in that it could be applied anywhere in thevisual field, or at least anywhere within a hemifield (). The amount of thisresource applied to a target determines the fastest speed at which a target canbe tracked.

Unconvinced by the findings of Wei-Ying Chen and I, Franconeri et al. (who


Figure 6: In experiments by Holcombe and Chen (2012), after the targets werehighlighted in white, all the discs became red and revolved about the fixationpoint. During this interval, each pair of discs occasionally reversed their direc-tion. After 3–3.8 s, the discs stop, one ring is indicated, and the participantclicks on one disc of that ring.

SPATIAL INTERFERENCE DOES NOT EXPLAIN WHY TRACKING MANY TARGETS IS MORE DIFFICULT THAN TRACKING ONLY A FEW27

did not say why) continued to push their spatial interference theory of tracking,and moreover they took the idea of spatial interference much further, suggestingthat the same basic idea could also explain capacity limits on object recognition,visual working memory, and motor control, writing that “competitive interac-tions are the roots of capacity limits for tasks such as object recognition andmultiple object tracking” (p.2) and that capacity limits arise only because “itemsinteract destructively when they are close enough for their activity profiles tooverlap” (p.2) (Franconeri et al., 2013a). Yet to explain the Holcombe and Chen(2012) results, the spatial interference posited by Franconeri would have to ex-tend over a very long distance, farther than anything that had been reported inbehavioral studies. Furthermore, if there were such long-range spatial gradientsof interference present, they should have shown up in the results of Holcombeand Chen (2012) as worse performance for the intermediate spatial separationswe tested than for the largest separations.

Franconeri et al. (2013b) addressed these problems with their theory in a replyto a letter I wrote in response to Franconeri et al. (2013a). Franconeri et al.(2013b) pointed to neurophysiological recordings in the lateral intraparietal area(LIP) of rhesus macaque monkeys. A study by Falkner et al. (2010), presentedmonkeys with a stimulus and the monkey was subsequently cued to executea saccade to that stimulus by offset of the fixation point. In some trials an-other stimulus was flashed 50 ms prior to the offset of the fixation point. Thatflashed stimulus, which was positioned in the receptive field of an LIP cell theresearchers were recording from, allowed researchers to show that the LIP cell’sresponse was suppressed relative to trials with no saccade target. This was trueeven when the saccade target was very distant - statistically significant impair-ment was found for separations as large as 40 deg for some cells. There was aspatial gradient to this interference, but the data suggested it could be quiteshallow, allowing Franconeri et al. (2013b) to write that “levels of surround sup-pression are strong at both distances, and thus no difference in performance isexpected” for the separations tested by Holcombe and Chen (2012). Franconeriet al. (2013b) was published as an “online comment” at Trends in CognitiveSciences, as a reply to my letter that the editor had also suggested I post asan online coment. Unfortunately, some time later both comments were lost bythe publisher, Elsevier, when they migrated their system. In the case of mycomment, I found an old draft on my computer, updated it slightly, and postedit at Holcombe (2019).

The neural suppression documented by Falkner et al. (2010) has one propertythat strongly suggests it is not one of the processes that limit our ability totrack multiple objects. Specifically, Falkner et al. (2010) found that nearly asoften as not, the location in the visual field that yielded the most suppressionwas not in the same hemifield as the receptive field center. But as we will see in, the cost of additional targets in attentional tracking is largely independent inthe two hemifields. Evidently, then, LIP suppression is not what causes worseperformance when there are more targets. Instead, as Falkner et al. (2010)themselves concluded, these LIP cells may help mediate a global (not hemifield-


specific) salience computation for prioritizing saccade or attentional targets,wherever they are in the visual field.

In light of all the above, it seemed the evidence ruled against the idea thatspatial interference was the sole reason that people perform worse with moretargets. Moreover, to accommodate the results we reported in Holcombe andChen (2012), the spatial interference account advocated by Franconeri et al.(2013b) seemed to have been watered down until it was practically indistin-guishable from a conventional resource theory – if spatial interference extendedover an entire visual field (or hemifield) with no detectable diminution at largeseparations relative to small separations, then it no longer seemed appropriateto refer to it as “spatial” interference. Instead, finite processing capacity mightbe both a more parsimonious and straightforward description.

Having failed to find evidence for long-range spatial interference, I decided toinvestigate the form of spatial interference that I was confident actually existed:short-range interference. Previous studies of tracking did not provide muchevidence about how far that interference extended - either they did not con-trol eccentricity (e.g., Feria (2013)) or they only tested a few separations (e.g.,Tombu and Seiffert (2011)).

In experiments published in 2014, we assessed tracking performance for twotargets using several configurations and separations between the targets’ tra-jectories (Holcombe et al., 2014). Performance improved with separation, butonly up to a distance of about half the target’s eccentricity, similar to what wasfound in the literature on crowding (Strasburger, 2014). In a few of our exper-iments there was a trend for better performance even as separation increasedbeyond the crowding zone, but this effect was small and not statistically signif-icant. These findings supported my suspicion that spatial interference is largelyconfined to the crowding range, and that the performance deficit when thereare more targets to track is caused by a limited processing resource.

The experiments we reported in Holcombe et al. (2014) did yield one findingthat surprised us. In the one-target conditions, outside the crowding rangewe found that performance decreased with separation from the other pair of(untracked) objects. This unexpected cost of separation was only statisticallysignificant in one experiment, but the trend was present in all four experimentsthat varied separation outside the crowding range. This might potentially beexplained by the configural or group-based processing documented by Bill et al.(2020) and others, as grouping of distant elements is usually weaker than fornearby elements (e.g., Kubovy et al., 1998).

The mechanisms that cause spatial interference

As explained in the beginning of this chapter, one cause of short-range spatialinterference is simply lack of spatial resolution by the processes that mediate

THE MECHANISMS THAT CAUSE SPATIAL INTERFERENCE 29

tracking. If a process cannot distinguish between two locations, either because ofa noisy representation of those locations or because the two locations are treatedas one, then a target may often be lost when it comes too close to a distractor.This would be true of any imperfect mechanism, regardless of whether it isbiological or man-made. The particular way that the human visual system isput together, however, may result in forms of spatial interference that do notoccur in, for example, computer algorithms engineered for object tracking.

As described in , our visual processing architecture has a pyramid-like struc-ture, with processing at the retina being local and massively parallel, and thengradually converging such that neurons at higher stages have receptive fieldsresponsive to large regions. At these higher stages processes critical to taskslike tracking or face recognition occur. Face-selective neurons, for example, aresituated in temporal cortex and have very large receptive fields. For tracking,while the parietal cortex is thought to be more important than the temporalcortex, the neurons in these parietal areas again have large receptive fields.

A large receptive field is a problem when the task is to recognize an object inclutter or to track a moving target in the presence of nearby moving distractors.In the case of object recognition, without a mechanism to prevent processing ofthe other objects that share the receptive field, object recognition would haveaccess to only a mishmash of the objects’ features. Indeed, this indiscrimi-nate combining of features is thought to be one reason for perceptual illusoryconjunctions of features from different objects. Similarly, for object tracking,isolating the target is necessary to keep it distinguished from the distractors.

In principle, our visual systems might include attentional processes that whenselecting a target can completely exclude distractors’ visual signals from reach-ing the larger receptive fields. Actually implementing such a system using real-istic biological mechanisms with our pyramid architecture, however, looks to bedifficult (Tsotsos et al., 1995). Neural recordings reveal that while the signalsfrom stimuli irrelevant to the current task are suppressed somewhat, they stillhave some effect on neural responses. The particular mechanism used by ourvisual system does appear to include active suppression of a region around a tar-get, even if these effects are not large or spatially extensive enough to explainwhy we can only track a limited number of objects. The computer scientistJohn Tsotsos has long championed surround suppresssion as a practical way forhigh-level areas of the brain to isolate a stimulus in their large receptive fields.Such suppression likely involves descending connections from high-level areasand possibly recurrent processing (Tsotsos et al., 2008).

Note that on Tsotsos’ account, it is only targets, not distractors, that have aregion of suppression surrounding them. While the attempt of Franconeri et al.(2010) to attribute all of the cost of tracking additional targets to surroundsuppression among targets appears to have been misguided, in Holcombe et al.(2014) we did find some tentative evidence supporting a greater range of in-terference in the two-target condition compared to the one-target condition.Again, the effect was small relative to the total additional-target performance


cost. It appears that overlapping surround suppression associated with targetsmay impair tracking, but the spatial range of this does not extend much beyondthe classic crowding range.

Crowding, the impairment of object identification by nearby objects, has beenstudied more extensively than the spatial interference associated with objecttracking. Like for tracking, however, the possibility of a suppression zone specificto targets remains understudied, as very few studies of crowding have variedthe number of targets. I have found one study of the identification of briefly-presented stimuli which found that attending to additional gratings within thecrowding range of a first grating resulted in greater impairment for identifying aletter (Mareschal et al., 2010). This is consistent with the existence of surroundsuppression around each target.

As many MOT experiments involve targets and distractors coming very closeto each other, spatial interference likely contributes to many of the errors. Thismay be mediated partly by surround suppression around targets, as well asthe inherent ambiguity regarding which is a target and which is a distractorduring close encounters for any system that has limited spatial resolution. Whenobjects are kept widely separated, it appears that spatial interference playslittle to no role in tracking. Some other factor or factors, such as some sortof attentional resource, is needed to explain the dramatic decline in trackingperformance that can be found with more targets even in widely-spaced displays(Holcombe et al., 2014; Holcombe and Chen, 2012, 2013).

Unitary cognition (C = 1processes)

Successful performance of a multiple object tracking task may be assisted bytwo resources. This worries me greatly.One resource is what researchers believe they are studying. It can process mul-tiple targets simultaneously, even if it processes them more poorly than it pro-cesses a lone target. This is what most researchers are interested in. However,the mind also has a resource that in some ways is more powerful.The processes that support our ability to explicitly reason, referred to as System1 in some contexts in cognitive psychology, can assist performance in many tasks.But they are very limited in capacity - some cognitive researchers think theycan only operate on one thing at a time (Oberauer, 2002). This may be whatprevents us from doing more than one 2-digit mental multiplication problemat a time. When applied to tracking, it means you can apply your powers ofreasoning to tracking that target, for example to use what you’ve learned aboutobject trajectories to predict future positions. I will refer to this as a C=1process because its processing capacity may be only one object.

An inconvenient possibility

A lot of the results trumpeted by tracking papers are rendered uninterestingbecause they could be due to our cognitive abilities operating on a single target,rather than the results speaking to the tracking resource that we can distributeto multiple targets. Many researchers that come out of the experimental psy-chology tradition content themselves with showing that a factor makes somedifference to performance, p < 0.05. Never mind how much the factor mat-ters. But in a task involving tracking several targets, a factor that has only asmall effect could be explained by the unitary C=1 resource working on justone target. As an example of evidence that such processes may contribute tovisual cognition, for a multi-colored shape, Xu and Franconeri (2015) found thatparticipants could only mentally rotate a single part.

31

32 UNITARY COGNITION (C = 1 PROCESSES)

Let’s say that a paper finds that people track multiple objects more accurately ifthey move on predictable trajectories than on unpredictable trajectories. I haveto think that may simply be due to our central thought processes operating onjust one target. Ruling this out requires sophisticated methods, such as showingthat the predictable-trajectory advantage applies independently in each hemi-field, as we will see in Chapter , or that the effect shows some other idiosyncrasyof tracking, such as inability to work with individual locations within a movingobject, as described in Chapter .

In case you are not entirely clear about my point, another way to put it is thatby using our capacity for reasoning and symbol manipulation, we can performa wide array of arbitrary tasks. We therefore should not be surprised by ourability to track a single target. We know that we have a visual system thatmakes the position and direction of motion of objects on our retina available tocognition, and that using our ability to think about where an object is goingand deliberately move our attention to a future anticipated location, we mightmuddle through to success at tracking a single object.

Even when participants are asked to track several targets, then, one can expectthat C=1 processes are contributing to overall performance, even if they areonly involved in the processing of one of the targets. Thus, when researcherscontrast tracking performance with different numbers of targets, one reason forthe decline in performance may be that C=1 processes are, in each condition,processing only a single target, so performance declines in inverse proportion tothe number of targets.

Objects and attentionalspread

In my rather cluttered living room, sunlight streaming through the windowilluminates dozens of objects. Hugo, our family dog, had been napping, butnow stands up and ambles into the kitchen. As he moves, I track him with myattention. To do so, something in my brain has to identify as a single objecta changing set of neurons, representing Hugo’s different parts as they shuffleacross my retina. How is this done?

It starts in the retina, but the processing that occurs there is far from enoughto segment a dog, or almost any object, from my living room background. Pro-cessing in the thalamus and early visual cortex, at the very least, continues thejob. Much, or perhaps all, of this early cortical processing occurs regardlessof where one is attending — in other words, it is pre-attentive. Exactly howextensive this preattentive processing is, and what sorts of representations itresults in, has been studied for years, but much is still not understood (Neisser,1963; Treisman, 1964; Kimchi and Peterson, 2008).

Attentive tracking is typically conceptualized as a process wherein a limited re-source simply selects one or more of the preattentively-created representations.It is unclear, however, whether processing is so neatly divided, with preattentiverepresentations simply selected rather than attention participating in modifyingor even creating the representation that is tracked. A popular view is that at-tending to a location results in binding some of the features there, such as colorand orientation, but attention likely also contributes to figure-ground segrega-tion, which is fundamental to defining an object (Peterson, 2014). As anotherintriguing possibility for the effect of attention, to explain the twinkle-goes illu-sion Nakayama and Holcombe (2021) suggested that attentional tracking couldcause the representation of a moving object to persist after the object has dis-appeared.

Studying the relationship of tracking to preattentive processing can be difficult,but a relevant and straightforward issue is which sorts of stimuli can be trackedand which cannot. However, the initial deployment of attention likely occursmore via a spatial or featural index than through some sort of index of the ob-

33

34 OBJECTS AND ATTENTIONAL SPREAD

jects in the scene. We cannot think “car” or “tree” to ourselves and expect ourattention to immediately deploy to any cars or trees in the scene. In contrast,our ability to deploy attention to the cued, static location is well-established,which very rapidly facilitates perceptual performance for that location and neu-ral activation in the associated parts of retinotopic cortices. We also have theability to deploy attention directly to certain other features, such as color ormotion direction (Sàenz et al., 2003; White and Carrasco, 2011).

As a result of our featural selection capability, if a moving target differs fromdistractors in certain ways, then featural selection can be relied on to keepattention on the moving target. For example, if the targets are the only yellowobjects in the scene, and all the distractors are blue or green, then one canthink “yellow” and that is enough to keep attention on the targets and offthe distractors (see also Chapter ). It is when the targets are identical to thedistractors, or not distinguishable from the distractors by one of the featuresthat feature selection acts on, that a different process is needed to keep attentionon a moving target.

When the targets and distractors are identical, spatial location selection mayinitiate the selection of a target, but if it were the only process operating, whenan object moved, attention would be left behind. A striking characteristic of theexperience of tracking is that the movement of attention along with an objectoften seems to take no more effort than attending to a static object. Indeed,one might say that attention seems to be positively pulled along - when thetargets in an MOT trial begin to move, I have never had the experience of myattention staying behind, remaining at one of the original target locations. Itfeels unnatural to un-latch attention from a target and fix it to the target’scurrent location while the target moves on. This is likely related to the abilityof visual transients to attract attention - a moving object is essentially a rapidsequence of transients along a continuous array of locations.

The term “object-based attention” is sometimes bandied about as an explana-tion of why attention seems to automatically move along with a selected object,the idea being that the units of attentional selection are objects rather thanlocations (Pylyshyn, 2006; Clark, 2009). But no one seems to think that directselection of objects is a thing, in the way that color selection is. That is, onecannot think “chair” and have all the locations of chairs in the scene becomerapidly attended. Selection of chairs, or another object type, seems to requirea search first, based on locations and simpler features. And even color selec-tion may work via location, with thinking of a color resulting in availability ofits locations, and attention then being deployed to those locations (Shih andSperling, 1996).

STATIONARY OBJECT SELECTION 35

Stationary object selection

One notion that fits with these considerations is that attention is deployed firstto a location or locations, but any object present can cause spatial attention tospread throughout that object. This “attentional spread” theory seems to be apopular one, in part due to the evidence from the paradigm I will describe next.

Dozens, possibly hundreds, of published papers have investigated the relation-ship of objects to attention using a cuing paradigm developed by Egly et al.(1994). Egly et al. (1994) presented two static objects (rectangles) and pre-sented a cue on one end of an object or another. They found that the cueresulted in a performance enhancement not only for probes at the location ofthe cue, but also at the cued object’s other end. The comparison condition wasperformance at locations equidistant from the cue but not on the cued object.The findings reported in most of the many follow-up papers similarly find thatparticipants are fastest and most accurate when the stimulus is presented inthe same location as the cue, or on the same object but on a different part ofthat object. I used the qualifier “most” when referring to the follow-up papersbecause some papers did not find this (Davis and Holmes, 2005; Shomstein andBehrmann, 2008; Shomstein and Yantis, 2002; Lou et al., 2020), and a majorconcern is that there may be many more such null findings that ended up inthe proverbial file drawer. Indeed, the effect sizes in the literature are oftenquite small and the studies not highly powered, which can be a red flag thatpublication bias may have created the illusion of a real effect (Button et al.,2013).

Based on the pattern of sample sizes, effect sizes, and p-values in three dozenpublished object-based attention studies, Francis and Thunell (2022) arguedthat publication bias and/or p-hacking in the literature is rife. This is plausiblein part because substantial proportions of researchers in psychology and otherfields admit to such practices (John et al., 2012; Rabelo et al., 2020; Chin et al.,2021). Francis and Thunell (2022) further point out that the only previously-published study with a large sample (120 participants) found a non-significanteffect, of only a 6.5 ms response time advantage (Pilz et al., 2012), and inFrancis et al.’s own study with 264 participants, the effect was also quite small,at 14 ms. For an effect of this size, Francis et al. calculated that the samplesizes typically used in the published literature were unlikely to yield statisticalsignificance without some help from p-hacking or another questionable researchpractice. As a result, many papers in the literature make conclusions aboutobjects and attention based on results that unfortunately cannot be trusted.

Publication bias and p-hacking are less of an issue when the effects being stud-ied are large, because in those cases studies are more likely to be adequatelypowered, resulting in fewer false positives and false negatives. Some effects areso large that spending a period of seconds looking at a display is enough toconvince oneself that an effect is real. Fortunately, those large effects includesome that relate to how variation in objects affects tracking, as we will see in


the next section.

The end of the line

Many objects, such as the letter ‘T’, have salient parts. While normally wethink of ‘T’ as a single object, we can also see that it is made up of a horizontalsegment and a vertical segment. In conscious awareness, then, we have accessto both the whole object level and to an individual parts level. You are able tofocus attention on individual bits of the vertical segment, even though there areno visual characteristics that differentiate it. But what kind of representation(s)does our object tracking processes operate on?

In early visual cortex, different populations of neurons respond to the the hor-izontal and to the vertical stroke of a ‘T’. But having neurons that respond toa thing does not suffice to be able to track that thing, as tracking operates ononly some sorts of representations. Scholl et al. (2001) shed light on this byasking participants to try to track the ends of lines. The notion that one end ofan undifferentiated shape (such as a line) is an object is somewhat unnatural,but it can be useful. When paying attention to someone holding a rifle, forexample, it may be important to continuously monitor the location of the frontof its barrel.

Four moving lines were presented in the Scholl et al. (2001) study, with one endof each line designated as a target. At the end of the trial, the lines stoppedmoving and participants were to click with a mouse on the line ends that weretargets. During a trial, each line grew, shrank, and rotated as each of its endswandered about the screen randomly.

Figure 7: A schematic of the display used by Scholl et al. (2001)

The results were striking. Performance on the task was abysmal relative to acontrol condition in which the two ends of the line were not connected. Simplyby viewing an example trial, one very quickly gets a sense of how difficult thetask is - the effect is very large.

THE END OF THE LINE 37

Figure 8: Using this display, Scholl et al. (2001) asked participants to track theend of each of several lines.


The task of tracking line ends in the Scholl et al. (2001) experiment was com-plicated by the fact that the objects frequently crossed over each other, andalso their length changed over time. But Howe et al. (2012) showed that thesecomplications were not the main reason for the poor performance. It simply isthe case, it seems, that one cannot confine one’s tracking processes to one bit ofan undifferentiated object. This inability to track the ends of lines fits in withthe view this chapter opened with, that preattentive processes define objectsthat become what tracking operates on.

Maintaining attention on a part of the visual scene in the absence of anythingin the image to delineate that part feels like it requires concentration, as if wemust continually think about what we are supposed to be attending to. If lotsof cognitive resources are needed to maintain the “object” representation whenit is not provided by preattentive processes, then for such objects may only beable to track one. This idea of C = 1 (capacity of one) processes being involvedor required for some forms of tracking was introduced in Chapter .

Object creation and object tracking: Distinctprocesses?

Researchers usually make a strong distinction between the processing that de-termines how many objects one can track and those that determine what kindsof objects can be tracked. An assumption of separate processing stages is pop-ular in the study of visual cognition generally. Visual search, for example, isusually conceptualized this way (Wolfe and Bennett, 1997; Nakayama et al.,1995), and appears to be an implicit assumption in two reviews of objects andtracking (Scholl, 2001; Pylyshyn, 2006). It would be convenient if object cre-ation and object tracking occurred at distinct processing stages, as that is morestraightforward to study than an interactive system (Simon, 1969; Sternberg,1969).

In accord with the above diagram, there certainly is evidence that trackingis high-level, for example Maechler et al. (2021) found evidence that trackingoperates on perceived object position rather than more low-level representationof position. Nevertheless, attention and object creation may be interactive. Forexample, the way stimulus elements are organized by attention can determinewhat illusory contours are created and perceived, as well as the lightness anddepth that is perceived (Harrison and Rideaux, 2019; Harrison et al., 2019; Peter,2005). Our ability to perceive the complex motion of a human body from onlyseveral points of light highlights that object perception can involve hierarchicalmotion segmentation that reflects an interaction between Gestalt grouping andtop-down knowledge of the overall shape of objects and the relative motionpattern of their parts (Johansson, 1973; Wang et al., 2010) (see the groupingChapter ()).

OBJECT CREATION AND OBJECT TRACKING: DISTINCT PROCESSES?39

Figure 9: A schematic of the idea that objects are created prior to the actionof tracking processes, which then point to the already-formed object represen-tations but do not change them.

Using a paradigm based on the attentional spread literature reviewed above,Ongchoco and Scholl (2019) asked participants to practice “imagining” a shapein a uniform grid of lines until they felt they could actually see the shape,which happened fairly readily. The detection of flashed probes was enhancedfor those presented on the same imagined object, compared with equidistantones presented on different objects. In summary, a variety of evidence suggestsa role for neural feedback in object segmentation, with some role for attention,but the extent of its importance remains unclear (Papale et al., 2021; Wyatteet al., 2014; Harrison and Rideaux, 2019).

Potentially, the same attentional resources that mediate tracking may also con-tribute to the creation of object representations. One consequence would bea trade-off between the involvement of attention in constructing object repre-sentations and the number of objects that can be tracked. Informal experiencewith tracking the line ends in the Scholl (2001) display seems to support this.If when you watch the SCHOLL MOVIE, you concern yourself with keepingtrack of the end of only one object, you are likely to succeed. But recall thatit is difficult or impossible to accurately track four object ends - indeed, Scholl(2001) found that participants’ performance was approximately that predictedif they could track one line end, but not more.

Possibly the ability to track only one line end is a result of consuming all of one’stracking resources to create a representation of the end of the line as like a singleobject. An important alternative, however, is that tracking the end of a line hasuses a C=1 process rather than the tracking processes that allow us to track


more objects in most circumstances. This would mean that covert tracking ofmultiple objects is qualitatively different from covert tracking of a single object.Because the participants in the Ongchoco and Scholl (2019) study imagined onlya single object, it is possible that their results reflect a C=1 process rather thanthe processes we use to track multiple objects, like the mental rotation study ofXu and Franconeri (2015).

What tracking sticks to

Even when all of our cognitive resources are brought to bear on a single en-tity, some kinds of entities still can’t be tracked. Anstis and Ito (2010) askedparticipants to track the intersection of two shapes moving in a configurationthat elicits the “chopsticks illusion” Anstis (1990). A horizontal and verticalline slide over each other in the chopsticks illusion, with each line following aclockwise circular trajectory. Viewers perceive the intersection of the two linesto also be moving clockwise (demo here), but in fact the intersection movescounterclockwise only. This error may be, in part, a failure of object trackingbecause if participants had been able to attentionally track the intersection,they should have been able to judge its trajectory. Anstis (1990) also foundthat participants could not accurately pursue the intersection with their eyes.

The true counterclockwise trajectory of the intersection becomes obvious per-ceptually if one views the display through a window so that the ends of the linesare occluded rather than visible, and in that condition participants were ableto smooth pursue the intersection accurately, in a clockwise direction. So theproblem is a likely pre-attentive interpretation of the motion that we cannotovercome. Anstis (1990) suggested that the reason that the intersection is per-ceived to move in the wrong direction ordinarily is because the clockwise motionof the ends of the lines is mistakenly assigned to the intersection, similar to howthe motion of the ends of lines can veto the barber-pole illusion. The illusionis an example of a failure to track a rather simply-defined point. Evidently,tracking cannot operate on that point, possibly because before the operation oftracking, a particular interpretation of motion and form is created. As schema-tized by the figure above, some processing of motion and form appears to occurprior to the operation of tracking.

As we saw in the “The ends of the line” section above, maintaining the rep-resentation of an undifferentiated part of an object is not something that ourmultiple object tracking resources are capable of. What sort of differentiationis needed? Scholl et al. (2001) varied how distinct the end of an object was.In a “dumbbell” condition, the object used were two squares connected by aline. In that condition, participants’ accuracy was lower than a standard sepa-rated squares condition, but not statistically significantly so - any detriment totracking appeared to be small, suggesting that participants are able to track adumbbell end. However, Howe et al. (2012) tested a dumbbell condition that

http://anstislab.ucsd.edu/illusions/chopsticks-illusion/

GROWTH, SHRINKAGE, AND TRACKING 41

Figure 10: Some stimuli used by Howe et al. (2012). CC-BY

was rather similar to that of Scholl et al. (2001), but they found performancewas substantially lower than when the objects (discs in their case) were notconnected. The reason for the discrepancy is not clear, and it appears it couldreflect the noisiness of the data of the two studies. Howe et al. (2012) alsotested a “luminance” condition, pictured above, and found that performance(80% correct) was substantially lower than their baseline condition (96% cor-rect), although not as low as for undifferentiated bar ends (72% correct). Theywere surprised that the clear difference in luminance between the targets andthe connector in the luminance condition was not enough to keep tracking frombeing so adversely affected by the connectors.

These results suggest that multiple object tracking uses a different segmenta-tion of objects than what is available to us when we focus our attention on asingle object. The findings have some similarity to those found in conjunctionvisual search studies. Wolfe and Bennett (1997) asked participants to searchfor conjunctions of features, such as red and vertical. If the vertical red partof an object were physically connected to a horizontal and green part, thenparticipants were much slower to find the red vertical target segment in the dis-play, among the green vertical and red horizontal distractors. In other words,it seemed that physically connecting one feature to another lumped it togetheras an undifferentiated collection of features from the perspective of search pro-cesses, what Wolfe and Bennett (1997) termed a “preattentive object file”. Noresearcher seems to have tested displays of this nature for both tracking andsearch , but for now the parsimonious account has to be that multiple objecttracking and search operate on the same object representations.

Growth, shrinkage, and tracking

Some objects and substances change shape. When one opens a faucet in akitchen, for example, a jet of water shoots into the sink, and flattens on thesink’s bottom as it expandings into a puddle. As one pours beer into a glass, afroth forms, which gradually thickens as the top of the liquid rises. These are


examples of non-rigid motion, specifically substances that change shape as theymove.

To investigate tracking of non-rigid substances, VanMarle and Scholl (2003)devised an object that moved a bit like an inchworm or a slinky. In a conditionI’ll refer to as the “slinky” condition, each object began as a square. It wouldthen move by extending its leading edge until it had the shape of a long andthin rectangle. Subsequently, the trailing edge of the slinky, which was still atits original location, would move forward until the slinky was a square again,now entirely at a new location. VanMarle and Scholl (2003) found that slinkytracking performance was very poor.

What is it about tracking that causes such difficulty with slinkys? Howe et al.(2013) tested a number of conditions that seemed to rule out possibilities suchas the faster speed of the slinky’s edges relative to the control conditions. Scholl(2008) suggested that the reason the slinky was difficult to track was, as they putit, “there was no unambiguous location for attention to select on this shrinkingand growing extended object” because “each object’s location could no longerbe characterized by a single point” (p.63). There may be something to this,but it is not entirely clear what is meant by an object’s location not beingcharacterizable by a single point. The objects typically used for MOT, discswith no internal features, also have no unambiguous internal locations, becausetheir insides are a completely undifferentiated mass. If one wishes to refer to asingle point for their location, their centroid could be used, but this seems justas true for an object changing in size and shape. Additionally, in the chopsticksillusion reviewed above, the target was defined by a single point (the intersectionof two lines), yet it could not be tracked.

While the reason or reasons that tracking a slinky is difficult remains obscure,it seems that object expansion and contraction disrupt both tracking and lo-calization. After Howe et al. (2013) replicated the tracking findings of Scholl(2008), they probed the effect of size changes on localization. Participants werepresented with a rectangle for 200 ms at a random location on the screen, andwere asked to click on the location of the center of the rectangle. In a baselinecondition, the rectangle did not change in size, shape, or location during its 200ms presentation. In the size-change condition, the length of the object increaseddue to expansion for half of the interval and shrank due to contraction duringthe other half. Participants’ localization errors were about 14% larger in thischanging-size condition. This appeared to be driven by errors along the axisof the object’s expansion and contraction, as errors in the orthogonal directionwere not significantly different from the baseline condition.

The substantial localization impairment documented by Howe et al. (2013) maypossibly be the cause of the poor performance during MOT. However, that isstill not clear. An important next step is to measure localization errors whenthe task is to monitor multiple objects changing in size rather than just one. Ifthe localization deficit caused by change in size worsens with object load, thiswould help implicate the processes underlying tracking .

COULD TRACKING WORK BY ATTENTIONAL SPREADING? 43

Could tracking work by attentional spreading?

The relationship between object representations and tracking schematized inFigure 9 suggests that attention selects entire object representations. It hasbeen suggested, however, that the process instead begins with selection of aparticular location, which then spreads up to the edges of the object. A gradualgrowth of the area of attentional activation to encompass an entire object hasbeen observed neurophysiologically in certain tasks (e.g., Wannig et al., 2011).

Although I have not seen anyone discuss this possibility, the spreading of at-tention may actually contribute to the ability to track moving objects. Whenan object moves, its leading edge will occupy new territory while its trailingedge continues to occupy an old location. If spreading of attention up to objectboundaries continually occurs, then attention should spread to the new locationsnear the leading edge. In such a fashion, attention could, by continually expand-ing to the new location of a leading edge and contracting with a trailing edge,stay on a moving object. Spreading of activation has been documented neuro-physiologically in V1 of rhesus macaques given the task of evaluating whethertwo points are on the same curved line segment (Roelfsema et al., 1998).

Presenting probes during a task of tracking multiple lines, Alvarez and Scholl(2005) found that probes presented at the center of objects were detected muchmore accurately than end probes, suggesting that attentional resources wereconcentrated on the centers of the lines. The spreading account seems to insteadpredict that accuracy would be highest near the trailing end of an object. Itappears, however, that the researchers did not analyze the data to check whetherof the two object ends, accuracy was higher for probes at the trailing end.Clearly, much more work is needed to reveal the nature of attentional spreadwhile an object moves and any role that has in facilitating tracking.


Grouping

Carving the scene into objects is not the only segmentation task conductedby our visual systems. We also perceive groups, as defined by a variety ofcues. Can tracking follow entire groupings of unconnected objects? Alzahabiand Cain (2021) used clusters of discs as targets and as distractors. Theseclusters maintained a constant spatial arrangement as they wandered about thedisplay. Participants seemed to do well at tracking these clusters. However,these researchers did not rule out the possibility that participants were trackingjust one disc of each cluster, and I am not aware of any work providing strongevidence that a tracking focus can track an entire group.

Yantis (1992) hypothesized that in MOT experiments, participants track animaginary shape formed by the targets, specifically a polygon whose verticesare the target positions. Progress has been slow in understanding whether allparticipants do this or just a minority do, and in what circumstances. Merkelet al. (2014) found a result that they took as evidence that some participantstrack a shape defined by the targets while others do not. In their task, at theend of the trial when the targets and distractors stopped moving, four of theobjects were highlighted and the task was to press one button if all four weretargets (match), and to press a different button otherwise (non-match). Errorrates were lowest when none of the objects highlighted were targets, and errorswere progressively more common as the number of highlighted objects that weretargets increased. This was unsurprising. For trials where all four of the high-lighted objects were targets (match), however, error rates were much lower thanwhen only three were targets (a non-match). Merkel et al. (2014) suggestedthat this reflected a “perceptual strategy of monitoring the global shape con-figuration of the tracked target items.” They went on to split the participantsbased on whether they had a relatively low error rate in the match condition,investigated the neural correlates of that, and made various conclusions aboutthe neural processing that underlies virtual shape tracking.

The inferences of Merkel et al. (2014) are based on the split of participants basedon low error rate in the match condition compared to the condition where none ofthe highlighted objects match. The idea seems to be that if participants weren’tusing a shape tracking strategy, error rates would steadily increase from thetrials where none of the highlighted objects were targets to the trials where all

45

46 GROUPING

of the objects highlighted were targets. However, there are other possible reasonsfor this pattern of results. One aspect of the Merkel et al. (2014) experiment isthat one of the two response choices (non-match) was the correct answer in 80%of trials. Because participants were not told that, some of them surely expectedthat they should press the two buttons approximately equally often. By pressingthe “match” button more often (close to 50% of trials) than the appropriate 20%of trials, they would artificially have the relatively low error rate for the full-match condition that was observed. Merkel et al. (2014) mention the possibilityof a response bias but suggest that this would result in the opposite (a higherror rate) to what I have suggested. The problem seems to be that they didn’tconsider that the participants may have expected the match stimulus to occurmore often than it did.

A response bias is not the only alternative to the virtual grouping account ofMerkel et al. (2014). The low error rate in the full-match condition might alsooccur for other reasons. Imagine that a participant tracked only one target andchecked that target at the end of each trial for whether it was highlighted. Ifthat one target is not highlighted, the participant presses the non-match button,otherwise they press the match button. For such a participant, the chance ofgetting the answer wrong when none of the probed objects are targets is zero.It is higher when one of the probed objects is a target (25% if the participanttracks one target perfectly on every trial and makes no finger errors), and stillhigher when two or three probed objects is a target. When three probed objectsare targets, in three out of four cases, one of them is the target the participanttracked, so the participant frequently gets it wrong. But when all four of theprobed objects are targets, the participant will always respond correctly. Thus,a low error rate in this all-targets-probed condition that is very close to the errorrate in the no-targets-probed condition can be a sign of a participant who onlymonitors one target. But Merkel et al. (2014) interpreted this result as insteadmeaning that the participant was tracking a virtual shape formed by all four ofthe targets.

Hierarchical relations

In the real world, the movement in a scene relative to our retinas is rarely asindependent as the movement of the objects in a typical multiple object trackingexperiment. Often there is a strong motion element throughout the visual fieldcreated by the movement of the observer, and recovering true object movementlikely requires detecting deviations from that (Warren and Rushton, 2007). Evenwhen the observer and their eyes do not move, hierarchical motion relationshipsare common. When one views a tree on a windy day, the largest branches swayslowly, while the smaller limbs attached to the larger branches move with thelarger branches but also, being more flexible and lighter, have their own, morerapid motion.

EYES TO THE CENTER 47

These aspects of the structure of the visual world may be one reason that ourvisual systems are tuned to relative motion (Tadin et al., 2002; Maruya et al.,2013). When we see a wheel roll by, we experience individual features on thewheel as moving forward, reflecting the global frame of the entire wheel, butalso as moving in a circle, reflecting the motion relative to the center of thewheel.

This decomposition of a wheel rim’s movement is so strong that people sys-tematically mis-report the trajectory of the points on the wheel (Proffitt et al.,1990). The red curve in the animation below reveals that a point on a rollingwheel traces out a curve that involves up, down, and forward motion, but nobackward motion. The trajectory reported by participants is very different andtends to include a period of backward motion.

\begin{figure} \caption{The red curve is that tracedout by a point on a rolling wheel, by Zorgit https://commons.wikimedia.org/wiki/File:Cycloid_f.gif} \end{figure}

Bill et al. (2020) varied the structured motion pattern of the moving discs of anMOT task to show that hierarchical relations are extracted and used to facilitatetracking. This should be studied further. For example, the attentional demands,if any, of such hierarchical motion decomposition has not been explored much.Thus it remains unclear to what extent the hierarchical relations are calculatedby the application of tracking or other attentional resources, versus trackingoperating on a representation of hierarchical relations that was determined pre-attentively.

Eyes to the center

The human visual system represents scenes as more than just a collection ofobjects. It performs a rapid global analysis of visual scenes, giving us summaryinformation sometimes referred to as “ensemble statistics” (Alvarez and Oliva,2009). One such ensemble statistic is the location of the center or centroid ofthe objects. This is useful for eye movement planning, among other things —to monitor a group of objects, it is helpful to look at the center of the group, asthat can minimize how far into peripheral vision the objects are situated.

Zelinsky and Neider (2008) and Fehd and Seiffert (2008) independently reportedthat during multiple object tracking, the eyes of many participants frequentlyare directed at blank locations near the center of the array of targets, in addi-tion to participants sometimes gazing directly at targets. This finding has beenreplicated by subsequent work (Hyönä et al., 2019). The nature of the centralpoint that participants tend to look at is not entirely clear. Researchers have

https://commons.wikimedia.org/wiki/File:Cycloid_f.gif

https://commons.wikimedia.org/wiki/File:Cycloid_f.gif

48 GROUPING

suggested that it may be the average of the targets’ locations, or the averagelocation of all the moving objects (both targets and distractors). Another possi-bility that has been investigated is that participants tend to look at the centroidof the shape formed by the targets, which recalls the Yantis (1992) hypothesisthat what is tracked is the shape defined by the targets. Lukavskỳ (2013) intro-duced the idea of an “anti-crowding point”, which minimizes the ratio betweeneach target’s distance from the gaze point and distance from every distractor.The idea was that participants move their gaze closer to a target when it is neara distractor to avoid confusing targets with distractors. Note, however, that theLukavsky metric does not take into account the limited range of the empiricalcrowding zone, which is about half the eccentricity of an object.

In a comparison of all these metrics against the eyetracking data, Lukavskỳ(2013) found that the anti-crowding point best predicted participants’ gaze inhis experiment, followed by the average of the target locations. These pointsboth matched the data better than the centroid of the targets. This underminesthe Yantis (1992) hypothesis that a virtual polygon is tracked, and the findingof best performance for the anti-crowding point is consistent with other resultsthat participants tend to look closer at targets that are near other objects (Vateret al., 2017; Zelinsky and Todor, 2010).

More work must be done to understand the possible role of the anti-crowdingeye movement strategy suggested by Lukavskỳ (2013). Spatial interference doesnot seem to extend further than half an object’s eccentricity, in both staticidentification tasks (Pelli and Tillman, 2008, Gurnsey et al. (2011)) and multipleobject tracking (Holcombe et al., 2014), but the anti-crowding point devised byLukavskỳ (2013) does not incorporate such findings. Its performance couldbe compared to a measure that is similar but excludes from the calculationdistractors further than about half an object’s eccentricity.

Two brains or one?

A human brain has two halves, a left and a right, that are anatomically con-nected, but there are fewer cross-hemisphere connections than there are within-hemisphere connections. Much of sensory and perceptual processing runs ratherindependently in the two halves of the cortex, but more cognitive functions suchas declarative memory benefit from a very tight integration. This integration isextensive enough that the comparison of our two hemispheres to our two handsor our two legs is misleading.

Our conscious experience, too, is quite unified. We experience no discontinuitywhen the movement of our eyes, or of an object, cause an object to shift fromone hemifield, where it is processed predominantly by one hemisphere, to theother hemifield. Communication between the two hemispheres happens rapidlyand continuously. Despite the claims of salespeople who prey on well-meaningschools and parents, there is no good evidence that exercises designed to insureboth hemispheres process stimuli have any benefit for learning.

In “split-brain” patients, many of the connections between the hemispheres havebeen lost. Yet such patients can still perform tasks such as visual search in bothhemifields, suggesting that both hemispheres have the mechanisms needed todo such tasks. When split-brain patients are asked to search for a target amongmany distractor objects, spreading the load by distributing the distractors acrossthe two hemifields can yield a large benefit, suggesting that the two hemispheresin these patients carry out their searches independently (Luck et al., 1994).For intact individuals, no such advantage is seen, suggesting that in a normalbrain, the processes that evaluate each stimulus for whether it is the target areintegrated across the hemispheres into a single attentional focus (Luck et al.,1989).

Although the hemispheres show tight integration during many tasks, each hemi-sphere does specialize in certain functions. The left hemisphere has greatercompetence in language functions such as reading, while the right hemisphere isbetter at recognizing faces. One behavioral consequence is that response timesfor a face recognition task are slightly faster when the stimulus is presentedwholly in the left hemifield (to the right hemisphere) than when it is presentedwholly in the right hemifield, whereas the opposite is found for word reading

49

50 TWO BRAINS OR ONE?

(Rizzolatti et al., 1971). With extended time to process a stimulus, however,such behavioral asymmetries can disappear, because eventually the informationfrom one hemisphere gets communicated to the other.

From the performance of most perceptual and attentional tasks, then, intypically-developing humans there is little overt sign that the brain is dividedinto two halves. Multiple object tracking, however, is a major exception tothis. The pattern of performance found indicates that the limited resourcethat determines how many objects one can keep track of resides largely with aprocess that operates independently in the two hemispheres.

The extraordinary hemifield independence of ob-ject tracking

In 2005, George Alvarez & Patrick Cavanagh reported a stunning finding fromtheir MOT task. Using objects that resembled spinning pinwheels, as a targetthey designated an individual bar of a pinwheel. Performance in a two-targetcondition (the targets were one bar of each of two different pinwheels) wascontrasted with that for a one-target condition (Alvarez and Cavanagh, 2005).When the second target was positioned in the same hemifield as the first target,accuracy in the two-target condition was much worse (89% vs. 63%). Remark-ably, however, when the second target belonged to a pinwheel located in theopposite hemifield, there was very little performance decrement - accuracy was93% in the one-target condition, and 90% correct in the two-target condition.This suggests that the processes that limit successful tracking in this task arelargely specific to each hemifield.

It was already known that sensory processing and quite a lot of perceptualprocessing occurs independently in each hemisphere. What is somewhat sur-prising is that a higher-level, limited-capacity process would be hemisphere-independent. Such capacities were traditionally thought to be among the pro-cesses that are tightly integrated across the two hemispheres, forming a singleresource “pool”, not two independent limits. We will get back to this point,first we’ll examine more extensively the evidence for hemispheric independenceof object tracking.

Quantitative estimates of independence

The hemispheric independence of a task can be quantified. Imagine that addinga second stimulus to a hemifield reduces performance by 20 percentage points,but adding that stimulus to the other hemifield reduces performance by only5 percentage points. One can quantify the hemispheric independence, then, as(20-5) / 20 = 75% hemifield independence. Ideally, however, one would not

QUANTITATIVE ESTIMATES OF INDEPENDENCE 51

use raw accuracy but instead would correct for the accuracy one can achieve byguessing. When applying such a calculation to the Alvarez and Cavanagh (2005)results, the estimated level of independence is very high: 88% independence inone of their experiments, and 92% in the other.

Alvarez and Cavanagh (2005) themselves, like other researchers who have in-vestigated this question, did not do these calculations. Alvarez and Cavanagh(2005) calculated expected performance if the hemifields are in fact completelyindependent, and reported that performance was not statistically significantlyworse than that figure. They then suggested that there was complete indepen-dence, but this is based on the common fallacy of concluding a null hypothesisis true when the evidence does not reject it at a p<.05 level (Aczel et al., 2018).That is, for the null hypothesis they inappropriately started with their conclu-sion (complete independence), and then affirmed this conclusion on the basis ofnot finding much evidence against it. Nevertheless, their data do suggest hemi-spheric independence of approximately 90%. In a study with similar methods,Hudson et al. (2012) found 65% independence (they did not calculate a number,so this is my calculation).

Some of the follow-up studies in this area have not included enough conditionsto quantify the degree of independence, or confounded distribution of the targetsto two hemifields with greater distance among them, such that any benefit mighthave been due to less spatial crowding interference, a phenomenon discussed inChapter .

Holcombe and Chen (2012) and Chen et al. (2013) also found evidence for ahigh degree of independence, using a slightly different approach based on speedthresholds. The findings were compatible with approximately 100% hemifieldindependence or a bit less. Shim et al. (2010) and Störmer et al. (2014) alsofound evidence for a substantial bilateral advantage compared to adding a targetin the same hemifield.

These findings of hemispheric independence have not replicated in all circum-stances (e.g., Shim et al., 2008) , but the successful replications strongly suggestthat at least in some circumstances, tracking does occur mostly independentlyin the two hemispheres. I say “mostly independently” rather than suggestingcomplete independence because each individual study has too much statisticaluncertainty to rule out a figure such as 75% independence, even when the pointestimates I’ve reported above suggest a higher degree of independence.

Shim et al. (2008) suggested that the reason they did not find evidence forhemifield independence is that they used only two targets, whereas according tothem the original Alvarez and Cavanagh (2005) report of hemifield independenceused four targets. This is unlikely to be the reason for the discrepancy, however,because in their E1 and E2 Alvarez and Cavanagh (2005) did find evidence forhemifield independence using just two targets, as did Holcombe and Chen (2012)and Störmer et al. (2014). The Shim et al. (2008) data may have been afflictedby a ceiling effect, as accuracy was over 85% correct in all conditions in their

https://github.com/alexholcombe/tracking-review/blob/main/twoBrainsOrOne.Rmd



experiment.

A limitation of deriving hemispheric independence from accuracy is that theydepend on the assumption that if a person can only track one target, in acondition where the person is also trying to track a second target, the person willsucceed just as often in tracking one of the two. My introspective experience,however, indicates that in some circumstances, trying to track both targetscauses one to fail at both, and thus one is better off only trying to track one.The reason for this may be that a particular threshold amount of resource isneeded to track a target, and so if neither target is allocated that much resource,tracking will fail for both. Evidence for this was provided by Chen et al. (2013).In the terminology of the Norman and Bobrow (1975) framework introduced inChapter , the resource function that relates attentional resource proportion toaccuracy falls below a straight line. This means that quantitative estimates ofhemispheric independence will be overestimates, particularly in circumstanceswhere the participants do not realize they may be better off focusing their effortson tracking fewer targets than the number they have been told to track .

Carlson et al. (2007) found evidence not only for hemifield independence but alsoquadrant-level independence, which they attributed to the partial separation ofthe retinotopic quadrant representations in areas V2 and V3. Using differentstimuli, Shim et al. (2008) and Holcombe et al. (2014) did not, however, findevidence for quadrantic independence. Specifically, they did not find evidencefor a deficit when two targets were positioned in the same quadrant compared todifferent quadrants but in the same hemifield. More work on this topic shouldbe done.

Some tracking resources are NOT hemifield-specific

One attentional process that is not hemifield-specific is feature attention, forexample attention to color. When a participant is told to look for a red target,they are able to use feature attention to enhance all red objects, no matter wherethey are in the visual field (White and Carrasco, 2011). The decision to look forred originates with cognitive processes and remains hemifield-unified rather thanhemifield-specific at the level of visual cortex (Saenz et al., 2002). Indeed, peopleseem to be unable to confine the enhancement of red objects to one hemifield(Lo and Holcombe, 2014). In real-world tracking where objects are at leastsomewhat heterogeneous and thus targets often have a different average colorand other features than distractors, feature attention will facilitate tracking,and this facilitation is not hemifield-specific.

A previous Chapter () introduced the idea of C=1 cognitive processes that cansupport tracking of a single target but perhaps not multiple targets. Such pro-cessing likely is not hemisphere-specific, being aligned with “central executive”

THE UNDERLYING MECHANISMS 53

processes that integrate processing in both hemispheres.

Chen et al. (2013) found evidence for both hemifield-specific tracking processesand processes not specific to a hemifield, operating in the same MOT task. Twotargets were used, and on some trials they moved at different speeds. Whena slow-moving target was paired (presented in the same trial) with a speediertarget, accuracy was lower for the slow-moving target than if it was paired witha target that was slower. This suggests that participants allocate more trackingresources to the faster of two targets, presumably because slower targets do notrequire much resource to track well. This trade-off was most pronounced whenthe two targets were in the same hemifield, but seemed to occur to some degreeeven when the two targets were in different hemifields, implicating a cross-hemifield resource that plays a small role. This cross-hemifield resource may bea C=1 process. Furthermore, as discussed in the next section, perturbing oneparietal lobe can affect performance in both hemifields, which suggests that eachhemisphere can in some circumstances mediate tracking in either hemifield.

The underlying mechanisms

The evidence reviewed above for hemifield independence suggests thathemisphere-specific processes determine how many targets one can track. Thisraises the question of what sort of processes those are, and how they interactwith the cognitive processes that are more integrated across the hemispheres.

Steve Franconeri and colleagues have championed the idea that the hemisphereindependence stems from spatial interference processes, by suggesting that theseprocesses occur largely within a hemisphere (Franconeri, 2013). The idea isthat when when an object is tracked, the neurons representing that target inretinotopic cortical areas activate inhibitory connections to nearby neurons inthe cortical map, suppressing the responses to neighboring objects. To explainthe findings of hemifield specificity, an additional detail of the account is theidea that the inhibitory neural connections do not extend from one hemisphere’sretinotopic map to another (Franconeri et al., 2013a). This is plausible becausein classic crowding tasks, spatial interference does show a discontinuity acrossthe left- and right-visual field boundary lends some plausibility to this idea (Liuet al., 2009). However, Holcombe et al. (2014) found evidence against spatialinterference extending any further than the classic crowding range, which is onlyhalf the eccentricity of an object - for example, an object placed six degrees ofvisual angle from where the point the eyes are looking at would be interferedwith only by other objects closer to it than three degrees of visual angle (Bouma,1970). Because in most studies of hemifield independence, the stimuli are notclose to the vertical midline, their would not be an artifactual finding of hemifieldindependence due to less crowding across the midline. The more viable theoryof hemifield independence, then, is that of two neural resources that span eachhemifield.


A number of studies have found that the activity of some parietal and frontalareas of cortex increase steadily with the number of targets in MOT (Culhamet al., 2001; Howe et al., 2009; Jovicich et al., 2001; Alnaes et al., 2014; Num-menmaa et al., 2017), consistent with the importance of a pool of attentionalresources. Unfortunately, these studies did not focus on the extent to whichthese activations are specific to target load within a hemifield, so we cannotbe sure whether the brain activation measured reflects the hemifield-specific re-source or a more global resource. The only imaging study I am aware of thatinvestigated the issue is Shim et al. (2010) who found an activation differencewhen the objects designated as targets were in opposite hemifields compared towhen they were in the same hemifield. This was found for the superior parietallobule and transverse parieto-occipital area, suggesting that they may be partof the hemifield-specific resource. The difference was not found for the anteriorintraparietal sulcus, which could mean its activation reflects a global resource .

Störmer et al. (2014) used EEG to investigate the hemifield-specific resource.They found that the SSVEP activation for targets was higher than that fordistractors, especially when the two targets were positioned in different (left andright) hemifields. In contrast, an ERP component known as the P3 thought toreflect more cognitive identification and decision processes was similar in thetwo conditions. This is consistent with the theory that tracking depends onboth hemisphere-specific attentive processing followed by some involvement ofhigher-order processes that are not hemisphere-specific.

Battelli et al. (2009) found they could disrupt MOT performance in a hemi-field by stimulating the contralateral intraparietal sulcus (IPS) using repetitivetranscranial magnetic stimulation. Importantly, this only occurred when themoving targets were present in both hemifields. When the targets were all inthe left or all in the right hemifield, TMS to the left or to the right IPS hadno effect on tracking accuracy, and they replicated this in a second experiment.These findings bring to mind the competition between the two hemifields ev-ident in the “extinction” symptom observed in parietal neglect patients. Inextinction, responding to stimuli in the hemifield contralateral to parietal in-jury only shows significant impairment if there are also stimuli presented tothe ipsilateral hemifield. This inspired Battelli to explain their findings withtwo propositions. The first is that the IPS in each hemisphere can mediate thetracking of targets in either visual hemifield. The second is that under normalconditions, inhibitory processes reduce the amount of ipsilateral processing byeach IPS, causing tracking capacity to effectively be hemifield-specific.

A rather complicated relationship of the hemispheres is also suggested by evi-dence from patients. Battelli et al. (2001) found that in patients with damageto their right parietal lobe, MOT performance only in the left visual field wasimpaired relative to control participants. Evidently the right parietal lobe doesnot normally mediate tracking in the right visual field, so losing it did not hurtright visual field tracking performance. For another task, however, these rightparietal patients had substantial impairments in both hemifields. Impairment

WHAT ELSE ARE HEMIFIELD-SPECIFIC RESOURCES USED FOR? 55

on that task, an apparent motion task, is believed to be a result of a deficit forregistering the relative timing of visual events. The involvement of the rightparietal lobe, but not the left parietal lobe, in judging the temporal order ofstimuli in both hemifields was further supported in an additional study withboth patients and with TMS applied to intact brains (Agosta et al., 2017).

In summary, while there is evidence that each parietal lobe is involved in field-wide processing for some tasks, parietal areas also likely mediate the hemifieldindependence evident in some circumstances. Using ERP, Drew et al. (2014)found evidence that when a target crosses the vertical midline, say from the leftto the right hemifield, the left hemisphere becomes involved shortly before thetarget reaches the right hemifield, and the right hemisphere remains involved fora short time after the crossing. Because this was modulated by predictability ofthe motion, it did not appear to be wholly mediated by the well-known overlapof the two hemispheres’ receptive fields at the midline. This phenomenon mayreflect the normally-inhibited ipsilateral representation of the visual field byparietal cortices highlighted by Battelli et al. (2009), although this remainsuncertain as the location the ERP signals originated from was not clear.

Regarding the cooperation between the hemifields necessary to keep trackinga target that travels from one hemifield to another, both Strong and Alvarez(2020) and Minami et al. (2019) found evidence for a tracking performance costwhen a target in MOT crossed the vertical midline. Evidently the handoff ofcontrol from one hemisphere to another is somewhat inefficient. Saiki (2019)also found some evidence in a memory paradigm that when two objects movedbetween hemifields, memory for their features was more disrupted than whenthey moved fron one to another quadrant within the same hemifield. Strongand Alvarez (2020) found no cost when targets moved between quadrants whileremaining within a hemifield, an important finding given that other work raisedthe prospect of quadrant-specific resources (Carlson et al., 2007).

In summary, areas of the parietal cortex may subserve both the hemifield-specifictracking resource that dominates most MOT tasks and the possible resource thatis not specific to a hemifield. More work must be done, however, to determinethe role of frontal lobe regions. Such regions could potentially play a role in thehemifield-specific resource, the hemifield-independent resource, or both.

What else are hemifield-specific resources usedfor?

Multiple object tracking appears to be much like the spatial selection studiedextensively in the visual attention literature, just applied to moving objects.Thus MOT suggests hemifield specificity of spatial attentional selection, andone should expect to find strong hemifield-specificity of other visual cognitiontasks in addition to MOT.


Prior to the discovery that MOT is highly hemifield-specific, many researchershad already, for various tasks, compared performance when the two stimuli arepresented in the same hemifield to performance when they are presented indifferent hemifields. For example, Dimond and Beaumont (1971) found thatthe reporting of two briefly-presented digits is more accurate when the digitsare presented in different hemifields than in the same hemifield. However, thisstudy and many like it did not include a single-stimulus condition, so for thehigher performance in the split condition, we don’t know how close that is tothe one-target level of performance and thus the degree of different-hemifieldadvantage cannot be quantified. Moreover, many studies use response time asa measure, which can be difficult to interpret quantitatively (Awh and Pashler,2000; Sereno and Kosslyn, 1991; Dimond and Beaumont, 1971).

Delvenne (2005) used both dual-target and single-target conditions in a visualworking memory task. For spatial working memory, he estimated 40% hemi-field independence, although unfortunately he used the discredited A’ measureof performance (Zhang and Mueller, 2005) and did not space the stimuli widelyenough to reduce the possibility of spatial interference. Nevertheless, the advan-tage appears to be large and did not occur for color working memory (Delvenne,2012). More generally, only tasks with spatial demands seem to show much of adifferent-hemifield advantage (Holt and Delvenne, 2015; Umemoto et al., 2010)

Alvarez et al. (2012) studied visual search, with the stimuli to search arrayedbilaterally or unilaterally. They found only a small advantage of the verticalmeridian split in a standard search task. However, in a subset search taskwhere participants knew the target would be located in one of several locationsdesignated by a pre-trial cue, they found a large bilateral advantage. When therelevant locations were visually salient (due to a color difference), this advantagelargely disappeared. These results, and those in the rest of the literature, suggestthat hemifield advantages are strongest when spatial selection is critical.

Strong and Alvarez (2020) investigated working memory for stimuli that movedeither within a hemifield or between hemifields. For between-hemifield move-ment, they found a substantial decrease in accuracy for the spatial task ofremembering which positions of a 2x2 grid contained dots at the beginning ofthe trial, before the (empty) grid moved - 79% correct for between-hemifieldmovement, and 85% correct for within-hemifield movement. No such cost wasfound for color or identity memory tasks. This between-hemifield cost for spatialmemory was similar to the cost they found for MOT itself.

This association between spatial tasks and a different-hemifield advantage mayreflect a large-scale difference in how the brain processes spatial versus identityinformation. Famously, the dorsal stream to the parietal cortices are moreconcerned with spatial information than is the ventral stream, which is moreinvolved in object recognition (Goodale and Milner, 1992). Neural responses inthe dorsal pathway to parietal cortex are largely contralateral (Sereno et al.,2001), although as we have seen, this may depend on having stimuli in bothhemifields. Largely contralateral responding is also found for other brain areas

HEMISPHERIC DIFFERENCES 57

thought to contribute to a “saliency map” (Fecteau and Munoz, 2006), suchas the frontal eye fields (Hagler Jr and Sereno, 2006), the superior colliculus(Schneider and Kastner, 2005), and the pulvinar (Cotton and Smith, 2007).In contrast, identity-related processing seems to involve more bilateral neuralresponses and connectivity between hemispheres (Cohen and Maunsell, 2011;Hemond et al., 2007).

Multiple identity tracking, which is discussed further in section , combines thelocation-updating demand of multiple object tracking with an additional re-quirement to maintain knowledge of what features belong to each of the objects.Across four experiments, Hudson et al. (2012) consistently found partial hemi-field independence for this task, ranging from 26 to 37% (my calculations arehere) with a paradigm that yielded 65% independence for MOT. This is consis-tent with the suggestion of the findings listed above that spatial selection and/orlocation updating processes are much more hemisphere-specific than processesthat require maintenance of non-spatial features.

Hemispheric differences

MOT and spatial selection appear to be limited by independent processing inthe two hemispheres. Are these two hemispheres doing exactly the same thing,or do they have different strengths?

Functional differences between the left and right cerebral hemispheres can be at-tenuated at the behavioral level by the cross-hemisphere integration that occursfor many tasks prior to the stage that determines behavioral performance. Thefact that tracking performance reflects more hemispheric independence, then,provides a higher potential to show hemifield differences than for most tasks.As it turns out, however, while differences are observed, they do not seem to belarge.

In each of four experiments conducted in 2013, we found either a trend for orstatistically significant advantage for targets in the right hemifield (Figure A2)(Holcombe et al., 2014). This was also found by Strong and Alvarez (2020). Theright hemifield advantage could be explained by the idea that stimuli presentedto the right hemifield are processed by both hemispheres to a greater degreethan are stimuli presented to the left hemifield — this is thought to be whyleft neglect is more common than right neglect. Specifically, it is thought thatthe right hemisphere mediates attention to both hemifields (Mesulam, 1999),such that the right hemifield is doubly represented. However, while Strong andAlvarez (2020) confirmed a right hemifield advantage in their MOT experiments,they found a left visual field advantage for spatial working memory experiments,even though spatial working memory is also thought to be mediated by parietalcortex. Most strikingly, Matthews and Welch (2015) found a large advantage fortemporal order judgments and simultaneity judgments for stimuli presented tothe left hemifield. Neuropsychological evidence suggests that the left hemifield



advantage for timing tasks reflects a specialization for timing in the right parietalcortex (Battelli et al., 2003), even though human specialization for language,which is thought to be in part an enhancement of timing and sequencing abilities,has occurred more in the left hemisphere.

The situation becomes more complex once one considers that subtle interactionsbetween the two hemispheres seem to affect attention in each hemifield, ashighlighted in the “the underlying mechanisms” section above. One illustrationof this is a recent finding by Edwards et al. (2021). They had participantsperform MOT in either hemifield for a 30 min session, and afterwards foundthat performance in the untrained hemifield improved significantly. The reasonfor this is not clear, but could reflect “fatigue” by the hemisphere contralateral tothe trained hemifield, causing it to reduce its inhibition of the other hemisphere.Alternatively, the mechanism could be potentiation (an increase in gain) of theuntrained hemisphere as a result of the deprivation, which may be the reasonwhy depriving an eye results in increased cortical activity when that eye isstimulated later (Lunghi et al., 2011).

Knowing where but notwhat

What does a lay person mean when they say they are keeping track of some-thing? If they are a parent referring to the rest of their family during an outingto a museum, that might mean knowing where their son is, where their daughteris, and where their spouse is. Notice that this suggests that they are aware ofwhich person is where. But the conventional laboratory multiple object trackingtask does not test peoples’ knowledge of which target is where. The targets aretypically all identical to each other and people need to report where the targetsare, but not which is which.

This chapter is about what people know about the objects they are tracking.The basic answer is: surprisingly little. We should break the question down,however, into two questions. A first question is about how position updatingworks - does it use differences between the distractors’ and targets’ features tohelp keep track of the targets? A second question is about the extent to whichthe features of targets are available to conscious awareness.

The first question: Does position updating bene-fit from differences in object identities?

Motion correspondence

For decades now, computer algorithms have been developed for object trackingand used to improve detection of intrusions and safety threats in industrialsettings. They are also used in sports to analyze the movements of players of anopposing team’s previous games, and in animal labs to monitor the movementsof study subjects. When developing their tracking algorithms, engineers do notconfine themselves to using only the locations and motions of objects - theyalso use the appearance of those objects, for example their shapes and colors.This helps the algorithm match objects across video frames (the correspondence

59

60 KNOWING WHERE BUT NOT WHAT

problem, sometimes known in engineering as the “data association problem”)(Yilmaz et al., 2006). This allows successful tracking in situations where locationand motion alone would result in losing a target.

Of course, the fact that object features would be useful for the brain to use intracking does not necessarily mean that the brain does use them. The divisionof cortical visual signal processing into two streams, dorsal and ventral, hintsthat it might not. The dorsal stream, sometimes called the “where” pathway,specializes in motion and position processing, while the ventral stream, the“what” pathway, specializes in object recognition (Goodale and Milner, 1992).While the two pathways do interact, this division raises the possibility thatposition updating might not involve much processing of objects’ features.

Over a century ago, Gestalt psychologists such as Max Wertheimer found thatapparent motion was equally strong whether the objects in successive frameswere identical or different (Wertheimer, 1912). Later studies found some effectof similarity, but the effect was weak (Kolers and Pomerantz, 1971; Burt andSperling, 1981). These findings contributed to the now-dominant view that thevisual system does not use feature similarity much when computing motion cor-respondence to update a moving object’s position. Some caution is justified,however, because when the successive presented frames of an object touch oroverlap with each other rather than being presented in non-contiguous locations,the results can be different. The study of such displays, with a different objectappearance (usually, shape) in two successive frames, is known as line motion ortransformational apparent motion. These studies have found that feature sim-ilarity, especially contour continuity, but also color, can decide which tokensare matched (Faubert and Von Grunau, 1995; Tse et al., 1998). Thus, featuresimilarity is involved in motion processing, even though in many situations mo-tion correspondence is determined by spatiotemporal luminance relationships.An important characteristic of this process that does does not seem to havebeen studied, however, is whether the more complex cues characterized by Tseet al. (1998) and others are processed in parallel. Short-range spatiotemporalluminance relationships (“motion energy”) are known to be processed in par-allel, by local detectors, yielding parallel visual search for a target moving inan odd direction defined by small-displacement apparent motion (Horowitz andTreisman, 1994). I am not aware of any studies that have investigated this fortransformational apparent motion, in a situation where the perceived motion di-rection is determined by feature similarity . Thus, the possibility remains thatfeature similarity has its influence through what I have called a C=1 process ().

Feature differences, but not feature conjunction differences,benefit tracking

While only spatiotemporal luminance information, not other features, typicallyinfluence motion correspondence, another way that object featural informationmight benefit position tracking is via the action of feature attention. A clear

THE SECOND QUESTION: ARE WE AWARE OF THE IDENTITIES OF OBJECTS WE ARE TRACKING?61

case is if the targets differ in color from the distractors. It is well-establishedthat attention can select stimulus representations by their color. One can, forexample, enhance the selection of all red objects in the visual field. Makovskiand Jiang (2009) confirmed that this process can benefit MOT. They used eightmoving objects, four of which were targets. MOT performance was better thanwhen the eight objects were different in color than when they were identical.This was also true when the objects were all different in shape.

Apart from the usefulness of attention to an individual feature when the targetsdiffer in that feature from the distractors, do feature differences benefit tracking?The answer seems to be no, because testing of feature conjunctions has foundevidence that they don’t help. In other paradigms, a large body of evidencehas supported Treisman’s theory that feature pairing information, in contrastto individual features, cannot efficiently guide attention to targets (Treismanand Gelade, 1980; Wolfe, 2021). It seems that the splitting of attention amongmultiple locations that occurs in MOT is similar to how attention is diffused overmultiple stimuli in visual search and other paradigms, because @Makovski andJiang (2009) found that targets having unique feature pairings do not benefittracking performance. In their “feature conjunction” condition, each object hada unique pair of features, while it shared the individual features with at leastone other object. Performance was no better in this condition than if the objectswere all identical.

The second question: Are we aware of the iden-tities of objects we are tracking?

We are of course aware of object features when the only task we are engagedin is tracking a single target. In that situation, our limited-capacity processescan all be applied to that target and, as there is only a single target, there is noneed to maintain bindings between particular targets and particular features.However, when we are tracking multiple targets, the evidence indicates that wehave little ability to report the objects’ features, other than their locations, aswe will see below.

A common view among lay people seem to be that we are simultaneously awareof the identities of all the objects in the central portion of our visual field, sounless an object actually disappears, moves to the edge of our visual field, orhides behind something or someone, we should always know where everythingin the scene, and we should immediately detect any changes to these objects.Change blindness demonstrations are often the first experience that disruptsthis belief.

Experiments suggest that during change blindness, although people cannot si-multaneously monitor a very large number of objects for change, but they areable to monitor several, perhaps four or five (Rensink, 2000). They appear to do


this by loading the objects into working memory and then, in the second frameof a change blindness display, checking whether any are different than what isheld in memory. People certainly can store several objects and rapidly comparethese stored representations to the visual scene. However, loading into memorythe features of objects for storage and subsequently comparing them to a newdisplay with the objects in the same location may have different demands thanmaintaining awareness of the changing features of such objects. For one thing,it appears that hundreds of milliseconds are needed to encode several objectsinto memory (Vogel et al., 2006; Ngiam et al., 2019). Second, it appears thatwhen objects are in motion, updating of their features is particularly poor, aswe will see.

When Zenon Pylyshyn published the first theory of multiple object tracking, hehad already devised the concept of FINSTs (Fingers of Instantiation), a smallset of discrete pointers allocated to tracked targets. The idea was that eachdiscrete pointer allows other mental processes to individuate and link up withan object representation, with the continued assignment of a pointer to a targetfacilitating its representation an object’s representation as the same persistingindividual (Pylyshyn, 1989).

Pylyshyn’s theory implied that when tracking multiple targets, people shouldknow which target is which. Pylyshyn tested this and other predictions of histheory, and when the results turned out differently than expected, he publishedthe results in two papers. The first paper was entitled “Some puzzling findings inmultiple object tracking: I. Tracking without keeping track of object identities”.In one study in that paper, targets were assigned identities either by givingthem names or by giving them distinct and recognizable starting positions: thefour corners of the screen (Pylyshyn, 2004). At the end of a trial, participantshad the usual task of indicating which objects were targets, but also were askedabout the identity of the target - which one it was. Accuracy at identifying thetargets was very low, even when accuracy reporting their positions was high.

More evidence for a disconnect between knowledge of what one is tracking andsuccess at the basic MOT task was found by Horowitz et al. (2007), who hadparticipants track targets with unique appearances - in one set of experiments,they were cartoon animals. All the targets moved behind occluders at the end ofeach trial so that their identities were no longer visible. Participants were askedwhere a particular target (say, the rabbit) had gone - that is, which occluder itwas hiding behind. This type of task was dubbed “multiple identity tracking”by Oksama and Hyönä (2004). Performance was better than chance, but wasmuch worse than performance for reporting the target locations irrespective ofwhich target it was. The effective number of objects tracked, as reflected ina standard MOT question, was about four, but for responses about the finallocation of a particular animal, capacity was estimated as closer to two objects.

These results of MIT experiments suggest that our ability to update the loca-tion of one of multiple objects of interest is much better than our ability tomaintain knowledge of what that object is. This harkens back to Pylyshyn’s

BEATEN BY A BIRD BRAIN 63

idea that tracking is mediated by pointers that in and of themselves, only pointto locations and don’t contain other featural information. Pylyshyn thoughtthat these pointers, being unique and distinct, do provide us with knowledge ofwhich target location at the end of a trial corresponded to a particular target atthe beginning of a trial. However, his own experiments ruled against that - thetracking process seems to deploy something to the moving targets that carriesabsolutely no information about those targets other than their positions.

If we are given enough time, we certainly can update our representation ofnot only the locations of targets but of their features. For example, in visualshort-term memory experiments, on successive trials people memorize differentlocation-feature mappings for several objects. Thus, if moving objects weresimply to move very slowly, we should be able to update our awareness of whatis where before any target travels more than a trivial distance. However, a quiterecent study yielded some results that further showcase the limitations of ouridentity updating abilities.

Beaten by a bird brain

Pailian et al. (2020) conducted a test that was, at its core, similar to Pylyshyn’sexperiments, with identical objects assigned unique identities. In their experi-ments, however, Pailian et al. (2020) used a format like a hustler’s “shell game”.The engaging nature of the shell game format made it suitable for testing chil-dren and an African grey parrot as well as human adults. This led to a fewsurprises.

For their stimuli, Pailian et al. (2020) used colored balls of wool. Real balls ofwool, not pictures of such on a screen. Between one and four of the balls wereshown to a participant by the experimenter. The experimenter then covered theballs with inverted opaque plastic cups, and began to move them, swapping thepositions of first one pair, then another. After a variable number of pairs wereswapped, the experimenter produced a probe ball of one of the target colors,and the participant’s task was to point to (or peck on!), the cup containing theprobed color.

At any one time, only two objects were in motion. The participants were respon-sible for knowing the final location of all the colors - there were no distractors. Ithink that many would have prediced that people would be able to perform thistask with high accuracy, especially given that not only were only two objectsin motion at any one time, the experimenter paused for a full second betweenswaps, which ought to give people sufficient time to update their memory of thelocations of those two colors.

When only two balls were used, the results were unsurprising: over 95% accu-racy, even for four swaps, the highest number of swaps tested. This was trueof all three types of participants tested: the human adults, the parrot, and the


Figure 11: An African grey parrot participates in a shell game. Pailian etal. (2020), CC-BY .

(6- to 8-year-old) children. In the three-ball condition, for the children perfor-mance was near ceiling for the zero-swap (no movement) condition, but fell toclose to 80% correct in the one-swap condition, and fell to around 70% correctfor two and three swaps. The adults did better, but still their performance fellsubstantially with number of swaps, to about 80% correct for four swaps. Re-markably, the parrot actually outperformed not only the children, but also thehuman adults. Importantly, the parrot had not been trained extensively on thetask, learning it primarily by simply viewing the experimenter and a confederateperform three example trials (the parrot was experienced with a simpler versionof the task involving only one object presented under one of the three cups).

The biggest surprise here is that an African gray parrot had the ability toremember and update small sets of moving hidden objects to a level of accuracysimilar to humans, despite having a much smaller brain than ours, less than one-fiftieth the size of our own in fact. Of course, this was an above-average parrot(selection bias surely was part of the reason it has been studied extensively), butstill. Large parts of the parrot brain evolved after they split from our lineage(Iwaniuk et al., 2005), so the fact that it can do this task, like us (or even betterthan us), appears to be an example of convergent evolution.

A second surprise was that the adult humans (in this case, Harvard under-graduates, who almost surely had above-average intelligence) displayed levels ofaccuracy that were not very high for the conditions that involved more than afew swaps. Remember that in these experiments, only two balls were moved at atime, and there was a one-second pause between swaps. Prior to the publicationof this study, I had assumed that the reason for poor performance in multipleidentity tracking was the difficulty of updating the identity of three or four tar-gets simultaneously while they moved. I would have predicted that changingpositions exclusively by swapping the positions of two objects, and providing aone-second pause between swaps, would keep performance very high. The Pail-ian et al. (2020) results suggest that updating the memory of object locationsis quite demanding.

This finding was also surprising based on the long-popular concept of “object

SOME DISSOCIATIONS BETWEEN IDENTITY AND LOCATION PROCESSING REFLECT POOR VISIBILITY IN THE PERIPHERY65

files” developed by Kahneman et al. (1992). The idea was that all the featuresof an object are associated with a representation in memory, the object file,that is maintained even as the object moves. Kahneman et al. (1992) showeda preview display with two rectangles, with a feature (in most experiments, aletter) presented in each. The featural information then disappears, and therectangles move to a new location. The observer’s representation of the displayis then probed, for example by presenting a letter again in one of the rectanglesand asking participants to identify it. Kahneman et al. (1992) found that if theletter was the same as the one presented in that rectangle at the beginning ofthe display, observers were faster to respond than if it had appeared in anotherrectangle in the beginning of the display, indicating that that aspect of therectangle’s initial properties was maintained, with its location updated. Thefocus in these studies was on simply demonstrating that this response timepriming occurred at all, not in assessing what proportion of time it occurred.

Many researchers may have made the same mistake that I did of assuming thatseveral object files could easily be maintained and updated. However, even inthe original experiments of Kahneman et al. (1992), they found that the amountof priming was greatly diminished when four letters were initially presented indifferent rectangles, indicating that fewer objects than that had letter infor-mation maintained and updated. They concluded that there may be a severecapacity limit on object files or object file updating. This was also supported bya pioneering study by Saiki (2002), who had participants view a circular arrayof colored discs that rotated about the center of the screen. Occasionally discsswapped color when they briefly went behind occluders, and the participants’task was to detect these color switches. Performance decreased dramaticallywith speed and number of discs, even though the motion was completely pre-dictable, and Saiki (2002) concluded that “even completely predictable motionseverely reduces our capacity of object representations, from four to only oneor two.” Because we now understand that simple MOT does not work wellacross occluders, however, that interpretation of the study is limited by theabsence of an MOT-type control. Nevertheless, the evidence from the studiesin this chapter overall suggests that identity updating is very poor in a range ofcircumstances.

Some dissociations between identity and locationprocessing reflect poor visibility in the periphery

Why is it that participants cannot update the identities of the moving objectsthat they are tracking nearly as well as they can update their positions? Theresults of an eye-tracking study by the Finnish researchers Lauri Oksama andJukka Hyönä led them to conclude that identities are updated by a serial one-by-one process. Eye movements during MOT were contrasted with eye movementsduring MIT, in which the targets and distractors were line drawings. During


MIT, participants frequently looked directly at targets, for more than 50% ofthe trial duration, and frequently moved their eyes from one target to another.In contrast, during MOT, the participants moved their eyes infrequently, andtheir gaze wasn’t usually at any of the moving objects, rather they were moreoften looking somewhere close to the center of the screen. Oksama and Hyönä(2016) took these results to mean that the targets’ identity-location bindingsthat must be updated during MIT are updated by a serial one-by-one process,whereas target positions during MOT are updated by a parallel process.

A problem for interpreting the Oksama and Hyönä (2016) results is that par-ticipants may have had to update target identity information one-by-one purelydue to limitations on human peripheral vision. That is, the targets (line draw-ings of different objects) likely were difficult to identify when in the periphery.Thus, participants may have had to move their eyes to each object to refreshtheir representation of which was which. Indeed, in a subsequent study, Liet al. (2019) tested discriminability of the objects in the periphery and foundthat accuracy was poor. When colored discs were used as stimuli instead of linedrawings, accuracy was higher in the periphery and participants did not movetheir eyes as often to individual targets. This suggests at least some degree ofparallel processing, leaving the amount of serial processing,if any, in doubt, atleast for simple colors, .

Unfortunately, many findings of differences between MIT and MOT performancemay be explained by poor recognition of the targets in the periphery. Becausemost studies of MIT do not include an assessment of how recognizable theirstimuli are in the periphery (Li et al. (2019) is the only study I know of that didthis), it is hard to say how much of the difference between MIT and MOT canbe attributed to this. I am not sure how one should equate object localizationwith object identifiability. One could blur the objects to impair localization butit is not clear what degree of spatial uncertainty is comparable to a particularlevel of object identifiability; this is the old apples-and-oranges problem.

One dissociation between identity and location tracking performance seems toremain valid regardless of the difficulty of perceiving object identities in theperiphery. This is the original finding by Pylyshyn (1989), replicated by Cohenet al. (2011), that if targets are actually identical but are assigned differentnominal identities, participants are very poor at knowing which is which at theend of the trial. Because in this paradigm, there is no visible identity informationand participants knew this, poor vision in the periphery is not an issue.

Evidence from two techniques suggests parallelupdating of identities

Howe and Ferguson (2015) used two techniques to investigate the possibilitythat serial processes are involved in multiple identity tracking. First, Howe et

EVIDENCE FROM TWO TECHNIQUES SUGGESTS PARALLEL UPDATING OF IDENTITIES67

al. applied a simultaneous-sequential presentation technique that had previouslyyielded evidence for no serial processing in MOT (Howe et al., 2010a). Origi-nally developed by Shiffrin and Gardner (1972) to investigate briefly-presentedstationary stimuli, the stimuli are presented either all at once (simultaneously)or in succession (sequentially) during a trial, half the stimuli presented in thefirst interval, and the other half in the second interval. The idea is that if aserial process is required to process each stimulus, performance should be betterin the sequential condition, as in the two conditions the presentation durationof each stimulus is equated, but in the simultaneous condition a one-by-one pro-cess wouldn’t have enough time to get through all the stimuli. The techniquehas been applied extensively to the detection of a particular alphanumeric char-acter among other alphanumeric characters, and researchers have found thatprocessing in the simultaneous condition is equal to or better than the sequen-tial condition, (Shiffrin and Gardner, 1972; Hung et al., 1995), suggesting thatat least four alphanumeric characters can be recognized in parallel.

For the MIT task, four targets of different colors moved among four distractors.Each of the four distractors was the same color as one of the targets, so that thetargets overall could not be distinguished from the distractors by color. In thesimultaneous condition, all the objects moved for 500 ms and then paused for500 ms, with this cycle repeating throughout the trial, which varied randomlybetween 8 and 16 s. In the sequential condition, half the targets moved for500 ms while the other half were stationary, and subsequently the other halfof targets moved for 500 ms while the others remained stationary. This cyclerepeated throughout the length of the trial. In two different versions of theexperiment, performance was similar in the simultaneous and sequential condi-tions, supporting the conclusion that there was no serial process required for thetask. This conclusion from Howe and Ferguson (2015) is limited, however, by itsassumption that any serial process could respond efficiently to the pause of halfthe targets by shifting its resources to the moving targets, while not causing anyforgetting of the locations and identities of the temporarily-stationary targets.To support this assumption, Howe and Ferguson (2015) pointed out that Hogen-doorn et al. (2007) had shown that attention could move at much faster ratesthan 500 ms per shift. However, the Hogendoorn et al. (2007) studies did notassess the attention shifting time between unrelated targets, rather their shiftswere for attention stepping along with a single target disc as it moved about acircular array. Thus, it is unclear how much the results of Howe and Ferguson(2015) undermine the serial, one-by-one identity updating idea embedded in thetheories of Oksama & Hyönä and Lovett et al. (2019).

Howe and Ferguson (2015) further investigated serial versus parallel processingin MIT by using another technique: the systems factorial technology of JimTownsend and colleagues (Townsend, 1990). Two targets were designated fortracking and presented in the same hemifield, to avoid independence by virtueof the hemispheres’ independence (Alvarez and Cavanagh, 2005). The partici-pants were told to monitor both targets as they moved and that if either of themdarkened, to press the response button as quickly as possible, after which all the


disks stopped moving and the participant was asked to identify the location ofa particular target, for example the green one (the objects were identical duringthe movement phase of the trial but initially each was shown in a particularcolor). To ensure that participants performed the identity tracking task as well,only trials in which the participant reported the target identity correctly wereincluded in the subsequent analysis. Detection of the darkening events was veryaccurate (95% correct). On different trials, either both targets darkened, oneof them darkened, or neither of them darkened, and each could darken eitherby a small amount or by a large amount. Based on certain assumptions, thepattern of the distributions of response time for the various conditions ruled outserial processing and implicated limited-capacity parallel processing. This sug-gests that participants can process luminance changes of two moving targets inparallel while also maintaining knowledge of the identity of the moving targets.One reservation is that it is unclear how often the participants needed to up-date the target locations and refresh their identities, because the rate at whichthey needed to be sampled to solve the correspondence problem is unclear forthe particular trajectories used. It also would be good to see these techniquesapplied to targets defined only by distinct feature conjunctions, with no differ-ences in features between the targets and the distractors. This would preventany contribution of feature attention, and with processing of feature pairs likelyto be more limited-capacity than that of identifying individual features, theresults might provide less evidence for parallel processing.

Eye movements can add a serial component totracking

Partially in response to the evidence of Howe and Ferguson (2015) against serialprocessing in tracking, Oksama and Hyona, with their colleague Jie Li, revisedtheir Model of Multiple Identity Tracking (MOMIT) to involve more parallelprocessing, creating MOMIT 2.0. MOMIT 2.0 proposes that the “outputs ofparallel processing are not non-indexed locations but proto-objects that containboth location and basic featural information, which can be sufficient for trackingin case no detailed information is required” (Li et al., 2019). This is a reasonableresponse to the evidence, even if it unfortunately means the theory doesn’t makeas strong predictions, as the role of serial processing is now more vague. InMOMIT 2.0, serial processing is tied to eye movements and is used to acquiredetailed visual information for refreshing working memory representations. Liet al. (2019) wrote that this “prevents the resolution of the active representationsfrom declining. This is vital for tracking targets that require high-resolutioninformation to be identified and kept distinguishable from other targets.” Thetheory seems to be mute on whether serial processing would be involved if bothfixation were enforced and the stimuli were easily identifiable in the periphery.

Here it is sensible to step back and consider the role of eye movements in more

EYE MOVEMENTS CAN ADD A SERIAL COMPONENT TO TRACKING69

everyday behavior. People move their eyes on average three times a second,in part because like many other animals, our retina has a specialized part (thefovea) that is adaptive to direct at whatever object we are most interested inat the moment. In natural tasks, rarely is it the case that all visual signals ofinterest are clustered together enough that they can be processed adequatelywithout eye movements. Moreover, animals such as ourselves have strong drivesfor exploration and vigilance because we evolved in changing environments.

Eye movements normally contribute a serial, one-by-one component to process-ing, because as Li et al. (2019) highlighted, high-resolution information comesfrom only a single region on the screen - that currently falling on the fovea.Near-continual scanning of the visual scene is a deeply ingrained habit, and isalso necessary for many artificial tasks, like reading. Not only are saccades fre-quent during reading, one influential theory of reading proposes that an internalrhythm drives saccades from one word to the next rather than them being trig-gered by the completion of a process such as word recognition (Engbert et al.,2002). Perhaps, then, one should expect frequent eye movements to occur andcontribute a serial component of processing to a range of tasks even when eyemovements are not necessary. People are cognitively “lazy” in that they seem tostructure eye movements and other actions in tasks so as to minimize short termmemory requirements (Hayhoe et al., 1998). Thus, even if saccading to differenttargets were inefficient because people could keep information in memory, andupdate information in the periphery, people move their eyes anyway.

To move the debate regarding the role of serial and parallel processing forward,we should recognize that the frequent eye movements associated with naturalbehavior should be expected in tracking, as for any task involving multiplerelevant stimuli, and this can contribute a serial processing component, evenif people can perform the same task in a much more parallel fashion when eyemovements are constrained. The most interesting evidence for serial processing,then, may be that found when eye movements are prohibited. The steep increasewith load in apparent sampling frequency discovered by Holcombe and Chen(2013) constitutes some evidence for that.

To summarise this chapter, there is plenty of evidence that both use of objectidentities in tracking and the updating of target identities for awareness is quitepoor. This fits with a much broader set of findings over the last thirty years,that the mind maintains fewer explicit visual representations than we intuitivelybelieve. The first findings of change blindness, which were in the context of fail-ures to detect changes that occurred during eye movements (McConkie, 1979),led O’Regan (1992) to suggest that “the world is an outside memory” and laterto discover change blindness (Rensink et al., 1997). The idea was that the im-pression that one has a rich representation of all the objects in the visual fieldis an illusion, and instead that one has only a more limited knowledge, but thatthis is quickly supplemented by attentional processing when one becomes inter-ested in a particular location or object. It appears that O’Regan’s big idea goesfurther than he anticipated. While O’Regan suggested that only when objects


were attended would they be fully processed, he did not suggest that one mightbe able to track the changing locations of multiple targets without becomingaware of what they are. Such lack of awareness has been documented not onlyin the context of tracking multiple objects as reviewed above, but even in achange blindness task that involved eyetracking of just a single target as partof an ongoing, more complex task (Triesch et al., 2003).

Abilities and individualdifferences

For understanding the abilities that underlie multiple object tracking, so farwe have discussed only the classic experimental approach of manipulating dif-ferent factors within participants. This has led to our present understandingof the roles of spatial interference, temporal interference (Holcombe and Chen,2013), and the relationship to the processes underlying other tasks. However,a small but growing literature uses the individual-differences approach. In theindividual-differences approach, the pattern of variation in scores on multipletests is examined to see which abilities tend to go together. Those abilities thatco-vary the most are thought to likely share more processes in common thanthose that don’t.

Do people vary much in how many objects theycan track?

Individual-difference studies can require more than ten times as many partici-pants as a within-participants experimental design investigating a large effect(Schönbrodt and Perugini, 2013), but some studies do not use large samples. Inaddition to this shortcoming of the literature, there are also two very commonother pitfalls in MOT and MIT individual-difference studies.

Meyerhoff and Papenmeier (2020) tested fifty participants and for each onecalculated the effective number of items tracked, for a display with four targetsand four distractors. The modal effective number of items tracked was aroundtwo, but a substantial proportion of participants came in at three targets orone target tracked, and a few scored close to zero items tracked. Meyerhoffand Papenmeier (2020) concluded that some participants could only track oneor zero targets, while others can track more. Unfortunately, however, thereis no way to know how much of the variation between individuals is due tomotivation rather than ability. Measuring motivation reliably is difficult orimpossible. Researchers can, however, include attention checks or catch trials

71

72 ABILITIES AND INDIVIDUAL DIFFERENCES

to allow exclusion of participants who show clear evidence of not reading theinstructions carefully, or frequently not paying attention. This is one pitfall —failing to include a measure that checks whether each participant is blowing offthe task.

Oksama and Hyönä (2004) were also interested in how many objects people cantrack. They managed to test over two hundred participants, and like Meyerhoffand Papenmeier (2020) they found what appeared to be a substantial variationin capacity, with some people able to track six objects, while many could trackonly two or even just one. However, no analyses were reported regarding thereliability of the MOT test. Their participants, who were provided by an airpilot recruitment program, were made up entirely of those who scored in thetop 11% on intelligence test scores from a larger group. This provides someconfidence that the participants were motivated

While we can be fairly confident that Oksama and Hyönä (2004) used motivatedpaticipants, the study suffers from what I think of as the second pitfall - thefailure to assess task reliability. On any test, a participant will tend to getsomewhat different scores when tested on two different occasions, even if theydid not learn anything from their first experience with the test. The extent towhich participants’ scores are similar when taking a test twice is known as test-retest reliability. Ideally, this is measured with two tests administered at verydifferent times, but a more limited assessment is provided by dividing a singlesession’s trials into two groups and calculating the correlation between thosetwo groups, which is known as split-half reliability. Knowing the reliability canallow us to calculate how much of the variation in scores between participantsis expected based on the noisiness of the test. Without knowing the reliability,there remains the possibility that the extreme variation in scores, with someparticipants’ data indicating that they could only track one target, could bedue to limited reliability - extensive testing of these participants might revealthat their low score was merely a fluke.

Subsequent studies have assessed reliability, albeit only with the split-half mea-sure rather than on separate days. Still, the reliabilities they have found forMOT are extremely impressive - 0.96 (Huang et al., 2012), 0.85 (Wilbiks andBeatteay, 2020), 0.92 (Treviño et al., 2021), and 0.87 (Eayrs and Lavie, 2018),near the highest of all the tests administered. Moreover, many basic cogni-tive and attentional tasks have notoriously low reliabilities (Hedge et al., 2018).Tasks with low reliabilities are not well suited for individual-differences studies- as mentioned above, individual-difference studies are largely based on measur-ing the pattern of correlations between tasks to reveal the relationship amongabilities. The lower the reliability of a task, the harder it is to reliably measurethe correlation with another task.

What do these high reliabilities mean for tracking? It suggests that the largeindividual differences observed by Oksama and Hyönä (2004) and others areactually real. Evidently, some young, healthy, high-intelligence people can reallyonly track one target. Second, the high task reliability of MOT means that

DO PEOPLE VARY MUCH IN HOW MANY OBJECTS THEY CAN TRACK?73

individual-difference studies are a viable avenue for gaining new insights abouttracking and its relation to other mental abilities.

In the general population, ageing is likely a major source of individual differencesin MOT, as older participants perform much worse than younger participants(Trick et al., 2005; Sekuler et al., 2008; Roudaia and Faubert, 2017). Using atask requiring participants to detect which of multiple objects had changed itstrajectory, Kennedy et al. (2009) found a steep performance decline between30 and 60 years — the effective numbers of trajectories tracked in a multipletrajectory tracking task dropped by about 20% with each decade of aging, whichcould not be explained by a drop in visual acuity. This is interesting in itself,and is something that theories of aging and attention ought to explain. Thisresult must also color our interpretation of individual-difference studies usingsamples with a wide age range — some of the correlations with other tasks willlikely be due to those abilities declining together rather than them being linkedin people of the same age. That’s still useful for drawing inferences, but theinferences should perhaps be different than from individual-difference studies ofundergraduates.

Researchers have taken what one might call a wide-angle approach to MOTindividual difference studies. They’ve tested participants with a wide varietyof tests, to see which mental abilities are linked. However, the first large-scalestudy concentrated on tasks typicaly thought of as attentional (Huang et al.,2012). Liqiang Huang and his colleagues used tests of conjunction search, con-figuration search, counting, feature access, spatial pattern, response selection,visual short-term memory, change blindness, Raven’s test of intelligence, visualmarking, attentional capture, consonance-driven orienting, inhibition of return,task switching, mental rotation, and Stroop. In a sample of Chinese universitystudents (age not reported, but presumably mostly young adults), many of thesetasks showed high reliabilities of over 0.9, meaning that there was a potentialfor high inter-task correlations (inter-task correlations are limited by the reli-abilities of the two tasks involved). However, the highest correlation of a taskwith MOT was 0.4. The task was counting, which required judging whether thenumber of dots in a brief (400 ms) display were odd or even. Change blindness,feature access, visual working memory, and visual marking were runner upswith correlations of around 0.3.

That no task had a higher correlation is very interesting, but also disappoint-ing. It’s interesting because it suggests that MOT involves distinct abilitiesfrom several other tasks that have previously been lumped together with MOTas being “attentional”. This is also disappointing, first because it suggests thatour theoretical understanding of these tasks is sorely lacking. It’s also disap-pointing because the low correlations mean that it’s hard to discern the patternof correlations, e.g. from most correlated to least correlated with MOT - whenthe highest correlations is 0.4, one needs very narrow confidence intervals to beconfident of the ordering of the tasks.

Treviño et al. (2021) reported data from a test of more than 400 participants,


an opportunity sample of people aged 18 to 89. They tested a set of cognitive,attentional, and common neuropsychological tasks: arithmetic word problems,the trail-making task, digit span, digit symbol coding, letter cancellation, spatialspan, approximate number sense, flanker interference, gradual onset continuousperformance, spatial configuration visual search, and visual working memory aswell as MOT. MOT had among the highest reliabilities, at 0.92. MOT perfor-mance had little correlation with performance on the task designed to measuresustained attention over an extended period (about five minutes, the gradual-onset continual performance task (Fortenbaugh et al., 2015). This supports thetentative conclusion that the ability to sustain attention without lapses is notan important determinant of tracking performance.

In the Treviño et al. (2021) inventory, the task that most resembled the countingtask found by Huang et al. (2012) to have a high correlation with MOT wasan approximate number sense task, which had a moderate correlation of 0.3.The approximate number sense task differed from the counting task of Huanget al. (2012) by not testing the subitizing (less than 5 items) range, which mighthelp explain any discrepancy. Indeed, Eayrs and Lavie (2018) found, usinghierarchical regression, that subitizing made a contribution to predicting MOTperformance that was somewhat separate to that of an estimation task usinglarger set sizes.

The tasks with the highest correlations with MOT in the data of Treviño et al.(2021) were visual working memory, spatial span, letter cancellation, and digitsymbol coding, all at around 0.5. As the authors pointed out, the letter can-cellation and digit symbol coding tasks are complex tasks believed to reflecta number of abilities. This makes it hard to interpret their correlation withMOT. Spatial span and visual working memory are quite different from MOT,but similar to each other in that they both involve short-term memory for mul-tiple visual stimuli.

Overall, there is a reasonable level of agreement across these individual-differences studies, as well as others not reviewed here, such as Trick et al.(2012). They agree that visual working memory has a robust correlation withMOT performance, which is particularly interesting because superficially, MOTimposes little to no memory demand. Many researchers conceive of tracking assimply the simultaneous allocation of multifocal attention to multiple objects,with a process autonomous to memory causing the foci of attention to movealong with the moving targets.

From the consistently strong correlation of MOT performance with visual work-ing memory, it is tempting to conclude that mechanistically the two tasks aretightly linked. However, it must be remembered that working memory tasksare among the best predictors of a wide range of tasks, including intelligenceas well as the Stroop task, spatial cuing, and task switching (e.g. Redick andEngle, 2006).

GOING DEEPER 75

Going deeper

Variation in multiple object tracking is unlikely to be caused by just one ability.From three decades of work, we now understand that tracking performancecan be limited by spatial interference and temporal interference, as well as lesstask-specific factors such as lapses of attention.

Unfortunately, no individual difference study to date seems to have attemptedto partial out the possible components of MOT (e.g., spatial interference versustemporal interference, as in Holcombe and Chen (2013)) to see whether theyshow different patterns of correlations with other tasks. In the realm of spatialinterference with static stimuli, even studies with small sample sizes have re-vealed substantial individual differences (Petrov and Meleshkevich, 2011), suchas larger crowding zones in some types of dyslexia (Joo et al., 2018). It’s pos-sible that these differences form a large part of inter-individual differences inMOT. There is also evidence that training with action video games can reducespatial interference and improve reading ability (Bertoni et al., 2019), makingit especially important to investigate spatial interference further.

With the growth of online testing, the sample sizes required for individual dif-ference studies have become easier to obtain, and so individual differences area promising future direction. However, researchers who are more familiar withthe issues and analyses of within-subjects studies must be aware of the differentissues that are important for individual-differences studies, such as the pitfallsreviewed in the beginning of this chapter.


Towards the real world

Some of the findings in multiple object tracking are useful for understandingreal-world situations. A naive view of visual perception and attention is thatwe are simultaneously aware of the identities of all the objects in a scene, andsome people such as sports coaches may think that unless a player actuallydisappears or hides behind something or someone, players should know whereeveryone in front of us on the basketball court, or the soccer field, is at all times.Similarly, during driving many people seem to assume that they are aware ofall hazards in their visual field. The fact that this is not accurate has beenincorporated into driver education (Fisher et al., 2006).

Change blindness demonstrations can rapidly dispel naive beliefs about visualawareness of change, but still people assume that if they are actively attendingto a moving object, they will be aware of its features. As we have seen (Chapter), this is not true, although tracking something does make it more likely thata change will be detected. This was supported in a simulated driving studyby Lochner and Trick (2014), who found that changes were found more accu-rately and rapidly when the change was made to a target vehicle rather than adistractor vehicle.

Few empirical studies, unfortunately, have made strong links between real-worldsituations nad laboratory MOT tasks, or its underlying abilities. Bowers et al.(2013) found that laboratory MOT performance did not predict driving testperformance as well as either the Montreal Cognitive Assessment task, a trail-making task, or a subtest of a useful field-of-view task. The aforementioneddriving simulator study by Lochner and Trick (2014) found that drivers weremore accurate at localizing which of multiple lead vehicles braked if it was atracking target, but there was no advantage in terms of braking response time.

Mackenzie et al. (2021) used a multiple object avoidance task where the userused a mouse to control one of the balls that they had to prevent from collidingwith the other balls. The task is reminiscent of the venerable Asteroids gamementioned in the beginning of this book. Mackenzie et al. (2021) found strongcorrelations with performance on a driving simulator and with years of drivingexperience. In an earlier paper, some of these authors found that the MOAcorrelated better with driving performance than conventional MOT (Mackenzie

77

78 TOWARDS THE REAL WORLD

and Harris, 2017).

In sports, some teams of researchers have frequently suggested that MOT per-formance predicts in-game performance in soccer and other sports, and furtherreported evidence that training on MOT tasks can enhance skill in sports. Un-fortunately, the associated evidence is not strong (Vater et al., 2021). Giventhe poor record of computer-based training tasks (sometimes called “brain train-ing”) in improving skills in other real-world domains, one should remain skepti-cal that MOT training has benefits until rigorous evidence is provided (Simonset al., 2016).

Progress andrecommendations

The study of multiple object tracking has traditionally been done in the lab-oratory, with only a few dozen participants in an experiment. Only recently,then, has one of the greatest virtues of the MOT task become clear: it’s hightest-retest reliability, of approximately .8 or .9, among the highest of attentionaltasks (see Chapter ). The outlook for future work, then, is positive both fromthe prospect of new results being credible (because with a non-noisy task, lessdata is needed to have high statistical power) and for the use of MOT as a toolto study individual differences.

To further highlight how much we have learned in the last thirty-odd years,next I will describe how much findings have allowed us to disconfirm multipleaspects of the first and most influential theory of tracking, Zenon Pylyshyn’sFINST theory.

The decline of Pylyshyn’s FINST theory

At the time of writing, Pylyshyn’s FINST theory is the featured theory on the“multiple object tracking” Wikipedia page (Editors, 2021), and it is frequentlyinvoked in the scholarly literature as well. However, some of the main featuresof the theory have been rebutted (Scholl, 2008).

Core to the theory is the idea that tracking is mediated by a fixed set of discrete,preattentive indices. As we have seen, however, as speed increases, the numberof targets that can be tracked rapidly decreases, to one target, which is hardto explain with a fixed set of discrete indices (Alvarez and Franconeri, 2007;Holcombe and Chen, 2012).

Adding to the evidence that tracking draws on attentional resources are dual-task studies. I did not have space to review them in this Element, but theyindicate that tracking draws on attentional resources shared with some othertasks (Oksama and Hyönä, 2016; Alnaes et al., 2014). One should add, however,

79

80 PROGRESS AND RECOMMENDATIONS

that such studies do not seem to have ruled out the possibility that these find-ings were caused entirely by a C=1 resource rather than the hemifield-specifictracking processes; unfortunately, none of these studies appear to have testedfor the hemifield specificity of their findings.One prediction of Pylyshyn’s theory was that participants should be aware whichtarget is which among the targets they are tracking. Pylyshyn himself foundstrong evidence against this, and the evidence for poor updating of target iden-tities has increased since then.

Topics not covered by this book

Many topics that I originally planned to cover could not be included in thisElement (I was given a word limit by the publisher). Some of the most impor-tant are whether tracking operates in a retinotopic, spatiotopic, or configuralrepresentation (see (Yantis, 1992; Bill et al., 2020; Howe et al., 2010b; Meyerhoffet al., 2015; Liu et al., 2005; Maechler et al., 2021)), the role of distractors andpossible distractor suppression, the role of surface features (Papenmeier et al.,2014), and the literature on dual-task paradigms, although a few dual-task pa-pers were mentioned along the way (e.g., Alvarez et al., 2005).If your favorite topic is not covered, possibly you can take some consolation inthe fact that my own favorite topic, the temporal limits on tracking (Holcombeand Chen, 2013; Roudaia and Faubert, 2017; Howard et al., 2011), also was notcovered. I hope to address this topic, which has major implications for whatthe tracking resource actually does during tracking, and whether processing isserial or parallel, in a separate manuscript.

• For computational modelling, I don’t recommend modelling only datafrom standard, fairly unconstrained trajectories, as such data may notconstrain models enough. Show your model successfully mimics both spa-tial crowding and temporal interference effects, including that temporalinterference is more resource-intensive than spatial interference. Anotherstrategy is to model large sets of data at an individual trial level; previousefforts seem to have modelled data from only one or a few different MOTexperiments.

Recommendations

Here are a few recommendations for multiple object tracking researchers thathave emerged from our journey through the literature:

• To dilute the influence of C=1 processes, use several targets, not just twoor three. But remember that even with several targets, a small effect

RECOMMENDATIONS 81

could be explained by a C=1 process, so consider that. Test for hemifieldspecificity as that can also rule out C=1 processes.

• Always test for hemifield specificity! In addition to it helping to ruleout a factor having its effect only on C=1 processes, we know very littleabout what limited-capacity brain processes are hemisphere-specific, soany results here are likely to be interesting.

82 PROGRESS AND RECOMMENDATIONS

Bibliography

Aczel, B., Palfi, B., Szollosi, A., Kovacs, M., Szaszi, B., Szecsi, P., Zrubka, M.,Gronau, Q. F., van den Bergh, D., and Wagenmakers, E.-J. (2018). Quan-tifying Support for the Null Hypothesis in Psychology: An Empirical In-vestigation. Advances in Methods and Practices in Psychological Science,1(3):357–366.

Agosta, S., Magnago, D., Tyler, S., Grossman, E., Galante, E., Ferraro, F.,Mazzini, N., Miceli, G., and Battelli, L. (2017). The Pivotal Role of the RightParietal Lobe in Temporal Attention. Journal of Cognitive Neuroscience,29(5):805–815.

Allport, A. (1987). Selection for action: Some behavioral and neurophysiologicalconsiderations of attention and action. Perspectives on perception and action,15:395–419.

Alnaes, D., Sneve, M. H., Espeseth, T., Pieter, S. H., and Laeng, B. (2014).Pupil size signals mental effort deployed during multiple object tracking andpredicts brain activity in the dorsal attention network and the locus coeruleus.Journal of Vision, 14:1–20.

Alvarez, G., Horowitz, T., Arsenio, H., DiMase, J., and Wolfe, J. (2005). Domultielement visual tracking and visual search draw continuously on the samevisual attention resources? Journal of Experimental Psychology, 31(4):643–667.

Alvarez, G. and Scholl, B. J. (2005). How Does Attention Select and TrackSpatially Extended Objects? New Effects of Attentional Concentration andAmplification. Journal of Experimental Psychology: General, 134(4):461–476.

Alvarez, G. A. and Cavanagh, P. (2005). Independent resources for atten-tional tracking in the left and right visual hemifields. Psychological Science,16(8):637–643.

Alvarez, G. A. and Franconeri, S. L. (2007). How many objects can you track?Evidence for a resource-limited attentive tracking mechanism. Journal ofVision, 7(13):14.1–10.

83

84 BIBLIOGRAPHY

Alvarez, G. A., Gill, J., and Cavanagh, P. (2012). Anatomical constraints on at-tention: Hemifield independence is a signature of multifocal spatial selection.Journal of Vision, 12(5):9–9.

Alvarez, G. A. and Oliva, A. (2009). Spatial ensemble statistics are efficientcodes that can be represented with reduced attention. Proceedings of theNational Academy of Sciences of the United States of America, 106(18):7345–7350.

Alzahabi, R. and Cain, M. S. (2021). Ensemble perception during multiple-object tracking. Attention, Perception, & Psychophysics, 83(3):1263–1274.

Anstis, S. (1990). Imperceptible intersections: The chopstick illusion. In AI andthe Eye. John Wiley, London.

Anstis, S. and Ito, H. (2010). Eyes pursue moving objects, not retinal motionsignals. Perception, 39(10):1408–1411.

Awh, E. and Pashler, H. (2000). Evidence for split attentional foci. Journal ofexperimental psychology. Human perception and performance, 26(2):834–46.

Battelli, L., a Alvarez, G., Carlson, T., and Pascual-Leone, A. (2009). The roleof the parietal lobe in visual extinction studied with transcranial magneticstimulation. Journal of cognitive neuroscience, 21(10):1946–55.

Battelli, L., Cavanagh, P., Intriligator, J., Tramo, M. J., and Barton, J. J. S.(2001). Unilateral Right Parietal Damage Leads to Bilateral Deficit for High-Level Motion. Neuron, 32(1992):985–995.

Battelli, L., Cavanagh, P., Martini, P., and Barton, J. J. S. (2003). Bilateraldeficits of transient visual attention in right parietal patients. Brain: AJournal of Neurology, 126(Pt 10):2164–74.

Bertoni, S., Franceschini, S., Ronconi, L., Gori, S., and Facoetti, A. (2019). Isexcessive visual crowding causally linked to developmental dyslexia? Neu-ropsychologia, 130:107–117.

Bettencourt, K. C., Michalka, S. W., and Somers, D. C. (2011). Shared filter-ing processes link attentional and visual short-term memory capacity limits.Journal of Vision, 11(10):22–22.

Bex, P. J., Dakin, S. C., and Simmers, A. J. (2003). The shape and size ofcrowding for moving targets. Vision Research, 43(27):2895–2904.

Bill, J., Pailian, H., Gershman, S. J., and Drugowitsch, J. (2020). Hierarchicalstructure is employed by humans during visual motion perception. Proceedingsof the National Academy of Sciences.

Bouma, H. (1970). Interaction effects in parafoveal letter recognition. Nature,226(5241):177–178.

BIBLIOGRAPHY 85

Bowers, A. R., Anastasio, R. J., Sheldon, S. S., O’Connor, M. G., Hollis, A. M.,Howe, P. D., and Horowitz, T. S. (2013). Can we improve clinical predictionof at-risk older drivers? Accident Analysis & Prevention.

Burt, P. and Sperling, G. (1981). Time, distance, and feature trade-offs in visualapparent motion. Psychological review, 88(2):171.

Button, K. S., a. Ioannidis, J. P., Mokrysz, C., a. Nosek, B., Flint, J., Robin-son, E. S. J., and Munafò, M. R. (2013). Power failure: Why small samplesize undermines the reliability of neuroscience. Nature Reviews Neuroscience,14(May).

Carlson, T., Alvarez, G., and Cavanagh, P. (2007). Quadrantic deficit revealsanatomical constraints on selection. Proceedings of the National Academy ofSciences of the United States of America, 104(33):13496–500.

Chen, W.-Y., Howe, P. D., and Holcombe, A. O. (2013). Resource demands ofobject tracking and differential allocation of the resource. Attention, percep-tion & psychophysics, 75(4):710–25.

Chesney, D. L. and Haladjian, H. H. (2011). Evidence for a shared mechanismused in multiple-object tracking and subitizing. Attention, Perception, &Psychophysics, 73(8):2457–2480.

Chin, J. M., Pickett, J. T., Vazire, S., and Holcombe, A. O. (2021). QuestionableResearch Practices and Open Science in Quantitative Criminology. Journalof Quantitative Criminology.

Clark, A. (2009). Location, Location, Location. In Cognition, Computation,and Pylyshyn. MIT Press.

Cohen, M., Pinto, Y., Howe, P. D. L., and Horowitz, T. S. (2011). The what-where trade-off in multiple-identity tracking. Attention, perception & psy-chophysics, 73(5):1422–34.

Cohen, M. R. and Maunsell, J. H. (2011). Using neuronal populations to studythe mechanisms underlying spatial and feature attention. Neuron, 70(6):1192–1204.

Cotton, P. L. and Smith, A. T. (2007). Contralateral Visual Hemifield Rep-resentations in the Human Pulvinar Nucleus. Journal of Neurophysiology,98(3):1600–1609.

Cowan, N. (2001). The magical number 4 in short-term memory: A reconsider-ation of mental storage capacity. Behavioral and brain sciences, 24(1):87–114.

Crowe, E. M., Howard, C. J., Attwood, A. S., and Kent, C. (2019). Goal-directedunequal attention allocation during multiple object tracking. Attention, Per-ception, & Psychophysics, 81(5):1312–1326.

86 BIBLIOGRAPHY

Culham, J. C., Cavanagh, P., and Kanwisher, N. G. (2001). Attention responsefunctions: Characterizing brain areas using fMRI activation during paramet-ric variations of attentional load. Neuron, 32(4):737–45.

Davis, G. and Holmes, A. (2005). Reversal of object-based benefits in visualattention. Visual Cognition, 12(5):817–846.

Delvenne, J. (2012). Visual short-term memory and the bilateral field advantage.In Short-Term Memory: New Research. Nova Publishers.

Delvenne, J.-F. (2005). The capacity of visual short-term memory within andbetween hemifields. Cognition, 96(3):B79–88.

Dimond, S. and Beaumont, G. (1971). Use of two cerebral hemispheres toincrease brain capacity. Nature, 232(5308):270–271.

Doran, M. M. and Hoffman, J. E. (2010). The role of visual attention in mul-tiple object tracking: Evidence from ERPs. Attention, Perception, & Psy-chophysics, 72(1):33–52.

Drew, T., Mance, I., Horowitz, T. S., Wolfe, J. M., and Vogel, E. K. (2014).A soft handoff of attention between cerebral hemispheres. Current Biology,24(10):1133–1137.

Eayrs, J. and Lavie, N. (2018). Establishing individual differences in percep-tual capacity. Journal of Experimental Psychology: Human Perception andPerformance, 44(8):1240.

Editors, W. (2021). Multiple object tracking. Wikipedia.

Edwards, G., Berestova, A., and Battelli, L. (2021). Behavioral gain followingisolation of attention. Scientific Reports, 11(1):19329.

Egly, R., Driver, J., and Rafal, R. D. (1994). Shifting visual attention betweenobjects and locations: Evidence from normal and parietal lesion subjects.Journal of Experimental Psychology: General, 123(2):161.

Engbert, R., Longtin, A., and Kliegl, R. (2002). A dynamical model of saccadegeneration in reading based on spatially distributed lexical processing. VisionResearch, 42(5):621–636.

Falkner, A. L., Krishna, B. S., and Goldberg, M. E. (2010). Surround sup-pression sharpens the priority map in the lateral intraparietal area. TheJournal of neuroscience : the official journal of the Society for Neuroscience,30(38):12787–97.

Faubert, J. and Von Grunau, M. (1995). The influence of two spatially dis-tinct primers and attribute priming on motion induction. Vision Research,35(22):3119–30.

BIBLIOGRAPHY 87

Fecteau, J. and Munoz, D. (2006). Salience, relevance, and firing: A prioritymap for target selection. Trends in Cognitive Sciences, 10(8):382–390.

Fehd, H. M. and Seiffert, A. E. (2008). Eye movements during multiple objecttracking: Where do participants look? Cognition, 108(1):201–209.

Feria, C. S. (2013). Speed has an effect on multiple-object tracking indepen-dently of the number of close encounters between targets and distractors.Attention, perception & psychophysics, 75(1):53–67.

Fisher, D. L., Pollatsek, A. P., and Pradhan, A. (2006). Can novice drivers betrained to scan for information that will reduce their likelihood of a crash?Injury Prevention, 12(suppl 1):i25–i29.

Fortenbaugh, F. C., DeGutis, J., Germine, L., Wilmer, J. B., Grosso, M., Russo,K., and Esterman, M. (2015). Sustained Attention Across the Life Span in aSample of 10,000: Dissociating Ability and Strategy. Psychological Science,26(9):1497–1510.

Fougnie, D. and Marois, R. (2006). Distinct Capacity Limits for Attention andWorking Memory: Evidence From Attentive Tracking and Visual WorkingMemory Paradigms. Psychological Science, 17(6):526–534.

Francis, G. and Thunell, E. (2022). Excess Success in Articles on Object-BasedAttention. Attention, Perception & Psychophysics.

Franconeri, S. L. (2013). The nature and status of visual resources. In Reisberg,D., editor, Oxford Handbook of Cognitive Psychology, volume 8481. OxfordUniversity Press.

Franconeri, S. L., Alvarez, G. A., and Cavanagh, P. (2013a). Flexible cognitiveresources : Competitive content maps for attention and memory. Trends inCognitive Sciences, 17(3):134–141.

Franconeri, S. L., Alvarez, G. A., and Cavanagh, P. (2013b). Resource theoryis not a theory: A reply to Holcombe.

Franconeri, S. L., Jonathan, S. V., and Scimeca, J. M. (2010). Tracking multipleobjects is limited only by object spacing, not by speed, time, or capacity.Psychological science, 21(7):920–925.

Franconeri, S. L., Lin, J. Y., Pylyshyn, Z. W., Fisher, B., and Enns, J. T.(2008). Evidence against a speed limit in multiple-object tracking. Psycho-nomic Bulletin & Review, 15(4):802–808.

Goodale, M. A. and Milner, A. (1992). Separate visual pathways for perceptionand action. Trends in Neurosciences, 15(1):20–25.

Gurnsey, R., Roddy, G., and Chanab, W. (2011). Crowding is size and eccen-tricity dependent. Journal of vision, 11:1–17.

88 BIBLIOGRAPHY

Hagler Jr, D. J. and Sereno, M. I. (2006). Spatial maps in frontal and prefrontalcortex. Neuroimage, 29(2):567–577.

Harrison, W. J., Ayeni, A. J., and Bex, P. J. (2019). Attentional selection andillusory surface appearance. Scientific Reports, 9(1):2227.

Harrison, W. J. and Rideaux, R. (2019). Voluntary control of illusory contourformation. Attention, Perception, & Psychophysics, 81(5):1522–1531.

Hayhoe, M. M., Bensinger, D. G., and Ballard, D. H. (1998). Task constraintsin visual working memory. Vision research, 38(1):125–137.

Hedge, C., Powell, G., and Sumner, P. (2018). The reliability paradox: Whyrobust cognitive tasks do not produce reliable individual differences. BehaviorResearch Methods, 50(3):1166–1186.

Hemond, C. C., Kanwisher, N. G., and Op de Beeck, H. P. (2007). A Preferencefor Contralateral Stimuli in Human Object- and Face-Selective Cortex. PLoSONE, 2(6):e574.

Hogendoorn, H., Carlson, T. A., and Verstraten, F. A. (2007). The time courseof attentive tracking. Journal of Vision, 7(14):2–2.

Holcombe, A. O. (2019). Comment: Capacity limits are caused by a finiteresource, not spatial competition.

Holcombe, A. O., Chen, W., and Howe, P. D. L. (2014). Object tracking:Absence of long-range spatial interference supports resource theories. Journalof Vision, 14(6):1–21.

Holcombe, A. O. and Chen, W.-Y. (2012). Exhausting attentional trackingresources with a single fast-moving object. Cognition, 123(2).

Holcombe, A. O. and Chen, W.-y. (2013). Splitting attention reduces temporalresolution from 7 Hz for tracking one object to <3 Hz when tracking three.Journal of Vision, 13(1):1–19.

Holt, J. L. and Delvenne, J.-F. (2015). A bilateral advantage for maintainingobjects in visual short term memory. Acta Psychologica, 154:54–61.

Horowitz, T. and Treisman, A. (1994). Attention and apparent motion. SpatialVision, 8(2):193–220.

Horowitz, T. S., Klieger, S. B., Fencsik, D. E., Yang, K. K., a Alvarez, G., andWolfe, J. M. (2007). Tracking unique objects. Perception & psychophysics,69(2):172–84.

Howard, C. J., Masom, D., and Holcombe, A. O. (2011). Position representa-tions lag behind targets in multiple object tracking. Vision research, pages1–13.

BIBLIOGRAPHY 89

Howe, P. D., Horowitz, T. S., Wolfe, J., and Livingstone, M. S. (2009). UsingfMRI to distinguish components of the multiple object tracking task. Journalof Vision, 9(4):1–11.

Howe, P. D., Incledon, N. C., and Little, D. R. (2012). Can attention be con-fined to just part of a moving object? Revisiting target-distractor merging inmultiple object tracking. PloS one, 7(7):e41491.

Howe, P. D. L., Cohen, M. A., and Horowitz, T. S. (2010a). Distinguishingbetween parallel and serial accounts of multiple object tracking. Journal ofVision, 10:1–13.

Howe, P. D. L. and Ferguson, A. (2015). The Identity-Location Binding Prob-lem. Cognitive Science, 39(7):1622–1645.

Howe, P. D. L., Holcombe, A. O., Lapierre, M. D., and Cropper, S. J. (2013). Vi-sually tracking and localizing expanding and contracting objects. Perception,42(12):1281–1300.

Howe, P. D. L., Pinto, Y., and Horowitz, T. S. (2010b). The coordinate systemsused in visual tracking. Vision research, 50(23):2375–80.

Huang, L., Mo, L., and Li, Y. (2012). Measuring the interrelations among multi-ple paradigms of visual attention: An individual differences approach. Journalof experimental psychology: human perception and performance, 38(2):414.

Hudson, C., Howe, P. D., and Little, D. R. (2012). Hemifield effects in multipleidentity tracking. PloS one, 7(8):e43796.

Hung, G. K., Wilder, J., Curry, R., and Julesz, B. (1995). Simultaneous betterthan sequential for brief presentations. Journal of the Optical Society ofAmerica. A, Optics, image science, and vision, 12(3):441–9.

Hyönä, J., Li, J., and Oksama, L. (2019). Eye Behavior During Multiple ObjectTracking and Multiple Identity Tracking. Vision, 3(3):37.

Intriligator, J. and Cavanagh, P. (2001). The spatial resolution of visual atten-tion. Cognitive Psychology, 43(3):171–216.

Iwaniuk, A. N., Dean, K. M., and Nelson, J. E. (2005). Interspecific allometryof the brain and brain regions in parrots (Psittaciformes): Comparisons withother birds and primates. Brain, Behavior and Evolution, 65(1):40–59.

Johansson, G. (1973). Visual perception of biological motion and a model forits analysis. Perception & Psychophysics, pages 201–211.

John, L. K., Loewenstein, G., and Prelec, D. (2012). Measuring the Preva-lence of Questionable Research Practices With Incentives for Truth Telling.Psychological Science, 23(5):524–532.

90 BIBLIOGRAPHY

Joo, S. J., White, A. L., Strodtman, D. J., and Yeatman, J. D. (2018). Op-timizing text for an individual’s visual system: The contribution of visualcrowding to reading difficulties. Cortex, 103:291–301.

Jovicich, J., Peters, R. J., Koch, C., Braun, J., Chang, L., and Ernst, T. (2001).Brain areas specific for attentional load in a motion-tracking task. Journal ofcognitive neuroscience, 13(8):1048–58.

Kahneman, D., Treisman, A., and Gibbs, B. J. (1992). The reviewing of ob-ject files: Object-specific integration of information. Cognitive Psychology,24(2):175–219.

Kennedy, G. J., Tripathy, S. P., and Barrett, B. T. (2009). Early age-relateddecline in the effective number of trajectories tracked in adult human vision.Journal of Vision, 9(2):21–21.

Kimchi, R. and Peterson, M. A. (2008). Figure-ground segmentation can occurwithout attention. Psychological Science, 19(7):660–668.

Kolers, P. A. and Pomerantz, J. R. (1971). Figural change in apparent motion.Journal of Experimental Psychology, 87(1):99.

Korte, W. (1923). über die Gestaltauffassung im indirekten Sehen. Zeitschriftfür Psychologie, 93:17–82.

Kubovy, M., Holcombe, A. O., and Wagemans, J. (1998). On the lawfulness ofgrouping by proximity. Cognitive psychology, 35(1):71–98.

Li, J., Oksama, L., and Hyönä, J. (2019). Model of Multiple Identity Track-ing (MOMIT) 2.0: Resolving the serial vs. parallel controversy in tracking.Cognition, 182:260–274.

Liu, G., Austen, E. L., Booth, K. S., Fisher, B. D., Argue, R., Rempel, M. I.,and Enns, J. T. (2005). Multiple-Object Tracking Is Based on Scene, NotRetinal, Coordinates. Journal of experimental psychology. Human perceptionand performance, 31(2):235–247.

Liu, T., Jiang, Y., Sun, X., and He, S. (2009). Reduction of the crowding effectin spatially adjacent but cortically remote visual stimuli. Current biology,19(2):127–32.

Lo, S.-Y. and Holcombe, A. O. (2014). How do we select multiple features?Transient costs for selecting two colors rather than one, persistent costsfor color–location conjunctions. Attention, Perception, & Psychophysics,76(2):304–321.

Lochner, M. J. and Trick, L. M. (2014). Multiple-object tracking while driving:The multiple-vehicle tracking task. Attention, perception & psychophysics.

BIBLIOGRAPHY 91

Lou, H., Lorist, M. M., and Pilz, K. S. (2020). Individual differences in thetemporal dynamics of object-based attention and the rhythmic sampling ofvisual space.

Lovett, A., Bridewell, W., and Bello, P. (2019). Selection enables enhancement:An integrated model of object tracking. Journal of Vision, 19(14):23.

Luck, S. J., Hillyard, S. A., Mangun, G. R., and Gazzaniga, M. S. (1989).Independent hemispheric attentional systems mediate visual search in split-brain patients. Nature, 342(6249):543–545.

Luck, S. J., Hillyard, S. A., Mangun, G. R., and Gazzaniga, M. S. (1994).Independent attentional scanning in the separated hemispheres of split-brainpatients. Journal of cognitive neuroscience, 6(1):84–91.

Lukavskỳ, J. (2013). Eye movements in repeated multiple object tracking. Jour-nal of Vision, 13(7):9–9.

Lunghi, C., Burr, D. C., and Morrone, C. (2011). Brief periods of monoculardeprivation disrupt ocular balance in human adult visual cortex. CurrentBiology, 21(14):R538–R539.

Mackenzie, A. K. and Harris, J. M. (2017). A link between attentional func-tion, effective eye movements, and driving ability. Journal of experimentalpsychology: human perception and performance, 43(2):381.

Mackenzie, A. K., Vernon, M. L., Cox, P. R., Crundall, D., Daly, R. C., Guest,D., Muhl-Richardson, A., and Howard, C. J. (2021). The Multiple ObjectAvoidance (MOA) task measures attention for action: Evidence from drivingand sport. Behavior Research Methods.

Maechler, M. R., Cavanagh, P., and Tse, P. U. (2021). Attentional tracking takesplace over perceived rather than veridical positions. Attention, Perception, &Psychophysics.

Makovski, T. and Jiang, Y. V. (2009). Feature binding in attentive tracking ofdistinct objects. Visual Cognition, 17(1-2):180–194.

Mareschal, I., Morgan, M. J., and Solomon, J. A. (2010). Attentional modula-tion of crowding. Vision Research, 50(8):805–809.

Maruya, K., Holcombe, A. O., and Nishida, S. (2013). Rapid encoding of rela-tionships between spatially remote motion signals. Journal of Vision, 13(4):1–20.

Matthews, N. and Welch, L. (2015). Left visual field attentional advantage injudging simultaneity and temporal order. Journal of Vision, 15(2):7.

McConkie, G. W. (1979). On the role and control of eye movements in reading.In Processing of Visible Language, pages 37–48. Springer.

92 BIBLIOGRAPHY

Merkel, C., Stoppel, C. M., Hillyard, S. A., Heinze, H.-J., Hopf, J.-M., andSchoenfeld, M. A. (2014). Spatio-temporal patterns of brain activity distin-guish strategies of multiple-object tracking. Journal of cognitive neuroscience,26(1):28–40.

Mesulam, M.-M. (1999). Spatial attention and neglect: Parietal, frontal andcingulate contributions to the mental representation and attentional target-ing of salient extrapersonal events. Philosophical Transactions of the RoyalSociety of London. Series B: Biological Sciences, 354(1387):1325–1346.

Meyerhoff, H. S. and Papenmeier, F. (2020). Individual differences in visualattention: A short, reliable, open-source, and multilingual test of multipleobject tracking in PsychoPy. Behavior Research Methods, 52(6):2556–2566.

Meyerhoff, H. S., Papenmeier, F., Jahn, G., and Huff, M. (2015). DistractorLocations Influence Multiple Object Tracking Beyond Interobject Spacing:Evidence From Equidistant Distractor Displacements. Experimental Psychol-ogy, 62(3):170–180.

Miller, G. A. (1956). The magical number seven, plus or minus two: Some limitson our capacity for processing information. Psychological review, 63(2):81.

Minami, T., Shinkai, T., and Nakauchi, S. (2019). Hemifield Crossings duringMultiple Object Tracking Affect Task Performance and Steady-State VisualEvoked Potentials. Neuroscience, 409:162–168.

Nakayama, K., He, Z. J., and Shimojo, S. (1995). Visual surface representation:A critical link between lower-level and higher-level vision. Visual cognition:An invitation to cognitive science, 2:1–70.

Nakayama, R. and Holcombe, A. O. (2021). A dynamic noise background revealsperceptual motion extrapolation: The twinkle-goes illusion. Journal of Vision,21(11):14.

Neisser, U. (1963). Decision-Time without Reaction-Time: Experiments in Vi-sual Scanning. The American Journal of Psychology, 76(3):376.

Ngiam, W. X., Khaw, K. L., Holcombe, A. O., and Goodbourn, P. T. (2019).Visual working memory for letters varies with familiarity but not complex-ity. Journal of Experimental Psychology: Learning, Memory, and Cognition,45(10):1761.

Norman, D. A. and Bobrow, D. G. (1975). On data-limited and resource-limitedprocesses. Cognitive Psychology, 7:44–64.

Nummenmaa, L., Oksama, L., Glerean, E., and Hyönä, J. (2017). CorticalCircuit for Binding Object Identity and Location During Multiple-ObjectTracking. Cerebral Cortex, 27(1):162–172.

BIBLIOGRAPHY 93

Oberauer, K. (2002). Access to information in working memory: Exploring thefocus of attention. Journal of Experimental Psychology: Learning, Memory,and Cognition, 28(3):411.

Oberauer, K., Lewandowsky, S., Awh, E., Brown, G. D., Conway, A., Cowan,N., Donkin, C., Farrell, S., Hitch, G. J., and Hurlstone, M. J. (2018). Bench-marks for models of short-term and working memory. Psychological bulletin,144(9):885.

Oksama, L. and Hyönä, J. (2004). Is multiple object tracking carried out au-tomatically by an early vision mechanism independent of higher-order cogni-tion? An individual difference approach. Visual Cognition, 11(5):631–671.

Oksama, L. and Hyönä, J. (2016). Position tracking and identity tracking areseparate systems: Evidence from eye movements. Cognition, 146:393–409.

Ongchoco, J. D. K. and Scholl, B. J. (2019). How to Create Objects With YourMind: From Object-Based Attention to Attention-Based Objects. Psycholog-ical Science, 30(11):1648–1655.

O’Regan, J. K. (1992). Solving the” real” mysteries of visual perception: Theworld as an outside memory. Canadian Journal of Psychology/Revue cana-dienne de psychologie, 46(3):461.

Pailian, H., Carey, S. E., Halberda, J., and Pepperberg, I. M. (2020). Age andSpecies comparisons of Visual Mental Manipulation Ability as evidence forits Development and evolution. Scientific reports, 10(1):1–7.

Palmer, J. (1995). Attention in Visual Search: Distinguishing Four Causes of aSet-Size Effect. Current Directions in Psychological Science, 4(4):118–123.

Papale, P., Zuiderbaan, W., Teeuwen, R. R. M., Gilhuis, A., Self, M. W., Roelf-sema, P. R., and Dumoulin, S. O. (2021). The influence of objecthood on therepresentation of natural images in the visual cortex.

Papenmeier, F., Meyerhoff, H. S., Jahn, G., and Huff, M. (2014). Tracking bylocation and features: Object correspondence across spatiotemporal disconti-nuities during multiple object tracking. Journal of Experimental Psychology:Human Perception and Performance, 40(1):159.

Pelli, D. G. and Tillman, K. A. (2008). The uncrowded window of objectrecognition. Nature neuroscience, 11(10):1129–1135.

Peter, U. T. (2005). Voluntary attention modulates the brightness of overlappingtransparent surfaces. Vision research, 45(9):1095–1098.

Peterson, M. A. (2014). Low-level and High-level Contributions to Figure–Ground Organization. Oxford University Press.

Petrov, Y. and Meleshkevich, O. (2011). Asymmetries and idiosyncratic hotspots in crowding. Vision research, 51(10):1117–1123.

94 BIBLIOGRAPHY

Piazza, M. (2010). Neurocognitive start-up tools for symbolic number represen-tations. Trends in Cognitive Sciences, 14(12):542–551.

Pilz, K. S., Roggeveen, A. B., Creighton, S. E., Bennett, P. J., and Sekuler, A. B.(2012). How Prevalent Is Object-Based Attention? PLoS ONE, 7(2):e30693.

Proffitt, D. R., Kaiser, M. K., and Whelan, S. M. (1990). Understanding wheeldynamics. Cognitive Psychology, 22(3):342–373.

Pylyshyn, Z. (1989). The role of location indexes in spatial perception: A sketchof the FINST spatial-index model. Cognition, 32(1):65–97.

Pylyshyn, Z. (2004). Some puzzling findings in multiple object tracking:I. Tracking without keeping track of object identities. Visual cognition,11(7):801–822.

Pylyshyn, Z., Burkell, J., Fisher, B., Sears, C., Schmidt, W., and Trick, L.(1994). Multiple parallel access in visual attention. Canadian Journal ofExperimental Psychology/Revue canadienne de psychologie expérimentale,48(2):260.

Pylyshyn, Z. W. (2006). Seeing and Visualizing: It’s Not What You Think. Lifeand Mind. MIT Press, Cambridge, Mass., 1. mit press paperback ed edition.

Pylyshyn, Z. W. and Storm, R. W. (1988). Tracking multiple independenttargets: Evidence for a parallel tracking mechanism. Spatial Vision, 3(3):179–197.

Rabelo, A. L. A., Farias, J. E. M., Sarmet, M. M., Joaquim, T. C. R., Hoersting,R. C., Victorino, L., Modesto, J. G. N., and Pilati, R. (2020). Questionableresearch practices among Brazilian psychological researchers: Results from areplication study and an international comparison. International Journal ofPsychology, 55(4):674–683.

Redick, T. S. and Engle, R. W. (2006). Working memory capacity and attentionnetwork test performance. Applied Cognitive Psychology: The Official Journalof the Society for Applied Research in Memory and Cognition, 20(5):713–721.

Reichle, E. D., Liversedge, S. P., Pollatsek, A., and Rayner, K. (2009). Encodingmultiple words simultaneously in reading is implausible. Trends in CognitiveSciences, 13(February):115–119.

Rensink, R. (2000). Visual Search for Change: A Probe into the Nature ofAttentional Processing. Visual Cognition, 7(1):345–376.

Rensink, R. A., Regan, J. K. O., and Clark, J. J. (1997). To see or not to see:The Need for Attention to Perceive Changes in Scenes. Psychological Science,8(5):1–6.

Revkin, S. K., Piazza, M., Izard, V., Cohen, L., and Dehaene, S. (2008). Doessubitizing reflect numerical estimation? Psychological science, 19(6):607–614.

BIBLIOGRAPHY 95

Rizzolatti, G., Umiltà, C., and Berlucchi, G. (1971). Opposite superioritiesof the right and left cerbral hemispheres in discriminative reaction time tophysiognomical and alphabetical material. Brain: A Journal of Neurology.

Robinson, M. M., Benjamin, A. S., and Irwin, D. E. (2020). Is there a K incapacity? Assessing the structure of visual short-term memory. CognitivePsychology, 121:101305.

Roelfsema, P. R., Lamme, V. A., and Spekreijse, H. (1998). Object-basedattention in the primary visual cortex of the macaque monkey. Nature,395(6700):376–381.

Roudaia, E. and Faubert, J. (2017). Different effects of aging and gender on thetemporal resolution in attentional tracking. Journal of Vision, 17(11):1.

Saenz, M., Buracas, G. T., and Boynton, G. M. (2002). Global effects of feature-based attention in human visual cortex. Nature neuroscience, 5(7):631–2.

Sàenz, M., Buraĉas, G. T., and Boynton, G. M. (2003). Global feature-basedattention for motion and color. Vision research, 43(6):629–37.

Saiki, J. (2002). Multiple-object permanence tracking: Limitation in mainte-nance and transformation of perceptual objects. Progress in brain research,140:133–148.

Saiki, J. (2019). Robust color-shape binding representations for multiple objectsin visual working memory. Journal of Experimental Psychology: General,148(5):905–925.

Schneider, K. A. and Kastner, S. (2005). Visual Responses of the Human Supe-rior Colliculus: A High-Resolution Functional Magnetic Resonance ImagingStudy. Journal of Neurophysiology, 94(4):2491–2503.

Scholl, B. (2001). Objects and attention: The state of the art. Cognition,80(1/2):1–46.

Scholl, B. J. (2008). What Have We Learned about Attention from Multiple-Object Tracking ( and Vice Versa )? In Computation, Cognition, andPylyshyn, pages 49–78. MIT Press.

Scholl, B. J., Pylyshyn, Z. W., and Feldman, J. (2001). What is a visual object?Evidence from target merging in multiple object tracking. Cognition, 80(1-2):159–77.

Schönbrodt, F. D. and Perugini, M. (2013). At what sample size do correlationsstabilize? Journal of Research in Personality, 47(5):609–612.

Sekuler, R., McLaughlin, C., and Yotsumoto, Y. (2008). Age-related changes inattentional tracking of multiple moving objects. Perception, 37(6):867–876.

96 BIBLIOGRAPHY

Sereno, A. B. and Kosslyn, S. M. (1991). Discrimination within and betweenhemifields: A new constraint on theories of attention. Neuropsychologia,29(7):659–675.

Sereno, M. I., Pitzalis, S., and Martinez, A. (2001). Mapping of contralateralspace in retinotopic coordinates by a parietal cortical area in humans. Science,294(5545):1350–1354.

Shiffrin, R. M. and Gardner, G. T. (1972). Visual processing capacity andattentional control. Journal of experimental psychology, 93(1):72.

Shih, S.-I. and Sperling, G. (1996). Is there feature-based attentional selectionin visual search? Journal of Experimental Psychology: Human Perceptionand Performance, 22(3):758.

Shim, W. M., a. Alvarez, G., and Jiang, Y. V. (2008). Spatial separation betweentargets constrains maintenance of attention on multiple objects. PsychonomicBulletin & Review, 15(2):390–397.

Shim, W. M., a Alvarez, G., Vickery, T. J., and Jiang, Y. V. (2010). The numberof attentional foci and their precision are dissociated in the posterior parietalcortex. Cerebral cortex, 20(6):1341–9.

Shomstein, S. and Behrmann, M. (2008). Object-based attention: Strength ofobject representation and attentional guidance. Perception & psychophysics,70(1):132–144.

Shomstein, S. and Yantis, S. (2002). Object-based attention: Sensory modula-tion or priority setting? Perception & psychophysics, 64(1):41–51.

Simon, H. A. (1969). The Sciences of the Artificial, Reissue of the Third Editionwith a New Introduction by John Laird. MIT Press Academic, Cambridge,MA, 3rd edition edition.

Simons, D. J., Boot, W. R., Charness, N., Gathercole, S. E., Chabris, C. F.,Hambrick, D. Z., and Stine-Morrow, E. A. (2016). Do “brain-training” pro-grams work? Psychological Science in the Public Interest, 17(3):103–186.

Sternberg, S. (1969). The discovery of processing stages: Extensions of Donders’method. Acta psychologica, 30:276–315.

Störmer, V. S., a Alvarez, G., and Cavanagh, P. (2014). Within-hemifield com-petition in early visual areas limits the ability to track multiple objects withattention. The Journal of neuroscience : the official journal of the Societyfor Neuroscience, 34(35):11526–33.

Strasburger, H. (2014). Dancing letters and ticks that buzz around aimlessly:On the origin of crowding. Perception, 43(9):963–976.

BIBLIOGRAPHY 97

Strong, R. W. and Alvarez, G. A. (2020). Hemifield-specific control of spa-tial attention and working memory: Evidence from hemifield crossover costs.Journal of Vision, 20(8):24.

Tadin, D., Lappin, J. S., Blake, R., and Grossman, E. D. (2002). What consti-tutes an efficient reference frame for vision? Nature Neuroscience, 5(10):1010–1015.

Tombu, M. and Seiffert, A. E. (2008). Attentional costs in multiple-objecttracking. Cognition, 108:1–25.

Tombu, M. and Seiffert, A. E. (2011). Tracking planets and moons: Mechanismsof object tracking revealed with a new paradigm. Attention, Perception, &Psychophysics.

Townsend, J. T. (1990). Serial vs. parallel processing: Sometimes they looklike Tweedledum and Tweedledee but they can (and should) be distinguished.Psychological Science, 1(1):46–54.

Treisman, A. and Gelade, G. (1980). A feature integration theory of attention.Cognitive Psychology, 12:97–136.

Treisman, A. M. (1964). Verbal Cues, Language, and Meaning in SelectiveAttention. The American Journal of Psychology, 77(2):206.

Treviño, M., Zhu, X., Lu, Y. Y., Scheuer, L. S., Passell, E., Huang, G. C., Ger-mine, L. T., and Horowitz, T. S. (2021). How do we measure attention? Us-ing factor analysis to establish construct validity of neuropsychological tests.Cognitive Research: Principles and Implications, 6(1):51.

Trick, L. M., Mutreja, R., and Hunt, K. (2012). Spatial and visuospatial work-ing memory tests predict performance in classic multiple-object tracking inyoung adults, but nonspatial measures of the executive do not. Attention,Perception, & Psychophysics, 74(2):300–311.

Trick, L. M., Perl, T., and Sethi, N. (2005). Age-related differences in multiple-object tracking. The Journals of Gerontology Series B: Psychological Sciencesand Social Sciences, 60(2):P102–P105.

Triesch, J., Ballard, D. H., Hayhoe, M. M., and Sullivan, B. T. (2003). Whatyou see is what you need. Journal of Vision, 3(1):9–9.

Tse, P., Cavanagh, P., and Nakayama, K. (1998). The role of parsing in high-level motion processing. High-level motion processing: Computational, neu-robiological, and psychophysical perspectives, pages 249–266.

Tsotsos, J. K., Culhane, S. M., Wai, W., Lai, Y., Davis, N., and Nuflo, F.(1995). Modeling visual attention via selective tuning. Artificial Intelligence,78:507–545.

98 BIBLIOGRAPHY

Tsotsos, J. K., Rodríguez-Sánchez, A. J., Rothenstein, A. L., and Simine, E.(2008). The different stages of visual recognition need different attentionalbinding strategies. Brain research, 1225(2007):119–32.

Umemoto, A., Drew, T., Ester, E. F., and Awh, E. (2010). A bilateral advantagefor storage in visual working memory. Cognition, 117(1):69–79.

Van der Burg, E., Cass, J., and Theeuwes, J. (2019). Changes (but not differ-ences) in motion direction fail to capture attention. Vision Research, 165:54–63.

VanMarle, K. and Scholl, B. J. (2003). Attentive tracking of objects versussubstances. Psychological Science, 14(5):498–504.

Vater, C., Gray, R., and Holcombe, A. O. (2021). A critical systematic reviewof the Neurotracker perceptual-cognitive training tool. Psychonomic Bulletin& Review.

Vater, C., Kredel, R., and Hossner, E.-J. (2017). Disentangling vision andattention in multiple-object tracking: How crowding and collisions affect gazeanchoring and dual-task performance. Journal of vision, 17(5):21–21.

Vogel, E. K., Woodman, G. F., and Luck, S. J. (2006). The time course ofconsolidation in visual working memory. Journal of experimental psychology.Human perception and performance, 32(6):1436–51.

Wang, L., Zhang, K., He, S., and Jiang, Y. (2010). Searching for Life MotionSignals: Visual Search Asymmetry in Local but Not Global Biological-MotionProcessing. Psychological Science, 21(8):1083–1089.

Wannig, A., Stanisor, L., and Roelfsema, P. R. (2011). Automatic spread of at-tentional response modulation along Gestalt criteria in primary visual cortex.Nature neuroscience, 14(10):1243–1244.

Warren, P. A. and Rushton, S. K. (2007). Perception of object trajectory:Parsing retinal motion into self and object movement components. Journalof Vision, 7(11):2.

Wertheimer, M. (1912). Experimentelle Studien über das Sehen von Bewegung.Zeitschrift für Psychologie, 61:161–165.

White, A. L. and Carrasco, M. (2011). Feature-based attention involuntarilyand simultaneously improves visual performance across locations. Journal ofVision, 11(6):1–10.

White, A. L., Palmer, J., and Boynton, G. M. (2018). Evidence of serial pro-cessing in visual word recognition. Psychological science, 29(7):1062–1071.

White, A. L., Palmer, J., Boynton, G. M., and Yeatman, J. D. (2019). Parallelspatial channels converge at a bottleneck in anterior word-selective cortex.Proceedings of the National Academy of Sciences, 116(20):10087–10096.

BIBLIOGRAPHY 99

Wilbiks, J. M. P. and Beatteay, A. (2020). Individual differences in multipleobject tracking, attentional cueing, and age account for variability in thecapacity of audiovisual integration. Attention, Perception, & Psychophysics.

Wolfe, J. M. (2021). Guided Search 6.0: An updated model of visual search.Psychonomic Bulletin & Review, 28(4):1060–1092.

Wolfe, J. M. and Bennett, S. C. (1997). Preattentive object files: Shapelessbundles of basic features. Vision research, 37(1):25–43.

Wolford, G. (1975). Perturbation model for letter identification. Psychologicalreview, 82(3):184.

Wyatte, D., Jilk, D. J., and O’Reilly, R. C. (2014). Early recurrent feedbackfacilitates visual object recognition under challenging conditions. Frontiers inPsychology, 5:674.

Xu, Y. and Franconeri, S. L. (2015). Capacity for Visual Features in MentalRotation. Psychological Science, 26(8):1241–1251.

Yantis, S. (1992). Multielement visual tracking: Attention and perceptual or-ganization. Cognitive psychology, 24(3):295–340.

Yilmaz, A., Javed, O., and Shah, M. (2006). Object tracking: A survey. ACMComputing Surveys, 38(4):13.

Zelinsky, G. J. and Neider, M. B. (2008). An eye movement analysis of multipleobject tracking in a realistic environment. Visual Cognition, 16(5):553–566.

Zelinsky, G. J. and Todor, A. (2010). The role of “rescue saccades” in trackingobjects through occlusions. Journal of Vision, 10(14):29–29.

Zhang, J. and Mueller, S. T. (2005). A note on ROC analysis and non-parametricestimate of sensitivity. Psychometrika, 70(1):203–212.

Zylberberg, A., Fernández Slezak, D., Roelfsema, P. R., Dehaene, S., and Sig-man, M. (2010). The brain’s router: A cortical network model of serialprocessing in the primate brain. PLoS computational biology, 6(4):e1000765.

Attending to moving objects - PsyArXiv

Documents