Top Banner
Machine Vision & Applications manuscript No. (will be inserted by the editor) Computer Vision-Based Gesture Tracking, Object Tracking, and 3D Reconstruction for Augmented Desks Thad Starner , Bastian Leibe , David Minnen , Tracy Westyn , Amy Hurst , and Justin Weeks Contextual Computing Group, GVU Center Georgia Insititute of Technology e-mail: thad,dminn,turtle,sloopy,joostan @cc.gatech.edu Perceptual Computing and Computer Vision Group, ETH Zurich Haldeneggsteig 4, CH-8092 Zurich, Switzerland e-mail: [email protected] The date of receipt and acceptance will be inserted by the editor Abstract The Perceptive Workbench endeavors to create a spontaneous and unimpeded interface between the physical and virtual worlds. Its vision-based methods for interaction constitute an alternative to wired input devices and tethered tracking. Objects are recognized and tracked when placed on the display surface. By using multiple infrared light sources, the object’s 3D shape can be captured and inserted into the virtual interface. This ability permits spontaneity since ei- ther preloaded objects or those objects selected at run-time by the user can become physical icons. Integrated into the same vision-based interface is the ability to identify 3D hand position, pointing direction, and sweeping arm gestures. Such gestures can enhance selection, manipulation, and navigation tasks. The Perceptive Workbench has been used for a vari- ety of applications, including augmented reality gaming and terrain navigation. This paper focuses on the techniques used in implementing the Perceptive Workbench andthe system’s performance. Key words gesture – 3D object reconstruction – tracking – computer vision – virtual reality 1 Introduction Humans and computers have interacted primarily through de- vices that are constrained by wires. Typically, the wires limit the distance of movement and inhibit freedom of orientation. In addition, most interactions are indirect. The user moves a device as an analogue for the action to be created in the display space. We envision an untethered interface that ac- cepts gestures directly and can accept any objects the user chooses as interactors. In this paper, we apply our goal to workbenches, large tables, which serve simultaneously as pro- jection display and as interaction surface. Originally proposed in 1995 by Krueger et al [15], they are now widely used in virtual reality and visualization applications. Computer vision can provide the basis for untethered in- teraction because it is flexible, unobtrusive, and allows direct interaction. Since the complexity of general vision tasks has often been a barrier to widespread use in real-time applica- tions, we simplify the task by using a shadow-based architec- ture. An infrared light source is mounted on the ceiling. When the user stands in front of the workbench and extends an arm over the surface, the arm casts a shadow on the desk’s surface, which can be easily distinguished by a camera underneath. The same shadow-based architecture is used in the Per- ceptive Workbench [19,18] to reconstruct 3D virtual repre- sentations of previously unseen real-world objects placed on the desk’s surface. In addition, the Perceptive Workbench can illuminate objects placed on the desk’s surface to identify and track the objects as the user manipulates them. Taking its cues from the user’s actions, the Perceptive Workbench switches between these three modes automatically. Computer vision controls all interaction, freeing the user from the tethers of traditional sensing techniques. In this paper, we will discuss implementation and perfor- mance aspects that are important to making the Perceptive Workbench a useful input technology for virtual reality. We will examine performance requirements and show how our system is being optimized to meet them. 2 Related Work While the Perceptive Workbench [19] is unique in its ability to interact with the physical world, it has a rich heritage of re- lated work [1,14,15,23,26,34,35,37,43]. Many augmented desk and virtual reality designs use tethered props, tracked by electromechanical or ultrasonic means, to encourage interac- tion through gesture and manipulation of objects [3,1,26,32,
13

Computer Vision-BasedGesture Tracking, Object Tracking ...wearables.cc.gatech.edu/publications/starner-mva2002.pdf · terrain navigation. This paper focuses on the techniques used

Jul 17, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computer Vision-BasedGesture Tracking, Object Tracking ...wearables.cc.gatech.edu/publications/starner-mva2002.pdf · terrain navigation. This paper focuses on the techniques used

Machine Vision & Applications manuscript No.(will be� insertedby theeditor)

Computer Vision-BasedGesture Tracking, Object Tracking, and 3DReconstructionfor AugmentedDesks

Thad Starner�, Bastian Leibe

�, David Minnen

�, Tracy Westyn

�, Amy Hurst

�, and Justin Weeks

��

Contextual ComputingGroup,GVU CenterGeorgia Insitituteof Technologye-mail:

�thad,dminn,turtle,sloopy,joostan � @cc.gatech.edu

�PerceptualComputingandComputerVision Group,ETH ZurichHaldeneggsteig4, CH-8092Zurich,Switzerlande-mail:[email protected]

Thedateof receiptandacceptancewill beinsertedby theeditor

Abstract ThePerceptive Workbenchendeavors to createaspontaneousandunimpeded interfacebetweenthe physicalandvirtual worlds. Its vision-basedmethodsfor interactionconstitutean alternative to wired input devices andtetheredtracking. Objectsarerecognizedandtrackedwhenplacedonthedisplaysurface. By usingmultiple infraredlight sources,the object’s 3D shapecanbe capturedandinsertedinto thevirtual interface.This ability permitsspontaneity since ei-ther preloadedobjectsor thoseobjectsselectedat run-timeby the usercan become physical icons. Integratedinto thesamevision-basedinterfaceis theability to identify 3D handposition,pointing direction,andsweepingarmgestures.Suchgesturescanenhanceselection,manipulation,andnavigationtasks.The Perceptive Workbenchhasbeenusedfor a vari-ety of applications, including augmentedreality gaming andterrainnavigation. Thispaperfocusesonthetechniquesusedin implementing thePerceptive Workbenchandthesystem’sperformance.

Keywords gesture– 3D objectreconstruction– tracking–computervision– virtual reality

1 Intr oduction

Humansandcomputershaveinteractedprimarily through de-vicesthatareconstrainedby wires.Typically, thewireslimitthedistanceof movementandinhibit freedomof orientation.In addition, most interactions are indirect. The usermovesa device as an analogue for the action to be createdin thedisplayspace.We envision an untetheredinterfacethat ac-ceptsgesturesdirectly andcan acceptany objectsthe userchoosesas interactors. In this paper, we apply our goal toworkbenches,largetables,whichservesimultaneouslyaspro-jectiondisplayandasinteractionsurface.Originallyproposed

in 1995by Kruegeret al [15], they arenow widely usedinvirtual realityandvisualizationapplications.

Computer vision canprovide thebasisfor untetheredin-teractionbecauseit is flexible, unobtrusive,andallowsdirectinteraction. Sincethecomplexity of generalvision taskshasoften beena barrier to widespreadusein real-timeapplica-tions,wesimplify thetaskby usingashadow-basedarchitec-ture.

An infrared light sourceis mountedon theceiling.Whentheuserstandsin front of theworkbenchandextends anarmoverthesurface,thearmcastsashadow onthedesk’ssurface,whichcanbeeasilydistinguishedby a camera underneath.

The sameshadow-basedarchitectureis usedin the Per-ceptive Workbench [19,18] to reconstruct3D virtual repre-sentationsof previously unseenreal-world objectsplacedonthedesk’ssurface.In addition, thePerceptiveWorkbenchcanilluminateobjectsplacedonthedesk’ssurfaceto identifyandtracktheobjectsastheusermanipulatesthem.Takingits cuesfrom the user’s actions, the Perceptive Workbench switchesbetweenthesethreemodes automatically. Computervisioncontrols all interaction, freeingthe userfrom the tethersoftraditionalsensingtechniques.

In this paper, we will discussimplementationandperfor-manceaspectsthat are important to making the PerceptiveWorkbench a useful input technology for virtual reality. Wewill examine performance requirementsandshow how oursystemis beingoptimizedto meetthem.

2 RelatedWork

While thePerceptive Workbench [19] is uniquein its abilityto interactwith thephysicalworld, it hasarich heritageof re-latedwork [1,14,15,23,26,34,35,37,43]. Many augmenteddeskandvirtual realitydesignsusetetheredprops,trackedbyelectromechanicalor ultrasonicmeans,to encourageinterac-tion through gesture andmanipulationof objects[3,1,26,32,

Page 2: Computer Vision-BasedGesture Tracking, Object Tracking ...wearables.cc.gatech.edu/publications/starner-mva2002.pdf · terrain navigation. This paper focuses on the techniques used

2 ThadStarneretal.

37]. Suchdesigns tethertheuserto thedeskandrequirethetime-consumingritual of donning anddoffing theappropriateequipment.

Fortunately, thecomputervisioncommunity hastakenupthe task of tracking hands and identifying gestures.Whilegeneralizedvisionsystemstrackthebodyin room- anddesk-basedscenariosfor games,interactiveart,andaugmenteden-vironments[2,44], the reconstructionof fine handdetail in-volves carefully calibrated systemsand is computationallyintensive [22]. Even so, complicatedgesturessuchasthoseusedin signlanguage[31,38] or themanipulationof physicalobjects[28] canbe recognized.The Perceptive Workbenchusessuchcomputer vision techniquesto maintaina wirelessinterface.

Mostdirectlyrelatedto thePerceptiveWorkbench, UllmerandIshii’s“Metadesk” identifiesandtracksobjectsplacedonthedesk’s displaysurfaceusinga near-infraredcomputer vi-sion recognizer, originally designed by Starner[34]. Unfor-tunately, sincenot all objects reflectinfraredlight andsinceinfraredshadowsarenotused,objectsoftenneedinfraredre-flective “hot mirrors” placedin patterns on their bottomsur-facesto aid tracking andidentification. Similarly, RekimotoandMatsushita’s “Perceptual Surfaces”[23] employ 2D bar-codesto identify objectsheld against the “HoloWall” and“HoloTable.” In addition, the HoloWall cantrack the user’shands(orother bodyparts)nearorpressedagainst its surface,but its potentialrecovery of theuser’s distancefrom thesur-faceis relatively coarsecomparedto the3D pointing gesturesof thePerceptiveWorkbench.Davis andBobick’sSIDEshow[6] is similar to the Holowall except that it usescastshad-ows in infraredfor full-body 2D gesture recovery. Someaug-menteddeskshave camerasandprojectorsabove thesurfaceof thedesk;they aredesignedto augmenttheprocessof han-dling paperor interactwith modelsandwidgetsthrough theuseof fiducials or barcodes [35,43]. Krueger’s VideoDesk[14], an early desk-basedsystem,usesan overheadcameraanda horizontal visible light table to provide high contrasthandgestureinput for interactions which arethendisplayedon a monitor on thefar sideof thedesk.In contrastwith thePerceptive Workbench,noneof thesesystemsaddresstheis-suesof introducingspontaneous3D physicalobjectsinto thevirtual environment in real-timeand combining 3D deictic(pointing) gestureswith objecttracking andidentification.

3 Goals

Ourgoalis to createavision-baseduserinterfacefor VR ap-plications.Hence,oursystemmustberesponsivein real-timeandbe suitablefor VR interaction. In order to evaluatethefeasibility of meetingthis goalwe needto examine thenec-essaryperformancecriteria.

3.1 SystemResponsiveness

Systemresponsiveness,thetimeelapsedbetweenauser’sac-tion andtheresponsedisplayedby thesystem[41], helpsde-

terminethequality of theuser’s interaction. Responsivenessrequirementsvarywith thetasksto beperformed.An accept-ablethreshold for objectselectionandmanipulation tasksistypically around 75 to 100 ms [39,41]. Systemresponsive-nessis directlycoupledwith latency. It canbecalculatedwiththefollowing formula:��� ��� ����������������� �!�� #"$������� %'&(�)� �!*��,+.-/�)0�213&(��45�768�

(1)Systemlatency, oftenalsocalleddevice lag, is thetime it

takesour sensorto acquire an image,calculateandcommu-nicatethe results,andchangethevirtual world accordingly.Input devicesshouldhave low latency, ideally below 50 ms.WareandBalakrishnanmeasuredseveralcommon magnetictrackersandfound themto have latenciesin therange of 45to 72ms[39].

In our situation,systemlatency depends on the time ittakesthecamerato transform thesceneinto a digital image,imageprocessingtime,andnetwork latency to communicatetheresults.Givenanaveragedelayof 1.5frameintervalsat33msperinterval to digitize theimageresultsin a 50 msdelay.In addition,weassumea1.5frameintervaldelayin renderingthe appropriategraphics. Assuminga constant60 framepersecond(fps) rendering rateresultsin anadditional 25 msde-lay for systemresponsiveness.Sincewe areconstrainedby a75 msoverheadin sensingandrendering, we mustminimizetheamount of processingtime andnetwork delayin order tomaintainan acceptable latency for objectselectionandma-nipulation. Thus,we concentrateon easilycomputed visionalgorithms anda lightweight UDP networking protocol fortransmittingtheresults.

3.2 Accuracy

With the deictic gesturetracking,we estimatethat absoluteaccuracy will notneedto beveryhigh.Sincethepointingac-tionsandgestureshappen in thethreedimensionalspacehighabove thedesk’s surface,discrepanciesbetweenauser’s pre-cisepointingpositionandthesystem’sdepictionof thatposi-tion is notobviousor distracting. Instead, it is muchmoreim-portant to capture thetrendof movementandallow for quickcorrectional motions.

For theobjecttracking,however, thisisnotthecase.Here,thephysicalobjectsplacedon thedeskalreadygive a strongvisual feedbackandany systemresponsediffering from thispositionwill bevery distracting.This constraint is relativelyeasyto satisfy, though,sincethetaskof detectingthepositionof anobjectonthedesk’ssurfaceis, by nature,moreaccuratethanfindingthecorrectarmorientationin 3D space.

4 Apparatus

Thedisplayenvironment for thePerceptiveWorkbenchbuildson Fakespace’s immersive workbench [40]. It consistsof awoodendeskwith ahorizontalfrostedglasssurfaceonwhichanimagecanbeprojectedfrom behindtheworkbench.

Page 3: Computer Vision-BasedGesture Tracking, Object Tracking ...wearables.cc.gatech.edu/publications/starner-mva2002.pdf · terrain navigation. This paper focuses on the techniques used

ThePercepti9 ve Workbench 3

: : : : : : : : : : : : : : :: : : : : : : : : : : : : : :: : : : : : : : : : : : : : :: : : : : : : : : : : : : :: : : : : : : : : : : : : :: : : : : : : : : : : : : ::: : : : : : : : : : : : : : :: : : : : : : : : : : : : : :: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : ::::::::::::::::: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

::::::::::::::::

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : ::: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : ::: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : ::: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

: :: :

: :

: :: :: :: :: : ::: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

:

:

:::::

: : : : : : : : : : : : : : : : : : : : : : : : : :: : : : : : : : : : : : : : : : : : : : : : : : : :: : : : : : : : : : : : : : : : : : : : : : : : : :: : : :: : : :: : : :: : : :: : :: : : : : : : : : : :: : : : : : ::: : : :: ::

: : : : : : :: : : : : : :: : : : : : : :: : : : : : : :: :

: : : : : : : : : :: : : : : : : : : :: : : : : : : : : :: : : : : : : : : :: : : : : : : : : :

: : : : : : : : : :: : : : : : : : : :: : : : : : : : : :: : : : : : : : : :: : : : : : : : : ::: : : : : : : :

: : : :: : : :: : : :: : : :: : : :

: :

: : : : : : : : : :: : : : : : : : : :: : : : : : : : : :: : : : : : : : : :: : : : : : : : : :

: : : : : : :: : : : : : :: : : : : : :: : : : : : :: :: :: : : : : : : :

: : : :: : : :: : : :: : : :

: : : : : : : : : : : : : : : : : : : : : : : :: : : : : : : : : : : : : : : : : : : : : : : :: : : : : : : : : : : : : : : : : : : : : : : : :: : : : : : : : : : : : : : : : : : : : : : : : :: :

: : : : :: : : : :: : : : : : : :: : : : : : : :: : : : : : : :; ; ; ;; ; ; ;: : : :: : : :: : : :: : : :: : : : : :: : : : : :: : : : :: : : : :: : : : : :: : : : : :: : : :: : : : ; ; ; ;; ; ; ;: : : : :: : : : :: : : :: : : : ; ; ; ; ;: : : : : :: :: : : : :: : ::: : :: : : :: : : : :: :: : :

< <: ::

Projector

IR Illum.B/W Cameraw/ IR fil ter

Ceiling mounted IR Illuminators (7)

Mirror : :: :::::: : : : : : : : :: : : : : : : : :: : : : : : : : :: : : : : : : ::: : : : : : :: :: :: :: :: ::::

= = = = = == = = = = => > > > > > >> > > > > > >>> >> > > > > > > > > > > > > > > > > > > > >

>

> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >

> >

> >

> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >

? ? ? ? ? ?? ? ? ? ? ?> > > > > > >>> > > > > >>>>> >> > > > >> > > > > > >> > > >> > > >> >> > > >> > > >> >> > > > > > >>>> ? ? ? ? ?? ? ? ? ?> > > > > >>> > > > >>>>

>> > > >> > > > > >> > > >> > > >> >> > > >> > > >> >> > > > >>>>>> > > > >> >> >> > > >> >> >> > >> > >> >> > > >> > >> > > >>>>> >> > >> > >> >> >> >> >>> >> >> > >>>> > >> >>> > >>> >

> > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > >>>> >> > > > > > > > > > > > > > > > > > > > > > > >> > >> > > > > >> > > > > > > >> >> >> >> > >> > > > >> > > > >> > >> > > > >> > > > >@ @ @ @ @@ @ @ @ @> > > > >> > > > > >@ @ @ @@ @ @ @> > > > >> > > > >> >@ @ @ @@ @ @ @> > > > >> > > > >> > > >

>

> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >>>>>

> >> >> >> >

>

> >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >>>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >

>>> > > > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > > > > > > > >> >

>

> > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > >>>>>

>>> > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > >> > >>> >

> >> >> >

> >> >> >

>>>

>>>> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > >

> >>> > > > > > >> > > > > > >> >> > > ? ? ? ?> > > > > >> > >> > > > > > > > >> > > > > > > > >> > > > > > >> > > > > > >

> > > > > > > > > > >>>> > > > > > > > > > >> >> >> >>> > > >> >> >> > > > > > > > > > >>>>>> >> >>>>>>>>>>>

>>>> >> >>>> >>>> >

> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >

> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >

>> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> >> >

> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >

> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> >> >

> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >

>>>>>>>>>>>>>>>>

B/W Camera

IR Il luminators

Ceiling mounted IR Illuminators (7)

> > >>> >> >

>> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >

B/W Cameraw/ IR fil ter

w/ IR fil ter

A AA A> > >> >> >> >>>

Fig. 1 Light andcamerapositionsfor the Perceptive Workbench. The top view shows how shadowsarecastandthe 3D arm position istracked.

We placeda standardmonochromesurveillance cameraunder the projector to watchthe desk’s surfacefrom under-neath(seeFigure 1). A filter placedin front of the cameralensmakesit insensitive to visible light andto imagespro-jectedonthedesk’ssurface.Two infraredilluminatorsplacednext to thecamerafloodthedesk’ssurfacewith infrared lightthatis reflectedbacktowardthecameraby objectsplacedonthedesk.

We mounteda ring of sevensimilar light sourceson theceiling surrounding thedesk(Figure 1). Eachcomputer-con-trolledlight castsdistinctshadowsonthedesk’ssurfacebasedon the objects on the table (Figure 2a). A secondinfraredcameraandanotherinfrared light source areplacednext tothedeskto provideasideview of theuser’sarms(Figure3a).This sidecamerais usedsolely for recovering 3D pointinggestures.

Notethatatany timeduringthesystem’soperation,eithertheceiling lights,or thelights below thetableareactive,butnotbothatthesametime.Thisconstraint isnecessaryin orderto achievereliabledetectionof shadowsandreflections.

We decidedto usenear-infraredlight sinceit is invisibleto the human eye. Thus, illuminating the scenedoesnot in-terferewith theuser’s interaction. Theuserdoesnotperceivethe illumination from the infrared light sources underneaththetable,nor theshadows castfrom theoverheadlights.Ontheotherhand,moststandardcharge-coupled device (CCD)camerascanstill seeinfrared light, providing aninexpensivemethod for observing the interaction. In addition, by equip-pingthecamerawith aninfraredfilter, thecameraimagecanbeanalyzedregardlessof changesin (visible)scenelighting.

We usethis setupfor threedifferent kinds of interaction:

– Recognitionand trackingof objectsplacedon the desksurfacebasedon their contour

– Trackingof handandarmgestures– Full 3D reconstruction of object shapesfrom shadows

castby theceiling light-sources.

FordisplayonthePerceptiveWorkbenchweuseOpenGL,the OpenGLUtility Toolkit (GLUT) anda customizedver-sion of a simplewidget packagecalledmicroUI (MUI). Inaddition, we usethe workbenchversionof VGIS, a globalterrainvisualizationandnavigationsystem[40] asanappli-cationfor interaction usinghandandarmgestures.

5 Object Tracking & Recognition

As abasicprecept for our interaction framework, we wanttolet usersmanipulate the virtual environment by placingob-jectson thedesksurface.Thesystemshouldrecognizetheseobjectsandtracktheirpositionsandorientationsasthey moveoverthetable.Usersshouldbefreeto pickany setof physicalobjectsthey choose.

Themotivation behind this is to usephysicalobjectsin a“graspable” userinterface[9]. Physicalobjectsareoftennat-ural interactors asthey provide physicalhandlesto let usersintuitively control a virtual application[11]. In addition, theuse of real objectsallows the user to manipulate multipleobjectssimultaneously, increasingthecommunicationband-width with thecomputer[9,11].

To achievethis tracking goal,weuseanimprovedversionof the technique describedin Starneret al. [30]. Two near-infraredlight-sourcesilluminate thedesk’sunderside(Figure1).Every object closeto thedesksurface(including theuser’shands)reflectsthis light, which thecameraunder thedisplaysurfacecansee.Usinga combinationof intensitythreshold-ing and background subtraction, we extract interestingre-gions of the camera imageand analyze them. We classifythe resultingblobsasdifferentobject typesbasedon a 72-dimensional featurevector reflectingthe distancesfrom thecenterof theblobto its contour in differentdirections.

Notethat thehardwarearrangementcausesseveral com-plications.Theforemostproblemis thatourtwo light sourcesunder the table can only provide uneven lighting over the

Page 4: Computer Vision-BasedGesture Tracking, Object Tracking ...wearables.cc.gatech.edu/publications/starner-mva2002.pdf · terrain navigation. This paper focuses on the techniques used

4 ThadStarneretal.

Fig. 2 (a) Arm shadow from overheadIR lights; (b) resultingcontourwith recoveredarmdirection.

whole desksurface.In addition, the light rays arenot par-allel, andthereflectionon themirror surfacefurther exacer-batesthis effect. To compensatefor this, we perform a dy-namicrange adjustment. In additionto a backgroundimage,we storea “white” imagethat representsthe maximum in-tensitythat canbe expectedat any pixel. This imageis ob-tainedby passinga bright white (andthushighly reflective)objectover the tableduring a one-time calibration stepandinstructingthe systemto record the intensityat eachpoint.Thedynamicrangeadjustment helpsto normalizetheimageso that a singlethreshold canbe usedover the whole table.An additional optimalthresholding stepis performedfor ev-ery blob to reducethe effectsof unwanted reflectionsfromusers’handsandarmswhile they aremoving objects.Sincethe blobs only representa small fraction of the image,thecomputationalcostis low.

In orderto handle theremaining uncertainty in therecog-nition process,two final stepsareperformed:detectingthestability of a reflectionandusingtrackinginformationto ad-justandimproverecognition results.Whenanobjectis placedon the table,therewill be a certaininterval whenit reflectsenough infrared light to betrackedbut is notcloseenoughtothedesk’s surfaceto createa recognizablereflection.To de-tectthissituation,wemeasurethechangein sizeandaverageintensityfor eachreflectionover time.Whenbothsettleto arelatively constantvalue,weknow thatanobjecthasreachedasteadystateandcannow berecognized.To further improveclassificationaccuracy, we make theassumptionthatobjectswill not move very far betweenframes. Thus, the closerablobis to anobject’spositionin thelastframe,themoreprob-ableit is thatthis blob correspondsto theobject andthelessreliabletherecognition resulthasto bebefore it is accepted.In addition, the systemremembersandcollectsfeature vec-tors thatcausedsomeuncertainty (for example, by anunfa-miliar orientation that causedthe featurevector to change)andaddsthemto the internal descriptionof theobject,thusrefiningthemodel.

In this work, we usethe objectrecognition andtrackingcapabilitymainly for cursoror place-holderobjects.We fo-cusonfastandaccuratepositiontracking, but thesystemmaybetrainedonadifferentsetof objects to serveasnavigationaltoolsor physicalicons[34]. A futureprojectwill exploredif-ferentmodesof interactionbasedon this technology.

6 Deictic GestureTracking

FollowingQuek’staxonomy[21], handgesturescanberough-ly classifiedinto symbols(referential and modalizing ges-tures)andacts(mimeticanddeicticgestures).Deictic (point-ing) gesturesdepend stronglyon locationandorientationoftheperforminghand. Their meaning is determinedby thelo-cationatwhichafingeris pointing,or by theangle of rotationof somepartof thehand.This informationactsnot only asasymbolfor thegesture’s interpretation, but alsoasa measureof theextentto whichthecorresponding actionshouldbeex-ecutedor to whichobjectit shouldbeapplied.

For navigation andobjectmanipulation in a virtual en-vironment,many gestureswill have a deictic component.Itis usuallynot enough to recognizethat an objectshouldberotated– wewill alsoneedto know thedesiredamount of ro-tation.For objectselectionor translation, we wantto specifytheobject or locationof ourchoicejust by pointing at it. Forthesecases,gesturerecognition methods that only take thehandshapeandtrajectory into account will not suffice. Weneedto recover 3D informationabout the users’handsandarmsin relationto theirbodies.

In the past, this information has largely beenobtainedby using wired gloves or suits, or magnetictrackers [3,1].Suchmethods provide sufficiently accurate resultsbut relyon wires tetheredto the user’s body or to specific interac-tion devices,with all theaforementionedproblems.We aimto develop a purely vision-basedarchitecture that facilitatesunencumbered3D interaction.

Page 5: Computer Vision-BasedGesture Tracking, Object Tracking ...wearables.cc.gatech.edu/publications/starner-mva2002.pdf · terrain navigation. This paper focuses on the techniques used

ThePercepti9 ve Workbench 5

Fig. 3 (a) imagefrom sidecamera(without infraredfilter); (b) armcontourfrom similar imagewith recoveredarmdirection.

With vision-based3D trackingtechniques,thefirst issueis to determine what informationin thecameraimageis rel-evant – that is, which regions representthe user’s handorarm.Whatmakesthisdifficult is thevariationin userclothingor skin color andbackground activity. Previous approacheson vision-basedgesture recognition usedmarkedgloves[8],infrared cameras[25], or a combination of multiple featurechannels,likecolorandstereo[13] to dealwith thisproblem,or they just restrictedtheir systemto a uniform background[36]. By analyzinga shadow image,this taskcanbegreatlysimplified.

Most directly relatedto our approach,SegenandKumar[27] derive 3D positionandorientation information of twofingersfrom theappearanceof theuser’shand andits shadow,co-locatedin thesameimage.However, sincetheirapproachreliesonvisible light, it requiresastationarybackgroundandthuscannot operateonahighly dynamicback-projectionsur-facelike the oneon our workbench.By usinginfrared lightfor castingtheshadow, we canovercomethis restriction.

The use of shadows solves, at the sametime, anotherproblem with vision-basedarchitectures: where to put thecameras.In a virtual workbenchenvironment, thereareonlyfew placesfrom wherewe cangetreliablehandpositionin-formation.Onecamera canbesetup next to the tablewith-out overly restrictingthe availablespacefor users.In manysystems,in order to recover threedimensional information,asecondcamera is deployed.However, theplacement of thissecondcamerarestrictstheusableareaaround theworkbench.Using shadows, the infrared cameraunderthe projector re-placesthe secondcamera.Oneof the infraredlight sourcesmounted on the ceiling above the usershineson the desk’ssurfacewhereit canbe seenby the cameraunderneath(seeFigure4). Whenusersmove anarmover thedesk,it castsashadow onthedesksurface(seeFigure2a).Fromthisshadow,andfrom theknown light-sourceposition,wecancalculateaplanein which theuser’sarmmustlie.

Simultaneously, thesecondcamerato theright of theta-ble (Figures3aand4) recordsasideview of thedesksurfaceand the user’s arm. It detectswherethe arm entersthe im-ageandthepositionof thefingertip. Fromthis information,the computer extrapolatestwo lines in 3D spaceon whichtheobservedreal-world pointsmustlie. By intersectingtheselineswith theshadow plane,wegetthecoordinatesof two3Dpoints– oneon theupper arm,andoneon thefingertip. Thisgivesus the user’s handpositionandthe directionin whichtheuseris pointing. Wecanusethis informationto projectaniconrepresentingthehandpositionandaselectionrayontheworkbenchdisplay.

Obviously, the successof the gesture-tracking capabil-ity relies heavily on how fast the imageprocessingcanbedone.Fortunately, we canmake somesimplifying assump-tionsabout the imagecontent. We mustfirst recover armdi-rectionandfingertip positionfrom both the cameraandtheshadow image.Since the user standsin front of the deskandtheuser’s armis connectedto theuser’s body, thearm’sshadow shouldalwaystouchtheimageborder. Thus, our al-gorithmexploits intensitythresholding andbackground sub-traction to discover regions of change in the image.It alsosearchesfor areasin which theseregionstouchthedesksur-face’sfront border(whichcorrespondsto theshadow image’stop borderor thecameraimage’s left border).Thealgorithmthentakesthemiddleof the touching areaasanapproxima-tion for the origin of the arm (Figures 2b and Figure 3b).Similar to Fukumoto’s approach [10], we tracetheshadow’scontour andtakepoint farthestawayfrom theshoulderasthefingertip.The line from theshoulderto thefingertip revealsthearm’s 2D direction.

In our experiments,the point thusobtainedwascoinci-dentwith thepointing fingertipin all but a few extremecases(suchasthefingertip pointingstraightdownatarightangletothearm). Themethod doesnotdependona pointing gesture,but alsoworksfor mostotherhandshapes,including a handheldhorizontally, vertically, or in afist. Theseshapesmaybe

Page 6: Computer Vision-BasedGesture Tracking, Object Tracking ...wearables.cc.gatech.edu/publications/starner-mva2002.pdf · terrain navigation. This paper focuses on the techniques used

6 ThadStarneretal.

B BB B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B

B BB B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B

B B

BB B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B

BB B B B B B B B B B B BB B B B B B B B B B B BB B B B B B B B B B B BB B B B B B B B B B B BB B B B B B B B B B B BB B B B B B B B B B B BB B B B B B B B B B B B

B B B B B B B B B B BB B B B B B B B B B BB B B B B B B B B B BB B B B B BB B B B B B

B

B BB B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B

B B

BB B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B

B

BB B B B B B B B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B BB BB B BB BB B B BBBB B BB BBB BB BB BB BB BB BB BB B B BB B B B B B BB B B B B B BB B B BB B B B B B BB B B B B B B

B BB BB BB BB BB

BB B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B

B BB B BB BB BB BB B BB B BB B

B B BB B BB BBB B B B B B B B B B B BB B B B B B B B B B B BB

BB B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B BB BBB BB BB BB BB BBB B B B B B B B B B B B B

B B B BB B B B BB B B B

IR Light

CameraDesk Surface

Camera Image

Shadow

Calculated 3D PositionsB BB B B B B B B BB B B B B B B BB B B B B B B BB B

B

B B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B B BB B B B B B B B B B B B B B B B B B B B B B

B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B BBBBBBBBBBBBBBBBBB B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B

BBBBBBBBBBBBBBBBB

Fig. 4 Principleof pointingdirectionrecovery.

distinguishedby analyzing asmallsectionof thesidecameraimageandmay beusedto triggerspecificgesturemodesinthefuture.

Thecomputedarmdirectioniscorrectaslongastheuser’sarmis notoverly bent(seeFigure3). In suchcases,thealgo-rithm still connects theshoulder andfingertip, resultingin adirectionsomewherebetweenthedirectionof thearmandtheonegivenby thehand. Although theabsoluteresultingpoint-ing positiondoesnot matchthe positiontowardswhich thefingeris pointing, it still capturesthetrendof movementverywell. Surprisingly, the technique is sensitive enough so thatuserscanstandat thedeskwith their armextendedover thesurfaceanddirect the pointersimply by moving their indexfingerwithoutany armmovement.

6.1 LimitationsandImprovements

Figure3b shows a casewheresegmentation basedon colorbackground subtraction in anolder implementationdetectedboththehandandthechangein thedisplayontheworkbench.Our new version replacesthe sidecolor camera with an in-fraredspotlightanda monochrome cameraequippedwith aninfrared-passfilter. By adjustingtheangle of thelight toavoidthedesk’ssurface,theuser’sarmis illuminatedandmadedis-tinct from thebackground. Changes in theworkbench’s dis-playdonotaffect thetracking.

Oneremainingproblemresultsfromthesidecamera’sac-tual location. If a userextends botharmsover thedesksur-face,or if morethanoneusertries to interactwith theenvi-ronment simultaneously, the imagesof thesemultiple limbscanoverlapandmerge into a singleblob. Consequently, ourapproachwill fail to detectthe handpositions andorienta-tionsin thesecases.A moresophisticatedapproachusingpre-vious positionandmovementinformationcould yield morereliableresults,but at this stagewe choseto acceptthis re-strictionandconcentrateon high frameratesupport for one-

Fig. 5 Realobjectinsertedinto thevirtual world. Thefigureshowsa reconstructionof thedoll in theforeground.

handedinteraction. In addition, thismaynotbeaseriouslimi-tationfor asingleuserfor certaintasks.A recent studyshowsthat for a tasknormally requiring two hands in a real envi-ronment, usershave no preferencefor oneversustwo handsin a virtual environment thatdoesnot modeleffectssuchasgravity andinertia[26].

7 3D Reconstruction

To complementthecapabilities of thePerceptiveWorkbench,wewantto beableto insertrealobjectsinto thevirtual worldand sharethem with other usersat different locations (seeFigure5). An example application for this couldbea telep-resenceor computer-supported collaborative work (CSCW)system.This requiresdesigninga reconstruction mechanismthatdoesnot interrupt theinteraction. Our focus is to provideanearlyinstantaneousvisualcuefor theobject,notnecessar-ily oncreating ahighly accuratemodel.

Several methods reconstructobjectsfrom silhouettes[29,33] or dynamic shadows [5] usingeithera moving cameraor light sourceon a known trajectoryor a turntable for theobject[33]. Severalsystemshave beendevelopedfor recon-structingrelatively simpleobjects,including somecommer-cial systems.

However, thenecessityto move eitherthecameraor theobject imposessevere constraintson the working environ-ment.Reconstructing an objectwith thesemethods usuallyrequires interrupting the user’s interactionwith it, taking itoutof theuser’senvironment, andplacingit intoaspecializedsetting.Otherapproachesusemultiple camerasfrom differ-entviewpoints to avoid this problem at theexpenseof morecomputationalpowerto processandcommunicatetheresults.

In this project,usingonly onecameraandmultiple in-fraredlight sources,we analyzetheshadows castby theob-ject frommultipledirections(seeFigure6).Sincetheprocess

Page 7: Computer Vision-BasedGesture Tracking, Object Tracking ...wearables.cc.gatech.edu/publications/starner-mva2002.pdf · terrain navigation. This paper focuses on the techniques used

ThePercepti9 ve Workbench 7

CC CCC C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C CC C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C

C CC C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C CC C

C CC C C C C C C C C C C C C C C C CC C C C C C C C C C C C C C C C CC C C C C C C C C C C C C C C C CC C C C C C C C C C C C C C C C CC C C C C C C C C C C C C C C C CC C C C C C C C C C C C C C C C CC C C C C C C C C C C C C C C C CC C C C C C C C C C C C C C C C C

C

C C

C C C C C C C C C C C C C CC C C C C C C C C C C C C CC C C C C C C C C C C C C CC C C C C C C C C C C C C CC C C C C C C C C C C C C CC C C C C C C C C C C C C CC C C C C C C C C C C C C CC C C C C C C C C C C C C C

C C C C C C C C C C CC C C C C C C C C C CC C C C C C C C C C CC C C C CC C C C CC C C C C C C C C C C C CC C C C C C C C C C C C CC C C C C C C C C C C C CC C C C C CC C C C C CC C CCC C CC C C

C CC C

C C CC C CC C C

C C CC C CD D D D D D D D D D D D D D D D D D D D D D D DD D D D D D D D D D D D D D D D D D D D D D D DC C C C C C C C C C C C C C C C C C C C C C C CC C C C C C C C C C C C C C C C C C C C C C C C C

CCC C

C CCC C C

CCC C

C CC CCC C CC C CE E E E E E E E E E E E E E E E E E E E E EE E E E E E E E E E E E E E E E E E E E E EE E E E E E E E E E E E E E E E E E E E E EC C C C C C C C C C C C C C C C C C C C C CC C C C C C C C C C C C C C C C C C C C C CC C C C C C C C C C C C C C C C C C C C C C

CCC C C C C C C C C CC C C C C C C C C CC C C C C C C C C CC C

I R L ight IR L ight

D esk Surface

Object

ShadowsC CCC C C C C C C C C CC C C C C C C C C CC C CC C C C CC C C C C

CC C

C C C C C C C CC C C C C C C CC C C C C C C C

C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C CCCCCCCCCCCCCCCCCC C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C

CCCCCCCCCCCCCCCCC

Fig. 6 Principleof the3D reconstruction.

is basedon infraredlight, it canbeappliedindependently ofthe lighting conditions and with minimal interferencewiththeuser’s natural interaction with thedesk.

To obtain thedifferent views, we usea ring of seven in-fraredlight sourcesin theceiling,eachindependently switchedby computercontrol. Thesystemdetectswhena userplacesa new objecton the desksurface,andrenders a virtual but-ton.Theusercantheninitiatereconstructionby touching thisvirtual button.Thecameradetectsthis action,andin approx-imatelyonesecondthesystemcancaptureall of therequiredshadow images.After another second, reconstruction is com-plete,andthenewly reconstructedobjectbecomespartof thevirtual world. Notethat this processusesthesamehardwareasthedeicticgesture-trackingcapabilitydiscussedin thepre-vioussection,andthusrequiresnoadditional cost.

Figure7 shows a seriesof contour shadows anda visual-izationof thereconstructionprocess.By approximatingeachshadow asa polygon (not necessarilyconvex) [24], we cre-atea setof polyhedral“view cones”extending from thelightsourceto the polygons.The intersectionof theseconescre-atesapolyhedronthatroughly contains theobject.

Intersectingnonconvex polyhedralobjectsis a complexproblem,furthercomplicatedbynumerousspecialcases.For-tunately, thisproblemhasalreadybeenextensively researchedandsolutionsareavailable.For the intersectioncalculationsin our application, we usePurdue University’s TWIN SolidModelingLibrary [7]. Recently, ahighlyoptimized algorithmhasbeenproposedby Matusik et al. that canperform theseintersectioncalculationsdirectlyaspartof therendering pro-cess[20]. Theiralgorithmprovidesasignificantimprovementon the intersectioncodewe arecurrently using,andwe areconsidering it for a futureversionof oursystem.

Figure8cshows a reconstructedmodelof a wateringcanplacedonthedesk’ssurface.Wechosethecolorsto highlightthedifferentmodel facesby interpreting thefacenormal asavectorin RGBcolorspace.In theoriginal versionof oursoft-ware,we did not handleholesin the contours. This feature

Fig. 7 Stepsof the 3D reconstructionof the doll from Figure5,including the extraction of contour shapesfrom shadowsand theintersectionof multipleview cones (bottom).

hassincebeenaddedby constructing light conesfor boththeobjectcontoursandfor thoserepresentingholes.By inspect-ing thepixels adjacentto theoutsideof thecontour, we candistinguishbetweenthe two typesof borders. Then,ratherthan intersectingthe light conewith the rest of the object,we perform a booleandifferencingoperationwith theconesformedfrom theholeborders.

7.1 Limitations

An obviouslimitation to ourapproachis thatweareconfinedtoafixednumberof differentviewsfromwhichto reconstructtheobject.Theturntableapproachpermitsthesystemto takeanarbitrary numberof imagesfrom differentviewpoints. Inaddition, not every nonconvex objectcanbe exactly recon-structedfrom its silhouettesor shadows.Theclosestapprox-imation that canbe obtained with volumeintersectionis itsvisualhull, that is, thevolume envelopedby all thepossiblecircumscribedview cones.Evenfor objectswith apolyhedralvisualhull, anunboundednumberof silhouettesmaybenec-essaryfor an exact reconstruction[17]. However, Sullivan’swork [33] andour experiencehave shown thatusuallysevento ninedifferentviews suffice to geta reasonable3D modelof theobject.

Exceptions to this heuristicare spherical or cylindricalobjects.The quality of reconstruction for theseobjectsde-pendslargely on the number of availableviews. With onlyseven light sources,the resultingmodelwill appearfaceted.Thisproblemcanbesolvedeitherby adding morelight sour-ces,or by improving themodelwith thehelpof splines.

In addition,theaccuracy with whichobjectscanberecon-structedis boundedby another limitation of ourarchitecture.Sincewe mountedour light sourceson the ceiling, the sys-temcannotprovidefull informationabouttheobject’sshape.

Page 8: Computer Vision-BasedGesture Tracking, Object Tracking ...wearables.cc.gatech.edu/publications/starner-mva2002.pdf · terrain navigation. This paper focuses on the techniques used

8 ThadStarneretal.

Thereis a pyramidal blind spotabove all flat, horizontal sur-facesthat the reconstructioncannot eliminate.Theslopeofthesepyramids dependson the anglebetweenthe desksur-faceandtheraysfrom thelight sources.Only structureswitha greater slopewill be reconstructedentirely without error.We expect that we cangreatlyreduce the effects of this er-ror by usingthe imagefrom the sidecameraandextractinganadditional silhouette of theobject.This will helpkeeptheerroranglewell below 10degrees.

8 PerformanceAnalysis

8.1 ObjectandGestureTracking

Bothobjectandgesturetrackingcurrently performatanaver-ageof between14and20framespersecond(fps).Frameratedepends on both the number of objectson the tableandthesizeof their reflections.Both techniquesfollow fastmotionsandcomplicatedtrajectories.

To test latency, we measuredthe runtime of our visioncode.In our currentimplementationwith an imagesize of320*240 pixels,theobjecttracking codetook around 43 msto run with a singleobjecton thedesksurfaceandscaledupto 60 mswith five objects. By switchingfrom TCPto UDP,we wereableto reduce thenetwork latency from a previous100msto approximately8 ms.Thus, our theoreticalsystemlatency is between101and118ms.Experimentalresultscon-firmedthesevalues.

For thegesturetracking, theresultsarein thesamerangesincethe codeusedis nearlyidentical.Measuring the exactperformance, however, is moredifficult becausetwo camerasareinvolved.

Even though the systemresponsiveness(systemlatencyplus display lag) exceedsthe envisioned thresholdof 75 to100ms,it still seemsadequatefor most(navigational)point-ing gesturesin our current applications.Sinceusersreceivecontinuousfeedbackabouttheirhandandpointing positions,andmostnavigation controls arerelativeratherthanabsolute,usersadapttheir behavior readily to thesystem.With objecttracking, the physical object itself providesuserswith ade-quatetactile feedback. In general, sinceusersmove objectsacrossa very large desk,the lag is rarely troublesomein thecurrent applications.

Nonetheless,we are confident that someimprovementsin the vision codecan further reduce latency. In addition,Kalmanfilters may compensatefor renderlag andwill alsoaddto thetrackingsystem’s stability.

8.2 3D Reconstruction

Calculatingtheerror from the3D reconstructionprocessre-quireschoosing known 3D models, performing the recon-structionprocess,aligning the reconstructedmodeland theidealmodel, andcalculatinganerror measure.For simplicity,we chosea coneandpyramid. We setthecentersof massof

Cone PyramidMaximal Error 0.0215(7.26%) 0.0228(6.90%)

MeanError 0.0056(1.87%) 0.0043(1.30%)MeanSquareError 0.0084(2.61%) 0.0065(1.95%)

Table 1 Reconstructionerrorsaveragedover threeruns(in metersandpercentageof objectdiameter).

theidealandreconstructedmodelsto thesamepoint in space,andalignedtheirprincipal axes.

To measureerror, we usedthe Metro tool developedbyCignoni,Rocchini, andScopigno [4]. It approximatestherealdistancebetweenthetwosurfacesbychoosingasetof 100,000to 200,000points onthereconstructedsurface,thencalculat-ing thetwo-sideddistance(Hausdorff distance)betweeneachof thesepointsandtheidealsurface.This distanceis definedas F/G�H!I3JKI � �ML � � N�L JKI � �OL � � N�N with JKI � �ML � � N denoting theone-sideddistancebetweenthesurfaces

� � and� � :

JKI � ��L � ��N " F/G�HP�Q(ROS IUT �) � I � L � ��N0N " F/G�HP�Q(ROS I�FWVYXP[ZUQ(R�\ IUT �]�� I � L ��^ N0N�N(2)

The Hausdorff distancecorresponds directly to the re-construction error. In additionto themaximum distance,wealso calculatedthe meanandmean-squaredistances.Table1 shows the results.In theseexamples, the relatively largemaximalerrorwascausedby thedifficulty in accuratelyre-constructing thetip of theconeandthepyramid.

Improvementsmaybemadeby preciselycalibrating thecameraandlighting system,adding more light sources,andobtainingasilhouettefromthesidecameratoeliminateambi-guity about thetopof thesurface.However, thesystemmeetsits goal of providing virtual presencesfor physicalobjectsina timely manner thatencouragesspontaneousinteractions.

8.3 UserExperience

To evaluatethecurrent usabilityof thesystem,weperformeda small userstudywith the goal of determining the relativeefficiency andaccuracy of theobjecttracking capability. Wedesigneda task that required usersto drag virtual balls ofvarioussizesto specifiedlocationsonthetable’ssurfacewiththe help of physical “cursor” objects.The systemrecordedthe time required to completethe taskof correctly movingfour suchballs.

Although thenumberof participantswastoosmalltoyieldsignificantquantitative results,we discovered several com-mon problems usershad with the interface. The main dif-ficulties arosefrom selectingsmallerballs,both becauseofanimprecise“hot spot” for physicalinteractors,andbecausethephysicalobjectoccludedits virtual representation.By de-signinga context-sensitive “crosshair”cursorthat extendedbeyond thedimensions of thephysical object,we wereableto significantly increaseperformancein thosecases.In thefuture,we plan to conduct a morethoroughuserstudy, withmoreparticipants,thatalsomeasurestheusabilityof theges-turetrackingsubsystem.

Page 9: Computer Vision-BasedGesture Tracking, Object Tracking ...wearables.cc.gatech.edu/publications/starner-mva2002.pdf · terrain navigation. This paper focuses on the techniques used

ThePercepti9 ve Workbench 9

9 Putting It to Use:SpontaneousGesture Interfaces

All the componentsof the Perceptive Workbench – deicticgesturetracking, objectrecognition, tracking, andreconstruc-tion – canbe seamlesslyintegratedinto a single,consistentframework. ThePerceptiveWorkbenchinterfacedetectshowuserswant to interactwith it andautomaticallyswitchestothedesiredmode.

Whenusersmove a handabove the displaysurface,thesystemtracksthe handand arm as describedin Section6.A cursorappearsat the projected handpositionon the dis-playsurface,andarayemanatesalongtheprojectedarmaxis.Thesecanbeusedin selectionor manipulation,asin Figure8a.Whenusersplacean objecton the surface,the camerasrecognize this and identify and track the object. A virtualbutton also appears on the display (indicatedby the arrowin Figure8b).By tracking thereflections of objectsnearthetablesurface,thesystemdetermineswhenthehand overlapsthebutton, thusselectingit. This actioncausesthesystemtocapture the3D objectshape,asdescribedin Section7.

Sinceshadows from theuser’sarmsalwaystouchtheim-ageborder, it is easyto decidewhetheranobject lies on thedesksurface.If the systemdetectsa shadow that doesnottouchany border, it canbe surethat an objecton the desksurfacewasthecause.As a result,thesystemwill switch toobject-recognition andtracking mode.Similarly, theabsenceof suchshadows,for acertainperiod, indicatesthattheobjecthasbeentakenaway, andthesystemcansafelyswitchbacktogesture-trackingmode.Notethatoncethesystemis in object-recognition mode,it turnsoff theceiling lights,andactivatesthe light sources underneaththe table.Therefore userscansafelygrabandmoveobjectson thedesksurface,sincetheirarmswill not castany shadows that could disturb the per-ceivedobjectcontours.

Theseinteraction modesprovide the elements of a per-ceptualinterfacethatoperateswithout wiresandwithout re-strictionson theobjects.For example, we constructeda sim-ple application wherethe systemdetectsobjectsplacedonthe desk,reconstructsthem,andthenplacesthemin a tem-platesetwherethey aredisplayedasslowly rotatingobjectson theworkbench display’s left border. Userscangrabtheseobjects,which canact asnew iconsthat the usercanattachto selectionor manipulationmodesor useasprimitives in amodel building application.

9.1 AnAugmentedBilliardsGame

We have developed a collaborative interfacethat combinesthePerceptive Workbench with a physicalgame of pool in atwo-player telepresence game. The objective of the game isfor theplayeratthebilliardstableto sinkall of theballswhileavoiding a virtual obstaclecontrolled by the otherplayerattheworkbench.A previoussystem[12] concentratedonsug-gestingshotsfor theplayerusingahead-updisplayandcam-eraasopposedto theprojecteddisplayusedhere.

Fig. 9 (a) The Augmented Billiards Table;(b) Workbench playerplacing an obstacle;(c) Virtual obstacleoverlaid on the real pooltable.

The billiard tableis augmented with a setupresemblingthePerceptiveWorkbenchapparatus (seeFigure9a).A cam-era positionedabove the table tracksthe type andpositionof thepool balls,while a projector in a similar locationcancreatevisual feedback directly on the playing surface.Thebilliard table’s current stateis transmittedto the workbenchclient andrenderedasa 3D model. As thegame progresses,theworkbenchupdatesthismodelcontinuouslyusingstream-ing datafrom thebilliards client.

During the workbenchplayer’s turn, he placesa physi-cal object on thesurfaceof theworkbench(Figure9b). Theworkbench derives a 2D representationfrom the outline oftheobjectandtransmitstheshapeto thebilliards client.Theoutlineis projected ontothesurfaceof thebilliards tableandactsasa virtual obstacle(Figure 9c). If, while the billiardsplayertriesto makehisshotany of theballspassthroughtheobstacle,theworkbenchplayeris awardedapoint.If thepoolplayercansuccessfullysinkaball without thishappening, heis awardeda point.Theworkbenchplayeris completely freeto chooseany objectasan obstacleaslong asit fits certainsizeconstraints.Thus,thePerceptive Workbench’s ability tousepreviouslyunknown physicalobjectsenhancestheusers’possibilitiesfor gameplay. In addition, this “tangible” inter-faceis apparent to anoviceuserasit involvesmanipulatinga

Page 10: Computer Vision-BasedGesture Tracking, Object Tracking ...wearables.cc.gatech.edu/publications/starner-mva2002.pdf · terrain navigation. This paper focuses on the techniques used

10 ThadStarneretal.

Fig. 8 (a)Pointinggesturewith handiconandselectionray; (b) Virtual buttonrenderedon thescreenwhenobjectis detectedon thesurface;(c) Reconstructionof thiswateringcan.

physical objectasa representationof thevirtual obstructiononadisplaysimilar in sizeto thebilliards tableitself.

9.2 AnAugmentedRealityGame

Wecreatedamoreelaboratecollaborative interfaceusingthePerceptive Workbench in an augmented reality game. Twoor moregamemasterscancommunicatewith a personin aseparatespacewearinganaugmentedreality headset(Figure10a). Theworkbenchdisplaysurfaceactsasatop-down viewof the player’s space.The game mastersplacedifferent ob-jectswhichappear to theplayerasdistinctmonstersatdiffer-entvertical levelsin thegamespace.While thegamemastersmove the objectsaround the displaysurface, this motion isreplicatedby monstersin the player’s view, which move intheir individual planes.The player’s goal is to dispel thesemonsters by performing Kung Fu gesturesbeforethey canreachhim. Sinceit is difficult for the game masterto keeppacewith theplayer, two or moregamemastersmaypartici-pate(Figure10a).ThePerceptiveWorkbench’sobjecttrackerscalesnaturally to handle multiple,simultaneoususers.For amoredetaileddescription of this application, seeStarneretal. [30,19].

9.3 3D Terrain Navigation

In another application, we usethe Perceptive Workbench’sdeicticgesturetrackingcapabilityto interfacewith VGIS, aglobal terrainnavigationsystemthatallowscontinuousflightfrom outerspaceto terrainat1 foot or betterresolution. Maininteractions includezooming, panning,androtatingthemap.Previously, interaction tookplaceby usingbuttonstickswith6-DOFelectromagnetictrackersattached.

Weemployeddeicticgesturetrackingto removethiscon-straint. Userschoose the direction of navigation by point-ing andcanchange thedirection continuously(Figure 10b).Moving the handtoward the displayincreasesthe speedto-wardtheearthandmoving it away increasesthespeedawayfrom the earth.Panningand rotating can be accomplishedby making lateralgesturesin the directionto be panned or

by makinga rotational armgesture.Currently, userschoosethesethree modes by keys on a keyboard attachedto theworkbench, while the extent of the actionis determined bydeictic tracking.In the future, this selectioncould be madeby analyzingtheuser’s handshape,or by reacting to spokencommands.In arecent paper, Krumetal proposesuchanavi-gationinterfacethatimplementsacombination of speechanduser-centeredgestures,recognizedby asmallcameramodulewornon theuser’schest[16].

9.4 TelepresenceandCSCW

Lastbut not least,webuilt a simpletelepresencesystem.Us-ing thesampleinteractionframework describedat thebegin-ning of this section,userscanpoint to any locationon thedesk,reconstructobjects,andmovethemacrossthedesksur-face.All of their actionsare immediately appliedto a VRmodelof theworkbenchmirroring thecurrentstateof therealdesk(Figure10c). Thus, whenperforming deictic gestures,the current handandpointing positionappearon the modelworkbenchasaredselectionray. Similarly, thereconstructedshapesof objectsonthedesksurfacearedisplayedat thecor-responding positions in themodel. Thismakesit possibleforcoworkersat a distantlocationto follow theuser’s actionsinreal-time,while having completefreedom to choosea favor-ableviewpoint.

10 Integration and Interface DesignIssues

ThePerceptiveWorkbenchwasdesignedwith application in-tegrationin mind. Applicationsimplementasimpleinterfaceprotocol andcantake advantageof thosepartsof thework-benchfunctionality they need. However, for successfulappli-cationintegration,severalissueshaveto beadressed.

From an interfacedesignstandpoint, limitations areim-posedby both the physical attributes,the hardware restric-tions,andthesoftwarecapabilities of theworkbench.Whiletheworkbenchsizepermitsthedisplayof alife-sizemodelof,for example, thebilliards table,theuser’scomfortablereach-ing range limits theuseful model size.In addition, interface

Page 11: Computer Vision-BasedGesture Tracking, Object Tracking ...wearables.cc.gatech.edu/publications/starner-mva2002.pdf · terrain navigation. This paper focuses on the techniques used

ThePercepti9 ve Workbench 11

Fig. 10 Applications:(a) Two gamemasterscontrollingvirtual monsters;(b) Terrainnavigationusingdeicticgestures;(c) A virtual instan-tiation of theworkbench.

designis restrictedin that onesideof the workbench is in-accessibledueto theplacement of theprojectorandcamera.If gesturetrackingis to beused,the availablerangeis evenmorelimited to justonesideof theworkbench.

Thesensinghardwareplacesrestrictions on thepotentialuseof tracking informationfor a userinterface.Precisepo-sitioning of objectsandpointing gesturesis limited by thecameraresolution.If the application requires watchingthewhole workbenchsurface,our current cameraresolutionof320*240 pixels limits single-pixel accuracy to about 5 mm.By interpolating thecontourswith polygonsandthusaverag-ing over several samples,we canhowever arrive at a muchhigher precision.In a relatedissue,thecontour of a movingobjecton theworkbenchis not necessarilystableover time,especiallywhenthemotion is sofastthatthecameraimageisblurred.To dealwith this,thebilliard systemdetectswhenanobjectfirst comesto rest,determinestheobject’scontour, andsimply translatesit insteadof trying to recompute it. In bothcases,the error canbe reducedby increasing the resolutionor switchingto amoreexpensiveprogressivescancamera.

Onthesoftwareside,thequestionis how to usetheinfor-mationthePerceptive Workbench providesto createa com-pellinguserinterface.For example,therearetwo conceivabletypesof gesturalinteractions. Thefirst usesdeicticgesturesfor relative control, for example for directinga cursor, or foradjustingthespeedof movement.Theotherdetectsgesturesthat causea discreteevent, like pushinga virtual button tostartthe reconstructingprocess,or assuminga specifichandshapeto switch betweeninteraction modes.Which of theseinteractiontypesis appropriate,andwhichhandshapesmakesensedependslargelyon theapplication.

Another questionis how much information to transmitfrom the workbenchto its clients. If too muchinformationabout static objectsis transmittedto the displayclient, thetime neededto readout andprocessthe corresponding net-work messagescanreducetheeffectivedisplayframerate.Inour applications,we found it usefulto only transmitupdatesonobjectswhosepositions hadchanged.

On the client side, sensorintergration needsto be ad-dressed.For gesturetracking, informationfrom two camerasis integrated. If theapplication requires lower latenciesthanthosecurrently provided,Kalmanfilteringmaybeused.Sincethe camerasare not explicitly synchronized,asynchronous

filters, like the single-constraint-at-a-timemethod by WelchandBishop[42], mayalsoproveuseful.

11 Maintenanceof Perceptual Interfaces

One detriment to perceptual interfaces is that the underly-ing sensorplatform needsto be maintained. Video camerasmay be bumped, lights may burn out, or the entire struc-turemayneedto bemoved.ThePerceptualWorkbenchhasserved asa experimentalplatform for several yearsandhasundergone severalmajor revisions.In addition, the underly-ing Fakespacehardwareis oftenusedfor virtual environmentdemonstrations.Suchheavy usestressesanexperimentalsys-tem.Thus,thesystemmustbeself-calibratingwhereverpos-sible.

Object identification and trackingon the the surfaceofthedeskis oneof themostvaluedservicesfor thePerceptiveWorkbench. Fortunately, it is alsothemosteasyto maintain.This servicerequires only onecameraandtheinfraredlightsunderthedesk.Thissystemis easyto installandrealignwhennecessary. In addition,thecomputervisionsoftwareautomat-ically adjuststo the lighting levels available eachtime thesystemis initialized, making the systemrelatively robust tochanges thatoccuronaday-to-daybasis.

ThePerceptive Workbench’s gesturetracking softwareisalsousedextensively. While theinfraredlight abovethetableis relatively protectedin everyday use,the side-view cam-erais not.If a userbumps thesidecameraoutof position,itscalibrationproceduremustberedone.Fortunately, thisproce-dureis not difficult. Embedding thecamerainto a wall nearthesideof theworkbenchmayreducethisproblem.

Threedimensional reconstructiononthePerceptiveWork-benchrequiresthepositionsof theoverheadlightstobeknownto within centimeters.The positionof eachlight constrainsthe positionsof the other lights due to the limited surfaceof thedeskon which a reconstructionsubjectcanbeplacedandstill casta shadow thatdoesnot interestthedesk’s edge.In addition, reconstruction requiresthemostpiecesof appa-ratusand the most careful alignment. Thus, reconstructionprovesthebiggestchallengetophysicallymoving thePercep-tiveWorkbench.Fortunately, thePerceptiveWorkbenchstaysin oneplacefor extended periods of time, andtheoverhead

Page 12: Computer Vision-BasedGesture Tracking, Object Tracking ...wearables.cc.gatech.edu/publications/starner-mva2002.pdf · terrain navigation. This paper focuses on the techniques used

12 ThadStarneretal.

lights areout of the way of mostexperimentsandotherap-paratus. However, theoverheadlights do burn out with timemustbereplaced.

12 Futur eWork

Many VR systemsusehead-trackedshutterglassesandstereo-scopic imagesto get a more immersive effect. In order tomake thesesystemsfully wireless,we needto applyvision-basedmethods to alsotrack the user’s head. At present,weareresearching inexpensive androbust ways to do this thatstill meet the performancecriteria. Resultsfrom Ware andBalakrishnan[39] suggest that,in contrastto fully immersivesystemswhereusersweara head-mounteddisplayandrela-tively small headrotations cancauselargeviewpoint shifts,semi-immersivesystemsdonot imposesuchhighrestrictionsonhead-movementlatency. In fact,sincetheheadpositionismuchmoreimportant thantheheadorientation in thesesys-tems,latency caneven beslightly largerthanwith thegestureandobject tracking.

In addition, we will work on improving the latency ofthe gesture-rendering loop through coderefinement andtheapplication of Kalmanfilters. For the recognition of objectson the desk’s surface,we will explore the useof statisticalmethods thatcangive us betterwaysof handling uncertain-tiesanddistinguishingnew objects.Wewill alsoemploy hid-denMarkov modelsto recognizesymbolic hand gestures[31]for controlling theinterface.Finally, ashintedby themultiplegamemastersin thegaming application, several usersmaybesupportedthroughcareful,activeallocationof resources.

13 Conclusion

ThePerceptiveWorkbenchusesavision-basedsystemto en-ablea rich setof interactions, including handandarm ges-tures,objectrecognition andtracking, and3D reconstructionof objects placedon its surface.Latency measurementsshowthatthePerceptiveWorkbench’strackingcapabilitiesaresuit-ablefor real-time interaction.

All elementscombine seamlesslyinto thesameinterfaceandcanbeusedin variousapplications.In addition, thesens-ing systemis relatively inexpensive,usingstandard camerasandlightingequipmentplusacomputerwith oneor twovideodigitizers,depending on thefunctionsdesired. As seenfromthemultiplayergaming, terrainnavigation, andtelepresenceapplications, the Perceptive Workbench encouragesan un-tetheredandspontaneousinterface thatencouragestheinclu-sionof physicalobjectsin thevirtual environment.

Acknowledgements

Thiswork is supportedin partby fundingfrom Georgia Insti-tuteof Technology’sBroadbandInstitute.Wethank BradSin-gletary, William Ribarsky, ZacharyWartell,David Krum, and

Larry Hodges for their help building the Perceptive Work-benchandinterfacing it with theapplicationsmentionedabove.In addition we thank Paul Rosin and Geoff West for theirline segmentationcode,thePurdueCADLab for TWIN, andPaoloCignoni, ClaudioRocchini,andRobertoScopigno forMetro.

References

1. O. Bimber. Gesturecontrolledobjectinteraction:A virtual ta-ble casestudy. In 7th Int’l Conf. in Central Europeon Com-puter Graphics,Visualization,and Interactive Digital Media(WSCG’99), volume1, Plzen,CzechRepublic,1999.

2. A. Bobick,S. Intille, J.Davis, F. Baird,C. Pinhanez,L. Camp-bell, Y. Ivanov, andA. Wilson. Thekidsroom:A perceptually-basedinteractive and immersive story environment. PRES-ENCE:Teleoperators andVirtual Environments, 8(4):367–391,August1999.

3. R. Bolt andE. Herranz. Two-handedgesturein multi-modalnaturaldialogue. In ACM Symposium on User InterfaceSoft-ware andTechnology (UIST’92), pages7–14, 1992.

4. P. Cignoni, C. Rocchini, and R. Scopigno. Metro: Measur-ing error on simplified surfaces. ComputerGraphicsForum,17(2):167–174,June1998.

5. D. DaumandG. Dudek. On 3-d surfacereconstruction usingshapefrom shadows. In IEEE Conference on ComputerVisionandPatternRecognition (CVPR’98), pages461–468,1998.

6. J.W. Davis andA.F. Bobick. Sideshow: A silhouette-basedin-teractive dual-screenenvironment. TechnicalReportTR-457,MIT MediaLab TechReport,1998.

7. Computer Aided Design and Graphics Laboratory (CAD-LAB). TWIN Solid Modeling Package Reference Man-ual. School of Mechanical Engineering,PurdueUniversity,http://cadlab.www.ecn.purdue.edu/cadlab/twin,1995.

8. K. Dorfmueller-UlhaasandD. Schmalstieg. Fingertrackingforinteractionin augmentedenvironments. In Proceedingsof the2nd ACM/IEEE InternationalSymposiumon AugmentedReal-ity (ISAR’01), 2001.

9. G.W. Fitzmaurice,H. Ishii, andW. Buxton. Bricks: Laying thefoundationsfor graspableuser interfaces. In ProceedingsofCHI’95, pages442–449, 1995.

10. M. Fukumoto,K. Mase,andY. Suenaga.Real-timedetectionof pointingactionsfor aglove-freeinterface.In ProceedingsofIAPRWorkshoponMachineVisionApplications, Tokyo, Japan,1992.

11. H. Ishii and B. Ullmer. Tangiblebits: Towardsseamlessin-terfacesbetweenpeople,bits, and atoms. In ProceedingsofCHI’97, pages234–241, 1997.

12. T. Jebara,C. Eyster, J. Weaver, T. Starner, and A. Pentland.Stochasticks:Augmentingthe billiards experiencewith prob-abilistic vision and wearablecomputers. In ProceedingsoftheFirst Intl. Symposiumon WearableComputers, Cambridge,MA, 1997.

13. C. Jennings. Robust finger tracking with multiple cameras.In Proc. of the International Workshop on Recognition, Anal-ysis,andTrackingof FacesandGesturesin Real-TimeSystems,pages152–160,1999.

14. M. Krueger. Artificial RealityII . Addison-Wesley, 1991.15. W. Krueger, C.-A. Bohn,B. Froehlich,H. Schueth,W. Strauss,

and G. Wesche. The responsive workbench: A virtual workenvironment. IEEEComputer, 28(7):42–48,July 1995.

Page 13: Computer Vision-BasedGesture Tracking, Object Tracking ...wearables.cc.gatech.edu/publications/starner-mva2002.pdf · terrain navigation. This paper focuses on the techniques used

ThePercepti9 ve Workbench 13

16. D.M. Krum, O. Ometoso, W. Ribarsky, T. Starner, andL. Hodges.Speechandgesturemultimodalcontrolof a wholeearth3d virtual environment. In submittedto IEEE Virtual Re-ality 2002Conference, 2002.

17. A. Laurentini. How many 2d silhouettesdoesit take to recon-structa 3d object?ComputerVision andImage Understanding(CVIU), 67(1):81–87,July 1997.

18. B. Leibe,D. Minnen,J. Weeks,andT. Starner. Integrationofwirelessgesturetracking,object tracking,and 3d reconstruc-tion in theperceptive workbench. In Proceedingsof 2ndInter-national Workshop on ComputerVision Systems(ICVS2001),volume2095of Lecture Notesin ComputerScience, pages73–92.Springer, Berlin, July 2001.

19. B. Leibe,T. Starner, W. Ribarsky, Z. Wartell,D. Krum, B. Sin-gletary, andL. Hodges. Toward spontaneousinteractionwiththeperceptive workbench. IEEE ComputerGraphics& Appli-cations, 20(6):54–65,Nov. 2000.

20. W. Matusik,C.Buehler, S.Gortler, R.Raskar, andL. McMillan.Imagebasedvisualhulls. In Proceedingsof SIGGRAPH2000,2000.

21. F.K.H. Quek.Eyesin theinterface.Image andVision Comput-ing, 13(6):511–525,Aug. 1995.

22. J.M. Rehgand T. Kanade. Visual tracking of high dof artic-ulatedstructures:an applicationto humanhandtracking. InThird EuropeanConferenceon ComputerVision (ECCV’94),pages35–46,1994.

23. J.RekimotoandN. Matsushita.Perceptualsurfaces:Towardsahumanandobjectsensitive interactive display. In Workshop onPerceptualUserInterfaces(PUI’97), 1997.

24. P.L. Rosin and G.A.W. West. Non-parametricsegmentationof curves into variousrepresentations.IEEE TransactionsonPatternAnalysisandMachineIntelligence, 17(12):1140–1153,1995.

25. Y. Sato,Y. Kobayashi, andH. Koike. Fast trackingof handsandfingertipsin infraredimagesfor augmenteddeskinterface.In Proc.of theFourth IEEEInternational Conferenceon Auto-maticFaceandGesture Recognition, pages462–467,2000.

26. A.F. Seay, D. Krum, W. Ribarsky, andL. Hodges.Multimodalinteractiontechniquesfor the virtual workbench. In Proceed-ingsCHI’99 ExtendedAbstracts, pages282–283,1999.

27. J.SegenandS.Kumar. Shadow gestures:3d handposeestima-tion usinga singlecamera.In IEEE Conferenceon ComputerVision and Pattern Recognition (CVPR’99), volume 1, pages479–485,1999.

28. R.SharmaandJ.Molineros.Computervisionbasedaugmentedreality for guiding manualassembly. PRESENCE: Teleopera-tors andVirtual Environments, 6(3):292–317,1997.

29. S.K.Srivastava andN. Ahuja. An algorithmfor generatingoc-treesfrom object silhouettesin perspective views. ComputerVision,Graphics,andImageProcessing:ImageUnderstanding(CVGIP:IU), 49(1):68–84, 1990.

30. T. Starner, B. Leibe,B. Singletary, andJ. Pair. Mind-warping:Towardscreatinga compellingcollaborative augmentedrealitygaminginterfacethroughwearablecomputersandmulti-modalinput andoutput. In IEEE InternationalConferenceon Intelli-gentUserInterfaces(IUI’2000), 2000.

31. T. Starner, J.Weaver, andA. Pentland.Real-timeamericansignlanguagerecognitionusingdeskandwearablecomputer basedvideo. IEEE Transactionson Pattern Analysisand MachineIntelligence, 20(12):1371–1375,1998.

32. D. Sturman.Whole-handInput. PhD thesis,MIT Media Lab,1992.

33. S. Sullivan andJ. Ponce.Automaticmodelconstruction, poseestimation,andobjectrecognitionfrom photographsusingtri-angular splines. IEEE Transactionson Pattern AnalysisandMachineIntelligence, 20(10):1091–1097, 1998.

34. B. Ullmer andH. Ishii. Themetadesk:Modelsandprototypesfor tangibleuserinterfaces.In ACM Symposiumon UserInter-faceSoftwareandTechnology(UIST’97), pages223–232,1997.

35. J. Underkoffler andH. Ishii. Illuminating light: An opticalde-signtool with a luminous-tangible interface.In ProceedingsofCHI’98, pages542–549, 1998.

36. A. Utsumi andJ. Ohya. Multiple-hand-gesturetrackingusingmultiple cameras. In IEEE Conference on ComputerVisionandPatternRecognition (CVPR’99), volume1,pages473–478,1999.

37. R. vandePol,W. Ribarsky, L. Hodges,andF. Post.Interactionin semi-immersive largedisplayenvironments.In Proceedingsof Virtual Environments’99, pages157–168,1999.

38. C. Vogler andD. Metaxas.Asl recognition basedon couplingbetweenhmmsand3d motion analysis.In SixthInternationalConference on ComputerVision (ICCV’98), pages363–369,1998.

39. C. WareandR. Balakrishnan. Reachingfor objectsin vr dis-plays: Lag and framerate. ACM Transactionson Computer-HumanInteraction, 1(4):331–356,1994.

40. Z. Wartell, W. Ribarsky, andL.F. Hodges. Third-personnav-igation of whole-planetterrain in a head-tracked stereoscopicenvironment. In IEEE Virtual Reality ’99 Conference, pages141–149,1999.

41. B. Watson,N. Walker, W. Ribarsky, andV. Spaulding. Theef-fectsof variationof systemresponsivenessonuserperformancein virtual environments.HumanFactors, 40(3):403–414,1998.

42. G. Welch and G. Bishop. Scaat:Incrementaltracking withincompleteinformation. In Conference Proceedings, AnnualConferenceSeries,1997,ACM SIGGRAPH, Aug. 1997.

43. P. Wellner. Interactingwith paperon thedigital desk.Commu-nicationsof theACM, 36(7):86–89, 1993.

44. C.Wren,A. Azarbayejani, T. Darrell,andA. Pentland.Pfinder:Real-timetracking of the humanbody. IEEE Transactionson Pattern Analysisand Machine Intelligence, 19(7):780–785,1997.