Top Banner
Abstract Alfred the robot won first place in the Hors d’Oeuvres Anyone? event and also received an award for the best integrated effort at the 1999 American Association of Artificial Intelligence robot competition. The three unique features that Alfred displayed were: the ability to nudge his way through a crowd to cover a large serv- ing area, a strong personality--that of a proper British butler, and the ability to recognize people he had served before. This paper describes the integrated navigation, natural language processing, and vision system that enabled these capabilities. 1 Introduction and task definition The American Association of Artificial Intelli- gence (AAAI) holds a national robot competition at their annual conference. This competition draws schools from around the country, and the event is judged by researchers and academics in the fields of artificial intelligence and robotics. This year’s competi- tions included the Hors D’oeuvres Anyone? event which required robots to serve hors d’oeuvres to con- ference attendees during the main conference recep- tion. The primary objective of this event was to have the robots unambiguously demonstrate interaction with the spectators. To evaluate the event, the judges looked at not only how the robots did during the final round at the reception, but also interacted with the robots during a preliminary round. This initial round gives the judges a chance to systematically test the full capabilities of each entrant in a more controlled setting. In 1999, this event was held at the Orlando Con- vention Center. The area where the robots were required to serve was extremely large--approximately 45m x 45m, and the ceilings were 15-20m high. There was no sound dampening material on the walls, ceiling, or floor. The illumination for the event consisted of directional lights shining down on the floor, that alter- nated between “cool” and “warm” colors every 5m. All of these factors made the evening event extremely chal- lenging for vision sensing and speech interaction. Alfred the robot was designed and constructed during a 10 week period prior to the competition by the authors, all of whom were either faculty or students at Swarthmore College. Two students worked primarily on the speech interaction, three on the visual sensing, and two on navigation and integration. Complete inte- gration of the parts took four weeks to accomplish Prior to the competition we had one “live” test run which gave us a benchmark and allowed us to focus our efforts on particular areas highlighted by the test. One of the major lessons of this experience was the need to begin integration even earlier in order to have a base platform from which to work. The remainder of this paper outlines Alfred’s tech- nical details. Section 2 highlights the physical design, section 3 the navigation and decision-making algo- rithms, section 4 the speech interaction, and section 5 the vision system. Section 6 presents an overall discus- sion and future directions of research and development. 2 Physical design The heart of Alfred’s physical design was a Nomad Super Scout II Mobile Robot, manufactured by Nomadics, Inc. The base configuration comes with a 233 MHz Pentium II processor, built-in sound, and a Captivator video frame grabber. The computer runs Linux--Red Hat 6.0 distribution--and links to the robot’s microcontroller through the serial port. On top of the base robot we built a penguin-shaped structure out of plywood, screen, and black & white felt. A shelf inside the penguin held the amplified speakers, microphone power supply, and the interface box for the nudging device. Alfred appears in Figure 1. The sensors attached to Alfred included a Costar CCD color camera with an 8mm lens, a Shure MX418S supercardioid (highly uni-directional) goose- neck microphone, and a rigid aluminum bumper with 5 contact switches for sensing feet. We installed the bumper on the front of the robot at a height of 4cm off the ground and interfaced the contact switches to the computer through the parallel port. Note that the bumper also kept the robot from pitching forward as there is no support wheel in front of the robot. The other modification we made was to add a fan to the top of the Scout base to lower the internal tem- perature. This proved to be essential to running the robot under the load we gave it. Alfred: The Robot Waiter Who Remembers You Bruce A. Maxwell, Lisa A. Meeden, Nii Addo, Laura Brown, Paul Dickson, Jane Ng, Seth Olshfski, Eli Silk, Jordan Wales Swarthmore College 500 College Ave. Swarthmore, PA 19081 [email protected], [email protected]
12

Alfred: The Robot W aiter Who Remembers Y oumeeden/papers/AlfredAAAI.pdf · Alfred the robot won first place in the Hors d’Oeuvres Anyone? event and also received an award for

May 09, 2018

Download

Documents

vuongtruc
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Alfred: The Robot W aiter Who Remembers Y oumeeden/papers/AlfredAAAI.pdf · Alfred the robot won first place in the Hors d’Oeuvres Anyone? event and also received an award for

Abstract

Alfred the robot won first placein the Hors d’OeuvresAnyone?event andalsoreceived an award for the bestintegratedeffort at the 1999 AmericanAssociationofArtificial Intelligence robot competition. The threeuniquefeaturesthat Alfred displayedwere: the abilityto nudgehis way througha crowd to cover a largeserv-ing area,a strongpersonality--thatof a properBritishbutler, andtheability to recognizepeoplehehadservedbefore.This paperdescribesthe integratednavigation,natural languageprocessing,and vision system thatenabled these capabilities.

1 Intr oduction and task definitionThe American Association of Artificial Intelli-

gence(AAAI) holds a national robot competitionattheir annual conference. This competition drawsschools from around the country, and the event isjudgedby researchersand academicsin the fields ofartificial intelligenceandrobotics.Thisyear’scompeti-tions included the Hors D’oeuvres Anyone? eventwhich requiredrobotsto serve horsd’oeuvresto con-ferenceattendeesduring the main conferencerecep-tion. The primary objective of this event was to havetherobotsunambiguouslydemonstrateinteractionwiththespectators.To evaluatetheevent,thejudgeslookedat not only how therobotsdid duringthefinal roundatthereception,but alsointeractedwith therobotsduringapreliminaryround.This initial roundgivesthejudgesa chanceto systematicallytest the full capabilitiesofeach entrant in a more controlled setting.

In 1999,this event washeld at the OrlandoCon-vention Center. The area where the robots wererequiredto serve was extremely large--approximately45mx 45m,andtheceilingswere15-20mhigh. Therewasnosounddampeningmaterialon thewalls,ceiling,or floor. The illumination for the event consistedofdirectionallights shiningdown on the floor, that alter-natedbetween“cool” and“warm” colorsevery5m.Allof thesefactorsmadetheeveningeventextremelychal-lenging for vision sensing and speech interaction.

Alfred the robot was designedand constructedduringa10weekperiodprior to thecompetitionby theauthors,all of whomwereeitherfacultyor studentsatSwarthmoreCollege. Two studentsworked primarily

on the speechinteraction,threeon the visual sensing,andtwo on navigation andintegration.Completeinte-gration of the parts took four weeks to accomplishPrior to the competitionwe had one “li ve” test runwhich gave us a benchmarkand allowed us to focusour efforts on particularareashighlightedby the test.One of the major lessonsof this experiencewas theneedto begin integrationevenearlierin orderto haveabase platform from which to work.

Theremainderof thispaperoutlinesAlfred’s tech-nical details.Section2 highlights the physical design,section 3 the navigation and decision-makingalgo-rithms, section4 the speechinteraction,andsection5thevision system.Section6 presentsanoverall discus-sionandfuturedirectionsof researchanddevelopment.

2 Physical designThe heart of Alfred’s physical design was a

NomadSuperScoutII Mobile Robot,manufacturedbyNomadics,Inc. The baseconfigurationcomeswith a233 MHz PentiumII processor, built-in sound,and aCaptivator video frame grabber. The computerrunsLinux--Red Hat 6.0 distribution--and links to therobot’s microcontroller through the serial port.

Ontopof thebaserobotwebuilt apenguin-shapedstructureout of plywood, screen,and black & whitefelt. A shelf inside the penguin held the amplifiedspeakers,microphonepower supply, and the interfacebox for thenudgingdevice.Alfred appearsin Figure1.

The sensorsattachedto Alfred includeda CostarCCD color camera with an 8mm lens, a ShureMX418Ssupercardioid(highly uni-directional)goose-neckmicrophone,anda rigid aluminumbumperwith 5contact switches for sensingfeet. We installed thebumperon thefront of therobotat a heightof 4cmoffthe groundand interfacedthe contactswitchesto thecomputer through the parallel port. Note that thebumperalso kept the robot from pitching forward asthere is no support wheel in front of the robot.

The othermodificationwe madewasto adda fanto the top of the Scoutbaseto lower the internaltem-perature.This proved to be essentialto running therobot under the load we gave it.

Alfr ed: The Robot Waiter Who Remembers You

Bruce A. Maxwell, Lisa A. Meeden, Nii Addo, Laura Brown, Paul Dickson,Jane Ng, Seth Olshfski, Eli Silk, Jordan Wales

Swarthmore College500 College Ave.

Swarthmore, PA [email protected], [email protected]

Page 2: Alfred: The Robot W aiter Who Remembers Y oumeeden/papers/AlfredAAAI.pdf · Alfred the robot won first place in the Hors d’Oeuvres Anyone? event and also received an award for

3 High-level control and navigationIt wasour taskto developa softwarearchitecturethat,

basedupon sensordata gatheredfrom its environment,would respondin an intelligent manner. The goal was tocreateanautonomoussystemthatwould serve horsd’oeu-vreswhile covering a large areaof the room, andseekingout and interacting with people.

The robot had to be able to differentiatepeoplefromobstacles,offer food to peopleit encountered,cover a widearea,detectwhen more food was needed,and navigate tothe refill station.Basedon the situationin which the robotfounditself, we wantedit to make appropriatecommentaryand engage people in some form of conversation. Weextendedthe robot-humaninteractionto include nudgingpeople out of the way when they blocked the robot’s path.

Whenchoosingthe architecturefor Alfred’s behavior,therewere several possibleapproaches.A commontech-niquein mobileroboticsis to developa subsumptionarchi-tecturewherea setof task-achieving behavior modulesareconstructedand layeredso that higher priority behaviorssubsumecontrolof lower priority behaviors by suppressingtheir outputs[2]. Traditionally, eachbehavior modulein asubsumptionarchitectureis a finite statemachine(FSM).The subsumptionstyle of architectureis quite robust andreactsquickly to changesin the environment. However,developmentmust be staged,starting from the simplestbehaviors and gradually adding more complex behaviors.

Due to the short time frame we had to prepare,wechoseto constructa single, high-level FSM instead.This

was also the techniqueusedby the 1998 contestwinner,Rusty the B.E.A.R. from the University of North Dakota[7]. A FSM lendsitself well to the situationof controlledinteraction for which we were developing the robot

By integrating the componentsof speech,vision, andnavigation throughthe FSM, we were able to accomplishourgoalof abasicservingbehavior. Uponstartup,therobotmovesaway from its refill stationtoward a guidancepointset for it in advance.Upon reachingthis point, the robotattemptsto detectpeoplein its field of view. Whenit findsthem,it movestowardthemandengagesin conversation.Ifit hasseenthepersonbeforeandrecognizesthem,therobotacknowledgesthis. New peopleare given nicknamesthatthe robot useslater when speakingto them again. It asksthem if they would like an hors d’oeuvre, and demandsproper decorum in their reply. If necessary, the robotavoids, ignores, or nudgespeople in order to cover abroaderarea.When the robot hasserved a predeterminednumberof people,it navigatesbackto the refill stationbylooking for a landmark,askingdirections,andusingdeadreckoning. After refilling, it moves back onto the floor tocontinue serving at the next unvisited guidance point.

3.1 Algorithms & theoryNavigationandmostlocalizationof therobotis accom-

plishedusingwheelencoders.Obstacledetectionis accom-plishedby sonarand peopleare detectedand recognizedvisually. As the robot navigatesthrougha crowd, the FSMdirectsits generalpathtowarda ‘guidancepoint’. Therearethreetypesof guidancepoints– general,intermediate,and

Figure 1.Alfred the Robot (left), and Alfred ser ving hor s d’oeuvres at the 1999AAAI Conf erence Reception (right)

Page 3: Alfred: The Robot W aiter Who Remembers Y oumeeden/papers/AlfredAAAI.pdf · Alfred the robot won first place in the Hors d’Oeuvres Anyone? event and also received an award for

imperative.Thegeneralguidancepointsform apatharounda room.They aresetfor therobotprior to initializationby apersonanddependupontheservingenvironment.Interme-diateguidancepointsaredynamicallycalculatedduringtheprocessesof the FSM. They are usedto direct the robottoward a certainpoint from which it canresumeits coursetowardthegeneralguidancepoints.Whentraveling to non-imperative guidancepoints, the robot will stop to engagepeopleit detectsalongtheway. Imperative guidancepointsarepointsto which the robot moveswhile ignoringpeopleit meetsalong the way, and avoiding them or nudging ifnecessary. Theseguidancepointsallow the robot to movesomewhereeven when it is hemmedin by peoplethat itwould usually stop to serve. The robot avoids inanimateobjectsby testingif the thing directly in front of the robotdisplays visual human characteristics.

TheFSM keepstrackof the robot’s positionon a mapstoredin memory. Whencheckingto ensurethat the robotcoversa certainarea,theboundingbox of thepathcoveredon themapis calculated.Thepixel sizeof themapis deter-mined by a predefined‘resolution’ value which indicatesthesizein inchesof a pixel on eachside.This valuecanbechangedin the programcodeso that mapscan be highlydetailed if necessary.

A diagramof theoverallFSMis givenin Figure2. Therobotbeginsin stateStart, from which thevision fork is ini-tiated.Vision processesare run independently, with com-mandsbeingpassedto Vision from the FSM via a sharedmemory structure.

From state Start, the FSM goes into stateMove_Engage, in which the robot moves away from therefill station,but will stopandserveany peopleencounteredalong the way (or handsthat are detectedin the servingtray). If the robot reachesits generalguidancepoint, theFSM will go into stateSense, which searchesfor peopletoserve andthengoesinto stateMove_Engage to move in the

direction of a possiblepersonwith an intermediateguid-ance point.

Every threeminutes,theFSMchecksto ensurethattherobotis moving outsideacertainpredeterminedarea.If thatarea has not been exceeded,the FSM goes into stateMove_Ignore to navigatetowarda dynamically-determinedimperative guidancepoint that takes the robot outsideitspreviousarea.Uponreachingthis point, therobot re-entersstateSense.

After a serve, if the FSM detectsthat the maximumnumberof serves have beenexceeded,it goesinto stateRefill which usesstateMove_Ignore to navigatebackto therefill stationwithout engaging people.At the station, theFSM goesinto stateWait, from which the robot can shutdown if sucha commandis issued.Otherwise,when thetray has been refilled, the FSM goes into stateMove_Engage to move away from therefill station,andthecycle repeats as before.

3.2 Integrating vision, speech, and navigationWe took a practicalapproachto integrating the three

main componentsof the robot: vision, speech,andnaviga-tion. Since we did not take a reactive or behavior-basedapproachto navigation, the FSM directly controlled thenavigation component. The Move_Engage andMove_Ignorestates,in particular, controlledthe motion ofthe robot basedon the current goal location and sensorreadings.Eachof the otherstateseitherrequiredthe robotto be still, or to be moving in a particularmanner--suchasrotating to look for the refill station.

Thevision andspeechcomponents,on theotherhand,weredevelopedindependentlyof the FSM andnavigation,in part becausea modularapproachpermitsmoreefficientdevelopmentandtesting.Bothvisionandspeechranassep-arateprocesses--visionas a fork, and speechas a set ofindependently called programs.

Figure 2.The overall finite state mac hine f or the r obot. The primar y action pattern is highlighted.

Star t Move_Enga ge Serve

Hand

Sense

Move_Ignore Refill

Wait

QuitVision [Fork]

If at a guidance point

If targetpoint reached

If too long inone area

If max_serves

Else if serve iscomplete

If hand in tray

If quitcommandgiven

exceeded

Page 4: Alfred: The Robot W aiter Who Remembers Y oumeeden/papers/AlfredAAAI.pdf · Alfred the robot won first place in the Hors d’Oeuvres Anyone? event and also received an award for

To engagespeech,theFSM wouldstarttheappropriatespeechprogramand then read the resultsof that call todeterminethespeaker’s responseif theprogramengagedinrecognition.This worked well becausethe speechgenera-tion and recognitioncould not be running simultaneouslydueto conflictswith thesoundresources.Themodularityofthis approachallowed us to test and refine eachspeechinteraction independently.

The vision process,by contrast,was a simple statemachineof its own, runningin parallelwith themainFSM.The FSM controlledthe vision processby settinga com-mandfield in a sharedmemorystructure.This commandfield would triggerthevision processto grabimage(s),exe-cute the requestedoperator, and return the results in theshared memory field. Some operators--like testing forhandsin front of thecamera--werefreerunningandwouldcontinueprocessingimagesuntil told to stop.Otheropera-tors--suchas searchingfor a badge--executedonly once.Becauseof the modularityof this approach,we could runthe vision processindependentlyof the main FSM, whichfacilitated development and testing.

Overall, themajorprincipleof integrationwe followedwas modularity. We also gave the robot the ability toaccomplisha sequenceof complex tasksby usingthe cen-tral FSM for high-level control. The weaknessof thisapproachwasthespeedandreactivity of therobotto its sur-roundingssincethe sensingprocessesstartedandstoppedbasedon the FSM state.Thesegeneralobservations areechoedby Bryson’s comparative analysisof robotarchitec-tures, which claims that they should have:

• a modular structure,• ameansto controlactionandperceptionsequencesfor

complex tasks, and• a means for reacting quickly to its environment [3].

In responseto the weaknessesof the pure FSMapproach,we have sinceevolved to a completelymodulararchitecturewith sensingandactingprocessesthatrun con-tinuously in parallel with a central controller.

3.3 Experiments & resultsTo testtheintegratedfunctioningof all elementswithin

theFSM, we put the robot into severalsituationsinvolvingvarying numbersof peoplewith whom the robot wouldhave to interact.Early in the testingwe found that visionwastheslowestprocess,sowe coordinatedspeechto over-lap while vision wasrunning,camouflagingthedelay. Notethat a part of this delaywasdueto the fact that in mostofthe testsituationswe werelogging imagesto disk to keeptrack of the accuracy of the vision processes.

We first testedtherobotunderindirectnaturallightingin a largeroomcontaining7 peoplespreadthroughout.Therobotwasableto navigateto its guidancepointsandengagepeoplealongtheway. Thesetestsdid not includeasituationwherethe robot wascompletelysurroundedby a groupof

people.Also, therewasn’t any backgroundnoiseto confusethe robot’s speechrecognition.The robot achieved 70%accuracy in personrecognition,90% accuracy in persondetection, and 75-85% accuracy in speechrecognitionwhich varied from person to person.

Therobotwasalsotestedin anincandescently-litroomwith about50 people.Again, theroomwasvery quiet,andthepeoplegave therobotspacesoit couldmove with ease.This test too was successful,however, as we did not logimages--loggingimagesslowed down the robot--for thisexercise, this is a purely qualitative assessment.

Theconditionsof thecompetitionwerequitedifferent.The lighting was more bluish, which dulled many of thecolors picked up by the camera,and varied from placetoplaceacrossthe competitionarea.In the first roundof thecompetition,where only the judgesand a few onlookerswereinvolved,therobotworkedalmostflawlesslyfrom thepoint of view of theaudience--again,we did not log imagesbecauseof speedconsiderations.Although the robot waspresentedwith a larger numberof people,they were allattentive to it and there was relatively little backgroundnoise.Therobotwasgivenplentyof roomto move, whichallowed it to navigate freely.

In the secondround of the competition,during theAAAI conferencereception,it wasmuchmoredifficult forthe robot to besuccessful.Therobotwasoftenhemmedinby jostling people,causingit to rotatein circlessearchingfor a way out. The nudgingalgorithmturnedout to be toonice,notsustainingits nudginglongenoughto gettherobotout of thecircle if thepeoplewerenot accommodating.Asnoted below, the worst of the robot’s problems,howeverwasbackgroundnoise,which greatly inhibited the speechrecognition and conversation aspects of the interaction.

In mostof thetrial runs,navigationencounteredsignif-icanterrorsin deadreckoning,causedby wheelskidswhilenudgingandoccasionallybumpingobjects.Obstacleavoid-ancewas still good; therewas only one collision with aninanimateobject. The errors in deadreckoning, however,were offset by the robot’s ability to find the refill stationlandmark,go to it, andthenresetits world coordinatesys-tembeforeheadingout again. This way it wasableto cor-rect its dead reckoning errors every 10-15 minutes.

The greatestproblemrevealedin the methodologyweemployed was that of speedof interaction.Many peoplewalked quickly past the robot and took an hors d’oeuvrewithout stopping.This triggeredthespeechinteraction,butby thetime therobotspoke,peoplewerealreadylonggone.Thus, the robot would wait for an answerfrom a personwho was no longer present or willing to speak to it.

4 Speech interactionAlfred’s speechsystemwasdevelopedprimarily to act

as the interface between human and machine. It was

Page 5: Alfred: The Robot W aiter Who Remembers Y oumeeden/papers/AlfredAAAI.pdf · Alfred the robot won first place in the Hors d’Oeuvres Anyone? event and also received an award for

throughspeechthatall thevarioushumaninteractionswerecarriedout.Wedecidedto augmentthis interactionby mak-ing it morelifelik e.As suchtheentirespeechsystemservedto build Alfred’s “British butler” personality. Thesegoalswereall achievedusingIBM’ s betaversionof theViaVoicesoftware developmentkit (SDK) for Linux [6], and stan-dard audio playback software that comes with Redhatrelease version 6.0.

4.1 Speech recognition systemViaVoice for Linux is available for public download

from IBM’ s website[6]. We hadthechoiceof usingIBM’ sspeechkit or theCenterfor SpokenLanguageUnderstand-ing (CSLU) speechtoolkit developedat theOregonGradu-ateInstituteof Science& Technology(OGI). ViaVoicewaschosenbecauseof its simplicity to implementand high-level interfacethat focusedmoreon theabstractfeaturesofspeechrecognition.Unlike CSLU,ViaVoicedid not requirethe programmerto specify any low-level preprocessingtechniquesof the audio file before recognitionwas per-formed;theViaVoiceengineperformedall this preprocess-ing. Another factor that contributed to our choice ofViaVoicewastheeaseof developmentof thegrammarfile.In speechrecognitionanutteranceis astreamof speechthatrepresentsacommand.A grammarfile is asetof wordsandphrasesgovernedby rulesthatdefineall theutterancesthatareto be recognizedat run-time.In ViaVoice therewasnoneed to make additional inputs of pronunciations,sincetherewas a built in dictionary of pronunciations.On theotherhand,CSLU requiredthis additionalinput.Themajordrawbackof ViaVoicewasthatit reliedgreatlyon thequal-ity of the spoken utterance,andthereforethe environmentneededto be reasonablyquiet to achieve high recognitionrates.This wasin partdueto thefact thatall of theprepro-cessingwasperformedby theengineandthereforewewereunableto modify the filters to suit our environment.Fur-thermoretheViaVoice input camedirectly from themicro-

phone and not an audio file. We obtained help inunderstandingtheViaVoiceSDK from accompanieddocu-mentation and the developers at IBM.

4.2 Speech interaction methodAll of the speechinteractionswere pre-definedand

basedon scriptsthat we wrote. Eachof thesescriptswasassociatedwith a stand-alonespeechprogram,and it con-tributed to the developmentof Alfred’s personality. Thestand-aloneprogramshada specifiedfinite stategrammar(FSG)file, whichcontainedthewordsandthephrasesto berecognizedby theViaVoicerecognitionengine.TheseFSGfileswerethecompiledoutputof Backus-NaurForm(BNF)files. TheseBNF files aresimple,but structuredtext files,written in a speechrecognitioncontrol language(SRCLand pronounced“circle”). The SpeechRecognitionAPICommittee and Enterprise Computer Telephony Forumjointly developed SRCL. The general form of a SRCLgrammar file consists of production rules in the form of (1)

< rule> = words or “phrase” (1)

Theleft sideof theproductionrule is synonymousto avariable nameand the right side specifiesthe individualwordsor phrases(givenin quotes)thatareto berecognized.An exampletaken from oneof Alfred’s BNF files is givenin Figure3. This exampleannotateseachrecognizedwordand phrasewith an integer, so that our speechprogramscould moreeasilyparsethem.More informationon SRCLgrammarscanbe found in the ViaVoice SDK documenta-tion.

An FSGfile waswritten for eachof Alfred’s primaryinteractionsincludingthe“serving interaction,” the“searchfor therefill-stationinteraction”andthe“at therefill-stationinteraction”. Each FSG file had many variations ofresponsesto questionslike “Would you like anhorsd'oeu-vre?”and“Whereis therefill station?”WealsodevelopedagenericFSG file to interpret yes/notype responses.Theassociatedprogramfor this FSGfile, servedto confirmout-

<<root>> = <affirmative_1> | <negative> | <vernacular> | <properYes> |<properNo> | <affirmative_2>.<affirmative_1> = yes: 1 | yeah: 1 | “yes i really would like a tastysnack”: 1.<affirmative_2> = “thank you very much”: 6 | “why thank you”: 6.<negative> = no: 2.<vernacular> = okay: 3 | sure: 3 | “why not”: 3 | “i guess so”: 3 | “ofcourse”: 3 | cope: 3.<properYes> = “yes please”: 4 | “but of course”: 4 | certainly: 4 | “ofcourse”: 4 | “yes” “please”: 4.<properNo> = “no thank you”: 5 | “I’m fine thank you”: 5 | “I’ll pass”: 5 |“I’ll pass thanks”: 5.

Figure 3.An example of a BNF file . This file was the grammar file f or the ser ving interaction.

Page 6: Alfred: The Robot W aiter Who Remembers Y oumeeden/papers/AlfredAAAI.pdf · Alfred the robot won first place in the Hors d’Oeuvres Anyone? event and also received an award for

put from the vision system.No explicitly definedspeechalgorithmswereusedin developingtheseprograms.How-ever, eachspeechinteractiontreewasbasedon productionsystems,with if-then-elseand casestatementsspecifyingwhat response was made based on a recognized utterance.

Work by Clif ford Nassproposesthat peopletend torespondpsychologicallyto computerpersonalitiesin thesameway that they respondto humanpersonalities[8], wedecidedto make recordedhumanresponsesfor Alfred asopposed to using a text-to-speechsynthesizer, therebyachieving a moreconvincing “humanpersonality”.To addto Alfred’s anthropomorphicnature,we madea minimumof five audiofiles for eachresponseandonewasselectedrandomlyat runtimeof the speechprogram.Consequentlyno two runsof the samespeechprogramwerealike, sincedifferent audio files were played back at runtime.

4.3 Experiments & resultsTestsof the speechinteractionwerequite goodin the

laboratory, achieving approximatelyan 85% recognitionrate.This numbertakes into considerationthe fact that allthespeechprogramsweredesignedto make threeattemptsat recognition per question asked, given that the firstattemptfailed.However, this wasnot thecaseat theAAAIreceptionwherethefinal roundof thehorsd’oeuvrescom-petitiontook place.Recognitionratesdroppedsignificantlyto about35%dueto thevery loud backgroundnoisein theconferencehall, in spite of the unidirectionalmicrophoneused.Anotherfactorthatmayhave contributedto this dropwasAlfred’s onboardsoundsystem.Thebuilt-in audiosys-tem,developedby ESSTechnologies,wasperceptiblylowin quality comparedto the64-bit Creative Labssoundcardused in the laboratory.

Our decisionto userecordedhumanresponsesprovedsuccessful,and Alfred was referredto by his given nameand not treatedlike a machine.In fact, someguestspro-ceededto talk casuallyto him asif he werea real person.Consequently, they talkedto him in full sentencesinsteadofthe short phrasesor single words which Alfred wasdesigned to understand.

5 Visual sensing

5.1 Detecting conference VIPsWith a singlecolor camera,Alfred usedblob-detection

to identify conferenceVIP’s by detectingcoloredribbonson their badges.For example, note the ribbon hangingbelow thebadgeof thepersonin thecenterof Figure1. Thecolor blob detectionprocesssearchedover an imageandcomparedsinglepixelswith the target color, so calibrationfor specific lighting conditions was necessary.

5.1.1 Relevant work Blob detectionis a standardtask invision and robotics.In a project similar to ours,a NASA

mobile robot that strives to recognizefaces,Wong et. al.[13] usedjust color information to detectblobs.The blobdetectionsimplifiedthesearchfor peopleby requiringpeo-ple in thetestingenvironmentto weara sweatshirtof a spe-cific color. The robot used a chromaticity comparisontechnique to detect the color of the sweatshirt.

Chromaticityis dependenton color and not intensity.For our ribbon detection,insteadof usingchromaticityweusedRGB color bounds.The reasonfor this was that thespecificrangeof target “colors” for detectionwerea non-linear mix of intensity and brightness,since somecolorbandshad greatervariation than others.Furthermore,theRGB color spaceworkedwell for this task,andwe avoidedthe extra computation by not using a different color space.

5.1.2 Algorithms & theory The blob detectionfunctiontakes as input the pixels of an image,and the RGB colorboundsof the blob it is searchingfor. First a loop is runthroughthepixels,countingthenumberof pixelswhich fallwithin the color boundsin eachcolumnandsummingtheresults into bins. A window of specifiedwidth is thenscannedacrossthebins,finding wherethemosttargetpix-els areat within a localizedregion. If the result is above agiven threshold,thenthe function returnsa 1 andthe left-mostcolumnlocationof theblob, otherwise,0 is returned.This function is called when there is a persondirectly infront of the robot.The imageis, therefore,alreadyentirelythatof theperson’s torso.Thismethodis significantlyfasterthanscanninga box acrosstheimage,becauseeachpixel isonly processed once.

TheRGBcolorboundsweredeterminedby usinga lin-ear searchalgorithm.The programneededseven parame-ters for simpleblob detection,the low rangeand the highrangeof eachcolor bandof the target color, aswell asthecutoff threshold.The linear searchalgorithm searchesforthespecifiednumberof iterationsover all of theparametersoneat a time for thebestsolution,asspecifiedby anevalu-ation function.The evaluationfunction takesasargumentsthe numberof parametersandthe valueof the parametersand returns a value that should increaseas the solutionimproves.A training setof twenty imagescontainingbothpositive and negative imagesis taken under the lightingconditionsof the test site and run throughthe evaluationfunction.SincetheRGBvaluesof thetargetcolormayvaryunderdifferent lighting situations,a calibrationusing thelinear searchfunction should be run before detectionisneeded in a new location.

5.1.3 Experiments & resultsOne of the biggesthurdlesof computervision with color is its dependenceon illumi-nation.As expected,theblob detectionprocesseshadto becalibratedat the operationsite. The pink ribbon detectionwas extremely accurateafter appropriatecalibration,withno falsepositives, and it found all visible ribbons in our

Page 7: Alfred: The Robot W aiter Who Remembers Y oumeeden/papers/AlfredAAAI.pdf · Alfred the robot won first place in the Hors d’Oeuvres Anyone? event and also received an award for

loggedimages.During thecompetitionwe only looked forpink ribbonssincethe otherimportantribbon color, white,could not be consistentlydetected.Figure 4 shows exam-ples imagesof successfulbadgedetections.Note that inFigure4(b) thebadgeis barelyvisible in the lower right ofthe image,but thesystemwasstill ableto detectit becauseof the large vertical extent.

5.2 Recognizing landmarksWhenAlfred ranout of food on its tray, it usedvision

along with confirmationfrom the handlerto recognizeadistinctiveblackandwhite landmarkplacedabove its initialstarting point to guide it back to the refill station

5.2.1 Relevant work The landmarkdetectionmethodweusedwasdesignedby D. ScharsteinandA. Briggs [10] atMiddlebury College. They developed a robust algorithmthat recognizesself-similar intensity patternsthat worksundera wide rangeof viewing and lighting conditionsinnear-real time.

5.2.2 Algorithms & theory Self-similarintensitypatternsare basedon self-similar functions. The graph of thesefunctionsareidenticalto themselvesscaledby a constantpin thehorizontaldirection.A propertyof self-similarfunc-tionsis thatthey arealsoself-similaratascaleof pk, mean-ing that the self-similar property is invariant to viewingdistance.

Thismethodoperatesreliablyonsinglescanlineswith-out any preprocessingandrunsin near-real time. Sincethemethodusessinglescanlines,it successfullyrecognizesthelandmark even when part of the pattern is being occluded.

We useda pre-compiledprogramobtainedfrom Mid-dlebury Collegewhich takesasinput any PGM image,andif a self-similar landmarkis found, returnsthe pixel loca-tionsof thetwo Xs markingtheverticalheightof theright-most strip of the landmark.After some experiments,asdescribedbelow, wewereableto convert thepixel locationsto a bearingandapproximatedistanceto the refill station.We usedknowledgeof thecamera’s field of view to calcu-late bearingandan empirically-calculatedequationto find

the distancebasedon the vertical height of the detectedlandmark.Thedistanceequationwasderivedby takingpic-turesof thelandmarkat known distancesandfitting a func-tion to thedata,knowing thattheverticalheightis inverselyproportional to distance.

5.2.3 Experiments & resultsThe landmark recognitionworked remarkablywell. In analyzingthe capabilitiesoftheself-similarpatternrecognitionprogram,wedeterminedthatif weusedthe8.5" x 11" patternprovided,wecouldgetreliableresults--betterthan90%correctdetectionandlocal-ization--fromup to 10 feetaway using320x240images.Tousethis methodfor refill stationrecognition,we neededtocustomizeit so it recognizedthe landmarkat least40 feetaway. Sincethedetectionof the landmarkis limited by thenumberof landmarkpixels per scanline,we doubledthesize of the landmark and captured 640x480 grayscaleimagesfor this purpose,increasingthe detectablerangetoabout 50 feet.

During the competition, the landmark detectionworkedwell enoughthat,althoughsometimesthelandmarkwaspartiallyblockedby aheador ahandin theconference,it still returneda reliablebearing,asjudgedby thedirectionthe robot headedafter each successfulrecognition. Theapproximatedistancereturned,however, wasnotasreliablesincea few occludedpixelsmeantseveral feetof miscalcu-lation.To compensatefor this,Alfred would askwhomeverwas nearbyif it was at the refill station.If the reply wasnegative, the robot would repeatlooking for the landmark.Figure4 shows an exampleimagefrom a successfulland-mark detection during the final competition.

5.3 Locating peopleAs Alfred’sprimarytaskwasto servepeople,hehadto

havearobust,fast,andaccuratepersondetectionalgorithm.In addition, to make the interactionmore interestingwedevelopedashort-termrecognitionalgorithmbasedonpeo-ple’s clothes.Thepersondetectioncombinedtwo indepen-dentmethods:oneusedmovementdetection,theotherusedskin-region detectioncombinedwith eye templatematch-

Figure 4Images fr om the final r ound of the competition. The left and center ima ges were successfulbadg e detections, while the right-most ima ge was a successful landmark detection.

Page 8: Alfred: The Robot W aiter Who Remembers Y oumeeden/papers/AlfredAAAI.pdf · Alfred the robot won first place in the Hors d’Oeuvres Anyone? event and also received an award for

ing. Thecombinationof thesetwo methodsprovidedmorerobust and accurate results than either method by itself.

5.3.1 Relevant work The humanlocatorbasedon move-mentdetectionwasmodeledafterthevision systemusedinRustytheB.E.A.R., the1998horsd’oeuvresservingrobotfrom the University of North Dakota [7]. We consideredaneuralnetwork-baseddetector[9], but themovementdetec-tor waschosenfor its speedandbecauseit doesnot requirean extensive searchthrough an image. In addition, themovement detector is simpler since there is no explicittraining needed for this type of system.

Thepersondetectionbasedonskindetectioncombinedwork onprefilteringimages[4], andfuzzypatternmatching[14]. Thefuzzy patterndetectionwasusedasa fastmethodof filtering for skin color. Thesefiltered imageswherethenused for templatematchingsimilar to that describedbyChan and Lewis [4].

5.3.2 Algorithms and theoryThe person locator basedon motionusedtwo differentmodes:generaldetection,andclose-persondetection.Both modesrequirethe robot to bestopped,and thus are activated by the FSM only in theappropriatestates.For general person detection,Alfredsearchedthe entire imagefor concentratedmovementandreturned an approximatedistance and heading for themovement.Three320x240color imageswerecapturedinquick succession.For each image, a 3x3 Sobel operator[11] wasappliedto thepixel valuesin orderto identify edgepixels. Consecutive imageswere then subtractedto formtwo differenceimagesthat representededgespresentin thesecondof thetwo imagesbut not in thefirst [5]. SeeFigure5 for an example of the capturedand processedimages

involved in the movement detection system.Then,threepassesweremadethrougheachof thedif-

ferenceimagesto determinewhetherthereexistedlocalizedmovementandto identify at whatdistanceandheadingthismovementoccurred.In orderto improve speed,Alfred cal-culated table values for each column of the differenceimagesso that repeatedcalculationswereeliminated.Eachpasscorrespondedto an approximatedistanceaway fromtherobotby runningappropriatesizeboxesover thebottomof the image that looked for different concentrationsofmovementpixels.Thelargebox correspondedto a distanceapproximately4 feet away; the mediumsizebox to a dis-tanceapproximately8 feetaway;andthesmallboxapprox-imately 13 feet away. Note that a personfar away wouldgenerally trigger a responsefor eachbox filter. To helpselectthe appropriatedistance,the large andmediumsizeboxes were broken into horizontal regions suchthat eachhorizontalregion had to satisfy a certainthreshold.Thus,the largebox filter would only detecta personif they filledeachhorizontal region (i.e., the whole box). Finally, if ahumanwasdetectedin oneof theboxes,aheadingwascal-culatedbasedon the column numberin the centerof thesearchbox.Specifically, wedeterminedexperimentallythatthe lens hada field of view approximatelyequalto 42.74degrees.Therefore,the following equationdeterminedtheangle from the center of the field of view to the person.

heading = (columns / 2 – column number) *(FOV / columns) (2)

The resultingheadingis in thesameunitsastheFOV.For Alfred’s physical setup,we used320asthenumberofcolumns,and 42.74¡as the FOV. Figure 6 shows a persondetectedat eachof thethreedistancesandthesearchboxes

Original Edge Images

Difference ofedge images

Figure 5.Person Locator -- Successive captured ima ges (left), calculated edg e images (mid dle), and onediff erence ima ge (right).

Page 9: Alfred: The Robot W aiter Who Remembers Y oumeeden/papers/AlfredAAAI.pdf · Alfred the robot won first place in the Hors d’Oeuvres Anyone? event and also received an award for

that were used in the detection.The secondmode--closepersondetection--was acti-

vatedafterdetectinganobstaclewhichmightbeahumantobe served. The close-persondetectorcapturedtwo succes-sive240x320images,performededgecalculations,andcre-ateda singledifferenceimageall in thesamemannerasinthefirst phase.In this phase,however, Alfred searchedonlythecenterhalf of theimageto seeif enoughmovementpix-elswerepresentto distinguishits forwardobstacleasaper-sonandnot asa staticobject.Alfred returneda true valueonly if the forward obstacle displayed significant motion.

Thepersondetectionalgorithmbasedon color workedin two phases:filtering andtemplatematching.Thefilteringpassuseda trainedfuzzy histogramspecifyingthesetSkinto filter the pixel valuesinto likely andunlikely faceloca-tions.Thetemplatepassconsistedof oneor moreeye tem-platesscannedover theregionsof theimageselectedin theprefiltering stage.

To createthe fuzzy histogram,we took a seriesof pic-turesof peoplein the areawhereAlfred wasto serve, andtheneditedthemto replaceall non-skincolor areasof thepicture with black. The programthen went througheachpicture,usingall non-blackpixelsto generatethefuzzy-his-togram.For all non-blackpixels the imagecolor the train-ing program normalized them by using equation (3),

(3)

where are the threecolor compo-

nents found in the original 320x240 color images,and

arethe threenormalizedcolors.The pro-

gramthenusedther andg values(b valuesweretoo noisy)to index into a 32x32histogramandincrementthe appro-priatecell. This sameprocedurewasfollowedfor theentiretestsetof images.Theprogramthenlocatedthefinal histo-gram’s largestvalue, and divided eachof the cells in thehistogramby thatvalue,scalingthehistogramvaluesto therange[0, 1]. Wecanconsidertheresultinghistogramto bea

fuzzy membershipsetfor pixelsbelongingto the fuzzy setSkin[14].

OncetheSkinsetis trained,wecanuseit to filter anewimagefor skin tones.The computeraccomplishedthis bynormalizingthe color of eachpixel usingequation(3) andthenindexing into theappropriatecell of theSkinhistogramto transformtheimageinto skin tonemembershipvalues.Itthenreducedthenew imageby a factorof four in eachaxisto speedup skin block detection.Using an appropriatelysizedblock, it locatedall potentialfaceregionsby compar-ing the average skin membership value against a threshold.

If the averagewasgreaterthanthe threshold,the pro-gram consideredit a possiblefaceand began the secondphaseof detection,templatematching.The templatewascreatedby croppinga testimagedown to ablock thesizeofapair of eyesandthenshrinkingthemby a factorof four ineachaxis so that they would matchthe shrunkimage.Byrunningthetemplateacrossthetophalf of thehypothesizedhead,the programcalculatedthe sumof the squareof thedifferencesof thepixel valuesin theimageandtemplate.Ifthis valuewaslessthenapresetthresholdtheareawascon-sideredto be a person,andthe programreturnedthe hori-zontal location of the person in the image.

To increasetheaccuracy of thealgorithm,we usedtwodifferent head sized blocks, and two correspondingeyetemplates.Using two different sized templateshelpedusensurethat peopleat different distancesfrom the cameracould be found reliably. Note that in this implementationthe algorithm stoppedonce it found a likely candidate,ratherthansearchingfor all possiblecandidates,in ordertoreducecomputationtime.SinceAlfred only neededonetar-get to head towards, this decision worked well.

To combinethesetwo independentalgorithmswe usedthe following rule: if only oneof the two person-detectionmethodsfounda person,therobotwould follow thatresult,else if both of the two methodsfound a person,then therobot would usethe face-detectionmethodas it tendedtogiveamoreaccurateheading.As thetwo methodsarecom-plementary--thefacedetectionwill work whena personisstandingstill, while the motion detectionwill work if theperson’s faceis not detectable--thiscombinationprovided

ci

Ci

CnN∑-------------=

Ci R G B, ,{ }∈

ci r g b, ,{ }∈

Figure6 .Distance Finder – Close size search box appr oximatel y 1.3m away (left), medium size search box~2.5m away (mid dle), and far siz e search box ~4m away (right).

Page 10: Alfred: The Robot W aiter Who Remembers Y oumeeden/papers/AlfredAAAI.pdf · Alfred the robot won first place in the Hors d’Oeuvres Anyone? event and also received an award for

better performance than either method by itself.

5.3.3 Experiments and resultsDuring the final roundofthecompetition,Alfred loggedimagesfor a 15-20minutesportion of its time serving in the conferencehall. Thisinvolved approximatelyeight interactions.The movement-basedpersonlocatorloggeda total of 15 images,correctlydetectinga personat a proper distanceand heading12times;correctlydetectinga personat an improperdistanceand heading1 time; incorrectly detectinga person1 timewhenno personexistedin theimage;andnot detectingonepersonwhenit shouldhave.Theresultwasasuccessrateof80%. The closepersonidentifier logged31 total images,correctly identifying the forward obstacleas a person22times and incorrectly identifying it as a person9 times.Thus, the successrate for the close-humandetectorwasapproximately71%. (Note, the close-persondetectoronlylogged an image upon a detecting a person.)

As regardsthehistogram-baseddetection,in theSwar-thmoreEngineeringbuilding wherethe algorithmwasini-tially tested,it performedsuccessfullyover 90% of thetime. Upon arriving at the roboticscompetition,however,thesystemexperiencedseveraldifficulties.First, thecreme-coloredwalls weresimilar enoughto skin toneto appearinthe probability histogram.This problemwascompoundedby the lights in the conventioncenterwhich castshadowson thewalls thatcould fool theeye templateat certaindis-tances.Also, theconventioncenterlighting usedlight bulbswith two differentspectrumsthat alternatedin their cover-ageof theroom.Thewide variancebetweenthespectrumsof the different typesof light would throw off the persondetectionunlessthehistogramwastrainedwith a largedataset.We took over 50 training imagesin a variety of loca-tions aroundthe convention center in order to provide arobust training set for the fuzzy histogram. When welogged images in the final round of judging, the robotdetectedfour different peopleusing the histogram-basedskin detectionalgorithm.Of thesefour, threewerecorrectdetections,while thefourthwasawall, for asuccessrateof

75%. The small numberof detectionswas due to the factthatwhenAlfred waslogging imagesfewer peoplewereatthe reception and paying attention to the robots.

5.4 Recognizing peopleColor andtexture histogrammatchingwaschosenfor

personrecognition,usingthe standardhistogrammatchingcriteria described in [12]. Alfred’s recognition systemfocusedon thecolorandtextureof clothing,asthephysicalplacementof the cameraallowed Alfred’s field of view toseeonly the torsoportion of a persononcethe personwasconversingwith therobot.Wedecidedto usebothcolorandtextureto increasethesizeof thesearchspacesincewehadto dealwith arbitrarycolorsand textures,unlike the workdescribedin [13] where peopleof interestwore specific,differentiable colors.

5.4.1 Algorithms & theory Alfred attemptedrecognitionwhenever he enteredthe Servebehavior of the FSM andsubsequentlydetectedacloseperson.Alfred capturedasin-gle 240x320imageandcroppedthe imageto includeonlythemiddle third so thatextraneouscolorscorrespondingtotheenvironmentsurroundingthepersonof interestwerenotincludedin the imageusedfor processing.A texture imagewascreatedfrom the RGB imagebasedon threedifferentpropertiesof calculatededgesin the color image.The redbandof thetextureimagecorrespondedto edgeorientation,in which orientationsfrom 0 to 180degreeswereassignedvaluesfrom 0 to 255accordingly. Similarly, thegreenbandof the texture imagecorrespondedto the amountof con-trast,which is characterizedby edgemagnitudes.Last, thebluebandcorrespondedto coarseness,which is definedbyedgedensity, or thenumberof surroundingedgepixelsin a5x5 area.Together, they createan imagewith RGB valuesthat can be manipulatedusing histogrammatchingin thesamemannerasthe original color image.Exampletextureimages are shown in Figure 7.

The three-dimensionalhistogramsare comparedbyaddingto a running total for eachcomparison.Eachaxis,

Figure7 .Person Recognition – Original image (far left), calculated texture image (left), texture band corre-sponding to edg e orientation (mid dle), te xture band corresponding to edg e magnitude (right),

and te xture band corresponding to edg e density (far right).

Page 11: Alfred: The Robot W aiter Who Remembers Y oumeeden/papers/AlfredAAAI.pdf · Alfred the robot won first place in the Hors d’Oeuvres Anyone? event and also received an award for

red,green,andbluearedividedinto 8 buckets,sothatthereare512bucketsin eachhistogram.Every pixel in theRGBimageis put into a bucket correspondingto the amountofred, greenand blue it contains.A histogramcomparisonconsistsof comparingeachof the buckets to the corre-spondingbucket in the other histogramand adding thelower of the two valuesto the total. The higher the totalvalue, the closer the match.

Thecomparisontook placeby dividing both theorigi-nal color imageandthecalculatedtexture imageinto threeequalhorizontalregionsto distinguishdifferentareasof thetorso. In total, eachpersonis definedby six histogramswhich are stored in a dynamically-createddatabasetowhichAlfred addsthroughoutthetimeheis serving.When-ever a personis served,Alfred goesthroughthesamepro-cessof capturingtheir image,creatingthe six histograms,andthensequentiallycomparingthehistogramsto all thosecurrentlyin thedatabase.Alfred returnsthebestmatchanda level of certaintyas to whetherhe believes that he hasserved thatpersonbefore.Threelevelswereused:0 meantno bestmatchwas found, 1 meantan unsurematchwasfound, and 2 meant a confident match was found.

5.4.2 Experiments & resultsA test run for the recogni-tion systemconductedbeforethe preliminaryroundof thecompetition yielded the following results on 7 subjects,with a total of 20 testpicturestaken;Alfred determinedthecorrect best match 13 times; an incorrect best match 2times; correctly found no good match1 time; and incor-rectly found no bestmatch4 times.Thus,the successratefor therecognitionsystemwas70%.It shouldbenotedthatin this test the subjectswereall awareof whenthe Alfredwas capturing test images.This allowed Alfred to takeaccurateimagerepresentationsof the subjects,which wasnot alwayseasyto accomplishin thedynamicenvironmentof the competition’s final round.

During the preliminary judging round, the robot cor-rectly identified the one judgewho interactedwith Alfredtwice.Likewise,eachof theotherjudgesreceiveda uniquenamefrom the robot. Alfred also correctly identified thewall twice whenpeoplemovedout of thecamera’s field ofview as Alfred took their picture.

In the final round, most people interactingwith therobotseemedto avoid thecamera.Thus,Alfred took manypicturesof thewall andclassifiedthemasoneof two labels.It did correctlyidentify two of therobotteammemberswhostooddirectly in front of the robot.However, it alsoincor-rectly mistook one memberof the audiencefor anotherbecauseboth werewearingsimilarly coloredand texturedshirts.Sincetherobotaskswhetherit got the identificationcorrect,thisactuallygave it achanceto beapologeticaboutthe mistake.

6 Future directions

6.1 Navigation and integrationAlthoughtheFinite StateMachineworkedwell, in the

future a less-rigidmodel would be better. A subsumptionarchitecturewould enabletherobotto exhibit a muchmoredynamicsetof behaviors thatcouldrespondmorefluidly toevents (such as being trapped in a group of people).Although this approachwill probablyrequiremoredevel-opment time, we believe it will be worth the effort.

6.2 Speech and personalityThereareseveral modificationsto the speechandper-

sonality systemthat we want to implementprior to nextyear’s competition. First, we intend to implement somemethodof noisecancellationperhapsby usingan adaptivenoisecancellation(ANC) filter [1]. Adaptive filters allowonly the desiredsignal to be processedandareconstantlyself-updatingto accountfor environmentalchanges.Twoalgorithmsthat can be usedto implementadaptive filtersareleastmeanssquares(LMS), which is robustandeasytoimplement,andtherecursive leastsquares(RLS), which isfasterbut its convergenceis not reliable.Two microphonesworking simultaneouslyareusedin ANC; oneis unidirec-tionalwhile theotheris omni-directional.Thenoiseinput isfrom the omni-directionalmicrophoneand this signal ispassedto theadaptive filter. Theunidirectionalmicrophonewould thenbeusedto recordthesubject’sutterancesandanadaptive noise cancellationis performedon it with theadaptive filter. The error signal or noise is thus removed.

A secondmodification that we intend to implementnext yearis to enablethespeechprocessingsystemto adaptto a situation. If there is a lot of backgroundnoise, forexample,Alfred might listen lessandjust make one-sidedconversation.

Finally we also intend to implement some auditoryspeaker-recognitionfeaturesto supplementthe visual per-son recognitionoutput from the vision system.A crudeimplementationof this would be to have eachguestsayaparticularphrasethe first time we meetthem,and extractuniquefeaturesfrom their speechwaveform that would bestoredas their voice template.When vision reports laterthatapersonhasbeenrecognizedwewouldconfirmthisbyaskingthepersonto repeatthesamephraseagain, to carryout the template recognition.

6.3 Vision systemThe high-level vision processingwas relatively accu-

rate,but wasoftennot fastenoughto beeffective in therap-idly-changing environment experienced in a crowdedexhibition hall. A persondetectedat onetime maymove toa completelydifferentlocationby thetime thatAlfred pro-cessesthe image information and navigatesto the calcu-lated destination. Similarly, the timing involved in

Page 12: Alfred: The Robot W aiter Who Remembers Y oumeeden/papers/AlfredAAAI.pdf · Alfred the robot won first place in the Hors d’Oeuvres Anyone? event and also received an award for

capturingimagesto be usedin the recognitionsystemwasvital in order to be accuratelyassessingonly thosecolorsandtexturesassociatedwith theperson,andnot thoseasso-ciatedwith the backgroundof the exhibition hall. There-fore, a moreusefulsystemwould have the ability to trackpeoplein real time sothatonly relevant informationis pro-cessedand updateddynamicallyalong with the changingbehavior of the humansto be served. Overall, the systemwasvery reliableandperformedwell in smaller, morecon-trolled environments.In order to make the systemmorerobust, a methodfor continually updatingthe informationandfor properlysegmentingtheimageto includeonly rele-vant information must be added.

With respectto theblob detection,thecolor basedbincountermethodis a fastandreliablemethodof blob detec-tion in ahomogeneousilluminationenvironment.TheRGBboundis suitablefor bright colors saturatedin one of thecolorbands,but if detectionof “mixed” colorsis needed,anapproachusinghistogrammatchingwould be moreappro-priate.Theuseof self-similarlandmarksturnedout to beanexcellentchoice,andfuture work may want to incorporatethe useof bar codesto provide morespecificnavigationalinformation [10].

Finally, the fuzzy histogram-basedmethod of facedetectionturnedout to be a good choice.Futurework inthis areawill be to combinethis prefilterwith active track-ing techniquesand better structuralmatching techniquesthan a simple template.

6.4 New capabilitiesBasedon our competitionexperienceand our experi-

encewith Alfred in a varietyof situations,thereareat leastthree new capabilitiesthat we believe a truly successfulwaiter robot needsto possess,andwhich will be the mainfocus of our work for the 2000 competition.

Thefirst of theseis theability to tracka personthat ittrying to serve from at least4-5 metersaway. This ability isnecessaryin order to avoid the situationwhere the robotheadsin a direction,only to find that thereis no onetherewhen it arrives. It would also enablethe robot to demon-strate dynamic adaptability to its environment.

The secondnew ability is that the robot needsto beableto adaptto the sensornoiselevels in its environment,particularlywith regard to speech.As notedabove, a robotwaiter needsto know both whenit canbe understoodandwhen it canunderstandothers.Only thencan it derive anappropriate interaction for a given situation.

Finally, a robot waiter needsto display more biomi-metic behavior--mimicing human reactions physically--than Alfred could. Small characteristicssuchas eyes thattrack the personbeingserved, the ability to raisea tray upanddown, or theability to turn its headin responseto stim-uli would make the robot’s interactionmore natural andendow it with moreperceived intelligence.Someof these

capabilitiesappearedin theearlystagesat the1999compe-tition, but bringingthemtogetherinto a single,robustrobotstructure is the challenge of the coming year.

References

[1] S. T. Alexander, AdaptiveSignal Processing:Theoryand Applications, New York: Springer-Verlag, 1986.

[2] R. Brooks, “A Robust-LayeredControl Systemfor aMobile Robot”, IEEE Journal of Roboticsand Auto-mation, vol. RA-2, no. 1, March 1986.

[3] J. Bryson, “Cross-ParadigmAnalysis of AutonomousAgent Architecture”,J. of Experimentaland Theoreti-cal Artificial Intelligence, vol. 12, no. 2, pp 165-190,2000.

[4] S. C. Y. ChanandP. H. Lewis, “A Pre-filterEnablingFast Frontal Face Detection”, in Visual InformationandInformationSystems, D. HuijsmansandA. Smeul-ders(ed.),Springer-Verlag,Berlin, pp. 777--784,June1999.

[5] E. R. Davies. Machine Vision: Theory, Algorithms,and Practicalities.2nd Ed. AcademicPress:London,1997.

[6] IBM ViaVoice SDK for Linux, http://www.soft-ware.ibm.com/is/voicetype/dev_linux.html

[7] B. Maxwell, S. Anderson,D. Gomez-Ibanez,E. Gor-don, B. Reese,M. Lafary, T. Thompson,M. Trosen,and A. Tomson, “Using Vision to Guide an Horsd’OeuvresServingRobot”, IEEEWorkshoponPercep-tion for Mobile Agents, June 1999.

[8] B. Reeves,C. Nass,TheMediaEquation:How PeopleTreatComputers,Television,andNew MediaLikeRealPeople and Places, Cambridge Univ. Press, June 1999.

[9] H. A. Rowley, S. Baluja,andT. Kanade,“Neural Net-work-BasedFace Detection”, IEEE TransactionsonPatternAnalysisandMachineIntelligence, vol. 20,no.1, January 1998.

[10]D. ScharsteinandA. Briggs,“FastRecognitionof Self-Similar Landmarks”, IEEE Workshop on Perceptionfor Mobile Agents, June 1999.

[11]M. Sonka,V. Hlavac,andR. Boyle, Image Processing,Analysis,and Machine Vision. 2nd Ed., PWSPublish-ing. Pacific Grove, CA, 1999.

[12]M. J. Swain andD. H. Ballard,“Color Indexing”, Int’lJournal of ComputerVision, vol. 7, no. 1, pp. 11-32,1991.

[13]C. Wong, D. Kortenkamp,andM. Speich,“A MobileRobotThatRecognizesPeople,” in IEEE InternationalConferenceon Toolswith Artificial intelligence, April,1995.

[14]H. Wu, Q. Chen, and M. Yachida, “Face DetectionFrom Color ImagesUsing a Fuzzy PatternMatchingMethod”, IEEE Transactionson Pattern AnalysisandMachine Intelligence, vol. 21, no. 6, June 1999.