Top Banner
To appear in International Journal of Artificial Intelligence in Education, 2000. Received January 1999; revised July 1999.
36

To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

Mar 19, 2019

Download

Documents

phamlien
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

To appear in International Journal of Artificial Intelligence in Education, 2000.

Received January 1999; revised July 1999.

Animated Pedagogical Agents: Face-to-Face Interaction in

Interactive Learning Environments

W. Lewis Johnson and Je� W. RickelCenter for Advanced Research in Technology for Education (CARTE)

USC Information Sciences Institute4676 Admiralty Way, Marina del Rey, CA 90292 USA

[email protected], [email protected]://www.isi.edu/isd/carte

James C. LesterDepartment of Computer ScienceNorth Carolina State UniversityRaleigh, NC 27695-7534 USA

[email protected]://multimedia.ncsu.edu/imedia

July 15, 1999

Abstract

Recent years have witnessed the birth of a new paradigm for learning environments:

animated pedagogical agents. These lifelike autonomous characters cohabit learning en-

vironments with students to create rich, face-to-face learning interactions. This opens

up exciting new possibilities; for example, agents can demonstrate complex tasks, em-

ploy locomotion and gesture to focus students' attention on the most salient aspect of

the task at hand, and convey emotional responses to the tutorial situation. Animated

pedagogical agents o�er great promise for broadening the bandwidth of tutorial commu-

nication and increasing learning environments' ability to engage and motivate students.

This article sets forth the motivations behind animated pedagogical agents, describes

the key capabilities they o�er, and discusses the technical issues they raise. The discus-

sion is illustrated with descriptions of a number of animated agents that represent the

current state of the art.

1 Introduction and Background

This paper explores a new paradigm for education and training: face-to-face interactionwith intelligent, animated agents in interactive learning environments. The paradigm joinstwo previously distinct research areas. The �rst area, animated interface agents (Andr�e &Rist 1996; Andr�e 1997; Ball et al. 1997; Hayes-Roth & Doyle 1998; Laurel 1990; Maes 1994;Nagao & Takeuchi 1994; Thorisson 1996), provides a new metaphor for human-computerinteraction based on face-to-face dialogue. The second area, knowledge-based learning en-vironments (Carbonell 1970; Sleeman & Brown 1982; Wenger 1987), seeks instructional

1

Page 2: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

software that can adapt to individual learners through the use of arti�cial intelligence. Bycombining these two ideas, we arrive at a new breed of software agent: an animated peda-gogical agent (Lester et al. 1999a; Lester, Stone, & Stelling 1999; Rickel & Johnson 1999a;Shaw, Johnson, & Ganeshan 1999).

Animated pedagogical agents share deep intellectual roots with previous work on knowl-edge-based learning environments, but they open up exciting new possibilities. As in pre-vious work, students can learn and practice skills in a virtual world, and the computer caninteract with students through mixed-initiative, tutorial dialogue (Carbonell 1970) in therole of a coach (Goldstein 1976; Burton & Brown 1982) or learning companion (Chan 1996).However, the vast majority of work on tutorial and task-oriented dialogues has focused onverbal interactions, even though the earliest studies clearly showed the ubiquity of nonver-bal communication in similar human dialogues (Deutsch 1974). An animated agent thatcohabits the learning environment with students allows us to exploit such nonverbal commu-nication. The agent can demonstrate how to perform actions (Rickel & Johnson 1997a). Itcan use locomotion, gaze, and gestures to focus the student's attention (Lester et al. 1999a;Noma & Badler 1997; Rickel & Johnson 1997a). It can use gaze to regulate turn-takingin a mixed-initiative dialogue (Cassell et al. 1994a). Head nods and facial expressions canprovide unobtrusive feedback on the student's utterances and actions without unnecessar-ily disrupting the student's train of thought. All of these nonverbal devices are a naturalcomponent of human dialogues. Moreover, the mere presence of a lifelike agent may in-crease the student's arousal and motivation to perform the task well (Lester et al. 1997a;Walker, Sproull, & Subramani 1994). Thus, animated pedagogical agents present two keyadvantages over earlier work: they increase the bandwidth of communication between stu-dents and computers, and they increase the computer's ability to engage and motivatestudents.

Animated pedagogical agents share aspects in common with synthetic agents developedfor entertainment applications (Elliott & Brzezinski 1998): they need to give the user animpression of being lifelike and believable, producing behavior that appears to the user asnatural and appropriate. There are two important reasons for making pedagogical agentslifelike and believable. First, lifelike agents are likely to be more engaging, making thelearning experience more enjoyable. Second, unnatural behaviors typically call attention tothemselves and distract users. As Bates et al. (Bates, Loyall, & Reilly 1992) have argued,it is not always necessary for an agent to have deep knowledge of a domain in order for itto generate behavior that is believable. To some extent the same is true for pedagogicalagents. We frequently �nd it useful to give our agents behaviors that make them appearknowledgeable, attentive, helpful, concerned, etc. These behaviors may or may not re ectactual knowledge representations and mental states and attitudes in the agents. However,the need to support pedagogical interactions generally imposes a closer correspondence be-tween appearance and internal state than is typical in agents for entertainment applications.We can create animations that give the impression that the agent is knowledgeable, but ifthe agent is unable to answer student questions and give explanations, the impression ofknowledge will be quickly destroyed.

Animated pedagogical agents also share issues with work on autonomous agents, i.e.,systems that are capable of performing tasks and achieving goals in complex, dynamic envi-ronments. Architectures such as RAP (Firby 1994) and Soar (Laird, Newell, & Rosenbloom1987) have been used to create agents that can seamlessly integrate planning and execution,adapting to changes in their environments. They are able to interact with other agents andcollaborate with them to achieve common goals (M�uller 1996; Tambe 1997). Pedagogi-

2

Page 3: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

cal agents must likewise exhibit robust behavior in rich, unpredictable environments; theymust coordinate their behavior with that of other agents; and they must manage their ownbehavior in a coherent fashion, arbitrating between alternative actions and responding toa multitude of environmental stimuli. Their environment includes both students and thelearning environment in which the agents are situated. Student behavior is by nature unpre-dictable, since students may exhibit a variety of aptitudes, levels of pro�ciency, and learningstyles. However, the need to support instruction imposes additional requirements that othertypes of agents do not always satisfy; in order to support instructional interactions, a ped-agogical agent requires a deeper understanding of the rationales and relationships betweenactions than would be needed simply to perform the task (Clancey 1983).

This paper lays out the motivations behind animated pedagogical agents, the key capa-bilities they o�er, and the technical issues they raise. Full technical accounts of individualmethods and systems can be found in the cited references.

2 Example Pedagogical Agents

This paper will make frequent reference to several implemented animated pedagogicalagents. These agents will be used to illustrate the range of behaviors that such agentsare capable of producing and the design requirements that they must satisfy. Some of thesebehaviors are similar to those found in intelligent tutoring systems, while others are quitedi�erent and unique.

The USC Information Sciences Institute's Center for Advanced Research in Technol-ogy for Education (CARTE) has developed two animated pedagogical agents: Steve (SoarTraining Expert for Virtual Environments) and Adele (Agent for Distance Learning: LightEdition). Steve (Figure 1) is designed to interact with students in networked immer-sive virtual environments, and has been applied to naval training tasks such as operatingthe engines aboard US Navy surface ships (Johnson et al. 1998; Johnson & Rickel 1998;Rickel & Johnson 1999a; 1997b). Immersive virtual environments permit rich interactionsbetween humans and agents; the students can see the agents in stereoscopic 3D and hearthem speak, and the agents rely on the virtual environment's tracking hardware to monitorthe student's position and orientation in the environment. Steve is combined with 3D dis-play and interaction software by Lockheed Martin (Stiles, McCarthy, & Pontecorvo 1995),simulation authoring software by USC Behavioral Technologies Laboratory (Munro et al.1993), and speech recognition and generation software by Entropic Research to produce arich virtual environment in which students and agents can interact in instructional settings.

Adele (Figure 2), in contrast, was designed to run on desktop platforms with conven-tional interfaces, in order to broaden the applicability of pedagogical agent technology.Adele runs in a student's Web browser and is designed to integrate into Web-based elec-tronic learning materials (Shaw, Johnson, & Ganeshan 1999; Shaw et al. 1999). Adele-basedcourses are currently being developed for continuing medical education in family medicineand graduate level geriatric dentistry, and further courses are planned for development bothat the University of Southern California and at the University of Oregon.

North Carolina State University's IntelliMedia Initiative has developed three animatedpedagogical agents: Herman the Bug (Lester, Stone, & Stelling 1999), Cosmo (Lester et al.1999a), and WhizLow (Lester et al. 1999b). Herman the Bug inhabits Design-A-Plant, alearning environment for the domain of botanical anatomy and physiology (Figure 3). Givena set of environmental conditions, children interact with Design-A-Plant by graphically as-

3

Page 4: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

Figure 1: Steve

Figure 2: Adele

4

Page 5: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

Figure 3: Herman the Bug

sembling customized plants that can thrive in those conditions. Herman is a talkative,quirky insect that dives into plant structures as he provides problem-solving advice to stu-dents. As students build plants, Herman observes their actions and provides explanationsand hints. In the process of explaining concepts, he performs a broad range of actions,including walking, ying, shrinking, expanding, swimming, �shing, bungee jumping, tele-porting, and acrobatics.

Cosmo provides problem-solving advice in the Internet Protocol Advisor (Figure 4).Students interact with Cosmo as they learn about network routing mechanisms by navigat-ing through a series of subnets. Given a packet to escort through the Internet, they directit through networks of connected routers. At each subnet, they may send their packet to aspeci�ed router and view adjacent routers. By making decisions about factors such as ad-dress resolution and tra�c congestion, they learn the fundamentals of network topology androuting mechanisms. Helpful, encouraging, and with a bit of an attitude, Cosmo explainshow computers are connected, how routing is performed, and how tra�c considerationscome into play. Cosmo was designed to study spatial deixis in pedagogical agents, i.e., theability of agents to dynamically combine gesture, locomotion, and speech to refer to objectsin the environment while they deliver problem-solving advice.

The WhizLow agent inhabits the CPU City 3D learning environment (Figure 5). CPUCity's 3D world represents a motherboard housing three principal components: the RAM,the CPU, and the hard drive. It focuses on architecture including the control unit (whichis reduced to a simple decoder) and an ALU, system algorithms such as the fetch cycle,page faults, and virtual memory, and the basics of compilation and assembly. WhizLow cancarry out students' tasks by picking up data and instruction packets, dropping them o� inspeci�ed locations such as registers, and interacting with devices that cause arithmetic andcomparison operations to be performed. He manipulates address and data packets, whichcan contain integer-valued variables. As soon as task speci�cation is complete, he beginsperforming the student's task in less than one second.

Andr�e, Rist, and M�uller at DFKI (the German Research Center for Arti�cial Intelli-

5

Page 6: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

Figure 4: Cosmo

Figure 5: WhizLow

6

Page 7: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

Figure 6: PPP Persona

gence) have developed an animated agent for giving on-line help instructions, called the PPPPersona (Andr�e, Rist, & M�uller 1999). The agent guides the learner through Web-basedmaterials, using pointing gestures to draw the student's attention to elements of Web pages,and providing commentary via synthesized speech (Figure 6). The underlying PPP systemgenerates multimedia presentation plans for the agent to present; the agent then executesthe plan adaptively, modifying it in real time based on user actions such as repositioningthe agent on the screen or asking follow-on questions.

3 Enhancing Learning Environments with Animated Agents

This section lists the key bene�ts provided by animated pedagogical agents by describingthe novel types of human-computer interaction they support. No current agent supports allof these types of interaction. Each type can signi�cantly enhance a learning environmentwithout the others, and di�erent combinations will be useful for di�erent kinds of learningenvironments. To provide a summary of achievements to date, we use existing agents toillustrate each type of interaction. At the end of the section, we discuss some early empiricalresults on the e�ectiveness of animated pedagogical agents.

3.1 Interactive Demonstrations

A simulated mock-up of a student's real work environment, coupled with an animated agentthat inhabits the virtual world, provides new opportunities for teaching the student howto perform tasks in that environment. Perhaps the most compelling advantage is that theagent can demonstrate physical tasks, such as operation and repair of equipment. Forexample, Figures 1 and 7 depict Steve showing a student how to operate a High PressureAir Compressor (HPAC) aboard a US Navy ship. Steve integrates his demonstrations with

7

Page 8: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

Figure 7: Steve pressing a button on the HPAC console

spoken commentary describing objectives and actions. Figure 1 shows him providing suchcommentary:

I will now perform a functional check of the temperature monitor to make surethat all of the alarm lights are functional. First, press the function test but-ton. This will trip all of the alarm switches, so all of the alarm lights shouldilluminate.

Steve then proceeds with the demonstration, as shown in Figure 7. As the demonstrationproceeds, Steve points out important features of the objects in the environment that relateto the task. For example, when the alarm lights illuminate, Steve points to the lights andsays \All of the alarm lights are illuminated, so they are all working properly."

Demonstrating a task may be far more e�ective than trying to describe how to performit, especially when the task involves spatial motor skills, and the experience of seeing a taskperformed is likely to lead to better retention. Moreover, an interactive demonstration givenby an agent o�ers a number of advantages over showing students a videotape. Studentsare free to move around in the environment and view the demonstration from di�erentperspectives. They can interrupt with questions, or even ask to �nish the task themselves, inwhich case Steve will monitor the student's performance and provide assistance. Also, Steveis able to construct and revise plans for completing a task, so he can adapt the demonstrationto unexpected events. This allows him to demonstrate the task under di�erent initial statesand failure modes, as well as help the student recover from errors.

The utility of agent demonstrations is not restricted to teaching physical tasks thatthe student must perform. Agents can also demonstrate procedures performed by complexdevices by taking on the role of an actor in a virtual process. For example, WhizLow, theagent in the CPU City learning environment, demonstrates computational procedures to

8

Page 9: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

teach novices the fundamentals of computer architecture. As he transports data packets andaddresses packets to the CPU, RAM, and hard drive, WhizLow teaches students how fetch-execute cycle algorithms work. In contrast to Steamer-style interactions (Hollan, Hutchins,& Weitzman 1984; Stevens, Roberts, & Stead 1983) in which knowledge-based simulationsguide the actions in a simulated world, learning environments in which the instructions areprovided by lifelike characters provide a visual focus and an engaging presence that aresometimes absent from their agentless counterparts.

3.2 Navigational Guidance

When a student's work environment is large and complex, such as a ship, one of the pri-mary advantages of a virtual mock-up is to teach the student where things are and how toget around. In this context, animated agents are valuable as navigational guides, leadingstudents around and preventing them from becoming lost. For example, Steve inhabitsa complex shipboard environment, including multiple rooms. The engine room alone isquite complex, with the large turbine engines that propel the ship, several platforms andpathways around and into the engines, a console, and a variety of di�erent parts of theengines that must be manipulated, such as valves. As Steve demonstrates tasks, he leadsstudents around this environment, showing them where relevant objects are and how to getto them. Because Steve has an internal representation of the spatial layout of the ship (seeSection 4), he is always able to plan the shortest path from his current location to the nextrelevant object. Leading someone down a hallway, up a ight of stairs, around a corner, andthrough some pipes to the valve they must turn is likely to be more e�ective than tryingto tell them where the valve is located. Our experience in training people using immersivevirtual reality has shown that students can easily become disoriented and lost in complexenvironments, so animated agents that can serve as guides are an important instructionalaid.

By enabling students to participate in immersive experiences, 3D learning environmentswith navigational guides can help students develop spatial models of the subject matter,even if these environments present worlds that the student will never occupy. For example,the CPU City environment depicts a virtual computer that the student can travel throughand interact with to acquire a mental model of the workings of a computer. Similar experi-ences could be provided by learning environments that o�er students tours of civilizationslong past, e.g., the wonders of ancient Greece, or of virtual museums housing the world'smasterpieces. Accompanied by knowledgeable guides, students can travel through these vir-tual worlds to learn about a variety of domains that lend themselves to spatial exploratorymetaphors.

Although Steve and WhizLow both inhabit 3D worlds, an animated navigational guidemay even be useful in 2D environments. For example, the CAETI Center Associate (Murray1997) serves as aWeb-based guide to a large collection of intelligent tutoring system projects.A virtual building houses these projects in individual \rooms." When a user �rst entersthe world, the CAETI guide interviews her about her interests to construct a customizeditinerary. It then escorts her from room to room (project to project) based on her interests.While the guides described above help students navigate 3D worlds, the CAETI Associatedemonstrates that 2D worlds may also bene�t from the presence of animated agents.

9

Page 10: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

3.3 Gaze and Gesture as Attentional Guides

Because of signi�cant advances in the capabilities of graphics technologies in the past decade,tutoring systems increasingly incorporate visual aids. These range from simple maps orcharts that are automatically generated (Mittal et al. 1995) to 3D simulations of physicalphenomena such as electromagnetic interactions in physics (Towns, Callaway, & Lester1998) and fullscale 3D simulated worlds such as the ship that Steve inhabits. To drawstudents' attention to a speci�c aspect of a chart, graphic or animation, tutoring systemsmake use of many devices, such as arrows and highlighting by color. An animated agent,however, can guide a student's attention with the most common and natural methods: gazeand deictic gesture.

Steve uses gaze and deictic gestures in a variety of ways. He points at objects whendiscussing them. He looks at an object immediately before manipulating or pointing at it.He looks at objects when they are manipulated by students or other agents. He looks at anobject when checking its state (e.g., to see whether a light is on or a reservoir is full). Helooks at a student or another agent when waiting for them, listening to them, or speakingto them. Steve is even capable of tracking moving objects; for example, if something (e.g.,the student) is moving counterclockwise around Steve, he will track it over his left shoulderuntil it moves directly behind him, at which point he will track it over his right shoulder.

Agents can employ deictic behaviors to create context-speci�c references to physicalobjects in virtual worlds. In the same manner that humans refer to objects in their en-vironment through judicious combinations of speech, locomotion, and gesture, animatedagents can move through their environment, point to objects, and refer to them appro-priately as they provide problem-solving advice. An agent might include some or all ofthese capabilities. For example, to produce deictic references to particular objects underdiscussion, the Edward system (Claassen 1992) employs a stationary persona that \grows"a pointer to a particular object in the interface. Similarly, the PPP Persona is able todynamically indicate various onscreen objects with an adjustable pointer (Figure 6). Adeleis able to point toward objects on the screen, and can also direct her gaze toward them;Figure 8 shows her looking at the student's mouse selection. The Cosmo agent employs adeictic behavior planner that exploits a simple spatial model to select and coordinate loco-motive, gestural, and speech behaviors. The planner enables Cosmo to walk to, point at,and linguistically refer to particular computers in its virtual world as it provides studentswith problem-solving advice.

Noma and Badler's Presenter Jack (Noma & Badler 1997), shown in Figure 9, exhibits avariety of di�erent deictic gestures. Like Steve and Cosmo, Presenter Jack can use his index�nger to point at individual elements on his visual aid. He can also point with his palmfacing towards the visual aid to indicate a larger area, and he can move his hand to indicatea ow on a map or chart. He also smoothly integrates these gestures into his presentation,moving over to the target object before his speech reaches the need for the deictic gesture,and dynamically choosing the best hand for the gesture based on a heuristic that minimizesboth visual aid occlusion and the distance from the current body position to the next onein the presentation.

3.4 Nonverbal Feedback

One primary role of a tutor is to provide feedback on a student's actions. In additionto providing verbal feedback, an animated agent can also use nonverbal communication

10

Page 11: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

Figure 8: Adele looking at the student's mouse selection

Figure 9: Presenter Jack pointing at a weather pattern

11

Page 12: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

to in uence the student. For example, Steve uses a nod of approval to show agreementwith a student's actions and shakes his head to indicate disapproval. Adele nods or smilesto indicate agreement with the student's actions, presents a look of puzzlement when thestudent makes an error, and shows pleasant surprise when the student �nishes their task.Moreover, body language can help indicate to students that they have just committed (orare on the verge of committing) a very serious error. This can make a strong impression onthem.

The ability to use nonverbal feedback in addition to verbal comments allows an animatedagent to provide more varied degrees of feedback than earlier tutoring systems. Nonverbalfeedback through facial expressions may often be preferable because it is less obtrusive thana verbal comment. For example, a simple nod of approval can reassure a student withoutinterrupting them. Similarly, human tutors often display a look of concern or puzzlement tomake a student think twice about their actions in cases where either they are unsure that thestudent has actually made a mistake or they do not want to interrupt with a verbal correctionyet. While some occasions call for these types of unobtrusive feedback, other occasions maycall for more exaggerated feedback than a verbal comment can o�er. For example, whenstudents successfully complete design problems in the Design-A-Plant learning environment,the animated agent (Herman) sometimes congratulates them by cartwheeling across thescreen. In the Internet Advisor, Cosmo employs \stylized" animations (Culhane 1988) (incontrast to \life-quality" animations) for nonverbal feedback. For example, when a studentsolves a problem, Cosmo smiles broadly and uses his entire body to applaud her success.

3.5 Conversational Signals

When people carry on face-to-face dialogues, they employ a wide variety of nonverbal signalsto help regulate the conversation and complement their verbal utterances. While tutorialdialogue in most previous tutoring systems resembles Internet chat or a phone conversation,animated pedagogical agents allow us to more closely model the face-to-face interactionsto which people are most accustomed. Some nonverbal signals are closely tied to spokenutterances, and could be used by any animated agent that produces speech output. Forexample, intonational pitch accents indicate the degree and type of salience of words andphrases in an utterance, including rhematic (i.e., new) elements of utterances and contrastiveelements (Pierrehumbert & Hirschberg 1990); to further highlight such utterance elements,a pitch accent is often accompanied by a short movement of the eyebrows or head, a blinkof the eyes, and/or a beat gesture (i.e., a short baton-like movement of the hands) (Cassellet al. 1994a). As another example, facial displays can provide the speaker's personaljudgement of the accompanying utterance (e.g., a scrunched nose to indicate distaste forthe subject) (Cassell et al. 1994a).

Other nonverbal signals help regulate the ow of conversation, and would be mostvaluable in tutoring systems that support speech recognition as well as speech output, suchas Steve or the Circuit Fix-It Shop (Smith & Hipp 1994). This includes back-channelfeedback, such as head nods to acknowledge understanding of a spoken utterance. It alsoincludes the use of eye contact to regulate turn taking in mixed-initiative dialogue. Forexample, during a pause, a speaker will either break eye contact to retain the oor or makeeye contact to request feedback or give up the oor (Cassell et al. 1994a). Although peoplecan clearly communicate in the absence of these nonverbal signals (e.g., by telephone),communication and collaboration proceed most smoothly when they are available.

Several projects have made serious attempts to draw on the extensive psychological

12

Page 13: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

Figure 10: Animated Conversation

and sociological literature on human nonverbal conversational behavior. Pelachaud et al.(Pelachaud, Badler, & Steedman 1996) developed a computational model of facial expres-sions and head movements of a speaker. Cassell et al. (Cassell et al. 1994a; 1994b) devel-oped perhaps the most comprehensive computational model of nonverbal communicativebehavior. Their agents coordinate speech, intonation, gaze, facial expressions, and a varietyof gestures in the context of a simple dialogue. However, their agents do not converse withhumans; their algorithm simply generates an animation �le for a face-to-face conversationbetween two computer characters, Gilbert and George (Figure 10), using the Jack human �g-ure software (Badler, Phillips, & Webber 1993). In contrast, the Gandalf agent (Figure 11)supports full multi-modal conversation between human and computer (Thorisson 1996;Cassell & Thorisson 1999). Like other systems, Gandalf combines speech, intonation, gaze,facial expressions, and a few gestures. Unlike most other systems, Gandalf also perceivesthese communicative signals in humans; people talking with Gandalf wear a suit that trackstheir upper body movement, an eye tracker that tracks their gaze, and a microphone thatallows Gandalf to hear their words and intonation. Although none of these projects hasspeci�cally addressed tutorial dialogues, they contribute signi�cantly to our understandingof communication with animated agents.

3.6 Conveying and Eliciting Emotion

Motivation is a key ingredient in learning, and emotions play an important role in moti-vation. By employing a computational model of emotion, animated agents can improvestudents' learning experiences in several ways (Elliott, Rickel, & Lester 1999). First, anagent that appears to care about a student's progress may encourage the student to caremore about her own progress. Second, an emotive pedagogical agent may convey enthusi-asm for the subject matter and thereby foster similar levels of enthusiasm in the learner.

13

Page 14: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

Figure 11: Gandalf speaking with a user

Finally, a pedagogical agent with a rich and interesting personality may simply make learn-ing more fun. A learner that enjoys interacting with a pedagogical agent may have a morepositive perception of the overall learning experience and may consequently opt to spendmore time in the learning environment.

Perhaps as a result of the inherent psychosocial nature of student-agent interactionsand of humans' tendency to anthropomorphize software (Reeves & Nass 1998), recent evi-dence suggests that tutoring systems with lifelike characters can be pedagogically e�ective(Lester et al. 1997b) while at the same time having a strong motivating e�ect on stu-dents (Lester et al. 1997a). It is even becoming apparent that particular features (e.g.,personal characteristics) of lifelike agents can have an important impact on learners' accep-tance of them (Hietala & Niemirepo 1998). As master animators have discovered repeatedlyover the past century, the quality, overall clarity, and dramatic impact of communicationcan be increased through the creation of emotive movement that underscores the a�ec-tive content of the message to be communicated (Noake 1988; Jones 1989; Lenburg 1993;Thomas & Johnston 1981). By carefully orchestrating facial expression, body placement,arm movements, and hand gestures, animated pedagogical agents could visually augmentverbal problem-solving advice, give encouragement, convey empathy, and perhaps increasemotivation. For example, the Cosmo agent employs a repertoire of \full-body" emotivebehaviors to advise, encourage, and (appear to) empathize with students. When a studentmakes a sub-optimal problem-solving decision, Cosmo informs the student of the ill-e�ectof her decision as he takes on a sad facial expression and slumping body language whiledropping his hands. As computational models of emotion become more sophisticated, e.g.,(Elliott 1992), animated agents will be well positioned to improve students' motivation.

3.7 Virtual Teammates

Complex tasks often require the coordinated actions of multiple team members. Teamtasks are ubiquitous in today's society; for example, teamwork is critical in manufacturing,in an emergency room, and on a battle�eld. To perform e�ectively in a team, each member

14

Page 15: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

must master their individual role and learn to coordinate their actions with their teammates.Distributed virtual reality provides a promising vehicle for training teams; students, possiblyat di�erent locations, cohabit a virtual mock-up of their work environment, where they canpractice together in realistic situations. In such training, animated agents can play twovaluable roles: they can serve as instructors for individual students, and they can substitutefor missing team members, allowing students to practice team tasks when some or all humaninstructors and teammates are unavailable.

Steve supports this type of training (Rickel & Johnson 1999b). The team can consist ofany combination of Steve agents and human students, each assigned a particular role in theteam (e.g., o�cer of the watch or propulsion operator). Each student is accompanied by aninstructor (human or agent) that coaches them on their role. Each person sees each otherperson in the virtual world as a head and two hands; the head is simply a graphical model,so each person can have a distinct appearance, possibly with their own face texture-mappedonto the graphical head. To distinguish di�erent agents, each agent can be con�gured withits own shirt, hair, eye, and skin color, and its voice can be made distinct by setting its speechrate, base-line pitch, and vocal tract size. Thus, students can easily track the activities oftheir teammates. Team members communicate through spoken dialogue, and Steve agentsalso incorporate valuable nonverbal communication: they look at a teammate when waitingfor them or speaking to them, they react to their teammates' actions, and they nod inacknowledgment when they understand something a teammate says to them. Each Steveagent's behavior is guided by a task representation that speci�es the overall steps in thetask as well as how various team members interact and depend upon each other.

In addition to serving as teammates, animated pedagogical agents could serve as othertypes of companions for students. Chan and Baskin (Chan & Baskin 1990) developed asimulated learning companion, which acts as a peer instead of a teacher. Dillenbourg (Dil-lenbourg 1996) investigated the interaction between real students and computer-simulatedstudents as a collaborative social process. Chan (Chan 1996) has investigated other typesof interactions between students and computer systems, such as competitors or reciprocaltutors. Frasson et al. (Frasson et al. 1996) have explored the use of an automated \trou-blemaker," a learning companion that sometimes provides incorrect information in orderto check, and improve, the student's self-con�dence. None of these automated companionsappears as an animated character, although recent work by A��meur et al. (A��meur et al.1997) has explored the use of a 2D face with facial expressions for the troublemaker. How-ever, since all these e�orts share the perspective of learning as a social process, this seemslike a natural direction for future research.

3.8 Adaptive Pedagogical Interactions

In addition to the types of interactions described above, animated pedagogical agents needto be capable of many of the same pedagogical abilities as other intelligent tutoring systems.For instance, it is useful for them to be able to answer questions, generate explanations,ask probing questions, and track the learners' skill levels. An animated pedagogical agentmust be able to perform these functions while at the same time responding to the learn-ers' actions. Thus the context of face-to-face interaction has a pervasive in uence on thepedagogical functions incorporated in an animated pedagogical agent; pedagogy must bedynamic and adaptive, as opposed to deliberate, sequential, or preplanned. For example,Steve adapts his demonstrations in midstream if the student performs actions that inter-act with the demonstration; he also responds to student interruptions. Similarly, the PPP

15

Page 16: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

Persona seamlessly integrates reactive behaviors responding to user inputs with plannedpresentations.

The ability to deliver opportunistic instruction, based on the current situation, is acommon trait of animated pedagogical agents. Herman the Bug, for example, makes ex-tensive use of problem solving contexts as opportunities for instruction. When the studentis working on selecting a leaf to include in a plant, Herman uses this as an opportunity toprovide instruction about leaf morphology. Adele constantly assesses the current situation,using the situation space model of Marsella and Johnson (Marsella & Johnson 1998), anddynamically generates advice appropriate to the current situation. Another type of oppor-tunistic instruction provided by Adele is suggesting pointers to on-line medical resourcesthat are relevant to the current stage of the case work-up. For example, when the studentselects a diagnostic procedure to perform on the simulated patient, Adele may point thestudent to video clips showing how the procedure is performed.

3.9 Preliminary Empirical Results

Because animated pedagogical agent technologies are still very much in their infancy, littleis known empirically about their e�ectiveness in learning environments. As discussed in thenext section, nearly every major facet of their communicative abilities needs considerable re-search. For this reason, it is much too early in their development to conduct comprehensive,de�nitive empirical studies that demonstrate their e�ectiveness in learning environments.Because their communicative abilities are still very limited compared to what we expectthey will be in the near future, the results of such studies will be skewed by the immaturityof the technology. Despite this caveat, it is essential to make an initial foray into assessingtheir impact on learning, and several studies have been undertaken with this objective inmind. Below we summarize the results of several representative studies.1

The largest formal empirical study of an animated pedagogical agent to date was con-ducted with Herman the Bug in the Design-A-Plant learning environment (Lester et al.1997b). Researchers wanted to obtain a \baseline" reading on the potential e�ectivenessof animated pedagogical agents and examine the impact of various forms of agents' advice.They conducted a study with one hundred middle school students in which each studentinteracted with one of several versions of the Herman agent. The di�erent versions variedalong two dimensions. First, di�erent versions of Herman employed di�erent modalities:some provided only visual advice, some only verbal advice, and some provided combina-tions of the two. Second, di�erent versions provided di�erent levels of advice: some agentsprovided only high-level (principle-based) advice, others provided low-level (task-speci�c)advice, and some were completely mute. During the interactions, the learning environmentlogged all problem-solving activities, and the students were given rigorous pre-tests andpost-tests. The results of the study were three-fold:

Baseline Result Students interacting with learning environments with an animated peda-gogical agent show statistically signi�cant increases from pre-tests to post-tests. Somecritics have suggested that animated agents could distract students and hence pre-vent learning. This �nding establishes that a well-designed agent in a well-designedlearning environment can create successful learning experiences.

Multi-Level, Multi-Modality E�ects Animated pedagogical agents that provide mul-tiple levels of advice combining multiple modalities yield greater improvements in

1Complete descriptions of the experimental methods and analyses are contained in the cited papers.

16

Page 17: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

problem solving than less expressive agents. This �nding indicates that there may beimportant learning bene�ts from introducing animated agents that employ both visual(animated) and auditory (verbal) modalities to give both \practical" and \theoretical"advice.

Complexity Bene�ts The bene�ts of animated pedagogical agents increase with problem-solving complexity. As students are faced with more complex problems, the positivee�ects of animated pedagogical agents on problem solving are more pronounced. This�nding suggests that agents may be particularly e�ective in helping students solvecomplex technical problems (as opposed to simple \toy" problems).

The Design-A-Plant study also revealed the persona e�ect (Lester et al. 1997a): the verypresence of a lifelike character in an interactive learning environment can have a strong posi-tive e�ect on learners' perception of their learning experience. The study also demonstratedan important synergistic e�ect of multiple types of explanatory behaviors on students' per-ception of agents: agents that are more expressive (both in modes of communication andin levels of advice) are perceived as having greater utility and communicating with greaterclarity.

In a separate study, the PPP research team conducted an experiment to evaluate thedegree to which their PPP agent contributes to learning (Andr�e, Rist, & M�uller 1999).To this end, they created two versions of their learning environment software, one withthe PPP Persona and one without. The latter uses identical narration and uses an arrowfor deictic reference. Each subject (all of them adults) viewed several presentations; somepresentations provided technical information (descriptions of pulley systems) while othersprovided non-technical information (descriptions of o�ce employees). Unlike the Design-A-Plant study, the subjects in this study did not perform any problem solving under theguidance of the agent. The results indicate that the presence of the animated agent madeno di�erence to subjects' comprehension of the presentations. This �nding neither supportsnor contradicts the Design-A-Plant study, which did not involve an agent vs. no-agentcomparison, and which involved a very di�erent learning environment. However, 29 out of30 subjects in the PPP study preferred the presentations with the agent. Moreover, subjectsfound the technical presentations (but not the non-technical presentations) signi�cantly lessdi�cult and more entertaining with the agent. This result is consistent with the personae�ect found in the Design-A-Plant study.

It is important to emphasize that both of these studies were conducted with agentsthat employed \�rst generation" animated pedagogical agent technologies. All of theircommunicative capabilities were very limited compared to the level of functionality thatis expected to emerge over the next few years, and Herman and the PPP Persona onlyemploy a few of the types of interaction that have been discussed in this paper. As animatedpedagogical agents become more sophisticated, it will be critical to repeat these experimentsen route to a comprehensive, empirically-based theory of animated pedagogical agents andlearning e�ectiveness.

4 Technical Issues

Animated pedagogical agents share many technical issues with previous work in intelligenttutoring systems and interactive learning environments, including representing and reason-ing about domain knowledge, modeling and adapting to the student's knowledge, choosing

17

Page 18: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

appropriate pedagogical strategies, and maintaining a coherent tutorial dialogue. However,just as animated agents raise new instructional opportunities, as described in the last sec-tion, they also pose new technical challenges. This section outlines the key challenges andsome of the relevant work to date in addressing them.

4.1 Interface to the Environment

Viewed as an autonomous agent, an animated pedagogical agent's \environment" includesthe learning environment (e.g., anything from a 3D virtual world to a simple 2D Web inter-face), the student(s), and any other agents in the learning environment. Before discussingthe inner workings of such an agent, it is helpful to discuss the interface between the agentand this environment. The interface can be divided into two parts: the agent's awarenessof the environment (its \perception"), and its ability to a�ect the environment (its \motoractions"). One of the primary motivations for animated pedagogical agents is to broadenthe bandwidth of human-computer interaction, so their perception and motor actions aretypically more diverse than previous computer tutors and learning companions.

Animated pedagogical agents share some types of perception with earlier tutoring sys-tems. Most track the state of the problem the student is addressing. For example, Stevetracks the state of the simulated ship, Adele tracks the state of the simulated patient, andHerman maintains a representation of the environment for which the student is designinga plant. Most track the student's problem-solving actions. For example, Steve knows whenthe student manipulates objects (e.g., pushes buttons or turns knobs), Adele knows whenthe student questions or examines the patient (e.g., inspects a lesion or listens to the heart),and Herman knows when the student extends the plant design (e.g., chooses the type ofleaves). Finally, most allow the student to ask them questions. For example, students canask Steve and Adele what they should do next and why, they can ask Herman and Cosmofor problem-solving assistance, and they can ask WhizLow to perform a task that they havedesigned for him.

In addition, some agents track other, more unusual events in their environment. Sometrack additional speech events. When an external speech synthesizer is used to generate theagent's voice, the agent must receive a message indicating when speech is complete, and theagent may receive interim messages during speech output specifying information such asthe appropriate viseme for the current phoneme (for lip synchronization) or the timing of apitch accent (for coordinated use of a beat gesture, a head movement, or raised eyebrows).To maintain awareness of when others are speaking, the agent may receive messages whenthe student begins and �nishes speaking (e.g., from a speech recognition program) andwhen other agents begin or �nish speaking (from their speech synthesizers), as well as arepresentation of what was said. Some agents, such as Steve, track the student's locationin the virtual world, and agents for team training may track the locations of other agents.Some track the student's visual attention. For example, Steve gets messages from thevirtual reality software indicating which objects are within the student's �eld of view, andhe pauses his demonstrations when the student is not looking in the right place. Gandalftracks the student's gaze as a guide to conversational turn taking, and he also tracks theirgestures. It is very likely that future pedagogical agents will track still other features, suchas students' facial expressions (Cohn et al. 1998) and emotions (Picard 1997).

Interactions between an agent's body and its environment require spatial knowledgeof that environment. As described in Section 3, such interactions are a key motivationfor animated pedagogical agents, including the ability to look at objects, point at them,

18

Page 19: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

demonstrate how to manipulate them, and navigate around them. Relatively simple repre-sentations of spatial knowledge have su�ced to support the needs of animated pedagogicalagents to date. For example, Herman maintains a simple representation of the student's\task bar" location in the Design-A-Plant environment so he can conduct his activities (e.g.,standing, sitting, walking) appropriately on the screen. Agents such as the PPP Personathat point at elements of bitmapped images need the screen location of the referenced ele-ments. Cosmo maintains a similar representation of the locations of objects on the screenso he can perform his deictic locomotion and gestural behaviors; he also uses this knowledgefor selecting appropriate referring expressions.

Agents that inhabit 3D worlds require still richer representations. Steve relies on thevirtual reality software to provide bounding spheres for objects, thus giving him knowledgeof an object's position and a coarse approximation of its spatial extent for purposes ofgaze and deictic gesture. Steve also requires a vector pointing at the front of each object(from which he determines where to stand) and, to support object manipulation, vectorsspecifying the direction to press or grasp each object. WhizLow maintains knowledge aboutthe physical properties of various objects and devices. For example, the representationencodes knowledge that data packets can be picked up, carried, and deposited in particulartypes of receptacles and that levers can be pulled.

Agents in 3D environments may need additional knowledge to support collision-freelocomotion. Steve represents the world as an adjacency graph: each node in the graphrepresents a location, and there is an edge between two nodes if there is a collision-free pathdirectly between them. To move to a new location, he uses Dijkstra's shortest path algorithm(Cormen, Leiserson, & Rivest 1989) to identify a collision-free path. In contrast, WhizLow'snavigation planner �rst invokes the A* algorithm to determine an approximate collision-freepath on a 2D representation of the 3D world's terrain. However, this only represents anapproximate path because it is found by searching through a discretized representation ofthe terrain. It is critical that control points, i.e., the coordinates determining the actualpath to be navigated, be interpolated in a manner that (1) enables the agent's movementto appear smooth and continuous and (2) guarantees retaining the collision-free property.To achieve this natural behavior, the navigation planner generates a Bezier spline thatinterpolates the discretized path from the avatar's current location, through each successivecontrol point, to the target destination.

To a�ect their environment, pedagogical agents need a repertoire of motor actions.These generally fall into three categories: speech, control of the agent's body, and controlof the learning environment. Speech is typically generated as a text string to speak to astudent or another agent. This string might be displayed as is or sent to a speech synthesizer.Control of the agent's body may involve playing existing animation clips for the whole bodyor may be decomposed into separate motor commands to control gaze, facial expression,gestures, object manipulations, and locomotion. (This issue is discussed in further detail inSection 4.2.) Finally, the agent may need to control the learning environment. For example,to manipulate an object, Steve sends a message to the virtual reality software to generatethe appropriate motions of his body and then sends a separate message to the simulatorto cause the desired change (e.g., to push a button). Actions in the environment are notrestricted to physical behaviors directly performed by the agent. For example, Hermanchanges the background music to re ect the student's progress. To contextualize the score,he tracks the state of the task model and sequences the elements of the music so that, asprogress is made toward successful completion of subtasks, the number of musical voicesadded increases.

19

Page 20: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

For modularity, it is useful to insulate an agent's cognitive capabilities from the details ofits motor capabilities. For example, Steve's cognitive module, which controls his behavior,outputs abstract motor commands such as look at an object, move to an object, point at anobject, manipulate an object (in various ways), and speak to someone. A separate motorcontrol module decomposes these into detailed messages sent to the simulator, the virtualreality software, and the speech synthesizer. This layered approach means that Steve'scognition is independent of the details of these other pieces of software, and even of thedetails of Steve's body. Because this architecture makes it easy to plug in di�erent bodies,we can evaluate the tradeo�s among them. Steve uses a similarly layered approach on theperception side, to insulate the cognitive module from the particular types of input devicesused.

4.2 Behavioral Building Blocks

Designing the behavior of an agent requires addressing two issues. This section addresses the�rst issue: designing the building blocks from which the agent's behavior will be generated.The next section discusses the second issue: developing the code that will select and combinethe right building blocks to respond appropriately to the dynamically unfolding tutorialsituation.

4.2.1 Behavior Spaces

The behavior space approach is the most common method for generating the behavior ofa pedagogical agent. A behavior space is a library of behavior fragments. To generate thebehavior of the agent, a behavior sequencing engine dynamically strings these fragmentstogether at runtime. When this is done well, the agent's behavior appears seamless to thestudent as it provides visually contextualized problem-solving advice.

Figure 12 illustrates the basic idea. It shows a behavior space with three types of be-havior fragments: visual segments serving as the agent's repertoire of movements (depictedin the �gure as a drawing of the character), audio clips serving as the agent's repertoire ofutterances (depicted as an audio wave), and segments of background music (depicted as amusical note). The arrows in the behavior space represent the behavior fragments selectedby the behavior sequencing engine for a particular interaction with the student, and thelower section of the �gure shows how the engine combines them to generate the agent'sbehavior and accompanying music.

Creating the behavior fragments for a behavior space can range from simple to quitecomplex depending on the desired quality of animation. Musical segments are simply audioclips of di�erent varieties of music to create di�erent moods, and utterance segments aretypically just voice recordings. A visual segment of the agent could be a simple bitmap imageof the agent in a particular pose, a graphical animation sequence of the agent moving fromone pose to another, or even an image or video clip of a real person. All three approacheshave been used in existing pedagogical agents.

To allow the behavior sequencing engine to select appropriate behavior fragments at run-time, each fragment must be associated with additional information describing its content.For example, behavior fragments in the behavior space for Herman the Bug are indexedontologically, intentionally, and rhetorically. An ontological index is imposed on explana-tory behaviors. Each behavior is labeled with the structure and function of the aspectsof the primary pedagogical object that the agent discusses in that segment. For example,

20

Page 21: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

Time

Behavior Space

Figure 12: Behavior Space

explanatory segments in Herman's behavior space are labeled by (1) the type of botanicalstructures discussed, e.g., anatomical structures such as roots, stems, and leaves, and by(2) the physiological functions they perform, e.g., photosynthesis. An intentional index isimposed on advisory behaviors. Given a problem-solving goal, intentional indices enablethe sequencing engine to identify the advisory behaviors that help the student achieve thegoal. For example, one of Herman's behaviors indicates that it should be exhibited when astudent is experiencing di�culty with a \low water table" environment. Finally, a rhetoricalindex is imposed on audio segments. This indicates the rhetorical role played by each clip,e.g., an introductory remark or interjection.

The following example of behavior sequencing in Herman the Bug illustrates this process.If Herman intervenes in a lesson, say because the student is unable to decide on a leaf type,the behavior sequencing engine �rst selects a topic to provide advice about, some componentof the plant being constructed. The engine then chooses how direct a hint to provide: anindirect hint may talk about the functional constraints that a choice must satisfy, whereasa direct hint proposes a speci�c choice. The level of directness then helps to determine thetypes of media to be used in the presentation: indirect hints tend to be realized as animateddepictions of the relationships between environmental factors and the plant components,while direct hints are usually rendered as speech. Finally, a suitable coherent set of mediaelements with the selected media characteristics are chosen and sequenced.

One of the biggest challenges in designing a behavior space and a sequencing engineis ensuring visual coherence of the agent's behavior at runtime. When done poorly, theagent's behavior will appear discontinuous at the seams of the behavior fragments. Forsome pedagogical purposes, this may not be serious, but it will certainly detract fromthe believability of the agent, and it may be distracting to the student. Thus, to assistthe sequencing engine in assembling behaviors that exhibit visual coherence, it is criticalthat the speci�cations for the animated segments take into account continuity. One simpletechnique employed by some behavior sequencing engines is the use of visual bookending.Visually bookended animations begin and end with frames that are identical. Just as walk

21

Page 22: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

cycles and looped backgrounds can be seamlessly composed, visually bookended animatedbehaviors can be joined in any order and the global behavior will always be awlesslycontinuous. Although it is impractical for all visual segments to begin and end with thesame frame, judicious use of this technique can greatly simplify the sequencing engine's job.

More generally, the design of behavior spaces can exploit lessons and methods from the�lm industry. Because the birth and maturation of the �lm medium over the past centuryhas precipitated the development of a visual language with its own syntax and semantics(Monaco 1981), the \grammar" of this language can be employed in all aspects of the agent'sbehaviors. Careful selection of the agent's behaviors, its accouterments (e.g., props such asmicroscopes, jetpacks, etc.), and visual expressions of its emotive state (Bates 1994) canemphasize the most salient aspects of the domain for the current problem-solving context.

Many animated agents employ variants of the behavior space approach. Vincent (Paiva& Machado 1998), an animated pedagogical agent for on-the-job training, uses a very simplebehavior space, consisting of 4 animation sequences (happy, friendly, sad, and impatient)and 80 utterances. Adele's animation is produced from a set of bitmap images of her indi�erent poses, which were created from an artist's drawings. Herman's behavior sequenc-ing engine orchestrates his actions by selecting and assembling behaviors from a behaviorspace of 30 animations and 160 audio clips. The animations were rendered by a team ofgraphic artists and animators. Herman's engine also employs a large library of runtime-mixable soundtrack elements to dynamically compose a score that complements the agent'sactivities.

The PPP Persona and Cosmo also use the behavior space approach. However, to achievemore exibility in their behavior, they use independent behavior fragments for di�erentvisual components of the agent, and the behavior sequencing engine must combine these atruntime. Like Adele, the PPP Persona's behavior is generated from bitmaps of the agentin di�erent poses. However, the PPP Persona can also use a dynamically generated pointerto refer to speci�c entities in the world as it provides advice; the sequencing engine mustcombine an image of the agent in a pointing pose with a pointer drawn from the agent'shand to the referenced entity.

Cosmo takes this approach much farther. Depending on the physical and pedagogicalcontexts in which Cosmo will deliver advice, at runtime each \frame" (at a rate of approxi-mately 15/second) is assembled from independent components for torsos, heads, and arms.Dynamic head assembly provides exibility in gaze direction, while dynamic arm assemblyprovides exibility in performing deictic and emotive gestures. Finally, Cosmo exhibits vo-cal exibility by dynamically splicing in referring expression phrases to voice clip sequences.For example, this technique enables Cosmo to take into account the physical and dialoguecontexts to alternatively refer to an object or group of objects with a proximal demonstra-tive (\this"), a non-proximal demonstrative (\those"), or perhaps with pronominalization(\it"). Although it is more di�cult to dynamically combine body fragments at runtime,the di�erent possible combinations allow for a wider repertoire of behaviors. Cosmo stillfollows the behavior space approach, since he relies on behavior fragments created ahead oftime by designers, but the granularity of his fragments is clearly smaller than an agent likeHerman.

The behavior space approach to behavior generation o�ers an important advantage overthe alternate techniques described below: it provides very high quality animations. Thegranularity of the \building block" is relatively high, and skilled animators have signi�cantcontrol over the process before runtime, so the overall visual impact can at times be quitestriking. However, the behavior space su�ers from several disadvantages. It is labor inten-

22

Page 23: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

sive (requiring much development time by the animation sta�), and because it involves 2Dgraphics, the student's viewpoint is �xed. Perhaps most lacking, however, is the degree of exibility that can be exhibited by these agents. Because it is not a fundamentally gener-ative approach, designers must anticipate all of the behavior fragments and develop robustrules for assembling them together.

4.2.2 Generating Behavior Dynamically

To achieve more exibility, the alternative approach is to completely generate behavior asit is needed, without reusing any canned animation segments or even individual frames.This approach has been used for several systems described in Section 3, including Jack(Badler, Phillips, & Webber 1993), Steve, and WhizLow. These characters each includea 3D graphical model of the agent, segmented into its movable parts. In addition, eachincludes algorithms that can take a speci�cation of a desired posture and generate theappropriate body motions to transition from the agent's current posture to the desiredone. For example, given an object that Steve should look at, an algorithm generates ananimation path for his head to follow. The di�culty lies in the fact that typically a numberof body parts must move in concert. For instance, even in the simple gaze example, Stevemay have to turn his eyes, head, neck, shoulders, and torso, all subject to constraints ontheir exibility, and these must move at di�erential speeds to look natural.

This generative approach works for speech as well as animation. While the behaviorspace approach pieces together pre-recorded voice clips, the text-to-speech synthesizersused by Steve, Adele, and WhizLow generate speech from individual phonemes. Thesesynthesizers can also apply a wide variety of prosodic transformations. For example, thesynthesizer could be instructed to speak an utterance in an angry tone of voice or a morepolite tone depending on the context. A wide variety of commercial and public domainspeech synthesizers with such capabilities are currently available.

The exibility of this generative approach to animation and speech comes at a price:it is di�cult to achieve the same level of quality that is possible within a handcraftedanimation or speech fragment. For now, the designer of a new application must weigh thetradeo� between exibility and quality. Further research on computer animation and speechsynthesis is likely to decrease the di�erence in quality between the two approaches, makingthe generative approach increasingly attractive.

4.2.3 Tools for Creating Behavioral Building Blocks

Most of the projects described in this paper have involved people with graphics and ani-mation expertise to create the behavioral building blocks. For projects without this luxury,tools are rapidly becoming available to allow designers to add animated characters to theirlearning environment even in the absence of such expertise. For example, Microsoft Agent2

and Adele's animated persona3 are both available free for download over the World WideWeb. Both provide animated characters with some existing behaviors as well as the abilityto create new characters and add new behaviors. Both employ the behavior space approachfor animation while using speech synthesizers for voice. In contrast, Jack4, available as acommercial product, supports dynamically generated behavior. The increasing availability

2http://msdn.microsoft.com/workshop/imedia/agent/default.asp3http://www.isi.edu/isd/carte/carte-demos.htm4http://www.transom.com/

23

Page 24: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

of tools for creating animated characters will greatly simplify the development of animatedpedagogical agents.

However, creating the behavioral building blocks for an animated character is only the�rst challenge in developing an animated pedagogical agent. The next challenge is develop-ing the code that will select and combine the right building blocks to respond appropriatelyto the dynamically unfolding tutorial situation. We now turn to that issue.

4.3 Behavior Control

Controlling the behavior of an animated pedagogical agent requires attention to many issues.Like any other autonomous agent, the agent must be able to react to a dynamic environment.Additionally, like any intelligent tutoring system or learning companion, the agent mustcarry on a coherent dialogue with the student, and it must make pedagogical decisions,such as when to intervene and what types of information to provide. On top of theseconsiderations, an animated agent must additionally provide appropriate control of its body,complementing its verbal utterances with appropriate nonverbal behavior. The presenceof a body marks a signi�cant shift in the problem of behavior control; while a typicaltutoring system's behavior is relatively discrete, providing occasional, atomic interventions,nonverbal behavior necessitates more continuous control. In this section we focus on theseadditional control problems raised by animated agents, and we survey current approaches.

The key to maintaining coherent behavior in the face of a dynamic environment is tomaintain a rich representation of context. The ability to react to unexpected events andhandle interruptions is crucial for pedagogical agents, yet it threatens the overall coherenceof the agent's behavior. A good representation of context allows the agent to be responsivewhile maintaining its overall focus. Animated pedagogical agents must maintain at leastthe following three types of context.

Pedagogical context The pedagogical context includes the instructional goals and amodelof the student's knowledge. This area has been studied extensively by past researchers;work in animated pedagogical agents to date has contributed relatively little on thisissue.

Task context The task context represents the state of the student's and agent's problemsolving. This includes the goals of the task, the current state of the learning environ-ment, and the actions that will be needed to complete the task. For example, Steveand Adele model tasks using a hierarchical partial-order plan representation, whichthey generate automatically using task decomposition planning (Sacerdoti 1977). Asthe task proceeds, they continually monitor the state of the virtual world, and theyuse the task model to maintain a plan for how to complete the task, using a variantof partial-order planning techniques (Weld 1994). Because Herman the Bug pro-vides problem-solving advice for design-centered problem solving, he maintains a taskmodel that includes knowledge about the active design constraints, the subtask thestudent is currently addressing, and a history of her design decisions. Cosmo maintainsanalogous knowledge about each of the factors bearing on students' problem-solvingdecisions in the Internet Protocol Advisor learning environment.

Dialogue context The dialogue context represents the state of the collaborative interac-tion between the student and the agent. This may include many types of information:a focus stack (Grosz & Sidner 1986) representing the hierarchy of tasks, subtasks,

24

Page 25: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

and actions in which the agent and student are currently engaged; the state of theirinteraction on the current task step (for instance, the state might be that the agenthas explained what must be done next but neither he nor the student has done it);a record of whether the agent or student is currently responsible for completing thetask (this task initiative can change during a mixed-initiative interaction); the lastanswer the agent gave, in case the student asks a follow-up question; and the actionsthat the agent and student have already taken. While this list is not exhaustive, itcaptures the most important items used in current animated pedagogical agents.

Given a rich representation of context, much of an agent's nonverbal behavior can begenerated dynamically in response to the current situation. In Steve, nonverbal behavioris generated in several di�erent layers. Some elements are generated as deliberate acts inSteve's cognitive module. This includes such things as looking at someone when waiting forthem or listening to them or releasing the conversational turn, nodding the head when Steveis informed of something or when the student takes an appropriate action, and shaking thehead when the student makes a mistake. Other actions are generated in his motor controlmodule to accompany motor commands from the cognitive module. For example, Stevelooks where he is going, looks at an object immediately before manipulating or pointing atit, looks at someone immediately before speaking to them, and changes facial expression toa \speaking face" (i.e., mouth open and eyebrows slightly raised) when speaking. Finally,low-level behavior that requires a frame-by-frame update is implemented in the virtualreality software. This includes the animation of Steve's locomotion and arm movements,periodic blinking, slight periodic movement of the lips when speaking, and tracking abilitiesof Steve's gaze.

The approach to behavior generation discussed so far can be viewed as a mapping from arepresentation of context to the next appropriate behavioral action. The resulting behavioris coherent to the extent that the regularities of human conversation are built into themapping. This approach is similar to the schemata approach to explanation generationpioneered by McKeown (McKeown 1985). The other common approach to explanationgeneration is to plan a coherent sequence of utterances by searching through alternativesequences until one is found that satis�es all coherence constraints (Hovy 1993; Moore1995). This approach has been adapted to the problem of generating the behavior of ananimated agent by Andr�e et al. (Andr�e & Rist 1996; Andr�e, Rist, & M�uller 1999) andimplemented in their PPP Persona.

Their approach is illustrated in Figure 13. The planning process starts with an abstractcommunicative goal (e.g., provide-information in the �gure). The planner's presentationknowledge is in the form of goal decomposition methods called \presentation strategies."In the �gure, each nonterminal node in the tree represents a communicative goal, andits children represent one possible presentation strategy for achieving it. For example,the goal provide-information can be achieved by introduce followed by a sequence ofelaborate acts. Each presentation strategy captures a rhetorical structure found in humandiscourse, based largely on Rhetorical Structure Theory (Mann & Thompson 1987), andeach has applicability conditions that specify when the strategy may be used and constrainthe variables to be instantiated. Given the top-level communicative goal, the presentationplanner tries to �nd a matching presentation strategy, and it posts the inferior acts ofthis strategy as new subgoals. If a subgoal cannot be achieved, the presentation plannerbacktracks and tries another strategy. The process is repeated until all leaves of the treeare elementary presentation acts. (A variant of the PPP Persona called WebPersona allows

25

Page 26: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

...

Sequence

...

Introduce

Introduce

Describe Describe

Elaborate

Provide-Information

S-Include-Map

Elaborate

This hotel is locatedin the heart of Frankfurt.

Label

S-Speak S-Point

...Elaborate Elaborate

This hotel has a verylarge fitness room with

a sauna.

S-Speak

Hotel Location

EmphasizeDesign-Intro-

Page

Illustrate

S-Include-Photo

S-Include-Link

S-Include-Text

...

......

Provide-Summary

I have found information for your trip

to Frankfurt.

S-Speak Elaborate Elaborate

Label

Here you can find the latest

weather information.

S-Speak S-Point

Design-Intro-Page

Figure 13: A Presentation Plan

some other types of leaves as well.) Thus, the leaves of the tree in Figure 13 represent theplanned presentation, and the tree represents its rhetorical structure.

This presentation script is forwarded to a Persona Engine, which executes it by dy-namically merging it with low-level navigation acts (when the agent has to move to a newposition on the screen), idle-time acts (to give the agent lifelike behavior when idle), andreactive behaviors (so that the agent can react to user interactions). The Persona Enginedecomposes the persona behaviors at the leaves of the presentation plan into more primitiveanimation sequences and combines these with unplanned behaviors such as idle-time actions(breathing or tapping a foot) and reactive behaviors (such as hanging suspended when theuser picks up and moves the persona with the mouse). When behavior execution begins,the persona follows the schedule in the presentation plan. However, since the Persona En-gine may execute additional actions, this in turn may require the schedule to be updated,subject to the constraints of the presentation plan. The result is behavior that is adaptiveand interruptible, while maintaining coherence to the extent possible.

One of the most di�cult yet important issues in controlling the behavior of an ani-mated agent is the timing of its nonverbal actions and their synchronization with verbalutterances. Relatively small changes in timing or synchronization can signi�cantly changepeople's interpretation or their judgement of the agent. Andr�e et al. (Andr�e, Rist, & M�uller1999) address timing issues through explicit temporal reasoning. Each presentation strat-egy includes a set of temporal constraints over its inferior acts. Constraints may includeAllen's qualitative temporal relations (Allen 1983) relating pairs of acts, as well as quan-titative inequality constraints on the start and end times of the acts. Any presentationwhose temporal constraints become inconsistent during planning is eliminated from furtherconsideration.

26

Page 27: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

One important area for further research is the synchronization of nonverbal acts withspeech at the level of individual words or syllables. This capability is needed to supportmany features of human conversation, such as the use of gestures, head nods, and eyebrowmovements to highlight emphasized words. Most current animated agents are incapable ofsuch precise timing. One exception is the work of Cassell and her colleagues (Cassell et al.1994a). However, they achieve their synchronization through a multi-pass algorithm thatgenerates an animation �le for two synthetic, conversational agents. Achieving a similardegree of synchronization during a real-time dialogue with a human student is a morechallenging problem that will require further research.

4.4 Believability

Because of the immediate and deep a�nity that people seem to develop for these inter-active lifelike characters, the direct pedagogical bene�ts that pedagogical agents provideare perhaps exceeded by their motivational bene�ts. By creating the illusion of life, dy-namically animated agents have the potential to signi�cantly increase the time that peopleseek to spend with educational software, and recent advances in a�ordable graphics hard-ware are beginning to make the widespread distribution of real-time animation technologya reality. Endowing animated agents with believable, lifelike qualities has been the sub-ject of much recent research (Bates 1994; Tu & Terzopoulos 1994; Granieri et al. 1995;Blumberg & Galyean 1995; Kurlander & Ling 1995; Maes et al. 1995).

Believability is a product of two forces: (1) the visual qualities of the agent and (2)the computational properties of the behavior control system that creates its behaviors inresponse to evolving interactions with the user. The behavior canon of the animated �lm(Noake 1988; Jones 1989; Lenburg 1993) has much to say about aesthetics, movement,and character development, and the pedagogical goals of learning environments imposeadditional requirements on character behaviors. In particular, techniques for increasing thebelievability of animated pedagogical agents should satisfy the following criteria:

Situated Liveness Throughout problem-solving sessions, agents should remain \alive" bycontinuing to exhibit behaviors that indicate their awareness of events playing out inthe learning environment, e.g., they can visually track students' activities and provideanticipatory cues (Thomas & Johnston 1981) to signal their upcoming actions.

Controlled Visual Impact Some behaviors such as moving from one location to anotherhave high visual impact, while others, such as small head movements, have low visualimpact. In general, the higher the visual impact, the more interesting a behavior willbe, but agents must control the visual impact of their behaviors in such a mannerthat they do not divert students' attention at critical junctures.

Complex Behavior Patterns Because students will interact with animated pedagogicalagents over extended periods of time, it is critical that agents' behavior patterns besu�ciently complex that they cannot be quickly induced. Easily recognized behaviorpatterns signi�cantly reduce believability.

Natural Unobtrusive Behavior It is critical that students' attention not be drawn toagents because they behave unnaturally. For example, a common problem in earlyimplementations of any pedagogical agent is that the designer has neglected to havethem assume a reasonable stance or blink. Omissions such as these typically result insurprisingly odd behaviors.

27

Page 28: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

Achieving believability in animated pedagogical agents poses three major challenges.First, the primary goal of pedagogical agents is to promote learning, and any agent behaviorsthat would interfere with students' problem-solving|no matter how much these behaviorsmight contribute to believability|would be inappropriate. For example, if the agent wereto cartwheel across the screen when the student was grappling with a di�cult problem,the student's concentration would be immediately broken. Second, believability-enhancingbehaviors must complement (and somehow be dynamically interleaved with) the advisoryand explanatory behaviors that pedagogical agents perform. Third, if observers see that anagent is acting like a simple automaton, believability is either substantially diminished oreliminated altogether.

To achieve believability, agents typically exhibit a variety of believability-enhancingbehaviors that are in addition to advisory and \attending" behaviors. For example, thePPP Persona exhibits \idle-time" behaviors such as breathing and foot-tapping to achievebelievability. To deal with the concerns of controlled visual impact for sensitive pedagogicalsituations in which the student must focus his attention on problem-solving, a competition-based believability-enhancing technique is used by one version of the Herman agent. Ateach moment, the strongest eligible behavior is heuristically selected as the winner andis exhibited. The algorithm takes into account the probable visual impact of candidatebehaviors so that behaviors inhabiting upper strata of the \impact spectrum" are rewardedwhen the student is addressing less critical sub-problems.

Throughout learning sessions, the agent attends to students' problem-solving activities.Believability-enhancing behaviors compete with one another for the right to be exhibited.When the agent is not giving advice, he is kept \alive" by a sequencing engine that enablesit to perform a large repertoire of contextually appropriate, believability-enhancing behav-iors such as visual focusing (e.g., motion-attracted head movements), re-orientation (e.g.,standing up, lying down), locomotion (e.g., walking across the scene), body movements(e.g., back scratching, head scratching), restlessness (e.g., toe tapping, body shifting), andprop-based movements (e.g., glasses cleaning). When a student is solving an unimportantsub-problem, Herman is more likely to perform an interesting prop-based behavior such ascleaning his glasses or a locomotive behavior such as jumping across the screen. The netresult of the ongoing competition is that the agent behaves in a manner that signi�cantlyincreases its believability without sacri�cing pedagogical e�ectiveness.

4.5 Emotion

Engaging, lifelike pedagogical agents that are visually expressive could clearly communi-cate problem-solving advice and simultaneously have a strong motivating e�ect on learners.If they could draw on a rich repertoire of emotive behaviors to exhibit contextually ap-propriate facial expressions and expressive gestures, they could exploit the visual channelto advise, encourage, and empathize with learners. However, enabling lifelike pedagogi-cal agents to communicate the a�ective content of problem-solving advice poses seriouschallenges. Agents' full-body emotive behaviors must support expressive movements andvisually complement the problem-solving advice they deliver. Moreover, these behaviorsmust be planned and coordinated in real time in response to learners' progress. In short, tocreate the illusion of life typi�ed by well crafted animated characters, animated pedagogicalagents must be able to communicate through both visual and aural channels.

To be maximally entertaining, animated characters must be able to express many dif-ferent kinds of emotion. As di�erent social situations arise, they must be able to convey

28

Page 29: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

emotions such as happiness, elation, sadness, fear, envy, shame, and gloating. In a similarfashion, because lifelike pedagogical agents should be able to communicate with a broadrange of speech acts, they should be able to visually support these speech acts with anequally broad range of emotive behaviors. However, because their role is primarily to fa-cilitate positive learning experiences, only a critical subset of the full range of emotiveexpression is essential for pedagogical agents. For example, they should be able to exhibitbody language that expresses joy and excitement when learners do well, inquisitivenessfor uncertain situations (such as when rhetorical questions are posed), and disappointmentwhen problem-solving progress is less than optimal. The Cosmo agent, for instance, canscratch his head in wonderment when he poses a rhetorical question.

Cosmo illustrates how an animated pedagogical agent using the behavior space ap-proach can employ contextually appropriate emotive behaviors. Cosmo employs an emotive-kinesthetic behavior sequencing framework for dynamically sequencing his full-body emotiveexpressions. Creating an animated pedagogical agent with this framework consists of threephases, each of which is a special case of the phases in the general behavior space approachdescribed above. First, designers add behavior fragments representing emotive behavior tothe behavior space. For example, Cosmo includes emotive behavior fragments for his facialexpressions (with eyes, eyebrows, and mouth) and gestures (with arms and hands). Second,these behavior fragments must be indexed by their emotional intent (i.e., which emotion isexhibited) and their kinesthetic expression (i.e., how it is exhibited). Third, the behaviorsequencing engine must integrate the emotive behavior fragments into the agent's behaviorin appropriate situations. For example, Cosmo's emotive-kinesthetic behavior sequencingengine dynamically plans full-body emotive behaviors in real time by selecting relevantpedagogical speech acts and then assembling appropriate visual behaviors. By associat-ing appropriate emotive behaviors with di�erent pedagogical speech act categories (e.g.,empathy when providing negative feedback), it can weave small expressive behaviors intolarger visually continuous ones that are then exhibited by the agent in response to learners'problem-solving activities.

Both emotive behavior sequencing and its counterpart, a�ective student modeling, inwhich users' emotive state is tracked (Picard 1997), will play important roles in future ped-agogical agent research. There is currently considerable research activity on computationalmodels of emotion, and a variety of useful frameworks are now available. Research on ap-plying such models to interactive learning environments, on the other hand, has only begun(Elliott, Rickel, & Lester 1999).

4.6 Platform and Networking Issues

All successful animated pedagogical agent designs must take into account the capabilitiesof the platform and network that are intended to be used. At the present time, high-�delityinteractive agents with dynamically generated behavior can only run on con�gurations withhigh processor speed, powerful graphics acceleration, and low latency. For applicationswhere such power is not guaranteed to be available, compromises must be made. Thebehavior space approach can be used in place of the dynamic behavior generation approachin order to reduce real-time rendering requirements. Reducing the repertoire of gesturescan also reduce processing requirements. For example, the \Verbots" created by VirtualPersonalities, Inc.5 have limited gestures other than lip movement; these agents can run onmost Pentium personal computers without graphics acceleration.

5http://www.vperson.com

29

Page 30: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

The problem of integrating pedagogical agents into Web-based learning materials is aninteresting case in point. The Web has become the delivery mechanism of choice for on-linecourses. At the same time, Web-based instruction can be very impersonal, with limitedability to adapt and respond to the user. An agent that can integrate with Web-basedmaterials is desirable both because it can be applied to a range of course materials andbecause it can improve the interactivity and responsiveness of such materials.

The most di�cult technical problem associated with Web-based agents is reconciling thehighly interactive nature of face-to-face interaction with the slow response times of the Web.In typical Web-based courseware delivery systems, the student must choose a response,submit it to a remote server, and wait for the server to send back a new page. Animatedpedagogical agents, on the other hand, need to be able to respond to a continuous stream ofstudent actions, watching what the student is doing, nodding in agreement, interrupting ifthe student is performing an inappropriate action, and responding to student interruptions.It is di�cult to achieve such interactivity if every action must be routed through a centralHTTP server.

Two Web-based architectures for animated pedagogical agents, the PPP Persona andAdele, both address this problem by moving reactive agent behavior from the server to theclient. The PPP Persona compiles the agent behavior into an e�cient state machine that isthen downloaded to the client for execution. The presentation planning capability, on theother hand, resides on the central server. In the case of Adele, a solution plan for the givencase or problem is downloaded, and is executed by a lightweight student monitoring engine.This approach requires a more sophisticated engine to run on the client side, capable of arange of di�erent types of pedagogical interactions. Nevertheless, the engine remains simpleenough to execute on a client computer with a reasonable amount of memory and processorspeed. Focusing on one case or problem at a time ensures that the knowledge base employedby the agent at any one time remains small.

The latencies involved in Web-based interaction also become signi�cant when one at-tempts to coordinate the activities of multiple students on di�erent computers. Adele mustaddress this problem when students work together on the same case at the same time. Sep-arate copies of Adele run on each client machine. Student events are shared between Adeleengines using Java's RMI protocol. Each Adele persona then reacts to student events assoon as they arrive at each client machine. This gives the impression at each station ofrapid response, even if events are not occurring simultaneously at all client computers.

In summary, integration of animated pedagogical agents into Web-based learning mate-rials inevitably entails developing ways of working around the latencies associated with theHTTP and CGI protocols to some extent. Nevertheless, such agents do take advantage ofWeb browser environment as appropriate. They point students to relevant Web sites andcan respond to browsing actions. Thus, they can be easily integrated into a Web-basedcurriculum, providing a valuable enhancement.

5 Conclusion

Animated pedagogical agents o�er enormous promise for interactive learning environments.Though still in the early stages of development, it is becoming apparent that this newgeneration of learning technologies will have a signi�cant impact on education and train-ing. By broadening the bandwidth of communication to include many of the modalitiesof human-human tutoring, pedagogical agents are slowly but surely becoming something

30

Page 31: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

akin to what ITS founders envisioned at the inception of the �eld. Now, rather than beingrestricted to textual dialogue on a terminal, pedagogical agents are beginning to perform avariety of tasks in surprisingly lifelike ways. What began as complex but nevertheless smallprototype systems have quickly become practical. Some of the systems described here willsoon be used in on-line courses; others have been (and continue to be) subject to large-scaleempirical studies.

Despite the great strides made in honing the communication skills of animated pedagog-ical agents, much remains to be done. In many ways, the current state of the art representsthe early developmental stages of what promises to be a fundamentally new and interestingspecies of learning technology. This article has set forth the key functionalities that lifelikeagents will need to succeed at face-to-face communication. While the ITS community ben-e�ts from the con uence of multidisciplinary research in cognition, learning, pedagogy, andAI, animated pedagogical agents will further require the collaboration of communicationtheorists, linguists, graphics specialists, and animators. These e�orts could well establisha new paradigm in computer-assisted learning, glimpses of which we can already catch onthe horizon.

6 Acknowledgments

Support for this work was provided by the O�ce of Naval Research under grant N00014-95-C-0179 and AASERT grant N00014-97-1-0598; the Air Force Research Laboratory undercontract F41624-97-C-5018; a gift from Mitsubishi Electric Research Laboratory; an in-ternal research and development grant from the USC Information Sciences Institute; theNational Science Foundation under grants CDA-9720395 (Learning and Intelligent SystemsInitiative) and IRI-9701503 (CAREER Award Program); the William S. Kenan Institutefor Engineering, Technology and Science; the North Carolina State University IntelliMediaInitiative; an industrial gift from Novell; and equipment donations from Apple and IBM.We are grateful to Brad Mott and the members of CARTE for their valuable comments onan earlier draft.

References

A��meur, E.; Dufort, H.; Leibu, D.; and Frasson, C. 1997. Some justi�cations for thelearning by disturbing strategy. In Proceedings of the Eighth World Conference on Arti�cialIntelligence in Education, 119{126. IOS Press.

Allen, J. 1983. Maintaining knowledge about temporal intervals. Communications of theACM 26(11):832{843.

Andr�e, E., and Rist, T. 1996. Coping with temporal constraints in multimedia presentationplanning. In Proceedings of the Thirteenth National Conference on Arti�cial Intelligence(AAAI-96), 142{147. Menlo Park, CA: AAAI Press/MIT Press.

Andr�e, E.; Rist, T.; and M�uller, J. 1999. Employing AI methods to control the behaviorof animated interface agents. Applied Arti�cial Intelligence 13:415{448.

Andr�e, E., ed. 1997. Proceedings of the IJCAI Workshop on Animated Interface Agents:Making Them Intelligent.

31

Page 32: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

Badler, N. I.; Phillips, C. B.; and Webber, B. L. 1993. Simulating Humans. New York:Oxford University Press.

Ball, G.; Ling, D.; Kurlander, D.; Miller, J.; Pugh, D.; Skelly, T.; Stankosky, A.; Thiel, D.;van Dantzich, M.; and Wax, T. 1997. Lifelike computer characters: the persona projectat microsoft. In Bradshaw, J., ed., Software Agents. Menlo Park, CA: AAAI/MIT Press.

Bates, J.; Loyall, A.; and Reilly, W. 1992. Integrating reactivity, goals, and emotion in abroad agent. In Proceedings of the Fourteenth Annual Conference of the Cognitive ScienceSociety, 696{701.

Bates, J. 1994. The role of emotion in believable agents. Communications of the ACM37(7).

Blumberg, B., and Galyean, T. 1995. Multi-level direction of autonomous creatures forreal-time virtual environments. In Computer Graphics Proceedings, 47{54.

Burton, R. R., and Brown, J. S. 1982. An investigation of computer coaching for informallearning activities. In Sleeman, D., and Brown, J., eds., Intelligent Tutoring Systems.Academic Press. 79{98.

Carbonell, J. R. 1970. AI in CAI: An arti�cial-intelligence approach to computer-assistedinstruction. IEEE Transactions on Man-Machine Systems 11(4):190{202.

Cassell, J., and Thorisson, K. R. 1999. The power of a nod and a glance: Envelopevs. emotional feedback in animated conversational agents. Applied Arti�cial Intelligence13:519{538.

Cassell, J.; Pelachaud, C.; Badler, N.; Steedman, M.; Achorn, B.; Becket, T.; Douville,B.; Prevost, S.; and Stone, M. 1994a. Animated conversation: Rule-based generationof facial expression, gesture and spoken intonation for multiple conversational agents. InProceedings of ACM SIGGRAPH '94.

Cassell, J.; Steedman, M.; Badler, N.; Pelachaud, C.; Stone, M.; Douville, B.; Prevost,S.; and Achorn, B. 1994b. Modeling the interaction between speech and gesture. InProceedings of the Sixteenth Annual Conference of the Cognitive Science Society. Hillsdale,NJ: Lawrence Erlbaum Associates.

Chan, T.-W., and Baskin, A. 1990. Learning companion systems. In Frasson, C., andGauthier, G., eds., Intelligent Tutoring Systems: At the Crossroads of Arti�cial Intelligenceand Education. Ablex.

Chan, T.-W. 1996. Learning companion systems, social learning systems, and the globalsocial learning club. Journal of Arti�cial Intelligence in Education 7(2):125{159.

Claassen, W. 1992. Generating referring expressions in a multimodal environment. In Dale,R.; Hovy, E.; Rosner, D.; and Stock, O., eds., Aspects of Automated Natural LanguageGeneration. Berlin: Springer-Verlag. 247{62.

Clancey, W. 1983. The epistemology of a rule-based expert system: A framework forexplanation. Arti�cial Intelligence 3(3):215{251.

Cohn, J. F.; Lien, J. J.-J.; Kanade, T.; Hua, W.; and Zlochower, A. 1998. Beyondprototypic expressions: Discriminating subtle changes in the face. In Proceedings of theIEEE Workshop on Robot and Human Communication (ROMAN '98).

Cormen, T. H.; Leiserson, C. E.; and Rivest, R. L. 1989. Introduction to Algorithms. NewYork: McGraw-Hill.

32

Page 33: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

Culhane, S. 1988. Animation from Script to Screen. New York: St. Martin's Press.

Deutsch, B. G. 1974. The structure of task oriented dialogs. In Proceedings of the IEEESpeech Symposium. Pittsburgh, PA: Carnegie-Mellon University. Also available as StanfordResearch Institute Technical Note 90.

Dillenbourg, P. 1996. Some technical implications of distributed cognition on the design ofinteractive learning environments. Journal of Arti�cial Intelligence in Education 7(2):161{179.

Elliott, C., and Brzezinski, J. 1998. Autonomous agents as synthetic characters. AIMagazine 19(2):13{30.

Elliott, C.; Rickel, J.; and Lester, J. 1999. Lifelike pedagogical agents and a�ectivecomputing: An exploratory synthesis. In Wooldridge, M., and Veloso, M., eds., Arti�cialIntelligence Today, volume 1600 of Lecture Notes in Computer Science. Springer-Verlag.195{212.

Elliott, C. 1992. The A�ective Reasoner: A Process Model of Emotions in a Multi-agentSystem. Ph.D. Dissertation, Northwestern University.

Firby, R. J. 1994. Task networks for controlling continuous processes. In Proceedings ofthe Second International Conference on AI Planning Systems.

Frasson, C.; Mengelle, T.; A��meur, E.; and Gouarderes, G. 1996. An actor-based architec-ture for intelligent tutoring systems. In Proceedings of the Third International Conferenceon Intelligent Tutoring Systems (ITS '96), number 1086 in Lecture Notes in ComputerScience, 57{65. Springer.

Goldstein, I. P. 1976. The computer as coach: An athletic paradigm for intellectual educa-tion. Arti�cial Intelligence Laboratory Memo 389, Massachusetts Institute of Technology,Cambridge, MA.

Granieri, J. P.; Becket, W.; Reich, B. D.; Crabtree, J.; and Badler, N. I. 1995. Behavioralcontrol for real-time simulated human agents. In Proceedings of the 1995 Symposium onInteractive 3D Graphics, 173{180.

Grosz, B. J., and Sidner, C. L. 1986. Attention, intentions, and the structure of discourse.Computational Linguistics 12(3):175{204.

Hayes-Roth, B., and Doyle, P. 1998. Animate characters. Autonomous Agents and Multi-Agent Systems 1(2):195{230.

Hietala, P., and Niemirepo, T. 1998. The competence of learning companion agents.International Journal of Arti�cial Intelligence in Education 9:178{192.

Hollan, J. D.; Hutchins, E. L.; and Weitzman, L. 1984. Steamer: An interactive inspectablesimulation-based training system. AI Magazine 5(2):15{27.

Hovy, E. 1993. Automated discourse generation using discourse structure relations. Arti-�cial Intelligence 63:341{385.

Johnson, W. L., and Rickel, J. 1998. Steve: An animated pedagogical agent for proceduraltraining in virtual environments. SIGART Bulletin 8:16{21.

Johnson, W. L.; Rickel, J.; Stiles, R.; and Munro, A. 1998. Integrating pedagogical agentsinto virtual environments. Presence: Teleoperators and Virtual Environments 7(6):523{546.

33

Page 34: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

Jones, C. 1989. Chuck Amuck: The Life and Times of an Animated Cartoonist. NewYork: Avon.

Kurlander, D., and Ling, D. T. 1995. Planning-based control of interface animation. InProceedings of CHI '95, 472{479.

Laird, J. E.; Newell, A.; and Rosenbloom, P. S. 1987. Soar: An architecture for generalintelligence. Arti�cial Intelligence 33(1):1{64.

Laurel, B. 1990. Interface agents: Metaphors with character. In Laurel, B., ed., The Artof Human-Computer Interface Design. New York: Addison-Wesley.

Lenburg, J. 1993. The Great Cartoon Directors. New York: Da Capo Press.

Lester, J. C.; Converse, S. A.; Kahler, S. E.; Barlow, S. T.; Stone, B. A.; and Bhogal,R. S. 1997a. The persona e�ect: A�ective impact of animated pedagogical agents. InProceedings of CHI '97, 359{366.

Lester, J. C.; Converse, S. A.; Stone, B. A.; Kahler, S. E.; and Barlow, S. T. 1997b.Animated pedagogical agents and problem-solving e�ectiveness: A large-scale empiricalevaluation. In Proceedings of the Eighth World Conference on Arti�cial Intelligence inEducation, 23{30. IOS Press.

Lester, J. C.; Voerman, J. L.; Towns, S. G.; and Callaway, C. B. 1999a. Deictic believabil-ity: Coordinating gesture, locomotion, and speech in lifelike pedagogical agents. AppliedArti�cial Intelligence 13:383{414.

Lester, J. C.; Zettlemoyer, L. S.; Gregoire, J.; and Bares, W. H. 1999b. Explanatory lifelikeavatars: Performing user-designed tasks in 3d learning environments. In Proceedings ofthe Third International Conference on Autonomous Agents.

Lester, J. C.; Stone, B. A.; and Stelling, G. D. 1999. Lifelike pedagogical agents formixed-initiative problem solving in constructivist learning environments. User Modelingand User-Adapted Interaction 9:1{44.

Maes, P.; Darrell, T.; Blumberg, B.; and Pentland, A. 1995. The ALIVE system: Full-body interaction with autonomous agents. In Proceedings of Computer Animation '95,11{18. Geneva, Switzerland: IEEE Press.

Maes, P. 1994. Agents that reduce work and information overload. Communications ofthe ACM 37(7).

Mann, W. C., and Thompson, S. A. 1987. Rhetorical structure theory: A theory oftext organization. In Polanyi, L., ed., The Structure of Discourse. Norwood, NJ: AblexPublishing Corporation. Also available as USC/Information Sciences Institute RS-87-190.

Marsella, S. C., and Johnson, W. L. 1998. An instructor's assistant for team-training indynamic multi-agent virtual worlds. In Proceedings of the Fourth International Conferenceon Intelligent Tutoring Systems (ITS '98), number 1452 in Lecture Notes in ComputerScience, 464{473. Springer.

McKeown, K. R. 1985. Text Generation. Cambridge University Press.

Mittal, V.; Roth, S.; Moore, J. D.; Mattis, J.; and Carenini, G. 1995. Generating ex-planatory captions for information graphics. In Proceedings of the International JointConference on Arti�cial Intelligence, 1276{1283.

Monaco, J. 1981. How To Read a Film. New York: Oxford University Press.

34

Page 35: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

Moore, J. D. 1995. Participating in Explanatory Dialogues. Cambridge, MA: MIT Press.

M�uller, J. P. 1996. The Design of Intelligent Agents: A Layered Approach. Number 1177in Lecture Notes in Arti�cial Intelligence. Springer.

Munro, A.; Johnson, M.; Surmon, D.; and Wogulis, J. 1993. Attribute-centered simulationauthoring for instruction. In Proceedings of the World Conference on Arti�cial Intelligencein Education (AI-ED '93), 82{89. Association for the Advancement of Computing inEducation.

Murray, W. R. 1997. Knowledge-based guidance in the CAETI center associate. InProceedings of the Eighth World Conference on Arti�cial Intelligence in Education, 331{339. IOS Press.

Nagao, K., and Takeuchi, A. 1994. Social interaction: Multimodal conversation withsocial agents. In Proceedings of the Twelfth National Conference on Arti�cial Intelligence(AAAI-94), 22{28. Menlo Park, CA: AAAI Press.

Noake, R. 1988. Animation Techniques. London: Chartwell.

Noma, T., and Badler, N. I. 1997. A virtual human presenter. In Proceedings of the IJCAIWorkshop on Animated Interface Agents: Making Them Intelligent, 45{51.

Paiva, A., and Machado, I. 1998. Vincent, an autonomous pedagogical agent for on-the-job training. In Proceedings of the Fourth International Conference on Intelligent TutoringSystems (ITS '98), 584{593. Springer.

Pelachaud, C.; Badler, N. I.; and Steedman, M. 1996. Generating facial expressions forspeech. Cognitive Science 20(1).

Picard, R. W. 1997. A�ective Computing. MIT Press.

Pierrehumbert, J., and Hirschberg, J. 1990. The meaning of intonational contours in theinterpretation of discourse. In Cohen, P.; Morgan, J.; and Pollack, M., eds., Intentions inCommunication. MIT Press. chapter 14, 271{311.

Reeves, B., and Nass, C. 1998. The Media Equation: How People Treat Computers,Television and New Media Like Real People and Places. New York: CSLI.

Rickel, J., and Johnson, W. L. 1997a. Integrating pedagogical capabilities in a virtualenvironment agent. In Proceedings of the First International Conference on AutonomousAgents. ACM Press.

Rickel, J., and Johnson, W. L. 1997b. Intelligent tutoring in virtual reality: A prelimi-nary report. In Proceedings of the Eighth World Conference on Arti�cial Intelligence inEducation, 294{301. IOS Press.

Rickel, J., and Johnson, W. L. 1999a. Animated agents for procedural training in virtualreality: Perception, cognition, and motor control. Applied Arti�cial Intelligence 13:343{382.

Rickel, J., and Johnson, W. L. 1999b. Virtual humans for team training in virtual reality. InProceedings of the Ninth International Conference on Arti�cial Intelligence in Education.IOS Press.

Sacerdoti, E. 1977. A Structure for Plans and Behavior. New York: Elsevier North-Holland.

35

Page 36: To appear in International Journal of Artificial ...people.ict.usc.edu/~traum/cs599f05/apa.pdf · To appear in International Journal of Artificial Intelligence in Education, 2000.

Shaw, E.; Ganeshan, R.; Johnson, W. L.; and Millar, D. 1999. Building a case for agent-assisted learning as a catalyst for curriculum reform in medical education. In Proceedingsof the Ninth International Conference on Arti�cial Intelligence in Education. IOS Press.

Shaw, E.; Johnson, W. L.; and Ganeshan, R. 1999. Pedagogical agents on the web. InProceedings of the Third International Conference on Autonomous Agents.

Sleeman, D., and Brown, J., eds. 1982. Intelligent Tutoring Systems. Academic Press.

Smith, R. W., and Hipp, D. R. 1994. Spoken Natural Language Dialog Systems. Cambridge,Massachusetts: Oxford University Press.

Stevens, A.; Roberts, B.; and Stead, L. 1983. The use of a sophisticated graphics interfacein computer-assisted instruction. IEEE Computer Graphics and Applications 3:25{31.

Stiles, R.; McCarthy, L.; and Pontecorvo, M. 1995. Training studio: A virtual environmentfor training. In Workshop on Simulation and Interaction in Virtual Environments (SIVE-95). Iowa City, IW: ACM Press.

Tambe, M. 1997. Towards exible teamwork. Journal of Arti�cial Intelligence Research7:83{124.

Thomas, F., and Johnston, O. 1981. The Illusion of Life: Disney Animation. New York:Walt Disney Productions.

Thorisson, K. R. 1996. Communicative Humanoids: A Computational Model of Psychoso-cial Dialogue Skills. Ph.D. Dissertation, Massachusetts Institute of Technology.

Towns, S. G.; Callaway, C. B.; and Lester, J. C. 1998. Generating coordinated natural lan-guage and 3D animations for complex spatial explanations. In Proceedings of the FifteenthNational Conference on Arti�cial Intelligence.

Tu, X., and Terzopoulos, D. 1994. Arti�cial �shes: Physics, locomotion, perception, andbehavior. In Computer Graphics Proceedings, 43{50.

Walker, J. H.; Sproull, L.; and Subramani, R. 1994. Using a human face in an interface.In Proceedings of CHI-94, 85{91.

Weld, D. S. 1994. An introduction to least commitment planning. AI Magazine 15(4):27{61.

Wenger, E. 1987. Arti�cial Intelligence and Tutoring Systems. Los Altos, CA: MorganKaufmann.

36