Human Robot Teaming: Approaches from Joint Action and ...cseweb.ucsd.edu/~lriek/papers/iqbal-riek-book2018.pdfHuman Robot Teaming: Approaches from Joint Action and Dynamical Systems

Human Robot Teaming: Approaches from JointAction and Dynamical Systems

Tariq Iqbal and Laurel D. Riek

Abstract As robots start to work alongside people, they are expected to coordi-nate fluently with humans in teams. Many researchers have explored the problemsinvolved in building more interactive and cooperative robots. In this chapter, wediscuss recent work and the main application areas in human-robot teaming. Wealso shed light on some practical challenges to achieving fluent human-robot co-ordination, and conclude the chapter with future directions for approaching theseproblems.

Key words: Human-Robot Interaction, Human-Robot Teaming, Joint Action, Dy-namical Group Modeling, Coordination

1 Introduction

As robots are becoming more ubiquitous, they will be expected to interact withpeople in a range of settings, from dyads to groups. To be effective and functionalteammates, robots need the ability to perceive and understand the activities per-formed by other group members. For example, if a robot can interpret various ac-tions performed by people around it during a social event, then it can make efficientdecisions about its own actions. However, it is difficult to automatically perceiveand understand all the different tasks people engage in to make effective decisionsas a teammate.

If a robot could make better sense of how humans interact among themselves ina group, its interactions with humans would reach a higher level of coordination,resulting in a fluent meshing of actions [66, 28, 14, 16, 30, 58, 26]. When twoor more agents work together, Hoffman and Breazeal [16] defined fluency as thequality of achieving a high level of mutual coordination and adaptation. This quality

Tariq Iqbal and Laurel D. RiekUniversity of California San Diego, La Jolla, CA, USA, e-mail: tiqbal,[email protected]

1

2 Tariq Iqbal and Laurel D. Riek

is particularly important when the agents are well-accustomed to the task and to eachother.

This chapter discusses the existing methods and applications of human-robot in-teraction (HRI) in cooperative tasks. In many of these situations, robots are expectedto work with people to achieve a common goal through the process of human-robotjoint action. Thus, we start this chapter by giving a brief introduction to joint action,both in the context of human-human and human-robot joint action. We then summa-rize recent applications of human-robot cooperative interaction from the literature.Finally, we conclude the chapter by briefly presenting the challenges to realizing ef-fective human-robot coordination with respect to hardware, software, and usability.

2 Background

2.1 Approaches from Cognitive Science to Model Joint Action

When a person acts alone, their behavior is very different than when they coordinatein a group [33]. When two or more persons coordinate in a group, it is important tounderstand the different ways they can interact among themselves and generate suit-able interactive behaviors [31]. Many researchers from the fields of psychology andcognitive science investigate the underlying mechanisms of a joint action task. Thisincludes how people interact together, how they understand the intention of otherindividuals, and how they coordinate together to perform a joint action. Curioni etal. [8] presented a detailed review of joint action in human teams.

Sebanz defined joint action as a form of social interaction where two or moreparticipants coordinate their actions in space and time while making changes to theirenvironment [71, 34]. Sebanz et al. described three important parts in a successfulperformance of a joint action task [70]. The first part makes a prediction about theintention of other interactional partners. The second involves understanding whento perform the actions jointly, as this is very important for temporal coordination.The last part involves understanding where and how to perform the joint action. Theauthors described these as the “what”, “when”, and “where” components of jointaction.

Vesper et al. [81] suggested an architecture for joint action which focuses onplanning, action monitoring, and action prediction processes, and ways of simplify-ing coordination. This architecture described minimal requirements for an individ-ual agent to engage in a joint action. This architecture aims to fill the gap betweenthe approaches that focus on language and propositional attitudes, and dynamicalsystem approaches.

Many researchers have explored the underlying mechanisms that people mayemploy to perform a successful joint action task [46]. To perform joint actions suc-cessfully in a group, each individual needs to integrate self-behavior with a predic-tion about others’ behavior simultaneously [51]. For example, Novembre et al. [51]

Human Robot Teaming: Approaches from Joint Action and Dynamical Systems 3

investigated whether this integration process of self and other-related behavior isunderpinned by a neural process associated with motor simulation. They exploredthis through a music performance experiment. Their results suggested that motorsimulation underpins temporal coordination during joint actions.

Other researchers took a group-perspective approach to model a successful jointaction. For example, Valdesolo et al. [79] investigated whether a coordinated actionin a group has any influence on the ability of the group members to pursue a jointgoal together. Their results suggested that a person’s ability to rocking in synchronyenhanced that person’s perceptual sensitivity to the motion of other group members.The ability to be synchronous with others resulted in an increase of their success ina joint action task.

Slowiński et al. [75] explored whether coordination between two people per-forming a joint action task is higher when they exhibit similar motion features. Toexplore this, they proposed an index of motion variability, called individual motorsignature (IMS), to capture the subtle differences of human movements. They in-vestigated the validity of this index via a mirror game. Their results suggested thatwhen two people shared a similar IMS value, the synchronization level was higher.

2.2 Dynamical Modeling of Groups

In this subsection, we discuss the contrasting perspective, which is more bottomup and non-linear, and explores coordination dynamics as a mechanism for real-izing joint action. In group interactions, the activities of each member continuallyinfluence the activities of other group members. Most groups create a state of inter-dependence, where each member’s outcomes and actions are determined in part byother members of the group [10]. This process of influence can result in coordinatedgroup activity over time.

Many disciplines have approached the problem of how to assess coordination in asystem. These include: robotics, physics, neuroscience, psychology, dance, and mu-sic. Many of these techniques take a bottom-up approach, which first try to measurethe low-level signals, and then build a high-level behavior from the low-level sig-nals [30, 23, 43, 24]. These low-level signals can include physical motion features,physiological features (e.g., heart rate), eye gaze behavior, or activity features. High-level behaviors, such as coordination within a group, are then inferred from theselow-level signals.

For example, Richardson et al. [60] proposed a method to assess group synchronyby analyzing the phase synchronization of rocking chair movements. A group ofsix participants rocked in their chairs with their eyes either open or closed, andthey used a cluster-phase method to quantify phase synchronization. Their resultssuggested that their group level synchrony measure could successfully distinguishbetween synchronous and asynchronous conditions. Similarly, Néda et al. [45] in-vestigated the development of synchronized clapping in a naturalistic environment.


They quantitatively described the phenomena of how asynchronous group applausestarts suddenly and transforms into synchronized clapping.

Coordination among explicit and implicit behaviors has also been explored inhuman-human interaction. Varni et al. [80] presented a system for real-time analy-sis of nonverbal, affective social interaction in a small group. In their study, severalpairs of violin players performed while conveying four different emotions. The au-thors then used recurrence quantification analysis to measure the synchronization ofthe performers’ affective behavior. In follow-on work, the researchers developed asystem capable of analyzing the interaction patterns in a group of dancers.

Konvalinka et al. [36] explored coordination among implicit physiological sig-nals and performed a study to measure the synchronous arousal between performersand observers during a Spanish fire-walking ritual. This synchronous arousal wasderived from heart rate dynamics of the active participants and the audience.

Taking a non-linear, dynamical systems approach, Iqbal and Riek developed amethod to measure the degree of synchronous joint action in a group [21, 23, 28,25, 30]. Their method takes multiple types of task level events into account whilemeasuring the synchronization. This method can work on multiple types of hetero-geneous events and can measure asynchronous situation in a group, in contrast tomost other methods from the literature which only take a single type of event intoaccount. The authors validated their method by applying it to both human-humanand human-robot teaming scenarios. Their results suggested that the method cansuccessfully measure the degree of coordination in a group which matches the col-lective perception of group members. Extending this work, the authors designed anew approach to enable robots to perceive human group behavior in real-time, an-ticipate future actions, and synthesize their own motion accordingly (see Fig. 1)[30, 25].

Lorenz et al. [40] also investigated movement coordination in human-human andhuman-robot teams. Their study involved both a human-human and human-robotdyad tapping on two positions on a table at certain times. The authors exploredwhether goal-directed, but unintentional coordination of movements occurred dur-ing these interactions. Their results suggest that humans synchronized their move-ments with the movements of the robots.

3 Recent Applications

As robots are increasing working with people, they need to perform joint actionswith people efficiently. To achieve this, many of the aforementioned approacheshave been employed in human-robot teams. This section will outline four main ap-plication areas where robots cooperatively perform joint action tasks with humans.We summarize the approaches used in these areas in Table 1 at the end of this sec-tion.


Fig. 1 People and robots are engaged in cooperative tasks (from [40, 30])

3.1 Proximate Human-Robot Teaming

In many interactions, robots and humans need to share a common physical spaceto interact. Various methods are employed on robots to work efficiently in closeproximities by avoiding collisions, such as models from a human demonstration,anticipatory action planning, etc. [77].

To build policies for robots to share a space with humans, many approaches inthe literature first built models from human demonstrations. After training, robotsthen use these trained models to collaborate with people. For example, Ben Amoret al. [2] collected human motion trajectories as Dynamic Movement Primitives(DMP) from a human-human task. After that, the authors used dynamic time warp-ing to estimate the robot’s DMP parameters. Using these parameters, they modelledhuman-robot joint physical activities using a new representation, called InteractionPrimitives (IP). Their experimental results suggested that a robot successfully com-pleted a joint physical task with a person when IPs were used.

Nikolaidis et al. [49] proposed a two-phase framework to fit a robot’s collabora-tive policy to fit with a human collaborator. They first grouped the human activitiesinto clusters and then learned a reward function for each cluster using an inversereinforcement learning. This learned model was incorporated with a Mixed Ob-servability Markov Decision Process (MOMDP) policy with the human type as thepartially observable variable. After that, they used this model for a robot to infer thehuman type and to generate the appropriate policies.

Many researchers try to achieve successful human-robot collaboration in a sharedspace by modeling human activities and use that knowledge as an input to a robot’santicipatory action planning mechanism [77]. This approach enables robots to gen-erate movement strategies to efficiently collaborate with people.

For instance, Hoffman and Weinberg [19, 18] developed an autonomous roboticjazz-improvising robot, Simon, which played the marimba (see Fig. 2). To play inreal-time with a person, the robot needed an anticipatory action plan. The authorsdivided the actions into preparation and follow-through steps. Based on the antici-


patory plans, their robot could simultaneously perform and react to shared activitieswith people.

Koppula et al. [37] also developed a method to anticipate a person’s future ac-tions. Anticipated actions were then used to plan appropriate actions for a robot toperform collaborative tasks in a shared environment. In their method, they modelhumans through low-level kinematics and high-level intent, as well as using con-textual information. Then, they modeled the human’s and robot’s behavior througha Markov Decision Process (MDP). Their results suggested that this approach per-formed better than various baseline methods for collaborative planning.

Mainprice and Berenson [41] presented a framework to allow a human anda robot to perform a manipulation task together in close proximity. This frame-work used early prediction of the human motion to generate a prediction of humanworkspace occupancy. Then, they used a motion planner to generate robot trajec-tories by minimizing a penetration cost in the human workspace occupancy. Theyvalidated their framework via simulation of a human-robot collaboration scenario.

Along these lines, Pérez-D’Arpino et al. [55] proposed a data-driven approachwhich used human motions to predict a target during a reaching-motion task. Un-helkar et al. [78] extended this concept for a human-robot co-navigation task. Thismodel used “human turn signals” during walking as anticipatory indicators of hu-man motion. These indicators were then used to plan motion trajectories for a robot.

3.2 Human-Robot Handovers

A particular kind of activity often conducted in the proximate human-robot inter-action space is a handover. It is an active application space in robotics research[77]. Most of the work on handovers focuses on designing algorithms for robots tosuccessfully hand objects to people, as well as receive objects from them. The re-searchers working in this area use many methods to achieve their goals, including:nonverbal signal analysis, human-human handover models, and legible trajectoryanalysis.

Many researchers used non-verbal signals of people to facilitate fluent objecthandover during human-robot interaction [77]. These signals included eye gaze,body pose, head orientation, etc. For example, Shi et al. [74] focused on buildinga model for a robot to handover leaflets in a public space, looking specifically atthe relationship between gaze, arm extension, and approach. They used a pedestriandetector in their implementation on a small humanoid robot. Their results showedthat pedestrians accepted more leaflets from the robot when their approach was em-ployed than another state of the art approach.

Similarly, Grigore et al. [11] demonstrated that the integration of an understand-ing of joint action into human-robot interaction can significantly improve the suc-cess rate of robot-to-human handover tasks. The authors introduced a higher-levelcognitive layer which models human behavior in a handover situation. They par-


ticularly focused on the inclusion of eye gaze and head orientation into the robot’sdecision making.

Other researchers also investigated human-human handover scenarios to get in-spiration to build models for human-robot handover scenarios [77]. Along this lineof research, Huang et al. [20] analyzed data from human dyads performing a com-mon household handover task - unloading a dish rack. They identified two coordina-tion strategies that enabled givers to adapt to receivers’ task demands, namely proac-tive and reactive methods, and implemented these strategies on a robot to performthe same task in a human-robot team. Their results suggested that neither proactivenor reactive strategy can achieve both better team performance and better user ex-perience. To address this challenge, they developed an adaptive method to achievea better user experience with an improved team performance compared to the othermethods.

To improve the fluency of a robot’s actions during a handover task, Cakmak etal. [5] found that the failure to convey an intention of a robot to handover an objectcauses delay during the handover process. To address this challenge and to achievefluency, the authors tested two separate approaches on a robot: performing distincthandover poses and performing unambiguous transitions between poses during thehandover task. They performed an experiment where a robot used these two ap-proaches while handing over an object to a person. Their findings suggested thatunambiguous transition between poses reduced human waiting times, resulted in asmoother object handover. However, distinct handover poses did not have any effecton that.

Other researchers work on perform trajectory analysis to achieve smooth han-dover of objects. For example, Strabala et al. [76] proposed a coordination struc-ture for human-robot handovers based on human-human handover. The authors firststudied how people perform handovers with their partners. From this study, theauthors structured how people approach, move their hands, and transfer objects.Taking inspiration from this structure, the authors then developed a similar han-dover structure for human-robot handover. This human-robot handover structureconcerned about what, when and where aspects of handovers. They experimentallyvalidated this design structure.

3.3 Fluent Human-Robot Teaming

Many researchers in the robotics community try to build fluent human-robot teams.To achieve this goal, many approaches have been taken, including: insights fromhuman-human teams, cognitive modeling for robots, understanding the coordinationdynamics of teams, and adaptive future prediction methods [77].

To achieve fluency in human-robot teams, many researchers investigated howpeople achieve fluent interaction in human-only teams. This knowledge is used todevelop strategies for robots to achieve fluent interaction while interacting with peo-ple.


Fig. 2 A live performance of a robotic marimba player (from [18])

Taking insights from human-human teaming, Shah et al. [72, 73] developed arobot plan execution system, called Chaski, to use in human-robot teams. This sys-tem enables a robot to collaboratively execute a shared plan with a person. Thissystem can schedule a robot’s action, and adapt to the human teammate to minimizethe human’s idle time. Through a human-robot teaming experiment, the authors val-idated that Chaski can reduce a person’s idle time by 85%.

To build cognitive models for robots, researchers build on many other fields,including cognitive science, neuroscience, and psychology. For example, Hoffmanand Breazeal [13] address the issue of planning and execution through a frame-work for collaborative activity in human-robot groups by building on the variousnotions from cognitive science and psychology literature. They presented a hier-archical goal-oriented task execution system. This system integrated human verbaland nonverbal actions, as well as robot nonverbal actions to support the shared ac-tivity requirements.

Iqbal, Rack, and Riek developed two anticipation algorithms for robots to co-ordinate their movements with people in teams by taking team coordination dy-namics into account [30, 58]. One of the anticipation algorithms (SIA) relied onhigh-level group behavior understanding, whereas the other method (ECA) did notrely on high-level group behavior. The results indicated that the robot was moresynchronous to the team and exhibited more contingent and fluent motion whenthe SIA method was used than the ECA method. These findings suggested that therobot performed better when it had an understanding of high-level group behaviorthan when it did not.

Additionally, Iqbal and Riek [25] investigated how the presence of robots af-fects group coordination when both their behavior and their number (single robotor multi-robot) vary. Their results indicate that group coordination is significantlyaffected when a robot joins a human-only group. The group coordination is furtheraffected when a second robot joins the group and has a different behavior from theother robot. These results indicated that heterogeneous behavior of robots in a multi-


human multi-robot group can play a major role in how group coordination dynamicsstabilize.

Drawing inspiration from the neuroscience and cognitive science literature, Iqbalet al. [29] developed algorithms for robots which leveraged a human-like under-standing of temporal changes during the coordination process, with a particular eyetoward an understanding of rhythmic tempo change. In their work, a robot employedtwo separate processes while coordinating with people, a temporal adaptation pro-cess, and a temporal anticipation process. A robot used the temporal adaptation pro-cess to compensate for temporal errors that occurred while coordinating with people.Additionally, the robot used the anticipation process to generate a prediction aboutthe timing of the next action to coincide with the timing of the next external rhyth-mic signal. They applied these processes to a robot to drum synchronously with agroup of people.

Building adaptive models based on a prediction of future actions is another ap-proach to achieve fluent human-robot collaboration. Hoffman and Breazeal [16] de-veloped a cognitive architecture for robots, taking inspiration from neuropsycho-logical principles of anticipation and perceptual simulation. In this architecture, thefluency in joint action achieved through two processes: 1) anticipation based on amodel of repetitive past events, and 2) the modeling of the resulting anticipatoryexpectation as perceptual simulation. They implemented this architecture on a non-anthropomorphic robotic lamp, which performed a human-robot collaborative task.Their results suggested that the sense of team fluency and the robot’s contributionto the fluency significantly increased when the robotic lamp used their developedarchitecture.

In other work, Hoffman and Breazeal [15] proposed an adaptive action selectionmechanism for a robot in the context of human-robot joint action. This model madeanticipatory decisions based on the confidence of their validity and relative risk.They validated their model through a study involving human subjects working witha simulated robot. They used two versions of robotic behaviors during this study,one was fully reactive, and another one used their proposed anticipation model.Their results suggested a significant improvement in best-case task efficiency andsignificant difference in the perceived commitment of the robot to the team and itscontribution to the team’s fluency and success.

3.4 Robot as a Partner

There are still many open areas regarding social interactional capabilities that arobot should have before it can fluently and naturally interact with people as a part-ner. Many researchers have tried to tackle these open questions by building modelsfor robots to understand and to act appropriately as a partner in social situations[50].

For example, Leite et al. [38] conducted an ethnographic study to investigatehow a robot’s capability of recognizing and responding empathically can influence


Fig. 3 A human-robot drumming team (from Iqbal et al. [29])

an interaction. The authors performed the study in an elementary school where chil-dren interacted with a social robot. That robot had the capability of recognizingand responding empathically to some of the children’s affective states. The resultssuggested that the robot’s empathic behavior had a positive effect on how childrenperceived the robot.

Many researchers also explored how a robot’s explicit behavior can influence itsinteraction with people [39, 69]. For example, Riek et al. [65, 64] investigated howimitation by a robot affects human-robot teaming. They designed a study where arobot performed three head gestures while interacting with a person: full head ges-ture mimicking, partial mimicking, and no mimicking. The authors found that inmany cases people nodded back in response to the robot’s nodding during inter-actions. They suggested incorporating more gestures, along with head nods, whilestudying affective human-robot teaming.

In another study, Riek et al. [62] explored the effect of cooperative gestures per-formed by a humanoid robot in a teaming scenario. The authors performed an ex-periment where they manipulated the gesture type, the gesture style, and the gestureorientation performed by the robot while interacting with people. Their results sug-gested that people cooperate more quickly when the robot performed abrupt (“robot-like”) gestures, and when the robot performed front-oriented gestures. Moreover, thespeed of people’s ability to decode robot gestures is strongly correlated with theirability to decode human gestures.

In HRI, eye gaze can provide important non-verbal information [77]. For exam-ple, Moon et al. [42] performed an experiment where a robot performed human-likegaze behavior during a handover task. In their experiment, a PR2 robot performed


three different gaze behaviors while handing over a water bottle to a person. The re-sults indicated that the timing of handover and the perceived quality of the handoverevent were improved when the robot showed a human-like gaze behavior.

Admoni et al. [1] explored whether a deviation from a robot’s standard behaviorcan influence the interaction. The authors claimed that people often times over-looked robot’s standard non-verbal signals (e.g., eye gaze) if they were not relatedto the primary task. In their experiment, the authors manipulated the handover be-havior of a robot to deviate a little from the standard expected behavior. The resultsof this experiment suggested that a simple manipulation on standard handover tim-ing of a robot made people be more aware of other nonverbal behaviors of the robot,such as eye gaze behavior.

Another well-investigated approach in the field is to teach a robot appropriatebehaviors by teaching it through demonstration, i.e., learning from demonstration(LfD) [3]. For instance, Niekum et al. [47] developed a method to discover seman-tically grounded primitives during a demonstrated task. From these primitives, theauthors then built a finite-state representation of the task. The authors used a BetaProcess Autoregressive Hidden Markov Model to automatically segment demon-strations into motion categories. These categories were then further divided intomotion grounded stated in a finite automaton. From many demonstrated examples,this model was trained on a robot.

Hayes [12] looked at mutual feedback as an implicit learning mechanism duringan LfD scenario. The authors explored grounding sequences as a feedback channelfor mutual understanding. In their study, both a person and a robot provided non-verbal feedback to communicate their mutual understanding. The results from theexperiments showed that people provided implicit positive and negative feedback tothe robot during the interaction, such as by smiling or by averting their gaze fromthe robot. The results of this work can help us to build adaptable robot policies inthe future.

Brys et al. [4] explored how to merge reinforcement learning and LfD approachestogether to achieve a better and faster learning phase. One key limitation of rein-forcement learning is that it often requires a huge amount of training data to achievea desirable level of performance. For a LfD approach, there is no guarantee about thequality of the demonstration, which can have many errors. Brys et al. investigatedthe intersection between these two approaches and tried to speed up the learningphase of RL methods using an approach called reward shaping.

4 Challenges

When a robot leaves controlled spaces and begins to work alongside people, manythings taken for granted in terms of perception and action do not apply, becausepeople act unpredictably, and little can be known about human environments in ad-vance [63, 48, 52]. These challenges include: difficulties in human action detection,understanding of team dynamics, limitations in robot hardware and software design,


Table 1 Application areas of human-robot collaboration

Application Areas Approaches

Proximate Human-Robot TeamingModels from human demonstration ([2], [49])

Anticipatory action planning ([19], [18], [37], [41], [55])

Human Robot Handovers

Non-verbal signal analysis ([74], [11])

Modelling based on human-human handover ([20], [5])

Trajectory analysis ([76])

Fluent Human-Robot Teaming

Insights from human-human teams ([72], [73])

Cognitive modeling ([13], [30], [25], [29], [58])

Predicting actions ([16], [15])

Robot as a Partner

Explicit behavior analysis ([38], [65], [64], [62])

Eye gaze analysis ([42], [1])

Learning from demonstration ([47], [12], [4])

and egocentric perception. This section introduces some of the challenges that re-searchers face while incorporating robots into human environments to coordinatewith people, and briefly discusses some solutions to these problems.

4.1 Uncertainty in Human Action Detection

One of the main challenges to detecting human actions is the unpredictability ofhuman behavior. Sometimes it can be difficult for a robot to perceive and understandthe different types of events involved in these activities to make effective decisionsdue to sensor occlusion, sensor fusion error, unanticipated motion, narrow field ofview, cluttered backgrounds, etc. [66, 7, 6, 59].

One approach to address the challenge of human action detection is to use clas-sification algorithms to detect actions from video data. However, this approach hasmajor challenges, including: intra vs inter class variations between action classes,environment and recording settings, temporal variations of actions, and obtainingand labeling training data [56]. Moreover, using a classifier for action detection hasseveral computational bottlenecks, including: generalizability, abnormality detec-tion, and classifier training [59].

Most of the approaches available in the literature can not handle most of thesechallenges. Moreover, in most action recognition cases, researchers usually assumethat camera positions are static. However, this is not the case for mobile robots [6].

Ryoo and Matthies try to address the challenge of action detection from a first-person point-of-view [68]. In their work, the authors try to detect seven classes ofcommonly observed activities during human-human interaction from a first-person


point-of-view. Ryoo et al. [67] further extended this approach to detect early hu-man activities from a robot. Using their method, a robot can detect human activitiesearly, in real-time, in real-world environments. However, these methods still do notaddress other practical challenges, such as occlusion.

4.2 Unpredictable Changes in Team Dynamics

If a robot has some ability to model team dynamics, it can anticipate future ac-tions in a team, and adapt to those actions to be an effective teammate. However,understanding team dynamics is not trivial. If robots have an understanding of itsenvironment, then its interactions within the team might facilitate a higher-level ofcoordination.

In many human-human team situations, team members are explicitly assigned tovarious roles [53]. On the other hand, in many human-human teams, various rolesemerges over time across the team members to achieve a common goal [35]. Oftentimes these assigned roles change dynamically based on necessities. For example, aperson who begins to lead a team to move a table may follow another teammate’slead later during the moving process. How people coordinate and cooperate amongthemselves in these situations are important indicators for robots to understand var-ious roles in groups.

In human-robot interaction scenarios, various role distribution models are used.High-level role distribution models in the HRI paradigm are master-slave, supervisor-subordinate, partner-partner, teacher-learner, and leader-follower [53, 22, 27, 61].However, these well-defined role distributions are rarely seen in real-world situa-tions. Moreover, distributed roles change dynamically in many situations. Therefore,if the roles are not predefined for an interaction, the robot needs to make predictionsabout the role of co-present people, to infer its own role in the group.

Understanding the role of other people in a group is not easy for a robot. Thus,most of the human-robot teams are designed using some prior distribution of roles toachieve goals. However, a dynamic understanding of role distributions in a human-robot team can enable a robot to understand team dynamics more appropriately,which can lead to a fluent interaction in the group.

4.3 Limited Behavioral Versatility on Robots

Another challenge of incorporating robots in human teams is a lack of versatility ofbehaviors on robots. Most robots are designed to perform a specific task. Therefore,most of time they are limited in their behavioral abilities because they are restrictedby their physical capabilities. For example, some robots are designed to performmanipulation tasks, some are good at recognizing and tracking people, and someare good at mobility.


However, a robot often needs to perform more than one of these abilities simul-taneously to interact fluently with, and establish trust with people. For example, tosocially interact with people, a robot needs to be able to identify them, approachthem by avoiding obstacles, understand verbal and non-verbal messages, commu-nicate verbally and non-verbally, and work alongside them. Thus, researchers needrobots with versatile behaviors and abilities to build more efficient and functionalhuman-robot teams.

Anthropomorphic robots are widely used in social environments to interact withpeople. These robots can engage with people in social interaction by perceiving var-ious social cues from verbal and nonverbal channels, and by communicating withpeople verbally and non-verbally. However, these types of robots are often not de-signed with capabilities to perform other tasks, such as mobility and manipulation.Kismet [32] was one of the first few anthropomorphic robots with an expressive facethat was used to interact with people in social environments using gaze, facial ex-pression, body posture, and vocal babbling. However, due to lack of other physicalparts, such as hands, this social robot lacks the capability to perform hand gesturesto interact with people fluently.

The Nao robot is a widely used humanoid robot for research [44], which canwalk, show expressive gestures, and verbally communicate with people. Becauseof its expressive body gestures and verbal communication capabilities, it became apopular platform which enabled researchers to design a wide variety of interactionswith people. However, it lacks facial expressions and is incapable of performingmanipulation tasks, which limits its utility.

There are also non-anthropomorphic robots that interact and collaborate withpeople. These robots can show various verbal and nonverbal responses, and can alsogenerate animated gestures while collaborating with people. For example, Hoffmanand Ju [17] designed a non-humanoid robot with expressive movements in mind.This robot can perform human-like gestures, such as a head nod to express agree-ment, and a head shake to express disagreement. This robot can express selectivegestures; however, it can not express many other gestures which are possible to per-form with an expressive face.

On the other hand of the spectrum, there exist many robots that are strictlydesigned to perform manipulation tasks, e.g., Fetch and Freight robot by FetchRobotics [9]. These arms are capable of performing dexterous manipulation tasks.However, these robots are not particularly functional in social situations, as oftentimes they are not safe around people, and can not easily generate expressive behav-iors.

PR2 robot by Willow Garage [57] is another widely used robotic platform, par-ticularly for manipulation and handover research. This robot has two manipulationarms with grippers and can perform many dexterous tasks, which make it a widelyused robotic manipulator by the research community. However, this robot lacks thecapability to perform any expressive behavior towards people, and not very suitablefor human social environments.


4.4 Lack of Infrastructure to Support Replicability

Because of the wide range of platforms used on various robots, it is very challengingfor researchers to replicate studies across different robots. This limitation preventshuman-robot collaboration researchers from exploring the effects of using variouskinds of robots in similar situations.

These difficulties include: changes in sensor modalities across various platforms,variation in on-board processing units, and variation in physical structure. For ex-ample, if a robot has a high definition RGB-D camera and has an onboard graphicalprocessing unit, then it can detect facial expressions more precisely. On the otherhand, if another robot only has a low definition RGB camera with no onboard pro-cessing unit, then the same algorithms will not perform consistently.

The Robot Operating System (ROS) is a commonly used platform in the aca-demic community [54]. However, as this is open source software, there are manychallenges using it due to lack of software support and maintenance.

Moreover, similar algorithms need to be implemented on different platforms asnot all robots are using a unified platform. This requires researchers to reimplementpre-existing algorithms to accommodate different platforms, which oftentimes delayprogress. Having common infrastructures will greatly help the research communityto achieve replicability and to explore new robotic behaviors to coordinate withpeople.

5 Discussion

In this chapter, we discussed some exciting recent work on human-robot coordina-tion. We briefly described recent approaches to model human-human and human-robot joint action from the literature. These approaches include: neural process mod-eling, taking a group-perspective, bottom-up approaches, nonlinear dynamical sys-tems approaches, implicit and explicit physiological signals.

We also discussed four main application areas in human-robot cooperation do-main, namely human-robot handovers, interaction in close physical proximities, flu-ent human-robot teaming, and robot as a partner. Many approaches have been takento incorporate robots in these application domains, including: dynamic trajectoryanalysis, anticipatory action planning, cognitive modeling, explicit and implicit be-havior analysis, affective behavior analysis, learning from demonstration (LfD).

Although there exist many applications of human-robot coordination, there alsoexist many practical issues that must be addressed to achieve a higher level of flu-ency in interaction. These practical issues include a lack of work done to detect andrecognize co-present human actions and to understand team dynamics, limitationsin robot design, and a lack of infrastructure to support replicability. Computationalfields, like computer vision and machine learning, are trying to address specificrobotic problems related to real-world scenarios, such as using egocentric vision,computationally inexpensive object proposal algorithms, and so on [7, 6]. Along


with improvements in these technologies and existing algorithms, social robots willbe able to cooperate with co-present people better in human social environments inthe future.

References

[1] Admoni H, Dragan A, Srinivasa S, Scassellati B (2014) Deliberate DelaysDuring Robot-to-Human Handovers Improve Compliance With Gaze Com-munication. Int Conf Human-Robot Interact

[2] Amor HB, Neumann G, Kamthe S, Kroemer O, Peters J (2014) Interactionprimitives for human-robot cooperation tasks. In: IEEE Conf. on Robotics andAutomation

[3] Argall B, Chernova S, Veloso M (2009) A Survey of Robots Learning fromDemonstration. Robotics and Autonomous Systems

[4] Brys T, Harutyunyan A, Brussel VU, Taylor ME (2015) Reinforcement Learn-ing from Demonstration through Shaping. Proc Twenty-Fourth Int Jt Conf Ar-tif Intell

[5] Cakmak M, Srinivasa SS, Lee MK, Kiesler S, Forlizzi J (2011) Using spa-tial and temporal contrast for fluent robot-human hand-overs. In: Proc. ofACM/IEEE HRI

[6] Chan D, Riek LD (2017) Object proposal algorithms in the wild: Are theygeneralizable to robot perception? In: Review

[7] Chan D, Taylor A, Riek LD (2017) Faster robot perception using salient depthperception. In: IROS

[8] Curioni A, Knoblich G, Sebanz N (2017) Joint Action in Humans - A Modelfor Human-Robot Interactions? Section: Human-Humanoid Interaction, Hu-manoid Robotics: a Reference

[9] Fetch Robot (2017) https://www.fetchrobotics.com[10] Forsyth DR (2009) Group dynamics, 4th edn. T. Wadsworth[11] Grigore EC, Eder K, Pipe AG, Melhuish C, Leonards U (2013) Joint action

understanding improves robot-to-human object handover. In: IEEE/RSJ Inter-nation Conference on Intelligent Robots and Systems (IROS)

[12] Hayes CJ, Moosaei M, Riek LD (2016) Exploring implicit human responsesto robot mistakes in a learning from demonstration task. In: Robot and HumanInteractive Communication (RO-MAN)

[13] Hoffman G, Breazeal C (2007) Collaboration in human-robot teams. In: AIAAIntelligent Systems Technical Conference

[14] Hoffman G, Breazeal C (2007) Cost-based anticipatory action selection forhuman–robot fluency. IEEE Transactions on Robotics

[15] Hoffman G, Breazeal C (2007) Effects of anticipatory action on human-robotteamwork efficiency, fluency, and perception of team. Human-Robot Interact

[16] Hoffman G, Breazeal C (2008) Anticipatory Perceptual Simulation forHuman-Robot Joint Practice: Theory and Application Study. AAAI


[17] Hoffman G, Ju W (2014) Designing robots with movement in mind. Journalof Human-Robot Interaction

[18] Hoffman G, Weinberg G (2010) Synchronization in human-robot Musician-ship. Int Symp Robot Hum Interact Commun

[19] Hoffman G, Weinberg G (2011) Interactive improvisation with a roboticmarimba player. Auton Robots

[20] Huang Cm, Cakmak M, Mutlu B (2015) Adaptive Coordination Strategies forHuman-Robot Handovers. In: Robot. Sci. Syst.

[21] Iqbal T, Riek L (2014) Assessing group synchrony during a rhythmic socialactivity: A systemic approach. In: Proc. of the conference of the InternationalSociety for Gesture Studies (ISGS)

[22] Iqbal T, Riek LD (2014) Role distribution in synchronous human-robot jointaction. In: Proc. of IEEE RO-MAN, Towards a Framework for Joint Action

[23] Iqbal T, Riek LD (2015) A Method for Automatic Detection of PsychomotorEntrainment. IEEE Transactions on Affective Computing

[24] Iqbal T, Riek LD (2015) Detecting and Synthesizing Synchronous Joint Actionin Human-Robot Teams. In: International Conference on Multimodal Interac-tion

[25] Iqbal T, Riek LD (2017) Coordination dynamics in multi-human multi-robotteams. IEEE Robotics and Automation Letters (RA-L)

[26] Iqbal T, Gonzales MJ, Riek LD (2014) A Model for Time-Synchronized Sens-ing and Motion to Support Human-Robot Fluency. In: ACM/IEEE Human-Robot Interaction, Workshop on Timing in HRI

[27] Iqbal T, Gonzales MJ, Riek LD (2014) Mobile robots and marching humans:Measuring synchronous joint action while in motion. In: AAAI Fall Symp. onAI-HRI

[28] Iqbal T, Gonzales MJ, Riek LD (2015) Joint action perception to enable flu-ent human-robot teamwork. In: Proc. of IEEE Robot and Human InteractiveCommunication

[29] Iqbal T, Moosaei M, Riek LD (2016) Tempo Adaptation and AnticipationMethods for Human-Robot Teams. In: Robotics: Science and Systems, Plan-ning for HRI: Shared Autonomy and Collab. Robotics Work.

[30] Iqbal T, Rack S, Riek LD (2016) Movement coordination in human-robotteams: A dynamical systems approach. IEEE Transactions on Robotics32(4):909–919

[31] Jarrassé N, Charalambous T, Burdet E (2012) A framework to describe, ana-lyze and generate interactive motor behaviors. PLoS One

[32] Kismet Robot (2017) https://www.ai.mit.edu/projects/humanoid-robotics-group/kismet/kismet.html

[33] Knoblich G, Jordan JS (2003) Action coordination in groups and individuals:learning anticipatory control. J Exp Psychol Learn Mem Cogn

[34] Knoblich G, Butterfill S, Sebanz N (2011) Psychological Research on JointAction: Theory and Data. In: Res. Theory


[35] Konvalinka I, Vuust P, Roepstorff A, Frith CD (2010) Follow you, follow me:continuous mutual prediction and adaptation in joint tapping. J of Experimen-tal Psychology

[36] Konvalinka I, Xygalatas D, Bulbulia J, Schjødt U, Jegindø EM, Wallot S,Van Orden G, Roepstorff A (2011) Synchronized arousal between perform-ers and related spectators in a fire-walking ritual. P Natl Acad Sci USA

[37] Koppula HS, Jain A, Saxena A (2016) Anticipatory planning for human-robotteams. Springer Tracts Adv Robot

[38] Leite I, Castellano G, Pereira A, Martinho C, Paiva A (2012) Modelling em-pathic behaviour in a robotic game companion for children: an ethnographicstudy in real-world settings. ACM/IEEE Int Conf Human-Robot Interact

[39] Lohan KS, Lehmann H, Dondrup C, Broz F, Kose H (2017) Enriching thehuman-robot interaction loop with natural, semantic and symbolic gestures.Section: Human-Humanoid Interaction, Humanoid Robotics: a Reference

[40] Lorenz T, Mortl A, Vlaskamp B, Schubo A, Hirche S (2011) Synchronizationin a goal-directed task: human movement coordination with each other androbotic partners. Proc IEEE RO-MAN

[41] Mainprice J, Berenson D (2013) Human-robot collaborative manipulationplanning using early prediction of human motion. IEEE/RSJ Int Conf IntellRobot Syst

[42] Moon A, Troniak DM, Gleeson B, Pan MKXJ, Zheng M, Blumer BA,MacLean K, Croft EA (2014) Meet Me Where I’M Gazing: How SharedAttention Gaze Affects Human-robot Handover Timing. In: ACM/IEEE Int.Conf. Human-robot Interact.

[43] Mörtl A, Lorenz T, Hirche S (2014) Rhythm patterns interaction-synchronization behavior for human-robot joint action. PloS one

[44] Nao Robot (2017) https://www.ald.softbankrobotics.com/en/cool-robots/nao[45] Néda Z, Ravasz E, Brechet Y, Vicsek T, Barabási AL (2000) Self-organizing

processes: The sound of many hands clapping. Nature[46] Newman-Norlund RD, Noordzij ML, Meulenbroek RGJ, Bekkering H (2007)

Exploring the brain basis of joint action: co-ordination of actions, goals andintentions. Soc Neurosci

[47] Niekum S, Chitta S (2013) Incremental Semantically Grounded Learning fromDemonstration. Robot Sci Syst IX

[48] Nigam A, Riek LD (2015) Social context perception for mobile robots. In:IEEE/RSJ Intelligent Robots and Systems (IROS)

[49] Nikolaidis S, Gu K, Ramakrishnan R, Shah J, May RO (2014) Effi-cient Model Learning for Human-Robot Collaborative Tasks pp 1–9, DOI10.1145/2696454.2696455,

[50] Nomura T (2017) Empathy as signaling feedback between (humanoid) robotsand humans. Section: Human-Humanoid Interaction, Humanoid Robotics: aReference

[51] Novembre G, Ticini LF, Schütz-Bosbach S, Keller PE (2014) Motor simulationand the coordination of self and other in real-time joint action. Soc Cogn AffectNeurosci


[52] O’Connor MF, Riek LD (2015) Detecting social context: A method for socialevent classification using naturalistic multimodal data. In: Automatic Face andGesture Recognition (FG)

[53] Ong K, Seet G, Sim S (2008) An Implementation of Seamless Human-RobotInteraction for Telerobotics. Int J Adv Robot Syst

[54] Open Source Robotics Foundation (2017) https://www.osrfoundation.org/[55] Pérez-DArpino C, Shah J (2015) Fast target prediction of human reaching mo-

tion for cooperative human-robot manipulation tasks using time series classi-fication. In: IEEE Conf. on Robotics and Automation

[56] Poppe R (2010) A survey on vision-based human action recognition. ImageVis Comput

[57] PR2 Robot (2017) http://www.willowgarage.com/pages/pr2/overview[58] Rack S, Iqbal T, Riek L (2015) Enabling synchronous joint action in human-

robot teams. In: Proc. of ACM/IEEE Human-Robot Interaction[59] Ramanathan M, Yau Wy, Teoh EK (2014) Human Action Recognition With

Video Data : Research and Evaluation Challenges. IEEE Transactions onHuman-Machine Systems

[60] Richardson MJ, Garcia RL, Frank TD, Gergor M, Marsh KL (2012) Measuringgroup synchrony: a cluster-phase method for analyzing multivariate movementtime-series. Frontiers in Physiology

[61] Rickert M, Gaschler A, Knoll A (2017) Applications in HHI Physical Coop-eration. Section: Human-Humanoid Interaction, Humanoid Robotics: a Refer-ence

[62] Riek L, Rabinowitch TC, Bremner P, Pipe A, Fraser M, Robinson P (2010)Cooperative gestures: Effective signaling for humanoid robots. ACM/IEEE IntConf Human-Robot Interact

[63] Riek LD (2013) The Social Co-Robotics Problem Space: Six Key Challenges.In Proc of RSS, Robotics Challenges and Visions

[64] Riek LD, Robinson P (2008) Real-time empathy: Facial mimicry on a robot.In: International Conference on Multimodal Interfaces., Affective Interactionin Natural Environments (AFFINE)

[65] Riek LD, Paul PC, Robinson P (2010) When my robot smiles at me: Enablinghuman-robot rapport via real-time head gesture mimicry. J Multimodal UserInterfaces

[66] Riek LD, Rabinowitch TC, Bremner P, Pipe AG, Fraser M, Robinson P (2010)Cooperative gestures: Effective signaling for humanoid robots. In: Proc. ofACM/IEEE HRI

[67] Ryoo M, Fuchs TJ, Xia L, Aggarwal J, Matthies L (2015) Robot-centric activ-ity prediction from first-person videos: What will they do to me? In: Proc. ofACM/IEEE HRI

[68] Ryoo MS, Matthies L (2013) First-Person Activity Recognition: What AreThey Doing to Me? In Proc of IEEE Computer Vision and Pattern Recognition

[69] Sandini G, Sciutti A, Rea F (2017) Movement-based communication forhumanoid-human interaction . Section: Human-Humanoid Interaction, Hu-manoid Robotics: a Reference


[70] Sebanz N, Knoblich G (2009) Prediction in Joint Action: What, When, andWhere. Top Cogn Sci

[71] Sebanz N, Bekkering H, Knoblich G (2006) Joint action: bodies and mindsmoving together. T Cogn Sci

[72] Shah J, Breazeal C (2010) An Empirical Analysis of Team Coordination Be-haviors and Action Planning With Application to Human-Robot Teaming.Hum Factors J Hum Factors Ergon Soc

[73] Shah J, Wiken J, Williams B, Breazeal C (2011) Improved human-robot teamperformance using chaski, a human-inspired plan execution system. Proc 6thInt Conf Human-robot Interact

[74] Shi C, Shiomi M, Smith C, Kanda T, Ishiguro H (2013) A Model of Distribu-tional Handing Interaction for a Mobile Robot. Robot Sci Syst

[75] Słowiński P, Zhai C, Alderisio F, Salesse R, Gueugnon M, Marin L, Bardy BG,di Bernardo M, Tsaneva-Atanasova K (2015) Dynamic similarity promotesinterpersonal coordination in joint-action

[76] Strabala K, Lee MK, Dragan A, Forlizzi J, Srinavasa SS, Cakmak M, Micelli V(2012) Towards Seamless Human-Robot Handovers. J Human-Robot Interact

[77] Thomaz A, Hoffman G, Cakmak M (2016) Computational Human-Robot In-teraction. Foundations and Trends in Robotics

[78] Unhelkar VV, Pérez-DArpino C, Stirling L, Shah J (2015) Human-robot co-navigation using anticipatory indicators of human walking motion. In: IEEEConf. on Robotics and Automation

[79] Valdesolo P, Ouyang J, DeSteno D (2010) The rhythm of joint action: Syn-chrony promotes cooperative ability. J Exp Soc Psychol

[80] Varni G, Volpe G, Camurri A (2010) A System for Real-Time MultimodalAnalysis of Nonverbal Affective Social Interaction in User-Centric Media.IEEE T Multimedia

[81] Vesper C, Butterfill S, Knoblich G, Sebanz N (2010) A minimal architecturefor joint action. Neural Networks

Human Robot Teaming: Approaches from Joint Action and ...cseweb.ucsd.edu/~lriek/papers/iqbal-riek-book2018.pdfHuman Robot Teaming: Approaches from Joint Action and Dynamical Systems

Documents