Study/ Review of Speechless Interaction Techniques in ...imi.ntu.edu.sg/NewsEvents/Events/PastSeminars/... · - A social robot is an autonomous robot that interacts and communicates

Study/ Review of Speechless Interaction Techniques in Social

Robotics

Dr. Manoj Ramanathan

Institute for Media and Innovation

Nanyang Technological University, Singapore

26th March 2019

IMI Research Seminar

IMI Research Seminar 2

Outline

• Introduction & Motivation• Speechless Techniques

• Gazing• Affective Engine – Emotions• Social Media• Action/Gestures• Reading

• Conclusion

Introduction & Motivation

- A social robot is an autonomous robot that interacts and communicates with humans or other autonomous physical agents by following social behaviors and rules attached to its role. Like other robots, a social robotis physically embodied (avatars or on-screen synthetic social characters are not embodied and thus distinct).

3IMI Research Seminar

[1] Wikipedia - https://en.wikipedia.org/wiki/Social_robot

https://en.wikipedia.org/wiki/Autonomous_robot

https://en.wikipedia.org/wiki/Robot


- Human interaction- Environmental

awareness- Applications




Verbal – Based on words

Non-Verbal

- Speechless- How to use these methods for social robots?

Gazing

• Eye contact plays an important role

• Based on user position, robot’s head and eyes have to be changed

• Where and When to see?

Gazing

- One-to-one- Do we maintain eye

contact throughout [1]


- Multiple people- Which person to see?

- Presentation or any other application context

- Interesting objects or events

[1] Ruhland, K. and Andrist, S. and Badler, J. B. and Peters, C. E. and Badler, N. I. and Gleicher, M. and Mutlu, B. and McDonnell, R., “Look me in the Eyes: A Survey of Eye and Gaze Animation for Virtual Agents and Artificial Systems”, Eurographics 2014: State of the art reports, http://dx.doi.org/10.2312/egst.20141036

http://dx.doi.org/10.2312/egst.20141036

Affective Engine - Emotions

• Understanding and processing human emotions

• Showing all possible emotions in behavior

[1] J. Zhang, J. Zheng, and N. Magnenat-Thalmann, “PCMD: personality characterized mood dynamics model toward personalized virtual characters,” Computer Animation and Virtual Worlds, vol. 26, pp. 237–245, May 2015

Social Media

• Simple ones like email

• More difficult ones like skype, facebook, twitter etc

• #AskSophia in Twitter

• Not only words, images, videos etc can be shared

Actions/Gestures

• Simple gestures

• Subtle Actions/Gestures

• Which actions to react to?

• When and how to react?


Actions/Gestures• [1] focused on action recognition in social robots• Based on direction vectors of each skeleton joint’s movement or

pose trajectories• Histogram of direction vectors was used for representation• Total 18 activities were included• Tested on P3Dx mobile robot platform

[1Chrungoo A., Manimaran S.S., Ravindran B. (2014) Activity Recognition for Natural Human Robot Interaction. In: Beetz M., Johnston B., Williams MA. (eds) Social Robotics. ICSR 2014. Lecture Notes in Computer Science, vol 8755. Springer, Cham[2] Ingo Keller, Markus Schmuck, Katrin Solveig Lohan, “Towards a model for automatic action recognition for social robot companions”, 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pg. 85 – 90, August 2016.

• [2] proposed how robots could learn actions/ tasks from users

• Two tasks were considered• Salt-Shaker• Cup-stacking

• iCub Robot was used• The user teaches tasks to the robot and it differentiates

between the 2 actions.


Actions/Gestures

• [1] proposed Space-Time Occupancy Patterns (STOP)

• Uses depth maps• Each depth sequence, divided along spatial and

time axes to create 4D grids• Preserves spatial and temporal contextual

information between space-time cells• Online action recognition is possible• Used with a Robot-

https://www.youtube.com/watch?v=oPvO-7kDM0c

[1] Antonio W. Vieira, Erickson R. Nascimento, Gabriel L. Oliveira, Zicheng Liu, Mario F. M. Campos, “STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences”, CIARP 2012: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications pg 252-259

https://www.youtube.com/watch?v=oPvO-7kDM0c

13

Skeleton Representation with CNN [1] –• Transform 3D skeleton sequences into three

clips• Each clip is generated from one channel of the

cylindrical coordinates of the skeleton sequence.

• Each frame of the clip represents temporal information of entire skeleton sequence and incorporates one particular spatial relationship between joints.

• Multiple frames with different spatial relationships are included.

Actions/Gestures

[1] Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, Farid Boussaid, “A New Representation of Skeleton Sequences for 3D Action Recognition,” in CVPR 2017, July 2017.

IMI Research Seminar

Clip or Feature Generation


Actions/GesturesSkeleton Representation with CNN [1] –• Deep CNN is used to learn long-term temporal information of the generated

clips and then use Multi-Task Learning Network to jointly process all frames of generated clips to incorporate spatial structural information.

[1] Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, Farid Boussaid, “A New Representation of Skeleton Sequences for 3D Action Recognition,” in CVPR 2017, July 2017.


Actions/Gestures

Action recognition from Nadine. Current implementation recognizes 21 actions. ( Shaking head, nodding,

drinking water, clapping, waving, taking selfie, reading, writing, etc. )

Action: Make phone call/ Answer phone Action: Check Time

M. Ramanathan, W.-Y. Yau, E. K. Teoh and N. Magnenat Thalmann, “Pose-invariant kinematic features for action recognition”, APSIPA-ASC

December 2017, IEEE, pg. 292- 297.

Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang, "NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis", in IEEE

Conference on Computer Vision and Pattern Recognition (CVPR), 2016

Discussion

• Actions to be recognized by Robot/VH

– Which actions need to be recognized

– Depending on application classroom, office, shops etc

• Action as a social cue

– Reactions

• How to react after recognition?

• Several factors user emotion, robot emotion, context etc

• Observe subtle, non-verbal actions


Discussion

• Reactions– Movie scripts can be used to develop [1]

– NLTK allows to identify verbs/actions in scripts

– Using the identified verbs, we can extract context and possible reactions – Mainly for verbal responses.

– Train model to generate verbal responses based on action, context, objects etc

– Limitations

• Word meaning ambiguities

• Action meaning based on context

• Extract clean and required data


[1] www.dailyscript.com, www.movie-page.com, www.weeklyscript.com

http://www.dailyscript.com/

http://www.movie-page.com/movie_scripts.htm

http://www.weeklyscript.com/movies.htm

Reading

• OCR is widely available

• Language?

• Lots of possible applications

• Make robot understand what it has read and react accordingly.

Reading

• Pioneer 2 robot [1] – simple textual message read from images

• Pioneer 2 robot [2] – Understand symbols

• Focus was only if OCR performance is good but not on what to do after reading.


[1] D. Letourneau ; F. Michaud ; J.-M. Valin ; C. Proulx, Textual message read by a mobile robot, IROS 2003, October 2003, pg. 2724 - 2729

[2] F. Michaud ; D. Letourneau, Mobile robot that can read symbols, Proceedings 2001 IEEE International Symposium on Computational Intelligence in Robotics and Automation, pg. 338 - 343, August 2001

Reading

• Google Cloud Vision API [1]

• Pytesseract-OCR [2]

• Applications– IC Registration

– Parcel Delivery


[1] https://console.cloud.google.com/[2] https://github.com/UB-Mannheim/tesseract/wiki

https://console.cloud.google.com/

https://github.com/UB-Mannheim/tesseract/wiki

Reading

- Layout Organization[1]


- Hand-written characters- Lighting- Language

- Can Robot Understand?

- Application based- Showing emotions while

reading stories

[1] Yucun Pan ; Qunfei Zhao ; Seiichiro Kamata, "Document layout analysis and reading order determination for a reading robot", TENCON 2010, IEEE Region 10 Conference, November 2010, pg. 1607 - 1612

Conclusion

• Understanding user intentions, behavior

– Very difficult to quantify, perceive

– Too many possible variations

– Actions, Emotions are only some visible cues but are not definitive

– Developing a reaction model is not easy

• Reading opens another avenue of communication

– Few limitations still exist in OCR that prevents complete implementation for robots


Thank you!!

Q & A ??


Robot Gazing Affective System

Gesture/Actions

Human –like Appearance

Nadinehttps://en.wikipedia.org/wiki/Nadine_Social_Robot

Sophiahttps://en.wikipedia.org/wiki/Sophia_(robot)

NA

Receptionist Robothttps://tinyurl.com/yxmdgw8m

Palroboticshttp://reemc.pal-robotics.com/en/reemc/

NA NA

Nancyhttps://www.ece.nus.edu.sg/econnect/issue4.pdf

Ericahttps://robots.ieee.org/robots/erica/

NA

Conclusion

https://en.wikipedia.org/wiki/Nadine_Social_Robot

https://en.wikipedia.org/wiki/Sophia_(robot)

https://tinyurl.com/yxmdgw8m

http://reemc.pal-robotics.com/en/reemc/

https://www.ece.nus.edu.sg/econnect/issue4.pdf

https://robots.ieee.org/robots/erica/

Study/ Review of Speechless Interaction Techniques in ...imi.ntu.edu.sg/NewsEvents/Events/PastSeminars/... · - A social robot is an autonomous robot that interacts and communicates

Documents