Study/ Review of Speechless Interaction Techniques in Social Robotics Dr. Manoj Ramanathan Institute for Media and Innovation Nanyang Technological University, Singapore 26 th March 2019 IMI Research Seminar
Study/ Review of Speechless Interaction Techniques in Social
Robotics
Dr. Manoj Ramanathan
Institute for Media and Innovation
Nanyang Technological University, Singapore
26th March 2019
IMI Research Seminar
IMI Research Seminar 2
Outline
• Introduction & Motivation• Speechless Techniques
• Gazing• Affective Engine – Emotions• Social Media• Action/Gestures• Reading
• Conclusion
Introduction & Motivation
- A social robot is an autonomous robot that interacts and communicates with humans or other autonomous physical agents by following social behaviors and rules attached to its role. Like other robots, a social robotis physically embodied (avatars or on-screen synthetic social characters are not embodied and thus distinct).
3IMI Research Seminar
[1] Wikipedia - https://en.wikipedia.org/wiki/Social_robot
Introduction & Motivation
- Human interaction- Environmental
awareness- Applications
4IMI Research Seminar
Introduction & Motivation
5IMI Research Seminar
Verbal – Based on words
Non-Verbal
- Speechless- How to use these methods for social robots?
Gazing
• Eye contact plays an important role
• Based on user position, robot’s head and eyes have to be changed
• Where and When to see?
Gazing
- One-to-one- Do we maintain eye
contact throughout [1]
7IMI Research Seminar
- Multiple people- Which person to see?
- Presentation or any other application context
- Interesting objects or events
[1] Ruhland, K. and Andrist, S. and Badler, J. B. and Peters, C. E. and Badler, N. I. and Gleicher, M. and Mutlu, B. and McDonnell, R., “Look me in the Eyes: A Survey of Eye and Gaze Animation for Virtual Agents and Artificial Systems”, Eurographics 2014: State of the art reports, http://dx.doi.org/10.2312/egst.20141036
Affective Engine - Emotions
• Understanding and processing human emotions
• Showing all possible emotions in behavior
[1] J. Zhang, J. Zheng, and N. Magnenat-Thalmann, “PCMD: personality characterized mood dynamics model toward personalized virtual characters,” Computer Animation and Virtual Worlds, vol. 26, pp. 237–245, May 2015
Social Media
• Simple ones like email
• More difficult ones like skype, facebook, twitter etc
• #AskSophia in Twitter
• Not only words, images, videos etc can be shared
Actions/Gestures
• Simple gestures
• Subtle Actions/Gestures
• Which actions to react to?
• When and how to react?
IMI Research Seminar 11
Actions/Gestures• [1] focused on action recognition in social robots• Based on direction vectors of each skeleton joint’s movement or
pose trajectories• Histogram of direction vectors was used for representation• Total 18 activities were included• Tested on P3Dx mobile robot platform
[1Chrungoo A., Manimaran S.S., Ravindran B. (2014) Activity Recognition for Natural Human Robot Interaction. In: Beetz M., Johnston B., Williams MA. (eds) Social Robotics. ICSR 2014. Lecture Notes in Computer Science, vol 8755. Springer, Cham[2] Ingo Keller, Markus Schmuck, Katrin Solveig Lohan, “Towards a model for automatic action recognition for social robot companions”, 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pg. 85 – 90, August 2016.
• [2] proposed how robots could learn actions/ tasks from users
• Two tasks were considered• Salt-Shaker• Cup-stacking
• iCub Robot was used• The user teaches tasks to the robot and it differentiates
between the 2 actions.
IMI Research Seminar 12
Actions/Gestures
• [1] proposed Space-Time Occupancy Patterns (STOP)
• Uses depth maps• Each depth sequence, divided along spatial and
time axes to create 4D grids• Preserves spatial and temporal contextual
information between space-time cells• Online action recognition is possible• Used with a Robot-
https://www.youtube.com/watch?v=oPvO-7kDM0c
[1] Antonio W. Vieira, Erickson R. Nascimento, Gabriel L. Oliveira, Zicheng Liu, Mario F. M. Campos, “STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences”, CIARP 2012: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications pg 252-259
13
Skeleton Representation with CNN [1] –• Transform 3D skeleton sequences into three
clips• Each clip is generated from one channel of the
cylindrical coordinates of the skeleton sequence.
• Each frame of the clip represents temporal information of entire skeleton sequence and incorporates one particular spatial relationship between joints.
• Multiple frames with different spatial relationships are included.
Actions/Gestures
[1] Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, Farid Boussaid, “A New Representation of Skeleton Sequences for 3D Action Recognition,” in CVPR 2017, July 2017.
IMI Research Seminar
Clip or Feature Generation
IMI Research Seminar 14
Actions/GesturesSkeleton Representation with CNN [1] –• Deep CNN is used to learn long-term temporal information of the generated
clips and then use Multi-Task Learning Network to jointly process all frames of generated clips to incorporate spatial structural information.
[1] Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, Farid Boussaid, “A New Representation of Skeleton Sequences for 3D Action Recognition,” in CVPR 2017, July 2017.
IMI Research Seminar 15
Actions/Gestures
Action recognition from Nadine. Current implementation recognizes 21 actions. ( Shaking head, nodding,
drinking water, clapping, waving, taking selfie, reading, writing, etc. )
Action: Make phone call/ Answer phone Action: Check Time
M. Ramanathan, W.-Y. Yau, E. K. Teoh and N. Magnenat Thalmann, “Pose-invariant kinematic features for action recognition”, APSIPA-ASC
December 2017, IEEE, pg. 292- 297.
Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang, "NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis", in IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 2016
Discussion
• Actions to be recognized by Robot/VH
– Which actions need to be recognized
– Depending on application classroom, office, shops etc
• Action as a social cue
– Reactions
• How to react after recognition?
• Several factors user emotion, robot emotion, context etc
• Observe subtle, non-verbal actions
IMI Research Seminar 16
Discussion
• Reactions– Movie scripts can be used to develop [1]
– NLTK allows to identify verbs/actions in scripts
– Using the identified verbs, we can extract context and possible reactions – Mainly for verbal responses.
– Train model to generate verbal responses based on action, context, objects etc
– Limitations
• Word meaning ambiguities
• Action meaning based on context
• Extract clean and required data
IMI Research Seminar 17
[1] www.dailyscript.com, www.movie-page.com, www.weeklyscript.com
Reading
• OCR is widely available
• Language?
• Lots of possible applications
• Make robot understand what it has read and react accordingly.
Reading
• Pioneer 2 robot [1] – simple textual message read from images
• Pioneer 2 robot [2] – Understand symbols
• Focus was only if OCR performance is good but not on what to do after reading.
IMI Research Seminar 19
[1] D. Letourneau ; F. Michaud ; J.-M. Valin ; C. Proulx, Textual message read by a mobile robot, IROS 2003, October 2003, pg. 2724 - 2729
[2] F. Michaud ; D. Letourneau, Mobile robot that can read symbols, Proceedings 2001 IEEE International Symposium on Computational Intelligence in Robotics and Automation, pg. 338 - 343, August 2001
Reading
• Google Cloud Vision API [1]
• Pytesseract-OCR [2]
• Applications– IC Registration
– Parcel Delivery
IMI Research Seminar 20
[1] https://console.cloud.google.com/[2] https://github.com/UB-Mannheim/tesseract/wiki
Reading
- Layout Organization[1]
21IMI Research Seminar
- Hand-written characters- Lighting- Language
- Can Robot Understand?
- Application based- Showing emotions while
reading stories
[1] Yucun Pan ; Qunfei Zhao ; Seiichiro Kamata, "Document layout analysis and reading order determination for a reading robot", TENCON 2010, IEEE Region 10 Conference, November 2010, pg. 1607 - 1612
Conclusion
• Understanding user intentions, behavior
– Very difficult to quantify, perceive
– Too many possible variations
– Actions, Emotions are only some visible cues but are not definitive
– Developing a reaction model is not easy
• Reading opens another avenue of communication
– Few limitations still exist in OCR that prevents complete implementation for robots
IMI Research Seminar 22
Thank you!!
Q & A ??
IMI Research Seminar 24
Robot Gazing Affective System
Gesture/Actions
Human –like Appearance
Nadinehttps://en.wikipedia.org/wiki/Nadine_Social_Robot
Sophiahttps://en.wikipedia.org/wiki/Sophia_(robot)
NA
Receptionist Robothttps://tinyurl.com/yxmdgw8m
Palroboticshttp://reemc.pal-robotics.com/en/reemc/
NA NA
Nancyhttps://www.ece.nus.edu.sg/econnect/issue4.pdf
Ericahttps://robots.ieee.org/robots/erica/
NA
Conclusion