Computer Vision in Human-Computer Interaction · 2. Recognizing emotions of a person and providing an affective robot’s response on the basis of person’s emotions 3. HRI demonstration

MACHINE VISION GROUP

Computer Vision in Human-Computer Interaction

Matti Pietikäinen

Machine Vision Group

Department of Electrical and Information Engineering and Infotech Oulu

University of Oulu, Finland

http://www.ee.oulu.fi/mvg

Invited talk in 2010 Autumn Seminar and Meeting of Pattern Recognition Society of

Finland, M/S Baltic Princess, 26.11.2010


Contents

• Introduction: Example applications

• Case project: Affective human-robot interaction

• Research on key computer vision methods for HCI in Oulu

• Case project: Vision-based mobile HCI

• Some challenges for future research


Example application: Microsoft’s Kinect

- Controller free interface for Xbox 360

- Range and color cameras

- Brings 3D imaging to mass markets


Example application: Human-robot interaction

- Communication must be easy and natural


Example application: Smart environments


Example application: Vision-based mobile HCI

– User authentication

– Vision-based interaction


Case project: Affective human-robot interaction

• Development of natural, affective human-computer interfaces (HCI) is of great interest in building future ubiquitous computing systems

• In 15 years servicing robots (social robots) will become a part of our everyday lives

• We should be able to inteact with robots in a natural way, like in human-human interaction

• Computer vision will play a key role building affective HCI and HRI (human-robot interaction) systems of the future

• A joint research on Affective HRI is in progress by Machine Vision Group Intelligent Systems groups of the University of Oulu

• Supported by the Academy of Finland, European Regional Development Fund and University of Oulu


Objectives

• To develop leading edge approaches for affective HRI

• Both ”face to face” interaction and remote interaction (using mobile

phones or PDAs equipped with cameras) will be considered

• The robot should be able to

- detect and identify the user

- recognize user’s emotions

- communicate easily by understaning speech and gestures

- provide a natural response based on its observations

- learn its behavior and tasks it is supposed to do

- utilize its motion in different levels of interaction (robot

embodiment)


Work packages

WP1: Machine vision for human-robot interaction

WP2: Human-robot interaction

WP3: Experimental validation of affective HRI


WP1: Machine vision for human-robot interactionPerson identification

• Face recognition, gender recognition, age estimation

• Speaker identification

• Gait recognition, clothing (recognition from a distance)

Recognition of emotions

• Facial expression recognition

• Speech, body movements

Methods for human-robot communication

• Hand gestures and speech (”face to face” communication)

• Body movements (e.g. waving hands) (communication from a distance)

• Using camera motion (communication with a mobile device)

• A talking avatar on robot’s flat panel display

On-line machine learning and adaptation


WP2: Human-robot interaction

Embodiment and learning in HRI:

• Our goal is to investigate how a robot could learn to interact with a person in a socially interesting way

• A possible approach is illustrated in the following figure


An example target system

The desired state in this example could be modified as a function of time as the the person and the robot get acquainted, e.g. the robot could learn to change its behavior in a pace of interaction.


WP3: Experimental validation of affective HRI

Different types of tasks of affective HRI are experimentally validated and

demonstrated, e.g.

1. Recognizing a person and providing a personalized robot’s response

2. Recognizing emotions of a person and providing an affective robot’s response on the basis of person’s emotions

3. HRI demonstration where the robot utilizes its learned behaviors in order to initialize, and maintain natural interaction with a person.

4. System-level demonstrations, e.g. robot guide for the visitors in a smartenvironment


Research of key computer vision methods for HCI in Oulu

1. Face recognition

2. Facial expression recognition

3. Visual speech recognition

4. Video synthesis for face animation

5. Object detection and recognition

6. Tracking of moving objects

7. Object re-identification in camera networks

8. Recognition of actions and gait


Face analysis: research challenges

Face is a dynamic and non-rigid object which is difficult to model because of

the large variability in its appearance.

The appearance of the face varies due to changes in pose, facial expression,

illumination, aging, occlusion, presence of glasses etc.


Face analysis using local binary patterns

• Face recognition is one of the major challenges in computer vision

• We proposed (ECCV 2004, PAMI 2006) a face descriptor based on LBP’s

• Our method has been adopted by many leading scientists

• Excellent results in face recognition and authentication, face detection,

facial expression recognition, gender classification

• LBP has a significant role in EU projects: ”Mobile Biometry” (2008-2010)

coordinated by IDIAP (Switzerland), and a new EU project ”Trusted

Biometrics under Spoofing Attacks” (Tabula Rasa)


Case: Mobile biometry (MOBIO) 2008-2010

(www.mobioproject.org)

• The aim of is to investigate multiple aspects of biometric

authentication based on the face and voice in the context of

mobile devices

• To increase security and user acceptance - using standard

sensors already available on mobile phones

• Coordinator: IDIAP Research Institute (CH)

• Partners: University of Manchester (UK), University of Surrey

(UK), Universite d’Avignon (FR), Brno University of Technology

(CZ), University of Oulu (FI), IdeArk (CH), EyePmedia (CH),

Visidon (FI)


Case: Trusted biometrics under spoofing attacks (TABULA

RASA) 2010-2014 (http://www.tabularasa-euproject.org/)

• The project will address some of the issues of direct

(spoofing) attacks to trusted biometric systems. This is an issue

that needs to be addressed urgently because it has recently

been shown that conventional biometric techniques, such as

fingerprints and face, are vulnerable to direct (spoof) attacks.

• Coordinated by IDIAP, Switzerland

• We will focus on face and gait recognition


Dynamic texture descriptors for motion analysis

• We proposed (PAMI 2007) simple spatiotemporal LBP descriptors for

dynamic texture recognition outperforming the state-of-the-art

• They have been applied to facial expression regonition (PAMI 2007),

face and gender recognition from video sequences (Pattern Recogn.

2009), visual speech recognition (IEEE T Multimedia 2009), and

recognition of actions and gait (BMVC 2008, ICB 2009, MVA 2010) -

with excellent results

• Our approach has potential for significant contributions in many

applications and fundamental problems of motion and activity analysis


Demo for facial expression recognition

Future challenge:

How to recognize spontaneous expressions insteadof acted ones?


Demo for visual speech recognition


Video synthesis for face animation

• Dynamic texture synthesis

(ICIP 2009) • Video-realistic speech animation

(ICVGIP 2010)


Object detection and recognition


Tracking of moving objects


Object re-identification in camera networks

• Person re-identification using global color context (VS 2010)


Recognition of actions and gait

B F S B F S


Dynamic textures for action recognition

• Formation of the feature histogram for an xyt

volume of short duration

• HMM is used for sequential modeling

• State-of-the-art results for Weizmann and KTH

databases (BMVC 2008, MVA 2010)

Feature histogram of a bounding volume


Case project: Vision-based mobile HCI

• Hand motion can be used for controlling mobile devices

• Camera becomes a motion sensor

• A fast and efficient method for

image motion estimation has

been proposed (CVIU 2007)

– Uncertainty analysis is performed

for each motion feature

• Symbian implementation

Local motion analysis: (a) 16 subregions and selected feature blocks, (b) feature motion estimates and associated error covariances.


Interactive Panorama Builder

Motion estimation system calculates shift, rotation and scale in real timeWhen frame is suitable for stitching (high quality), the user receives feedback and instructions


Scene Panorama Builder


Automatic Device Activation

The principle: the light is not on if no one is watching

Example case: recognizing context and sequences of actions


Virtual 3D Display

Face-tracking based 3D rendering

• Face-Tracker obtains 2D face position at 25 fps• Face distance to the screen estimated with face size• Face distance estimated using Motion Estimation Library


Camera Assisted Multimodal UI

• UI concepts that rely on multiple sensors of modern mobile devices

• Used for recognizing context and sequences of actions

• The key motivation is to hide start-up latencies of the functionalities from the user


Some future challenges for computer vision in HCI

• Face analysis in natural environments

- changing illumination, varying view and distance, different sensors etc.

• Recognition of spontaneous facial expressions

- instead of acted expressions

• Recognition of natural human actions

- instead of acted ones

• Person tracking, identification and activity recognition in multicamera

networks

• Use of multimodal information

- e.g. emotions from facial expressions, body movements, speech, biosignals

• Etc.

Computer Vision in Human-Computer Interaction · 2. Recognizing emotions of a person and providing an affective robot’s response on the basis of person’s emotions 3. HRI demonstration

Documents