Finding Time Together: Detection and Classification of ... · 2. Focused Interaction Detection using Multimodal Features 3. Evaluation Focused Interaction Dataset • 19 egocentric

Finding Time Together: Detection and Classification of Focused Interaction in Egocentric Video

Sophia Bano, Jianguo Zhang, Stephen J. McKenna, {s.bano, j.n.zhang, s.j.z.mckenna}@dundee.ac.uk

Computer Vision and Image Processing Group, Computing, School of Science and Engineering, University of Dundee, United Kingdom

International Conference on

Computer Vision 2017

1. Introduction

Focused Interaction (FI)

• Co-present individuals, having mutual focus of attention, interact by establishing face-to-face engagement and direct conversation [1]

Hypothesis

• Fusion of multimodal features can improve the overall FI detection

Challenges

• Face-to-face engagement often not maintained

• Conversational partner not always present in the video frame

• Varying illumination

• Varied scenes

Examples from our Focused Interaction Dataset

Existing methods

• Off-line processing of video clips or photo streams captured in quite constrained conditions and interacting people always in view [2, 3, 4]

2. Focused Interaction Detection using Multimodal Features

3. EvaluationFocused Interaction Dataset

• 19 egocentric videos (378 mins) captured using a shoulder-mounted GoPro Hero4 at 18 different locations and with 16 conversational partners

Observations

• Fusion of multimodal features is useful for discriminating no FI and FI (walk) when using SVM with RBF kernel

• Face track and VAD scores are significant for discriminating FI (non-walk)

Limitations

• Sound from nearby surroundings influenced the VAD

• Low illumination scenarios affected the face tracker

HOG: Histogram of Oriented Gradient

KLT: Kanade-Lucas-Tomasi Tracker

HOOF: Histogram of Oriented Optical Flow [5]

VAD: Voice Activity Detection [6]

References[1] E. Goffman. Encounters: Two studies in the sociology of interaction. Bobbs-Merrill, 1961.[2] M. Aghaei, M. Dimiccoli, P. Radeva. With whom do I interact? Detecting social interactions in egocentric photostreams. IEEE

ICPR, 2016.[3] S. Alletto, G. Serra, S. Calderara, F. Solera, R. Cucchiara. From ego to nos-vision: Detecting social relationships in first-person

views. IEEE CVPRW, 2014.[4] A. Fathi, J. K. Hodgins, J. M. Rehg. Social interactions: A first-person perspective. IEEE CVPR, 2012.[5] R. Chaudhry, A. Ravichandran, G. Hager, R. Vidal. Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear

dynamical systems for the recognition of human actions. IEEE CVPR, 2009.[6] M. Van Segbroeck, A. Tsiartas, S. Narayanan. A robust frontend for VAD: exploiting contextual, discriminative and spectral cues

of human voice. INTERSPEECH, 2013.

4. Conclusion and Future Work• Automatic online classification of Focused Interaction in continuous,

egocentric videos

• Multimodal features: face track, VAD and camera motion profile

• Best performance with multimodal feature fusion and SVM with RBF kernel

• Future work involves the use of recurrent neural network for classification and to extend this work to identify conversational partners

AcknowledgementsThis work is supported by the UKEngineering and Physical SciencesResearch Council (EPSRC) under grantEP/N014278/1: ACE-LP: AugmentingCommunication using EnvironmentalData to drive Language Prediction.

An outdoor night-time FI scenario with weak visual cues due to low illumination

https://ace-lp.ac.uk/

FI in which conversational partners are in the field of view of the camera

FI in which the conversational partner is no longer in the field of view as the interaction occurred while walking

No Focused Interaction Focused Interaction (non-walk) Focused Interaction (walk)

Computer work FI-NW Searching for documents Walk- turn around-walk FI-NWCamera setup

VAD score

0 20 40 60 80 100 120 140 160 180 200Time (sec)

0 20 40 60 80 100 120 140 160 180 2000

Camera motion feature (HOOF)

Input video

Video Stream

Audio Stream Audio-based

feature (VAD)

Face detection (HOG) and

tracking (KLT)

Feature concatenation

Temporal windowing

Classification using SVM

(Linear/RBF)

FI-NW(non-walk)

FI-W (walk)

Finding Time Together: Detection and Classification of ... · 2. Focused Interaction Detection using Multimodal Features 3. Evaluation Focused Interaction Dataset • 19 egocentric

Documents

Interaction with Surfaces. Aims Last week focused on looking...

Two-person Interaction Detection Using Body-Pose Features...

Single molecule detection of PARP1 and PARP2 interaction...

Review Article Protein-Protein Interaction Detection:...

Interaction of a strongly focused light beam with single...

NMR detection of intermolecular interaction sites in the ......

The impact of focused training on abnormality detection ...

regression trees with unbiased variable selection and...

Modeling the Interaction of a Laser Target Detection ... ·...

Improving Robot Vision Models for Object Detection Through.....

Pocket detection and interaction-weighted ligand-similarity....

Bird and Bat Interaction Vision-Based Detection System for.....

Detection of social interaction using ... - Kevin Curran

Focused Clustering and Outlier Detection in Large Attributed...

Task-Based Language-Learning: Form-Focused Interaction ·.....

Two-person Interaction Detection Using Body-Pose … ·...