Université du Québec École de technologie supérieure Face Recognition in Video Using What- and-Where Fusion Neural Network Mamoudou Barry and Eric Granger.

Université du Québec

École de technologie supérieure

Face Recognition in Video Using What-Face Recognition in Video Using What-and-Where Fusion Neural Networkand-Where Fusion Neural Network

Mamoudou Barry and Eric GrangerMamoudou Barry and Eric GrangerLaboratoire d’imagerie, de vision et d’intelligence Laboratoire d’imagerie, de vision et d’intelligence

artificielleartificielleÉcole de technologie supérieureÉcole de technologie supérieure

Montreal, CanadaMontreal, Canada


École de technologie supérieure2

OverviewOverview

1.1. IntroductionIntroduction

2.2. What-and-Where fusion neural networkWhat-and-Where fusion neural network

3.3. Experimental methodologyExperimental methodology

4.4. ResultsResults

5.5. ConclusionConclusion



1. Introduction1. Introduction

Challenges of video-based face recognitionChallenges of video-based face recognition

low quality and resolution of frames.low quality and resolution of frames.

uncontrolled environments: variation in poses, uncontrolled environments: variation in poses, orientation, expressions, illumination, occlusion, orientation, expressions, illumination, occlusion, etc.etc.




General system for face recognition in videoGeneral system for face recognition in video




State of the artState of the art

1.1. Methods based on static imagesMethods based on static images– exploit quality metric, and recognize only high exploit quality metric, and recognize only high

quality ROIsquality ROIs

2.2. Spatiotemporal approachesSpatiotemporal approaches– track faces in the environment, and recognize track faces in the environment, and recognize

individuals over several samplesindividuals over several samples




ObjectivesObjectives

Observe the effectiveness of the What-and-Where Observe the effectiveness of the What-and-Where fusion neural network in video-based face recognitionfusion neural network in video-based face recognition

Robust operation in uncontrolled environmentsRobust operation in uncontrolled environments



2. What-and-Where Fusion Neural Network2. What-and-Where Fusion Neural Network(Granger (Granger et alet al., 2001)., 2001)

Division of data Division of data streamsstreams

1.1. WhatWhat data data:: intrinsic intrinsic properties of a properties of a face face (to classifier)(to classifier)

2.2. WhereWhere data data:: ccontextual ontextual information information (to tracker)(to tracker)

Tracker

Classifier

1

h

R

1

k

L

1

k

L

1

k

L

1

k

L

Evidenceaccumulation

track#

WHAT data stream

WHERE data stream

yeyab

Fe1

Feh

FeR



TrackerTracker: : bank of Kalman filtersbank of Kalman filters estimates the future position stimates the future position

of faces in a sceneof faces in a scene

ClassifierClassifier: : fuzzy ARTMAPfuzzy ARTMAP classifies faces detected in a sceneclassifies faces detected in a scene neural network architecture capable neural network architecture capable of of

fast, stable, online, unsupervised fast, stable, online, unsupervised or or supervised, incremental learning, supervised, incremental learning, classification classification and predictionand prediction

2. What-and-Where Fusion Neural Network2. What-and-Where Fusion Neural Network



2. What-and-Where fusion neural network2. What-and-Where fusion neural network

Evidence accumulationEvidence accumulation

1

k

L

1

k

L

1

k

L

Evidenceaccumulation

Fe1

Feh

FeR

ey

1

h

L-2

2

L-1

L



Sequential evidence accumulationSequential evidence accumulation

Fusion of responses from classifier and trackerFusion of responses from classifier and tracker

1.1. accumulation rule:accumulation rule:

2.2. prediction of the recognition system:prediction of the recognition system:

2. What-and-Where Fusion Neural Network2. What-and-Where Fusion Neural Network

'e e abH H T T y

arg max : 1,2,...,ee

e e e

Hkk

K T k L



3. Experimental methodology3. Experimental methodology

Data set Data set (D. Gorodnichy, CNRC, 2005)(D. Gorodnichy, CNRC, 2005)

Video-based framework for face recognition in videoVideo-based framework for face recognition in video

Task: recognize the user of a PC 11 individuals11 individuals: : 2 video sequences per individual,

one dedicated for training and the other for testing




Data set Data set

different scenarios: pose, expression, orientation, motion, proximity, resolution and partial occlusion.




Protocol for experimentsProtocol for experiments

train:train: train fuzzy ARTMAP with train fuzzy ARTMAP with What What data, data, using two training strategies using two training strategies Hold-Out Validation (HV)Hold-Out Validation (HV) Particle Swarm Optimization (PSO) to optimize hyper-Particle Swarm Optimization (PSO) to optimize hyper-

parameters (Granger parameters (Granger et al.,et al., 2007) 2007)

testtest: : classify classify What What data with fuzzy ARTMAP and data with fuzzy ARTMAP and track track Where Where data with Kalman filtersdata with Kalman filters




Performance measuresPerformance measures

accuracy:accuracy: average classification error (estimate of average classification error (estimate of generalization error)generalization error)

resource requirements:resource requirements:

compression: compression: average number of training patterns average number of training patterns per categoryper category

convergence time:convergence time: average number of epochs average number of epochs required to complete learning.required to complete learning.



4. Results4. Results

Examples of Face DetectionsExamples of Face Detections




Average error and compressionAverage error and compressionvsvs. . ROI scaling size (with 100% of training data)ROI scaling size (with 100% of training data)




Average error and compressionAverage error and compressionvsvs. training subset size (with a |ROI| =10x10). training subset size (with a |ROI| =10x10)




Average convergence timeAverage convergence time

fuzzy ARTMAP with HV: ~fuzzy ARTMAP with HV: ~1 epoch1 epoch

fuzzy ARTMAP with PSO: ~fuzzy ARTMAP with PSO: ~543 epochs543 epochs

(60 particles x ~8.9 iterations x 1 epoch)(60 particles x ~8.9 iterations x 1 epoch)



4. Results4. ResultsAverage confusion matrixAverage confusion matrix



Example of prediction errors over timeExample of prediction errors over time




Effectiveness of the What-and-Where fusion neural Effectiveness of the What-and-Where fusion neural network in improving the accuracy on complex video data network in improving the accuracy on complex video data (about 50% over fuzzy ARTMAP alone, and k-NN).(about 50% over fuzzy ARTMAP alone, and k-NN).

The system is less sensitive to noise: attenuation of fuzzy The system is less sensitive to noise: attenuation of fuzzy ARTMAP poor predictions.ARTMAP poor predictions.

Optimizing the network internal parameters using PSO Optimizing the network internal parameters using PSO learning strategy improves the accuracy of the system.learning strategy improves the accuracy of the system.

Fuzzy ARTMAP yields a higher compression than k-NN: Fuzzy ARTMAP yields a higher compression than k-NN: suitable for real time and ressource limited applications.suitable for real time and ressource limited applications.

5. Conclusion5. Conclusion



6. Future work6. Future work

Explore different ARTMAP models to Explore different ARTMAP models to improve the classification rate.improve the classification rate.

Explore other representations (features) of face Explore other representations (features) of face based on biological vision perception.based on biological vision perception.

Investigate for more robust tracking algorithms Investigate for more robust tracking algorithms such as Extended Kalman filter, Particle filters, such as Extended Kalman filter, Particle filters, etc., for non linear tracking.etc., for non linear tracking.

Université du Québec École de technologie supérieure Face Recognition in Video Using What- and-Where Fusion Neural Network Mamoudou Barry and Eric Granger.

Documents

qubec cole

technologie suprieure

video slide

training data slide

fusion neural network

tracker slide

canada slide

10x10 slide