Top Banner
FaceTrack: FaceTrack: Tracking and Tracking and summarizing faces summarizing faces from compressed from compressed video video Hualu Wang, Harold S. Stone*, Shih-Fu Hualu Wang, Harold S. Stone*, Shih-Fu Chang Chang Dept. of Electrical Engineering, Dept. of Electrical Engineering, Columbia University Columbia University *NEC Research Institute *NEC Research Institute Presentation by Andy Rova School of Computing Science Simon Fraser University
32

FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

FaceTrack: FaceTrack: Tracking and Tracking and

summarizing faces summarizing faces from compressed from compressed

videovideoHualu Wang, Harold S. Stone*, Shih-Fu ChangHualu Wang, Harold S. Stone*, Shih-Fu Chang

Dept. of Electrical Engineering, Columbia Dept. of Electrical Engineering, Columbia UniversityUniversity

*NEC Research Institute*NEC Research InstitutePresentation by

Andy RovaSchool of Computing Science

Simon Fraser University

Page 2: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 2Andy Rova • SFU CMPT 820

IntroductionIntroduction

FaceTrackFaceTrack System for both System for both trackingtracking and and summarizingsummarizing

faces in faces in compressed videocompressed video data data TrackingTracking

Detect faces and trace them through time in video shotsDetect faces and trace them through time in video shots SummarizingSummarizing

Cluster the faces across video shots and associate them Cluster the faces across video shots and associate them with different peoplewith different people

Compressed videoCompressed video Avoids the costly overhead of decoding prior to face Avoids the costly overhead of decoding prior to face

detectiondetection

Page 3: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 3Andy Rova • SFU CMPT 820

System OverviewSystem Overview

The FaceTrack system’s goals are The FaceTrack system’s goals are related to ideas discussed in related to ideas discussed in previous presentationsprevious presentations A face-based video summary can help A face-based video summary can help

users decide if they want to download users decide if they want to download the whole videothe whole video

The summary provides good visual The summary provides good visual indexing information for a database indexing information for a database search enginesearch engine

Page 4: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 4Andy Rova • SFU CMPT 820

Problem definitionProblem definition

The goal of the FaceTrack system is The goal of the FaceTrack system is to take an input video sequence and to take an input video sequence and generate a list of prominent faces generate a list of prominent faces that appear in the video, and that appear in the video, and determine the time periods where determine the time periods where each of the faces appearseach of the faces appears

Page 5: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 5Andy Rova • SFU CMPT 820

General ApproachGeneral Approach Track faces within shots Track faces within shots Once tracking is done, group faces Once tracking is done, group faces

across video shots into faces of different across video shots into faces of different peoplepeople

Output a list of faces for each sequenceOutput a list of faces for each sequence For each face, list shots where it appears, For each face, list shots where it appears,

and whenand when Face recognition Face recognition is notis not performed performed

Very difficult in unconstrained videos due to Very difficult in unconstrained videos due to the broad range of face sizes, numbers, the broad range of face sizes, numbers, orientations and lighting conditionsorientations and lighting conditions

Page 6: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 6Andy Rova • SFU CMPT 820

General ApproachGeneral Approach

Try to work in the compressed domain Try to work in the compressed domain as much as possibleas much as possible MPEG-1 and MPEG-2 videos MPEG-1 and MPEG-2 videos

Used in applications such as digital TV and DVDUsed in applications such as digital TV and DVD Macroblocks and motion vectors can be Macroblocks and motion vectors can be

used directly in tracking used directly in tracking Greater computational speed compared to Greater computational speed compared to

decodingdecoding Can always decode select frames down to Can always decode select frames down to

the pixel level for further analysisthe pixel level for further analysis For example, grouping faces across shotsFor example, grouping faces across shots

Page 7: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 7Andy Rova • SFU CMPT 820

MPEG ReviewMPEG Review 3 types of frame data3 types of frame data

Intra-frames Intra-frames (I-frames)(I-frames) Forward predictive frames Forward predictive frames (P-frames)(P-frames) Bidirectional predictive frames Bidirectional predictive frames (B-frames)(B-frames)

Macroblocks are coding units which Macroblocks are coding units which combine pixel information via DCTcombine pixel information via DCT Luminance and chrominance are separatedLuminance and chrominance are separated

P-frames and B-frames are subjected to P-frames and B-frames are subjected to motion compensationmotion compensation Motion vectors are found and their differences Motion vectors are found and their differences

are encodedare encoded

Page 8: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 8Andy Rova • SFU CMPT 820

System DiagramSystem Diagram

Page 9: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 9Andy Rova • SFU CMPT 820

Face TrackingFace Tracking

ChallengesChallenges Locations of detected faces may not be accurate, Locations of detected faces may not be accurate,

since the face detection algorithm works on 16x16 since the face detection algorithm works on 16x16 macroblocksmacroblocks

False alarms and missesFalse alarms and misses Multiple faces cause ambiguities when they move Multiple faces cause ambiguities when they move

close to each otherclose to each other The motion approximated by the MPEG motion The motion approximated by the MPEG motion

vectors may not be accuratevectors may not be accurate A tracking framework which can handle these A tracking framework which can handle these

issues in the compressed domain is neededissues in the compressed domain is needed

Page 10: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 10Andy Rova • SFU CMPT 820

The Kalman FilterThe Kalman Filter A linear, discrete-time dynamic system A linear, discrete-time dynamic system

is defined by the following difference is defined by the following difference equations:equations:

We only have access to a sequence of We only have access to a sequence of measurements measurements

Given this noisy observation data, the Given this noisy observation data, the problem is to find the optimal estimate problem is to find the optimal estimate of the unknown system state variablesof the unknown system state variables

Page 11: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 11Andy Rova • SFU CMPT 820

The Kalman FilterThe Kalman Filter The “filter” is actually an iterative algorithm The “filter” is actually an iterative algorithm

which keeps taking in new observationswhich keeps taking in new observations The new states The new states are successively estimatedare successively estimated The error of the prediction ofThe error of the prediction of is called is called

the the innovationinnovation The innovation is amplified by a gain matrix The innovation is amplified by a gain matrix

and used as a correction for the state and used as a correction for the state predictionprediction

The corrected prediction is the new state The corrected prediction is the new state estimateestimate

Page 12: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 12Andy Rova • SFU CMPT 820

The Kalman FilterThe Kalman Filter In the FaceTrack system, the state In the FaceTrack system, the state

vectorvector of the Kalman filter is the of the Kalman filter is the kinematic information of the face kinematic information of the face position, velocity (and sometimes position, velocity (and sometimes

acceleration)acceleration) The observation vector The observation vector is the is the

position of the position of the detecteddetected face face May not be accurateMay not be accurate

The Kalman filter lets the system predict The Kalman filter lets the system predict and update the position and parameters and update the position and parameters of the facesof the faces

Page 13: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 13Andy Rova • SFU CMPT 820

The Kalman FilterThe Kalman Filter

The FaceTrack system uses a 0.1 The FaceTrack system uses a 0.1 second time interval for state second time interval for state updatesupdates

This corresponds to every I-frame This corresponds to every I-frame and P-frame for typical MPEG GOP and P-frame for typical MPEG GOP structurestructure GOP: “Group Of Pictures” frame GOP: “Group Of Pictures” frame

structurestructure For example, IBBPBBP…For example, IBBPBBP…

Page 14: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 14Andy Rova • SFU CMPT 820

The Kalman FilterThe Kalman Filter For I-frames, the face detector results are used For I-frames, the face detector results are used

directlydirectly For P-frames, the face detector results are more For P-frames, the face detector results are more

prone to false alarmsprone to false alarms Instead, P-frame face locations are predicted Instead, P-frame face locations are predicted

based on the MPEG motion vectors based on the MPEG motion vectors (approximately)(approximately)

These locations are then fed into the Kalman These locations are then fed into the Kalman filter as observationsfilter as observations (in contrast with previous trackers, which assumed (in contrast with previous trackers, which assumed

that the motion-vector calculated locations were that the motion-vector calculated locations were correct alone) correct alone)

Page 15: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 15Andy Rova • SFU CMPT 820

The Face Tracking The Face Tracking FrameworkFramework

How to discriminate new faces from How to discriminate new faces from previous ones during tracking?previous ones during tracking? The The Mahalanobis distanceMahalanobis distance is a is a

quantitative indicator of how close the new quantitative indicator of how close the new observation is to the predictionobservation is to the prediction

This can help separate new faces from This can help separate new faces from existing tracks: if the Mahalanobis distance existing tracks: if the Mahalanobis distance is greater than a certain threshold, then the is greater than a certain threshold, then the newly detected face is unlikely to belong to newly detected face is unlikely to belong to a particular existing tracka particular existing track

Page 16: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 16Andy Rova • SFU CMPT 820

The Face Tracking The Face Tracking FrameworkFramework

In the case where two faces move close In the case where two faces move close together, Mahalanobis distance alone together, Mahalanobis distance alone cannot keep track of multiple facescannot keep track of multiple faces

Case where a face is missed or occluded:Case where a face is missed or occluded: Hypothesize the continuation of the face trackHypothesize the continuation of the face track

Case of false alarm or faces close Case of false alarm or faces close together:together: Hypothesize creation of a new trackHypothesize creation of a new track

The idea is to wait for new observation The idea is to wait for new observation data before making the final decision data before making the final decision about a trackabout a track

Page 17: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 17Andy Rova • SFU CMPT 820

Intra-shot Tracking Intra-shot Tracking ChallengesChallenges

Multiple hypothesis method:Multiple hypothesis method:

Page 18: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 18Andy Rova • SFU CMPT 820

Kalman Motion ModelsKalman Motion Models The Kalman filter is a framework which can The Kalman filter is a framework which can

model different types of motion, depending on model different types of motion, depending on the system matrices usedthe system matrices used

Several models were tested for the paper, with Several models were tested for the paper, with varying resultsvarying results

Intuition: who pays to research object Intuition: who pays to research object tracking?tracking? The military! The military! Hence many tracking models are based on Hence many tracking models are based on

trajectories that are unlike those that faces in video trajectories that are unlike those that faces in video will likely exhibitwill likely exhibit

For example, in most commercial video, a human For example, in most commercial video, a human face will not maneuver like a jet or missile face will not maneuver like a jet or missile

Page 19: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 19Andy Rova • SFU CMPT 820

Kalman Motion ModelsKalman Motion Models

Four motion models were tested for Four motion models were tested for FaceTrackFaceTrack Constant VelocityConstant Velocity (CV)(CV) Constant AccelerationConstant Acceleration (CA)(CA) Correlated AccelerationCorrelated Acceleration (AA)(AA) Variable DimensionVariable Dimension (VDF)(VDF)

The testing was done against The testing was done against ground ground truthtruth consisting of manually consisting of manually identified face centers in each frameidentified face centers in each frame

Page 20: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 20Andy Rova • SFU CMPT 820

Kalman Motion ModelsKalman Motion Models

Rather than go through the whole Rather than go through the whole process in exact detail, the next process in exact detail, the next several slides are an illustration of several slides are an illustration of the differences between the CV and the differences between the CV and CA modelsCA models

Also, the matrices are expanded to Also, the matrices are expanded to show how the states are updatedshow how the states are updated

Page 21: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 21Andy Rova • SFU CMPT 820

Constant Velocity (CV) Constant Velocity (CV) ModelModel

expand

Page 22: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 22Andy Rova • SFU CMPT 820

Constant Velocity (CV) Constant Velocity (CV) ModelModel

simplify

Page 23: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 23Andy Rova • SFU CMPT 820

Constant Velocity (CV) Constant Velocity (CV) ModelModel

simplify

expand

Page 24: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 24Andy Rova • SFU CMPT 820

Constant Acceleration (CA) Constant Acceleration (CA) ModelModel

Acceleration is now added to the state vector, and is explicitly modeled as constants disturbed by random noises

expand

Page 25: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 25Andy Rova • SFU CMPT 820

Constant Acceleration (CA) Constant Acceleration (CA) ModelModel

simplify

Page 26: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 26Andy Rova • SFU CMPT 820

The Correlated Acceleration The Correlated Acceleration ModelModel

Replaces constant accelerations with a Replaces constant accelerations with a AR(1) modelAR(1) model AR(1): First order autoregressiveAR(1): First order autoregressive

A stochastic process where the immediately previous A stochastic process where the immediately previous value has an effect on the current value (plus some value has an effect on the current value (plus some random noise)random noise)

Why? Why? There is a strong negative autocorrelation There is a strong negative autocorrelation

between the accelerations of consecutive framesbetween the accelerations of consecutive frames Positive accelerations tend to be followed by negative Positive accelerations tend to be followed by negative

accelerationsaccelerations Implies that faces tend to “stabilize”Implies that faces tend to “stabilize”

Page 27: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 27Andy Rova • SFU CMPT 820

The Variable Dimension The Variable Dimension FilterFilter

A system that switches between CV A system that switches between CV (constant velocity) and CA (constant (constant velocity) and CA (constant acceleration) modesacceleration) modes

The dimension of the state vector The dimension of the state vector changes when a maneuver is changes when a maneuver is detected, hence “VDF”detected, hence “VDF”

Developed for tracking highly Developed for tracking highly maneuverable targets (probably maneuverable targets (probably military jets)military jets)

Page 28: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 28Andy Rova • SFU CMPT 820

Comparison of Motion Comparison of Motion ModelsModels

average tracking error

tracking runs (first 16)

Page 29: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 29Andy Rova • SFU CMPT 820

Comparison of Motion Comparison of Motion ModelsModels

Why does CV perform best?Why does CV perform best? Small sampling interval justifies viewing Small sampling interval justifies viewing

face motion as piecewise linear movementsface motion as piecewise linear movements The face cannot achieve very high The face cannot achieve very high

accelerations (as opposed to a jet fighter)accelerations (as opposed to a jet fighter) AA also performs well because it fits the AA also performs well because it fits the

nature of the face motion wellnature of the face motion well Commercial video faces exhibit few Commercial video faces exhibit few

persistent accelerations (negative persistent accelerations (negative autocorrelation)autocorrelation)

Page 30: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 30Andy Rova • SFU CMPT 820

Summarization Across Summarization Across ShotsShots

Select representative frames for tracked Select representative frames for tracked facesfaces Large, frontal-view faces are bestLarge, frontal-view faces are best

Decode representative frames into the pixel Decode representative frames into the pixel domaindomain

Use clustering algorithms to group the faces Use clustering algorithms to group the faces into different personsinto different persons

Make use of domain knowledgeMake use of domain knowledge For example, people do not usually change clothes For example, people do not usually change clothes

within a news segment, but often do change within a news segment, but often do change outfits within a sitcom episodeoutfits within a sitcom episode

Page 31: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 31Andy Rova • SFU CMPT 820

Simulation ResultsSimulation Results

Page 32: FaceTrack: Tracking and summarizing faces from compressed video Hualu Wang, Harold S. Stone*, Shih-Fu Chang Dept. of Electrical Engineering, Columbia University.

March 15, 2005 32Andy Rova • SFU CMPT 820

Conclusions & Future Conclusions & Future ResearchResearch

The FaceTrack is an effective face tracking (and The FaceTrack is an effective face tracking (and summarization) architecture, within which summarization) architecture, within which different detection and tracking methods can be different detection and tracking methods can be usedused Could be updated to use new face detection Could be updated to use new face detection

algorithms or improved motion modelsalgorithms or improved motion models Based on the results, the CV and AA motion Based on the results, the CV and AA motion

models are sufficient for commercial face motionmodels are sufficient for commercial face motion Summarization techniques need the most Summarization techniques need the most

development, followed by optimizing tracking for development, followed by optimizing tracking for adverse situationsadverse situations