Lectureship Early Career Fellowship School of Technology, Oxford Brookes University 19/6/2008 Fabio Cuzzolin INRIA Rhone-Alpes.

LectureshipEarly Career Fellowship

School of Technology, Oxford Brookes University

19/6/2008

Fabio CuzzolinINRIA Rhone-Alpes

Career path

Master’s thesis on gesture recognition at the University of Padova

Visiting student, ESSRL, Washington University in St. Louis, and at the University of California at Los Angeles (2000)

Ph.D. thesis on belief functions and uncertainty theory (2001)

Researcher at Politecnico di Milano with the Image and Sound Processing group (2003-2004)

Post-doc at the University of California at Los Angeles, UCLA Vision Lab (2004-2006)

Marie Curie fellow at INRIA Rhone-Alpes

collaborations with several groups

Scientific production and collaborations

collaborations with journals:

IEEE PAMI IEEE SMC-B

CVIUInformation

FusionInt. J. Approximate

ReasoningPC member for VISAPP, FLAIRS, IMMERSCOM, ISAIMcurrently 4+10 journal papers and 31+8 conference papers

SIPTA

Setubal

CMU

Pompeu Fabra

EPFL-IDIAPUBoston

My background

research

Discrete math

linear independence on lattices and matroids

Uncertainty theory

geometric approach

algebraic analysis

generalized total probability

Machine learning

Manifold learning for dynamical models

Computer vision gesture and action recognition

3D shape analysis and matching

Gait ID

pose estimation

Computer Vision Action and gesture recognition

Laplacian segmentation and matching of 3D shapes

Bilinear models for invariant gaitID

Machine learning Manifold learning for dynamical models

Discrete math

Uncertainty theoryGeometric approach to

measuresGeneralized total probability

Vision applications and developments

Unification of the notion of independence

New directions

HMMs for gesture recognition

transition matrix A -> gesture dynamics

state-output matrix C -> collection of hand poses

Hand poses were represented by “size functions” (BMVC'97)

Gesture classification

…

HMM 1

HMM 2

HMM n

EM to learn HMM parameters from an input sequence

the new sequence is fed to the learnt gesture models

they produce a likelihoodthe most likely model is chosen (if above a threshold)

OR new model is attributed the label of the closest one (using K-L divergence or other distances)

Compositional behavior of HMMs

the model of the action of interest is embedded in the overall model → clustering

• “Cluttered” model for two overlapping

motions

Reduced model for the “fly” gesture after

clustering

Volumetric action recognition2D approaches, feature extracted from images → viewpoint dependencenow available synchronized multi-camera systems Milano, BBC R&D

volumetric approach: features are extracted from volumetric reconstructions of the body (ICIP'04)

• Locally linear embedding to find topological representation of the moving body

3D feature extraction

• Linear discriminant analysis (LDA) to estimate motion direction

• k-means clustering of bodyparts





Discrete math





New directions

Unsupervised coherent 3D segmentation

to recognize actions we need to extract features

segmenting moving articulated 3D bodies into parts

along sequences, in a consistent way

in an unsupervised fashion

robustly, with respect to changes of the topology of the moving body

as a building block of a wider motion analysis and capture framework

ICCV-HM'07, CVPR'08, to submit to IJCV

Clustering after Laplacian embedding

generates a lower-dim, widely separated embedded cloudless sensitive to topology changes than other methodsless expensive then ISOMAP (refs. Jenkins, Chellappa)

rigid part

rigid part

moving joint area

unaffected neighborhoods

unaffected neighborhoods

affected neighborhoods

local neighbourhoods stable under articulated motion

Algorithm

K-wise clustering in the embedding space

Seed propagation along time

To ensure time consistency clusters’ seeds have to be propagated along time

Old positions of clusters in 3D are added to new cloud and embedded

Result: new seeds

Results

Coherent clustering along a sequence

Example of model recovery

Results - 2

handling of topology changes

missing data

Laplacian matching of dense meshes or voxelsets

as embeddings are pose-invariant (for articulated bodies)

they can then be used to match dense shapes by simply aligning their images after embedding

ICCV '07 – NTRL, ICCV '07 – 3dRR, CVPR '08, submitted to ECCV'08, to submit to PAMI

Eigenfunction Histogram assignment

Algorithm:

compute Laplacian embedding of the two shapesfind assignment between eigenfunctions of the two shapesthis selects a section of the embedding spaceembeddings are orthogonally aligned there by EM

Results

Appls: graph matching, protein analysis, motion capture To propagate bodypart segmentation in timeMotion field estimation, action segmentation





Discrete math





New directions

Bilinear models for gait-ID

CSSC bAy

To recognize the identity of humans from their gait (CVPR '06, book chapter in progress)nuisance factors: emotional state, illumination, appearance, view invariance ... (literature: randomized trees)each motion possess several labels: action, identity, viewpoint, emotional state, etc.

bilinear models [Tenenbaum] can be used to separate the influence of “style” and “content” (to classify)

Content classification of unknown style

given a training set in which persons (content=ID) are seen walking from different viewpoints (style=viewpoint)an asymmetric bilinear model can learned from it through SVDwhen new motions are acquired in which a known person is being seen walking from a different viewpoint (unknown style)…

an iterative EM procedure can be set up to classify the content

E step -> estimation of p(c|s), the prob. of the content given the current estimate s of the style M step -> estimation of the linear map for unknown style s

Three layer model

each sequence is encoded as an HMMits C matrix is stacked in a single observation vectora bilinear model is learnt from those vectors

Three-layer model

Features: projections of silhouette's contours onto a line through the center

Results on CMU database

Mobo database: 25 people performing 4 different walking actions, from 6 cameras. T Three labels: action, id, view

Compared performances with “baseline” algorithm and straight k-NN on sequence HMMs





Discrete math





New directions

Learning manifolds of dynamical models

Classify movements represented as dynamical models

for instance, each image sequence can be mapped to an ARMA, or AR linear model, or a HMM

Motion classification then reduces to find a suitable distance function in the space of dynamical models

e.g.: Kullback-Leibler, Fisher metric [Amari]

when some a-priori info is available (training set)..

.. we can learn in a supervised fashion the “best” metric for the classification problem!

To submit to ECCV'08 – MLVMA Workshop

Learning pullback metrics

many algorithms take in input dataset and map it to an embedded space, but fail to learn a full metric (LLE, ISOMAP)

consider than a family of diffeomorphisms F between the original space M and a metric space N

the diffeomorphism F induces on M a pullback metricmaximizing inverse volume finds the manifold which better interpolates the data (geodesics pass through “crowded” regions)

N

k

M

k

k

dmmg

mgDO

1 2

1

2

1

))((det

))((det)(

Pullback metrics - detail

)(

:

mFm

MMF

• DiffeomorphismDiffeomorphism on M:

MTvMTv

MTMTF

mFm

mm

)(

*

'

:

• Push-forwardPush-forward map:

),(),( **)(* vFuFgvug mFm

• Given a metric on M, g:TMTM, the

pullback metricpullback metric is

case of linear maps: Xing and Jordan'02, Shental'02

Space of AR(2) models

given an input sequence, we can identify the parameters of the linear model which better describes itautoregressive models of order 2 AR(2)Fisher metric on AR(2)

Compute the geodesics of the pullback metric on M

21

12

2212121 1

1

)1)(1)(1(

1),(

aa

aa

aaaaaaag

Results on action and ID rec

scalar feature, AR(2) and ARMA models





Discrete math





New directions

assumption: not enough evidence to determine the actual probability describing the problem

second-order distributions (Dirichlet), interval probabilities

credal sets

Uncertainty measures: Intervals, credal sets

Belief functions [Shafer 76]: special case of

credal sets

a number of formalisms have been proposed to extend or replace classical probability

Multi-valued maps and belief functions

suppose you have two different but related problems ...

... that we have a probability distribution for the first one

... and that the two are linked by a map one to many

[Dempster'68, Shafer'76]

the probability P on S induces a belief function

on T

Belief functions as random sets

1)( B

Bmif m is a mass function s.t.

AB

BmAb )(

A

B• belief function b:2

s.t.

• probabilities are additive: if AB= then p(AB)=p(A)+p(B)

• probability on a finite set: function p: 2Θ -> [0,1] with

p(A)=x m(x), where m: Θ -> [0,1] is a mass function

it has the shape of a simplex

IEEE Tr. SMC-C '08, Ann. Combinatorics '06, FSS '06, IS '06, IJUFKS'06

Geometric approach to uncertainty

belief functions can be seen as points of a Cartesian space of dimension 2n-2 belief space: the space of all the belief functions on a given frame

Each subset is a coordinate in this space

how to transform a measure of a certain family into a different uncertainty measure → can be done geometrically

Approximation problem

Probabilities, fuzzy sets, possibilities are special cases of b.f.s

IEEE Tr. SMC-B '07, IEEE Tr. Fuzzy Systems '07, AMAI '08, AI '08, IEEE Tr. Fuzzy Systems '08





Discrete math





New directions

generalization of the total probability theorem

Total belief theorem

introduces Kalman-like filtering for random sets

conditional constraint

a-priori constraint

Graph of all solutions

admissible solution is found by following a path on the graph links to combinatorics and linear systems to submit to JRSS-B

whole graph of candidate solutions





Discrete math





New directions

Model-free pose estimation

estimating the “posepose” (internal configuration) of a moving body from the available images

Qtq k ˆt=0

t=T

if you do not have an a-priori model of the object ..Sun & Torr BMVC'06, Rosales, Urtasun Brand, Grauman ICCV'03, Agarwal

Learning feature-pose maps

... learn a map between features and poses directly from the data

given pose and feature sequences acquired by motion capture ..

q q

y y

1

1

T

T

Q~

• a multi-modal Gaussian density is set up on the feature space• a map from each cluster to the set of poses whose feature values fall inside it (regression functions, EM)

Evidential model

18594

161

38

.. and approximate parameter space ..

.. form the “evidential model”similar to propagation in qualitative Markov treesMTNS'00, ISIPTA'05, to submit to Information Fusion

Information fusion by Dempster’s rule

several aggregation or elicitation operators proposed

original proposal: Dempster’s rule

• b1:

m({a1})=0.7, m({a1 ,a2})=0.3

a1

a2

a3

a4

• b1 b2 :

m({a1}) = 0.7*0.1/0.37 = 0.19

m({a2}) = 0.3*0.9/0.37 = 0.73

m({a1 ,a2}) = 0.3*0.1/0.37 = 0.08

• b2:

m()=0.1, m({a2 ,a3 ,a4})=0.9

Performances

comparison of three models: left view only, right view only, both views

pose estimation yielded by the overall model

estimate associated with the “right” model

ground truth

• “left” model

JPDA with shape info

YX

Z

XY

Z

robustness: clutter does not meet shape constraints

occlusions: occluded targets can be estimated

CDC'02, CDC'04

JPDA model: independent targets shape model: rigid links

Dempster’s fusion

Belief graphical models

what happens when the original probability distribution belongs to a certain class?

In particular: belief functions induced by graphical models?

Imprecise classifiers

application of robust statistics to vision problems

“imprecise” classifiers

class estimate is a belief function or a credal set [Zaffalon, Cozman]

exploit only available evidence, represent ignorance

Credal networks

belief networks or credal networks [Shafer and Shenoy]

at each node a BF or a convex set of probs

similar to generalized belief propagation ...

message passing between nodes representing groups of variables

algorithms to reduce complexity already exist





Discrete math





New directions

independence can be defined in different ways in Boolean algebras, semi-modular lattices, and matroidsBoolean independence is important in uncertainty theory

Boolean independence

tA

example: collection of power sets of the partitions of a given finite set

• a set of sub-algebras {At} of a Boolean algebra B are independent (IB) if

Relation with matroids?

matroid (E, I2E) :I; AI, A’A then A’I;

A1I, A2I, |A2|>|A1| then x A2 s.t. A1{x}I

graphic matroids: dependent sets are circuits

they have significant relationships BUT Boolean independence a form of anti-matroidicity?

BCC'01, BCC'07, ISAIM'08, UNCLOG'08, subm.to Discrete Mathematics

Matroids → paradigm of abstract independence





Discrete math





New directions

A multi-layer frameworkfor human motion analysis

feedbacks act between different layers (e.g. integrated detection, segmentation, classification and pose estimation)

action recognition

action segmentation

multiple views

3D reconstruction

unsupervised body-part segmentation image data fusion

model fitting (stick-

articulated)

motion capture

identity recognitio

n

surveillance

HMI

Spatio-temporal action segmentation

problem: segmenting parts of the video(s) containing “interesting” motions

global approach: working on multidimensional volumemultidimensional volumes previous works: object segmentation on the spatio-temporal volume for single frames [Collins, Natarajan]

idea: in a multi-camera setup, working on 3D clouds (hulls) + motion fields + time = 7D volume

proposal: smoothingsmoothing using tensor voting [Medioni PAMI'05] + shape detectionshape detection on the obtained manifold

Stereo correspondence based on local image structure

problem: finding correspondences between points in different view, using the local structurelocal structure of the image

Markov random fields:Markov random fields: disparity = hidden variableone direction: using local direction of the gradient or structure tensor to help the correspondence [Zucker]

second option: FRAME -> large scale structureslarge scale structures in MRFgeneral potentialpotential for MRFs, local texture for correspondence?

Other developments

3D markerless motion capture

Proposal: data-driven pose estimation based on 3D representations

unsupervised body model learning

shape classification/ recognition in embedding spaces

surveillance in crowded areas: impossible to recover a 3D model

→ information fusion techniques on multiple images

handle conflict between different pieces of evidence

Lectureship Early Career Fellowship School of Technology, Oxford Brookes University 19/6/2008 Fabio Cuzzolin INRIA Rhone-Alpes.

Documents

clustering slide

estimation slide

em slide

action segmentation

pami slide

bodyparts slide

ijcv slide

articulated motion slide