Lectureship Early Career Fellowship School of Technology, Oxford Brookes University 19/6/2008 Fabio Cuzzolin INRIA Rhone-Alpes
Mar 28, 2015
LectureshipEarly Career Fellowship
School of Technology, Oxford Brookes University
19/6/2008
Fabio CuzzolinINRIA Rhone-Alpes
Career path
Master’s thesis on gesture recognition at the University of Padova
Visiting student, ESSRL, Washington University in St. Louis, and at the University of California at Los Angeles (2000)
Ph.D. thesis on belief functions and uncertainty theory (2001)
Researcher at Politecnico di Milano with the Image and Sound Processing group (2003-2004)
Post-doc at the University of California at Los Angeles, UCLA Vision Lab (2004-2006)
Marie Curie fellow at INRIA Rhone-Alpes
collaborations with several groups
Scientific production and collaborations
collaborations with journals:
IEEE PAMI IEEE SMC-B
CVIUInformation
FusionInt. J. Approximate
ReasoningPC member for VISAPP, FLAIRS, IMMERSCOM, ISAIMcurrently 4+10 journal papers and 31+8 conference papers
SIPTA
Setubal
CMU
Pompeu Fabra
EPFL-IDIAPUBoston
My background
research
Discrete math
linear independence on lattices and matroids
Uncertainty theory
geometric approach
algebraic analysis
generalized total probability
Machine learning
Manifold learning for dynamical models
Computer vision gesture and action recognition
3D shape analysis and matching
Gait ID
pose estimation
Computer Vision Action and gesture recognition
Laplacian segmentation and matching of 3D shapes
Bilinear models for invariant gaitID
Machine learning Manifold learning for dynamical models
Discrete math
Uncertainty theoryGeometric approach to
measuresGeneralized total probability
Vision applications and developments
Unification of the notion of independence
New directions
HMMs for gesture recognition
transition matrix A -> gesture dynamics
state-output matrix C -> collection of hand poses
Hand poses were represented by “size functions” (BMVC'97)
Gesture classification
…
HMM 1
HMM 2
HMM n
EM to learn HMM parameters from an input sequence
the new sequence is fed to the learnt gesture models
they produce a likelihoodthe most likely model is chosen (if above a threshold)
OR new model is attributed the label of the closest one (using K-L divergence or other distances)
Compositional behavior of HMMs
the model of the action of interest is embedded in the overall model → clustering
• “Cluttered” model for two overlapping
motions
Reduced model for the “fly” gesture after
clustering
Volumetric action recognition2D approaches, feature extracted from images → viewpoint dependencenow available synchronized multi-camera systems Milano, BBC R&D
volumetric approach: features are extracted from volumetric reconstructions of the body (ICIP'04)
• Locally linear embedding to find topological representation of the moving body
3D feature extraction
• Linear discriminant analysis (LDA) to estimate motion direction
• k-means clustering of bodyparts
Computer Vision Action and gesture recognition
Laplacian segmentation and matching of 3D shapes
Bilinear models for invariant gaitID
Machine learning Manifold learning for dynamical models
Discrete math
Uncertainty theoryGeometric approach to
measuresGeneralized total probability
Vision applications and developments
Unification of the notion of independence
New directions
Unsupervised coherent 3D segmentation
to recognize actions we need to extract features
segmenting moving articulated 3D bodies into parts
along sequences, in a consistent way
in an unsupervised fashion
robustly, with respect to changes of the topology of the moving body
as a building block of a wider motion analysis and capture framework
ICCV-HM'07, CVPR'08, to submit to IJCV
Clustering after Laplacian embedding
generates a lower-dim, widely separated embedded cloudless sensitive to topology changes than other methodsless expensive then ISOMAP (refs. Jenkins, Chellappa)
rigid part
rigid part
moving joint area
unaffected neighborhoods
unaffected neighborhoods
affected neighborhoods
local neighbourhoods stable under articulated motion
Algorithm
K-wise clustering in the embedding space
Seed propagation along time
To ensure time consistency clusters’ seeds have to be propagated along time
Old positions of clusters in 3D are added to new cloud and embedded
Result: new seeds
Results
Coherent clustering along a sequence
Example of model recovery
Results - 2
handling of topology changes
missing data
Laplacian matching of dense meshes or voxelsets
as embeddings are pose-invariant (for articulated bodies)
they can then be used to match dense shapes by simply aligning their images after embedding
ICCV '07 – NTRL, ICCV '07 – 3dRR, CVPR '08, submitted to ECCV'08, to submit to PAMI
Eigenfunction Histogram assignment
Algorithm:
compute Laplacian embedding of the two shapesfind assignment between eigenfunctions of the two shapesthis selects a section of the embedding spaceembeddings are orthogonally aligned there by EM
Results
Appls: graph matching, protein analysis, motion capture To propagate bodypart segmentation in timeMotion field estimation, action segmentation
Computer Vision Action and gesture recognition
Laplacian segmentation and matching of 3D shapes
Bilinear models for invariant gaitID
Machine learning Manifold learning for dynamical models
Discrete math
Uncertainty theoryGeometric approach to
measuresGeneralized total probability
Vision applications and developments
Unification of the notion of independence
New directions
Bilinear models for gait-ID
CSSC bAy
To recognize the identity of humans from their gait (CVPR '06, book chapter in progress)nuisance factors: emotional state, illumination, appearance, view invariance ... (literature: randomized trees)each motion possess several labels: action, identity, viewpoint, emotional state, etc.
bilinear models [Tenenbaum] can be used to separate the influence of “style” and “content” (to classify)
Content classification of unknown style
given a training set in which persons (content=ID) are seen walking from different viewpoints (style=viewpoint)an asymmetric bilinear model can learned from it through SVDwhen new motions are acquired in which a known person is being seen walking from a different viewpoint (unknown style)…
an iterative EM procedure can be set up to classify the content
E step -> estimation of p(c|s), the prob. of the content given the current estimate s of the style M step -> estimation of the linear map for unknown style s
Three layer model
each sequence is encoded as an HMMits C matrix is stacked in a single observation vectora bilinear model is learnt from those vectors
Three-layer model
Features: projections of silhouette's contours onto a line through the center
Results on CMU database
Mobo database: 25 people performing 4 different walking actions, from 6 cameras. T Three labels: action, id, view
Compared performances with “baseline” algorithm and straight k-NN on sequence HMMs
Computer Vision Action and gesture recognition
Laplacian segmentation and matching of 3D shapes
Bilinear models for invariant gaitID
Machine learning Manifold learning for dynamical models
Discrete math
Uncertainty theoryGeometric approach to
measuresGeneralized total probability
Vision applications and developments
Unification of the notion of independence
New directions
Learning manifolds of dynamical models
Classify movements represented as dynamical models
for instance, each image sequence can be mapped to an ARMA, or AR linear model, or a HMM
Motion classification then reduces to find a suitable distance function in the space of dynamical models
e.g.: Kullback-Leibler, Fisher metric [Amari]
when some a-priori info is available (training set)..
.. we can learn in a supervised fashion the “best” metric for the classification problem!
To submit to ECCV'08 – MLVMA Workshop
Learning pullback metrics
many algorithms take in input dataset and map it to an embedded space, but fail to learn a full metric (LLE, ISOMAP)
consider than a family of diffeomorphisms F between the original space M and a metric space N
the diffeomorphism F induces on M a pullback metricmaximizing inverse volume finds the manifold which better interpolates the data (geodesics pass through “crowded” regions)
N
k
M
k
k
dmmg
mgDO
1 2
1
2
1
))((det
))((det)(
Pullback metrics - detail
)(
:
mFm
MMF
• DiffeomorphismDiffeomorphism on M:
MTvMTv
MTMTF
mFm
mm
)(
*
'
:
• Push-forwardPush-forward map:
),(),( **)(* vFuFgvug mFm
• Given a metric on M, g:TMTM, the
pullback metricpullback metric is
case of linear maps: Xing and Jordan'02, Shental'02
Space of AR(2) models
given an input sequence, we can identify the parameters of the linear model which better describes itautoregressive models of order 2 AR(2)Fisher metric on AR(2)
Compute the geodesics of the pullback metric on M
21
12
2212121 1
1
)1)(1)(1(
1),(
aa
aa
aaaaaaag
Results on action and ID rec
scalar feature, AR(2) and ARMA models
Computer Vision Action and gesture recognition
Laplacian segmentation and matching of 3D shapes
Bilinear models for invariant gaitID
Machine learning Manifold learning for dynamical models
Discrete math
Uncertainty theoryGeometric approach to
measuresGeneralized total probability
Vision applications and developments
Unification of the notion of independence
New directions
assumption: not enough evidence to determine the actual probability describing the problem
second-order distributions (Dirichlet), interval probabilities
credal sets
Uncertainty measures: Intervals, credal sets
Belief functions [Shafer 76]: special case of
credal sets
a number of formalisms have been proposed to extend or replace classical probability
Multi-valued maps and belief functions
suppose you have two different but related problems ...
... that we have a probability distribution for the first one
... and that the two are linked by a map one to many
[Dempster'68, Shafer'76]
the probability P on S induces a belief function
on T
Belief functions as random sets
1)( B
Bmif m is a mass function s.t.
AB
BmAb )(
A
B• belief function b:2
s.t.
• probabilities are additive: if AB= then p(AB)=p(A)+p(B)
• probability on a finite set: function p: 2Θ -> [0,1] with
p(A)=x m(x), where m: Θ -> [0,1] is a mass function
it has the shape of a simplex
IEEE Tr. SMC-C '08, Ann. Combinatorics '06, FSS '06, IS '06, IJUFKS'06
Geometric approach to uncertainty
belief functions can be seen as points of a Cartesian space of dimension 2n-2 belief space: the space of all the belief functions on a given frame
Each subset is a coordinate in this space
how to transform a measure of a certain family into a different uncertainty measure → can be done geometrically
Approximation problem
Probabilities, fuzzy sets, possibilities are special cases of b.f.s
IEEE Tr. SMC-B '07, IEEE Tr. Fuzzy Systems '07, AMAI '08, AI '08, IEEE Tr. Fuzzy Systems '08
Computer Vision Action and gesture recognition
Laplacian segmentation and matching of 3D shapes
Bilinear models for invariant gaitID
Machine learning Manifold learning for dynamical models
Discrete math
Uncertainty theoryGeometric approach to
measuresGeneralized total probability
Vision applications and developments
Unification of the notion of independence
New directions
generalization of the total probability theorem
Total belief theorem
introduces Kalman-like filtering for random sets
conditional constraint
a-priori constraint
Graph of all solutions
admissible solution is found by following a path on the graph links to combinatorics and linear systems to submit to JRSS-B
whole graph of candidate solutions
Computer Vision Action and gesture recognition
Laplacian segmentation and matching of 3D shapes
Bilinear models for invariant gaitID
Machine learning Manifold learning for dynamical models
Discrete math
Uncertainty theoryGeometric approach to
measuresGeneralized total probability
Vision applications and developments
Unification of the notion of independence
New directions
Model-free pose estimation
estimating the “posepose” (internal configuration) of a moving body from the available images
Qtq k ˆt=0
t=T
if you do not have an a-priori model of the object ..Sun & Torr BMVC'06, Rosales, Urtasun Brand, Grauman ICCV'03, Agarwal
Learning feature-pose maps
... learn a map between features and poses directly from the data
given pose and feature sequences acquired by motion capture ..
q q
y y
1
1
T
T
Q~
• a multi-modal Gaussian density is set up on the feature space• a map from each cluster to the set of poses whose feature values fall inside it (regression functions, EM)
Evidential model
18594
161
38
.. and approximate parameter space ..
.. form the “evidential model”similar to propagation in qualitative Markov treesMTNS'00, ISIPTA'05, to submit to Information Fusion
Information fusion by Dempster’s rule
several aggregation or elicitation operators proposed
original proposal: Dempster’s rule
• b1:
m({a1})=0.7, m({a1 ,a2})=0.3
a1
a2
a3
a4
• b1 b2 :
m({a1}) = 0.7*0.1/0.37 = 0.19
m({a2}) = 0.3*0.9/0.37 = 0.73
m({a1 ,a2}) = 0.3*0.1/0.37 = 0.08
• b2:
m()=0.1, m({a2 ,a3 ,a4})=0.9
Performances
comparison of three models: left view only, right view only, both views
pose estimation yielded by the overall model
estimate associated with the “right” model
ground truth
• “left” model
JPDA with shape info
YX
Z
XY
Z
robustness: clutter does not meet shape constraints
occlusions: occluded targets can be estimated
CDC'02, CDC'04
JPDA model: independent targets shape model: rigid links
Dempster’s fusion
Belief graphical models
what happens when the original probability distribution belongs to a certain class?
In particular: belief functions induced by graphical models?
Imprecise classifiers
application of robust statistics to vision problems
“imprecise” classifiers
class estimate is a belief function or a credal set [Zaffalon, Cozman]
exploit only available evidence, represent ignorance
Credal networks
belief networks or credal networks [Shafer and Shenoy]
at each node a BF or a convex set of probs
similar to generalized belief propagation ...
message passing between nodes representing groups of variables
algorithms to reduce complexity already exist
Computer Vision Action and gesture recognition
Laplacian segmentation and matching of 3D shapes
Bilinear models for invariant gaitID
Machine learning Manifold learning for dynamical models
Discrete math
Uncertainty theoryGeometric approach to
measuresGeneralized total probability
Vision applications and developments
Unification of the notion of independence
New directions
independence can be defined in different ways in Boolean algebras, semi-modular lattices, and matroidsBoolean independence is important in uncertainty theory
Boolean independence
tA
example: collection of power sets of the partitions of a given finite set
• a set of sub-algebras {At} of a Boolean algebra B are independent (IB) if
Relation with matroids?
matroid (E, I2E) :I; AI, A’A then A’I;
A1I, A2I, |A2|>|A1| then x A2 s.t. A1{x}I
graphic matroids: dependent sets are circuits
they have significant relationships BUT Boolean independence a form of anti-matroidicity?
BCC'01, BCC'07, ISAIM'08, UNCLOG'08, subm.to Discrete Mathematics
Matroids → paradigm of abstract independence
Computer Vision Action and gesture recognition
Laplacian segmentation and matching of 3D shapes
Bilinear models for invariant gaitID
Machine learning Manifold learning for dynamical models
Discrete math
Uncertainty theoryGeometric approach to
measuresGeneralized total probability
Vision applications and developments
Unification of the notion of independence
New directions
A multi-layer frameworkfor human motion analysis
feedbacks act between different layers (e.g. integrated detection, segmentation, classification and pose estimation)
action recognition
action segmentation
multiple views
3D reconstruction
unsupervised body-part segmentation image data fusion
model fitting (stick-
articulated)
motion capture
identity recognitio
n
surveillance
HMI
Spatio-temporal action segmentation
problem: segmenting parts of the video(s) containing “interesting” motions
global approach: working on multidimensional volumemultidimensional volumes previous works: object segmentation on the spatio-temporal volume for single frames [Collins, Natarajan]
idea: in a multi-camera setup, working on 3D clouds (hulls) + motion fields + time = 7D volume
proposal: smoothingsmoothing using tensor voting [Medioni PAMI'05] + shape detectionshape detection on the obtained manifold
Stereo correspondence based on local image structure
problem: finding correspondences between points in different view, using the local structurelocal structure of the image
Markov random fields:Markov random fields: disparity = hidden variableone direction: using local direction of the gradient or structure tensor to help the correspondence [Zucker]
second option: FRAME -> large scale structureslarge scale structures in MRFgeneral potentialpotential for MRFs, local texture for correspondence?
Other developments
3D markerless motion capture
Proposal: data-driven pose estimation based on 3D representations
unsupervised body model learning
shape classification/ recognition in embedding spaces
surveillance in crowded areas: impossible to recover a 3D model
→ information fusion techniques on multiple images
handle conflict between different pieces of evidence