Top Banner
Learning Spa+otemporal Graphs of Human Ac+vi+es Sinisa Todorovic William Brendel
30

Iccv2011 learning spatiotemporal graphs of human activities

May 19, 2015

Download

Technology

zukun
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Iccv2011 learning spatiotemporal graphs of human activities

Learning  Spa+otemporal  Graphs  of  Human  Ac+vi+es  

Sinisa  Todorovic  William  Brendel  

Page 2: Iccv2011 learning spatiotemporal graphs of human activities

Our Goal Long Jump Triple Jump

• Recognize all occurrences of activities •  Identify the start and end frames •  Parse the video and find all subactivities •  Localize actors and objects involved

Page 3: Iccv2011 learning spatiotemporal graphs of human activities

Weakly Supervised Setting

In  training:          >  ONLY  class  labels      

Domain  knowledge  of  temporal  structure:          >  NOT  AVAILABLE  

Weight Lifting Large-Box Lifting

Page 4: Iccv2011 learning spatiotemporal graphs of human activities

Learning What and How

Weak  supervision  in  training      

Need  to  learn  from  training  videos:    

What  ac+vity  parts  are  relevant    

How  relevant  they  are  for  recogni+on  

Page 5: Iccv2011 learning spatiotemporal graphs of human activities

Prior Work vs. Our Approach Typically, focus only on HOW

gap

semantic level

raw video

features

model

Page 6: Iccv2011 learning spatiotemporal graphs of human activities

Prior Work vs. Our Approach

semantic level

raw video

Typically...

mid-level features

semantic level

raw video

features

model model gap

Page 7: Iccv2011 learning spatiotemporal graphs of human activities

Prior Work – Video Representation

•  Space-­‐+me  points  –  Laptev  &  Schmid  08,  Niebles  &  Fei-­‐Fei  08,  …  

•  S+ll  human  postures  –  SoaLo  07,  Ning  &  Huang  08,  …  

•  Ac+on  templates  –  Yao  &  Zhu  09,  …  

•  Point  tracks  –  Sukthankar  &  Hebert  10,  …  

 

Page 8: Iccv2011 learning spatiotemporal graphs of human activities

Our Features: 2D+t Tubes

Sukthankar & Hebert 07,

Gorelick & Irani 08,

Pritch & Peleg 08, ...

•  Allow  simpler:  -­‐  Modeling  -­‐  Learning  (few  examples)    -­‐  Inference  

   

•  We  are  the  first  to  use  2D+t  tubes  for  building  a  sta+s+cal  model  of  ac+vi+es  

Page 9: Iccv2011 learning spatiotemporal graphs of human activities

Our Features: 2D+t Tubes

•  Allow  simpler:  -­‐  Modeling  -­‐  Learning  (few  examples)    -­‐  Inference  

   

•  We  use  2D+t  tubes  for  building  a  sta+s+cal  genera+ve  model  of  ac+vi+es  

Sukthankar & Hebert 07,

Gorelick & Irani 08,

Pritch & Peleg 08, ...

Page 10: Iccv2011 learning spatiotemporal graphs of human activities

Prior Work – Activity Representation

•  Graphical  models,  Grammars  -­‐  Ivanov  &  Bobick  00  -­‐  Xiang  &  Gong  06  -­‐  Ryoo  &  Aggawal  09  -­‐  Gupta  &  Davis  09  -­‐  Liu  &  Zhu  09  -­‐  Niebles  &  Fei-­‐Fei  10  -­‐  Lan  et  al.  11    

 

•  Probabilis+c  first-­‐order  logic  -­‐  Tran  &  Davis  08  -­‐  Albanese  et  al.  10  -­‐  Morariu  &  Davis  11  -­‐  Brendel  et  al.  11...  

Page 11: Iccv2011 learning spatiotemporal graphs of human activities

Approach

       Input                                    Spa+otemporal                                              Ac+vity                Recogni+on          Video                                                    Graph                                                                Model                  Localiza+on  

Page 12: Iccv2011 learning spatiotemporal graphs of human activities

Blocky Video Segmentation

Page 13: Iccv2011 learning spatiotemporal graphs of human activities

Activity as a Spatiotemporal Graph

Descriptors of nodes and edges:

•  Node descriptors: F - Motion

- Object shape

•  Adjacency Matrices: {Ai} - Allen temporal relations

- Spatial relations

- Compositional relations

Page 14: Iccv2011 learning spatiotemporal graphs of human activities

Activity as Segmentation Graph

G = (V, E, "descriptors") = (F, {A1, ..., An})

node descriptors

adjacency matrices of distinct relations between the tubes

Page 15: Iccv2011 learning spatiotemporal graphs of human activities

Activity Graph Model

compositional

+ spatial temporal

+ * * *

model node descriptors mixture weights

Probabilistic Graph Mixture

model adjacency matrices

Page 16: Iccv2011 learning spatiotemporal graphs of human activities

Activity Model

An  ac+vity  instance: G = (F, {A1,..., An})

Model adjacency matrices

Edge type: i =1, 2,..., n

Page 17: Iccv2011 learning spatiotemporal graphs of human activities

Activity Model

An  ac+vity  instance: G = (F, {A1,..., An})

Model adjacency matrices

Edge type: i =1, 2,..., n

Model matrix of node descriptors

Page 18: Iccv2011 learning spatiotemporal graphs of human activities

Inference

       Input                                    Spa+otemporal                                              Ac+vity                Recogni+on          Video                                                    Graph                                                                Model                  Localiza+on  

Page 19: Iccv2011 learning spatiotemporal graphs of human activities

Inference = Robust Least Squares

Goal:    

• For  every  ac+vity  model  

• Es+mate  the  permuta+on  matrix  

subject to

Page 20: Iccv2011 learning spatiotemporal graphs of human activities

Learning the Activity Graph Model

Training  videos  →  Training  graphs  →  Graph  model      

Page 21: Iccv2011 learning spatiotemporal graphs of human activities

Learning

             Adjacency  matrix                            Node  descriptor  

Edge type: i =1, 2,..., n

Given K training graphs,

Page 22: Iccv2011 learning spatiotemporal graphs of human activities

Learning

Model parameters

Given K training graphs, ESTIMATE

             Adjacency  matrix                                Node  descriptor  

Page 23: Iccv2011 learning spatiotemporal graphs of human activities

Learning

Given K training graphs, ESTIMATE

             Adjacency  matrix                                Node  descriptor  

Permutation matrix

Page 24: Iccv2011 learning spatiotemporal graphs of human activities

Learning = Robust Least Squares

Es4mate:   and  

Given  K  Training  graphs:  

Page 25: Iccv2011 learning spatiotemporal graphs of human activities

Learning = Structural EM

Estimatation of model parametrs

Estimation of permutation matrices

E-step à expected model structure

M-step à matching of the training graphs and model

Page 26: Iccv2011 learning spatiotemporal graphs of human activities

Learning Results

Correctly  learned  ac+vity-­‐characteris+c  tubes    

Page 27: Iccv2011 learning spatiotemporal graphs of human activities

Recognition and Segmentation

Ac+vity  “handshaking”  Detected  and  segmented  characteris+c  tube  

 

Page 28: Iccv2011 learning spatiotemporal graphs of human activities

Recognition and Segmentation

Ac+vity  “kicking”  Detected  and  segmented  characteris+c  tube  

 

Page 29: Iccv2011 learning spatiotemporal graphs of human activities

Classification on UTexas Dataset

Human  interac+on  ac+vi+es  [18]  Ryoo  et  al.  ’10  

Page 30: Iccv2011 learning spatiotemporal graphs of human activities

Conclusion

•  Fast spatiotemporal segmentation

•  New activity representation = graph model

•  Unified learning and inference = Least squares

•  Learning under weak supervision:

- WHAT activity parts are relevant and

- HOW relevant they are for recognition