A Hierarchical Model of Shape and Appearance for Human Action …vision.stanford.edu › posters › NieblesFeiFei_CVPR07_poster.pdf · 2009-05-06 · Bg Bg Appearance A j Bg Shape

A Hierarchical Model of Shape and Appearance forHuman Action Classification

Ref: J.C. Niebles & L. Fei-Fei. A hierarchical model of shape and appearance for human action classification. CVPR 2007. Minneapolis, USA.

Highlights and Summary• A novel model for human action categorization from video sequences.• Our model can be characterized as a constellation of bags-of-features.• Use of hybrid features: combines both static shape and spatio-temporal features.

Hybrid featuresStatic features Dynamic features

Originalframe

Feature detection

Canny edge detector

Spatio-temporal interest pointdetector[Dollár et al. 2005]

Feature description

Shape context[Belongie et al. 2000]

Concatenated brightness gradient

codebookcodebook

Membership assignment

video frame: w = x,awi = xi,ai

xi: i-th feature positionai: i-th feature appearance

Final Representation

Learning

Estimate model parameters using EM

Ω=

==

K

K

1

1 ,,,,, 00,,,,

ω

θθθθ ωωωωω PpAXA

p

X

pLL ΣΣµ

SVM

action class

( )( )

( )

Cclassframep

classframep

classframep

2

1

M

Conclusions• The constellation of bags-of-features is able to capture semantic information of human action classes.• Combines hybrid features: static shape features and dynamic motion features.• Capable of classifying in both frame based and video based manner.

1 University of Illinois at Urbana – Champaign

3 Princeton University2 Universidad del Norte, [email protected]

Juan Carlos Niebles 1,2

[email protected]

Li Fei-Fei 1,3

Algorithm

New sequence

Feature extraction and description

Decide on best model

Recognition

Class 1

Class N

...

Feature extraction

Model 1

Model N

Feature extraction and description Learning

Video frame representation

Input videosequences

Learn a modelfor each class

...

...

Form Codebook

Class 1

Class N

Previous Works

Partlayer

Feature Layer

constellation

Small number of

features

Strong shape

representation

bag-of-features

Large number of

features

No geometrical or

shape information

w

P3

P1 P2

P4

Bg

w

Hierarchical model

Image

Partlayer

Featurelayer

Mixturecomponents

w

P3

P1 P2

P4

Bg

θω

Action Modelsmixture components

bend

ω = 1 ω = 2 ω = 3

jack

run

wave1

jump

pjump

wave2

side

walk

http://vision.cs.princeton.edu

[Weber et al. 2000,

Csurka et al. 2004,

Sudderth et al. 2006]

( )

( ) ( ) ( )∑ ∑Ω

= ∈

∗

≈

1

,,,,

,

ωωωωω θθθπ

Hure layerLocal featPart layer

ppp

p

h

hmYwhYh

θYw

44 344 2143421

Approximated data likelihood:

( ) ( )( )ωωωθ ,, ,|,| LLNp ΣµhYhYT

=

Part layer term:

( )

( ) ( ) ( ) ( )∏ ∏∏= ∈∈

∗ =

P

p PAppearancePart

A

pi

ShapePart

X

pp

r

i

BgAppearanceBg

A

j

ShapeBg

Xr

j

pij

aphxpapxp

p

1

0

0 ,,

,,,

ww

Y

θhmYw

4342144 344 214342143421θθθθ

ω

Local feature layer term:

• Large number of features from the bag-of-features model• Strong shape representation from the constellation model.

Experimental Results• 9 action classes,performed by 9subjects [Blank et al 2005]• Leave one outcross-validation• Video Classificationperformance: 72.8%

( )( ) ( ) ( )

( )old

oldoldold

old

p

ppp

p

ω

ωωωω

θ

θθθπ

θω

Yw

mhYwhhY

Ywh

,

,,,,

,,,

∗

≈

• E-step:

( ) ( )θωθωθθ

,,,ln,,,maxarg hYwYwh

h

ppoldnew ∑=

• M-Step:

• Classify actions in both frame based and video based manner• Video classification based on majority votes of frames

Recognition

A Hierarchical Model of Shape and Appearance for Human Action …vision.stanford.edu › posters › NieblesFeiFei_CVPR07_poster.pdf · 2009-05-06 · Bg Bg Appearance A j Bg Shape

Documents