Top Banner
We propose a general-purpose Bayesian network prior of human pose. Fully non-parametric: Estimation of both optimal information-theoretic topology and local conditional distributions from data. Compositional: Eective handling of the combinatorial explosion of articulated objects, thereby improving generalization. Superior performance: Better data representation than traditional global models and parametric networks on the large Human 3.6M dataset. Real-time: Fast and accurate computation of approximate likelihoods on datasets with up to 100k training poses. We compute expected log-likelihoods for our Chow-Liu/CKDE model and several baselines on the Human 3.6M dataset. Figure 1: Samples drawn from a single Chow-Liu/CKDE model. training data test data cluster centers exact evaluations approx. evaluations 4 25 50 10 15 10 10 10 5 10 2 10 0 Exact LL Mean absolute error (nats) Core clusters 4 25 50 0 1.5 5 10 Mean runtime (ms) 82% Approximate LL Approximation with 4 core clusters Our formulation allows to freely combine substructures, but only if they do not share a lot of information. = ) Compositionality exactly where needed and only where appropriate. Perceiving Systems – ps.is.tue.mpg.de 2 Non-parametric Networks “Standing” “Sitting” “Kneeing / Lying” Training samples Samples from model “wave both” ... ... “neutral” “wave left” ... “wave right” ... ... “wave right”: 50% “wave left”: 50% ... Inferred network Table 1: Expected log-likelihoods. Method Graph structure Training Testing Gaussian Global -266.84 -271.15 KDE Global -239.61 -263.77 GPLVM * Global -327.85 -341.89 Independent -352.80 -345.94 Gaussian linear Kinematic chain (order 1) -311.54 -310.98 network Kinematic chain (order 2) -305.54 -307.88 Chow-Liu tree -283.82 -284.03 CKDE network Independent -322.64 -322.25 Kinematic chain (order 1) -260.04 -270.52 Kinematic chain (order 2) -247.35 -263.83 Chow-Liu tree (ours) -242.24 - 254.98 * 25% subsampling; FITC Kinematic chain Mutual information Chow-Liu tree Learning the conditional distributions: We use a conditional kernel density estimate (CKDE) to learn the local models of the inferred tree, p ( X j X pa(j ) ) = p ( X j ,X pa(j ) ) R X j p ( X j ,X pa(j ) ) dX j = P i N (X j ,X pa(j ) ) (X (i) j ,X (i) pa(j ) ),BB > P i N X pa(j ) X (i) pa(j ) , (BB > )| X pa(j ) , where p ( X j ,X pa(j ) ) is an unconditional KDE with isotropic Gaussian kernel and bandwidth B proportional to the square root of the covariance. Important operations are ecient: Computation of a log-likelihood requires O (|V |) KDE evaluations. Ancestral sampling requires O (|V |) samples from the local models. [Gaussian mixture models with non-uniform weight distribution] A Non-parametric Bayesian Network Prior of Human Pose Andreas M. Lehrmann¹, Peter V. Gehler¹, Sebastian Nowozin² MPI for Intelligent Systems¹, Microsoft Research Cambridge² 3D pose dataset Learn topology / local models Non-parametric Bayesian Network Prior of Human Pose Score Test 3D skeletal data e.g., pose estimation 1 Overview 3 Compositionality & Generalization 4 Live Scoring References [1] C. Chow and C. Liu. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 1968. [2] A. Gray and A. Moore. Nonparametric density estimation: Toward computational tractability. SIAM International Conference on Data Mining, 2003. [3] C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments. Technical report, University of Bonn, 2012. Visit Us Learn a sparse and non-parametric Bayesian network B =(p, G (V,E )). Learning the graph structure: Minimize KL-divergence between the high-dimensional pose distribution q (X) and the tree-structured network p(X)= Q |V | j =1 p ( X j X pa(j ) ) , G := argmin pa KL (q (X) k p(X)) = MST(G 0 ), where G 0 is the complete graph with edge weights e jk = c MI(X j ,X k ). Applications in real-time environments require additional speed. Training: Cluster the training points into clusters {C (i) } i using k -means and build a kd-tree for their centres. Testing: Given a test pose x, use the kd-tree to compute a k -NN partitioning {C (i) } i = C e (x) U C a (x) and approximate the likelihood as p (x) (S e + S a ) . (N · det(B )), with S e = X C 2C e X j 2C B -1 x - x (j ) ⌘⌘ , [exact] S a = X C 2C a |C | ( B -1 ( x - C )) , [approx.] where C and |C | denote the centre and size of cluster C , respectively.
1

A Non-parametric Bayesian Network Prior of Human Pose › sebastian › papers › lehrmann2013human... · Non-parametric Bayesian Network Prior of Human Pose Test Score 3D skeletal

Jun 27, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Non-parametric Bayesian Network Prior of Human Pose › sebastian › papers › lehrmann2013human... · Non-parametric Bayesian Network Prior of Human Pose Test Score 3D skeletal

• We propose a general-purpose Bayesian network prior of human pose.

• Fully non-parametric: Estimation of both optimal information-theoretictopology and local conditional distributions from data.

• Compositional: E↵ective handling of the combinatorial explosion ofarticulated objects, thereby improving generalization.

• Superior performance: Better data representation than traditionalglobal models and parametric networks on the large Human 3.6M dataset.

• Real-time: Fast and accurate computation of approximate likelihoods ondatasets with up to 100k training poses.

• We compute expected log-likelihoods for our Chow-Liu/CKDE model andseveral baselines on the Human 3.6M dataset.

Figure 1: Samples drawn from a single Chow-Liu/CKDE model.

training datatest datacluster centersexact evaluationsapprox. evaluations

4 25 5010

−15

10−10

10−5

10−2

100

Exact LL

Me

an

ab

solu

te e

rro

r (n

ats

)

Core clusters4 25 50

0

1.5

5

10

Me

an

ru

ntim

e (

ms)

−82%

Approximate LL

Approximation with 4 core clusters

• Our formulation allows to freely combine substructures, but only if theydo not share a lot of information.

=) Compositionality exactly where needed and only where appropriate.

Perceiving Systems – ps.is.tue.mpg.de

2 Non-parametric Networks

“Standing”! “Sitting”!

“Kneeing / Lying”!

Training samples! Samples from model!

“wave both”!

...!

...!“neutral” !

“wave left” !

...!“wave right”!

...!...!“wave right”: 50% !

“wave left”: 50% !

...!

Inferred network!

Table 1: Expected log-likelihoods.

Method Graph structure Training Testing

Gaussian Global �266.84 �271.15KDE Global �239.61 �263.77GPLVM

*Global �327.85 �341.89

Independent �352.80 �345.94Gaussian linear Kinematic chain (order 1) �311.54 �310.98network Kinematic chain (order 2) �305.54 �307.88

Chow-Liu tree �283.82 �284.03

CKDE network

Independent �322.64 �322.25Kinematic chain (order 1) �260.04 �270.52Kinematic chain (order 2) �247.35 �263.83Chow-Liu tree (ours) �242.24 � 254.98

*25% subsampling; FITC

Kinematic chain Mutual information Chow-Liu tree

• Learning the conditional distributions:

We use a conditional kernel density estimate (CKDE) to learn the local

models of the inferred tree,

p�Xj

��Xpa(j)

�=

p�Xj , Xpa(j)

�RXj

p�Xj , Xpa(j)

�dXj

=

Pi N

⇣(Xj , Xpa(j))

��� (X(i)j , X(i)

pa(j)), BB>⌘

Pi N

⇣Xpa(j)

��� X(i)pa(j), (BB>

)|Xpa(j)

⌘ ,

where p�Xj , Xpa(j)

�is an unconditional KDE with isotropic Gaussian

kernel and bandwidth B proportional to the square root of the covariance.

• Important operations are e�cient:

– Computation of a log-likelihood requires O(|V |) KDE evaluations.

– Ancestral sampling requires O(|V |) samples from the local models.

[Gaussian mixture models with non-uniform weight distribution]

A Non-parametric Bayesian Network Prior of Human Pose Andreas M. Lehrmann¹, Peter V. Gehler¹, Sebastian Nowozin²

MPI for Intelligent Systems¹, Microsoft Research Cambridge²

3D pose dataset

Learn topology / local models

Non-parametric Bayesian Network

Prior of Human Pose

Score Test

3D skeletal data

e.g., pose estimation

1 Overview

3 Compositionality & Generalization

4 Live Scoring

References [1] C. Chow and C. Liu. Approximating discrete probability distributions with dependence trees.

IEEE Transactions on Information Theory, 1968.[2] A. Gray and A. Moore. Nonparametric density estimation: Toward computational tractability.

SIAM International Conference on Data Mining, 2003.[3] C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu. Human3.6M: Large Scale Datasets and

Predictive Methods for 3D Human Sensing in Natural Environments.Technical report, University of Bonn, 2012.

Visit Us

Learn a sparse and non-parametric Bayesian network B = (p,G(V,E)).

• Learning the graph structure:Minimize KL-divergence between the high-dimensional pose distribution

q(X) and the tree-structured network p(X) =

Q|V |j=1 p

�Xj

��Xpa(j)

�,

G := argmin

paKL (q(X) k p(X)) = MST(G0

),

where G0is the complete graph with edge weights ejk =

cMI(Xj , Xk).

• Applications in real-time environments require additional speed.

• Training: Cluster the training points into clusters {C(i)}i usingk-means and build a kd-tree for their centres.

• Testing: Given a test pose x, use the kd-tree to compute a k-NN

partitioning {C(i)}i = Ce(x)U

Ca(x) and approximate the likelihood as

p (x) ⇡ (Se + Sa)

.(N · det(B)),

with

Se =

X

C2Ce

X

j2C

⇣B�1

⇣x� x

(j)⌘⌘

, [exact]

Sa =

X

C2Ca

|C|�B�1

�x� C

��, [approx.]

where C and |C| denote the centre and size of cluster C, respectively.