IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...aravind/documents/sundaresan08-LE.pdf · 2 ieee transactions on pattern analysis and machine intelligence, vol. 30, no. 10, october

Model-Driven Segmentation of ArticulatingHumans in Laplacian Eigenspace

Aravind Sundaresan, Student Member, IEEE, and Rama Chellappa, Fellow, IEEE

Abstract—We propose a general approach using Laplacian Eigenmaps and a graphical model of the human body to segment 3Dvoxel data of humans into different articulated chains. In the bottom-up stage, the voxels are transformed into a high-dimensional (6Dor less) Laplacian Eigenspace (LE) of the voxel neighborhood graph. We show that the LE is effective at mapping voxels on longarticulated chains to nodes on smooth 1D curves that can be easily discriminated, and we prove these properties using representativegraphs. We fit 1D splines to voxels belonging to different articulated chains such as the limbs, head, and trunk, and we determine theboundary between splines by thresholding the spline fit error, which is high at junctions. A top-down probabilistic approach is then usedto register the segmented chains, utilizing both their mutual connectivity and their individual properties such as length and thickness.Our approach enables us to deal with complex poses such as those where the limbs form loops. We use the segmentation results toautomatically estimate the human body models. Although we use human subjects in our experiments, the method is fairly general andcan be applied to voxel-based registration of any articulated object, which is composed of long chains. We present results on real andsynthetic data that illustrate the usefulness of this approach.

Index Terms—Pattern recognition, image processing, computer vision, segmentation, graph-theoretic methods, region growing,

partitioning, object recognition.

Ç

1 INTRODUCTION

HUMAN motion capture and analysis has applications indifferent fields such as kinesiology, biomechanics,

surveillance, human-computer interaction, animation, andvideogames. There is a correspondingly large body ofliterature on human motion analysis and pose estimationfrom video data. However, the requirements in terms ofthe detail of pose parameters and accuracy in estimationvary from application to application, as does the form ofthe available input data. Surveillance applications, forinstance, usually require just the location of the subject oran approximate estimate of human pose from a singlevideo stream, whereas biomechanical applications requireaccurate pose estimates of different joint angles fromimages obtained using multiple video cameras. The mostcommon methods for accurate capture of 3D humanmovement require attachment of markers, fixtures, orsensors to body segments. These methods are invasive;i.e., they encumber the subject, hinder movement, andrequire subject preparation time. Biomechanical andclinical applications [1], [2] require the accurate captureof normal and pathological human movement without theartifacts associated with current state-of-the-art marker-based motion capture techniques. A markerless motioncapture system using multiple video streams thereforepossesses several advantages over marker-based systemsand is highly desirable.

There are a number of algorithms for estimating the poseof human subjects from both single and multiple cameras.Some of these methods are applied directly on images,whereas others are applied to 3D volumetric data or voxelsthat can be computed from image silhouettes. Estimatingthe 3D pose of an articulated object using images from asingle camera is an inherently difficult task due to selfocclusion and the ill-posed nature of estimating 3Dstructure and motion from 2D images. Such methodstypically address applications such as surveillance orhuman computer interaction and perform only rough poseestimation. In order to accurately estimate the 3D joint angleparameters required in biomechanical and clinical applica-tions, it is preferable to use 3D input data such as voxels.Voxels can be computed from 3D mesh data obtained fromlaser scanners and images obtained from multiple cali-brated cameras, as illustrated in Figs. 1a and 1b and serve asan input-layer abstraction. A popular class of markerlessmotion capture algorithms [3], [4], [5] uses voxel data toestimate the human body model parameters and pose. Oneof the key steps in such a process is to segment voxels asbelonging to different body segments. Most algorithmstypically use a human body model to guide the poseestimation process, as the use of a model greatly increasesthe accuracy and robustness of the algorithm. It is thereforenecessary to obtain the parameters of the human bodymodel as well.

The human body can be visualized as six articulatedchains connected at joints, as illustrated in Fig. 2. Thesegments labeled b1, b2, b3, b4, b5, and b6 correspond to thetrunk, head, left arm, right arm, left leg, and right leg,respectively. We propose a novel bottom-up method toperform segmentation of the 3D voxel structure intocomponent articulated chains by mapping the voxels tothe Laplacian Eigenspace (LE) [6]. Having used a bottom-upapproach to perform segmentation, we then use a top-downapproach, using our knowledge of the structure of the

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 10, OCTOBER 2008 1

. A. Sundaresan is with the Artificial Intelligence Center, SRI International,333 Ravenswood Ave., Menlo Park, CA 94025. E-mail: [email protected].

. R. Chellappa is with the Center for Automation Research, University ofMaryland, 4411 A.V. Williams Building, College Park, MD 20742-3275.E-mail: [email protected].

Manuscript received 26 Sept. 2006; revised 21 June 2007; accepted 1 Oct.2007; published online 6 Nov. 2007.Recommended for acceptance by C. Kambhamettu.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TPAMI-0685-0906.

0162-8828/08/$25.00 � 2008 IEEE Published by the IEEE Computer Society

human body, to register each chain to the human bodymodel (bi in Fig. 2a) and estimate the human body modelparameters and pose. The block diagram illustrating thesteps in our algorithm is presented in Fig. 1. The novelty ofour algorithm lies in exploiting the properties of theLaplacian Eigenmap transformation, as a consequence ofwhich voxels belonging to each nonrigid chain are mappedto a smooth 1D curve in the LE. We can then performsegmentation by fitting 1D splines to the voxels in the LE.Although the Laplacian Eigenmap (and other manifoldmethods) have been used before for dimensionality reduc-tion, we present properties of the Laplacian Eigenmaps thatmake the mapping ideal for segmentation, as compared toother manifold methods such as Isomap. The spline fitprocedure implicitly computes the position of each voxelalong the 1D chain and can be used to estimate the skeletonof the body segment in 3D space. We then use a graphicalmodel of the human body (Fig. 2a) in a probabilisticframework to identify the segmented body chains andresolve possible ambiguities in registration. Our algorithmis able to perform segmentation and pose estimation forsimple poses (Fig. 2b) and complex poses (Fig. 2c), whereother manifold methods fail. The segmentation and prob-abilistic registration algorithm was introduced by Sundar-esan and Chellappa [6]. We use the output of thesegmentation and registration algorithm for a set of key-frames to automatically estimate the parameters of thehuman body model. We note that although pose estimationmay not work on all frames, it can be used to estimate themodel and initialize the pose for a tracking algorithm suchas the one presented by Sundaresan and Chellappa [7].

The organization of the paper is given as follows: Wediscuss related pose estimation and tracking work inSection 2. We describe the mapping to the LE and keyproperties of the Laplacian Eigenvectors of representativegraphs and compare it with similar techniques in Section 3.We then present the algorithm for segmentation in the LEusing splines in Section 4. The probabilistic registrationmethod is described in Section 5. The application of seg-mentation and registration steps in designing a completely

automatic human body model estimation algorithm isdescribed in Section 6 with examples. We present the resultsof the human body model estimation algorithm for realvideo sequences, where the image silhouettes were used toobtain the voxel data, a synthetic sequence, where themotion of an animated model was used to obtain the voxeldata, and 3D laser scans for which the voxel data wasobtained from a mesh model, in Section 7. The virtual dataset allows us to compare the pose estimation results withactual parameters, while the real data illustrates that thealgorithm can be easily applied to real imperfect data in acompletely automatic manner.

2 RELATED WORK

Gavrila [8], Aggarwal and Cai [9], Moeslund and Granum[10], and, more recently, Sigal and Black [11] providesurveys of human motion tracking and analysis methods. Alarge number of pose estimation algorithms use a singleimage or a single video sequence to estimate the pose of thesubject or use simplified models. Several pose trackingalgorithms also assume that the initial pose is known.Although we list pose estimation algorithms that use asingle camera, we concentrate on related work that estimate3D pose by using images obtained from multiple cameras.The accuracy and robustness of these algorithms vary, asdoes the suitability of the algorithms for different applica-tions. There are several methods [12], [13], [14], [15] forestimating the pose from a single view, whereas [3], [4], [16],[17], and [18] estimate the pose from multiple views.Specifically, the algorithms in [3], [4], and [17] estimatethe pose from voxel representations. Carranza et al. [18]describe a system that uses a multiview synchronized videofootage of an actor’s performance to estimate the motionparameters and to interactively rerender the actor’s appear-ance from any viewpoint. Chu et al. [3] describe a methodfor pose estimation by using Isomaps [19] to transform thevoxels to its pose-invariant intrinsic space representationand obtain a skeleton representation. Cheung et al. [17]extend shape-from-silhouette methods to articulated ob-jects. Given silhouettes of a moving articulated object, theypropose an iterative algorithm to solve the simultaneousassignment of silhouette points to a body part andalignment of the body part. These methods work well withposes such as those in Fig. 2a, but they are usually unable tohandle poses where there is self-contact (Fig. 2c); i.e., one ormore of the limbs touches the others. Anguelov et al. [20]describe an algorithm that automatically decomposes anobject into approximately rigid parts and obtains theirlocation and underlying articulated structure, given a set ofmeshes describing the object in different poses. They use an

2 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 10, OCTOBER 2008

Fig. 1. Block diagram describing the steps in the segmentation in the Laplacian Eigenspace (LE) to estimate the human body model. (a) Spacecarving to compute voxels. (b) Map to LE. (c) Bottom-up segmentation in LE. (d) Top-down registration. (e) Top-down model. (f) Pose estimation.

Fig. 2. (a) Graph model of human body model comprising six articulatedchains. (b) Pose 1. (c) Pose 2.

unsupervised nonrigid technique to register the meshes andperform segmentation using the EM algorithm. Krahnsto-ever and Sharma [21] address the issue of acquiringarticulated models directly from monocular video. Thestructure, shape, and appearance of articulated models areestimated, but as only a single camera is used, the estimated3D human body models are not accurate, and hence, thismethod is limited in its application.

Algorithms that estimate the complete human bodymodel from multiple views are presented in [4] and [16].Miki�c et al. [4] propose a model acquisition algorithm usingvoxels, which starts with a simple body part localizationprocedure based on fitting and growing templates and usesprior knowledge of shapes and dimensions of average bodyparts. Kakadiaris and Metaxas [16] present a Human BodyPart Identification Strategy (HBPIS) that recovers all thebody parts of a moving human based on the spatiotemporalanalysis of its deforming silhouette by using input fromthree mutually orthogonal views. However, they specify aprotocol of movements that the subject is required to gothrough.

Our segmentation algorithm can also be viewed as askeletonization algorithm that obtains the skeletons of theindividual articulated chains. Thus, it can also be comparedto [22], which uses voxel data to estimate a novel skeletonrepresentation. We model the human body as a set of rigidbody segments that are connected to each other at specificjoints forming kinematic chains originating from the trunk.Badler et al. [23] suggest several methods to representhuman subjects in terms of the shape and the articulatedstructure. We find that using modified superquadrics torepresent shapes [24] is reasonably accurate for ourpurposes, although our approach can accommodate moresophisticated mesh models if the data is accurate enough.Belkin and Niyogi [25] describe the construction of arepresentation for data lying in a low-dimensional manifoldembedded in a high-dimensional space and use theLaplacian Eigenmaps for dimensionality reduction.Although the Laplacian Eigenmaps and other manifoldmethods have been applied to dimensionality reductionproblems such as classification and face retrieval usingLaplacianfaces [26], we actually map the voxels to a higherdimensional space in order to segment the chains. Thedimension of this eigenspace depends on the number ofchains that we wish to segment. There exist other methodsfor dimensionality reduction such as Isomaps [19], KernelEigenvalue analysis [27], Locally Linear Embedding (LLE)[28], Diffusion maps [29], and Multidimensional scaling[30]. We compare several manifold techniques and discusstheir suitability for our segmentation algorithm in the nextsection.

3 MOTIVATION AND THEORY

In this section, we describe the Laplacian Eigenmaptransformation and motivate the segmentation algorithmin the LE. We construct an adjacency matrix based on theconnectivity between the voxels. We use the eigenvectors ofthe Laplacian of the adjacency matrix to transform the voxelsto nodes in the LE. The details of the mapping are presentedin Section 3.1. The mapping of voxels to the LE achieves twoimportant objectives. First, the effect of articulation at jointsis minimized because the Laplacian Eigenmap transforma-

tion depends on the connectivity of voxels, which isminimally affected by the articulation. Second, the transfor-mation maps voxels belonging to different nonrigid chains(such as the limbs in the human body) to nodes on separatesmooth 1D curves in the LE according to their position alongthe articulated chain. It is this important property thatallows us to use the spline-fitting algorithm to segment thedifferent articulated chains and differentiates the LaplacianEigenmap transformation from other manifold techniquessuch as Isomaps. We describe the properties of the LaplacianEigenvectors of two simple representative graphs inSection 3.2 and motivate our segmentation algorithm basedon these properties. We also compare the LaplacianEigenmap with other manifold techniques for dimension-ality reduction and distance-preserving transforms inSection 3.3, and we show why the Laplacian Eigenmap isthe most suitable for our purpose.

3.1 Mapping to the Laplacian Eigenspace

The mapping to the LE is described in Table 1 and isillustrated using the example in Fig. 3. The examplehighlights certain features of the transformation. The 2Dobject in the image consists of several nonrigid chains ofvarying widths and lengths connected at a single joint. Oneof the chains (red) has self-contact, and one of the chains(green) has a sharp “bend.” The different chains are color-coded for the purpose of illustration, and no distinctionbased on color is used in the mapping. The object issampled on a regular grid, and the graph GðV ;EÞ thatdescribes the connectivity between neighboring nodes inFig. 3b is computed. Although the nodes lie on a 2D planein this example, they could lie in any high-dimensionalspace, as long as we are able to compute GðV ;EÞ. Weassume that the graph G is completely connected; other-wise, we choose the biggest connected component. Theeigenvalues of L are positive and real, as L is positivesemidefinite and symmetric. Chung [31] shows that �0 ¼ 0and the corresponding eigenvector xx0 ¼ 1. If G is fullyconnected, then �1 > 0. The ith row of Y provides theembedding yyTi for the ith node.

Belkin and Niyogi [25] show that the Laplacian Eigen-map embedding is optimal when we wish to obtain yyi suchthat the distance between neighbors

Pi;j kyyi � yyjkWij ¼

trðY >LY Þ is minimized. The constraint Y >Y ¼ I is imposedto remove an arbitrary scaling factor. In addition to thedistance-minimizing property, the Laplacian Eigenvectorsalso possess certain properties that are described in thefollowing section.

3.2 Properties of Laplacian Eigenvectors

We make the following observations about the LE mappingbased on the example in Fig. 3 and justify them by

SUNDARESAN AND CHELLAPPA: MODEL-DRIVEN SEGMENTATION OF ARTICULATING HUMANS IN LAPLACIAN EIGENSPACE 3

TABLE 1Mapping Nodes to the LE

analyzing the properties of the Laplacian Eigenvectors ofgraphs:

. Nodes on different chains are mapped to nodes ondifferent curves in the eigenspace such that each ofthe curves can be discriminated from others. Wenote that the discriminative capability of the trans-formation improves with the dimension of theeigenspace.

. Nodes belonging to each chain are mapped tonodes along a smooth 1D curve, irrespective of thethickness of the chain to which they belong, asshown in Figs. 3c and 3d. The 1D structure isretained in the higher dimensions. We observe thatthe position of each node along the 1D curvealso encodes the position of that node along thearticulated chain.

The first observation is justified using extended star graphsin Section 3.2.1, and the second is justified using grid graphsin Section 3.2.2. The path graph Pm is a graph on m nodes,with the edge set EPm ¼ fði; iþ 1Þji ¼ 0; 1; � � � ;m� 2g.The ring graph Rm has the edge set ERm

¼ fði; ðiþ1ÞmodmÞji ¼ 0; 1; � � � ;m� 1g. The properties of thesegraphs are described in Appendix A, which can be foundon the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TPAMI.2007.70823.Path graphs and rings are the basic building blocks of theextended star graphs and grid graphs.

3.2.1 Eigenvectors of Extended Star Graphs

We define extended star graphs as graphs that are composedof n chains connected at one end to a common node, asillustrated in Fig. 4a. The ith chain has mi nodes, and hence,there are a total of rþ 1 nodes, where r ¼

Pn�1j¼0 mj. Let

xx ¼ x0 x1 � � � xrð Þ> represent an eigenvector. The nodewith index

Pj�1l¼0 ml þ i, i.e., the ith node ði ¼ 1; 2; � � � ;mjÞ in

the jth chain ðj ¼ 0; 1; � � � ; n� 1Þ, is labeled xðjÞi for the sake

of clarity in representation. The first node on each chain xðjÞ0

ðj ¼ 0; 1; � � � ; n� 1Þ is actually the same point x0 (seeFig. 4a). The graph is asymmetric in general, i.e., mi 6¼ mj.The graph is symmetric if m0 ¼ � � � ¼ mn�1 ¼ m. Weanalyze the structure of the eigenvectors corresponding tothe smallest nonzero eigenvalues in the general asymmetriccase and deal with the symmetric case in Appendix B,which can be found on the Computer Society DigitalLibrary at http://doi.ieeecomputersociety.org/10.1109/TPAMI.2007.70823. The eigenvector xx has to satisfyLxx ¼ �xx, from the rows of which we get

nx0 �Xn�1

j¼0

xðjÞ1 ¼ �x0; ð1Þ

2xðjÞi � x

ðjÞi�1 � x

ðjÞiþ1 ¼ �x

ðjÞi ; 1 � i < mj; ð2Þ

xðjÞmj� xðjÞmj�1 ¼ �xðjÞmj

; ð3Þ

where 0 � j � n� 1. We note that (2) and (3) are similar

to the equations for the path graph Pm and the ring

graph Rm, respectively (Appendix A, which can be found

on the Computer Society Digital Library at http://

doi.ieeecomputersociety.org/10.1109/TPAMI.2007.70823).

We can verify, by substitution, that the set of equations in

(2) is satisfied by a solution of the form

xðjÞi ¼ �ðjÞ sin �ðjÞ þ ’i

� �; 0 � i � mj; 0 � j � n� 1: ð4Þ

The eigenvalue � is given by 2� 2 cos’. We note that (4)

also satisfies (3) if xðjÞm ¼ xðjÞmþ1, i.e.,

sin �ðjÞ þ ’mj

� �¼ sin �ðjÞ þ ’ðmj þ 1Þ

� �; ð5Þ

�ðjÞ ¼ �2� ’

2ð2mj þ 1Þ: ð6Þ

Since xð0Þ0 ; x

ð1Þ0 ; � � � ;x

ðn�1Þ0 all represent the same point, we have

�ð0Þ sin �ð0Þ ¼ �ð1Þ sin �ð1Þ ¼ � � � ¼ �ðn�1Þ sin �ðn�1Þ: ð7Þ

Finally, substituting (4) in (1), we have

n�ð0Þ sin �ð0Þ �Xn�1

j¼0

�ðjÞ sin �ðjÞ þ ’� �

¼

ð2� 2 cos’Þ�ð0Þ sin �ð0Þ:ð8Þ

In the general asymmetric case, mi 6¼ mj. If sin �ðjÞ ¼ 0 for

some j, then from (7), we have sin �ðjÞ ¼ 08j. If sin �ðjÞ 6¼ 0,


Fig. 3. (a) 2D object with chains of varying thickness. (b) Corresponding nodes in graph. (c) Nodes in the LE dimensions 1-3 and (d) dimensions 4-6.

Fig. 4. (a) The structure of the extended star graph, with the nodes

labeled. (b) Example of an asymmetric extended star graph, with n ¼ 4and mj ¼ f11; 8; 7; 6g. Nodes belonging to different chains are colored

differently.

then it follows from (7) that �ðjÞ ¼ 1= sin �ðjÞ, and substitut-

ing in (8), we get

n�Xn�1

j¼0

sinð�ðjÞ þ ’Þsinð�ðjÞÞ � 2� 2 cos’ ¼ 0; ð9Þ

n�Xn�1

j¼0

cos’þ sin’ cot �ðjÞ� �

� 2þ 2 cos’ ¼ 0; ð10Þ

Xn�1

j¼0

1� 2

n

� �ð1� cos’Þ � sin’ tanð’ljÞ

� �¼ 0; ð11Þ

Xn�1

j¼0

fð’; ljÞ ¼ 0; ð12Þ

w h e r e lj ¼ mj þ 1=2 a n d fð’; ljÞ ¼ 1� 2n

� �1� cos’ð Þ �

sin’ tanð’ljÞ. We can show that f ’; lj� �

is a monotonically

decreasing function of ’ 2 ½0; ��, except at points of

discontinuity that occur at ’ ¼ �ð2kþ 1Þ=ð2ljÞ for k ¼0; 1; � � � ;mj � 1. A proof is provided in Appendix C, which

can be found on the Computer Society Digital Library at

http://doi.ieeecomputersociety.org/10.1109/TPAMI.

2007.70823. The sum of monotonically decreasing functions

is also monotonically decreasing. The eigenvalue 2� 2 cos’

is a monotonically increasing function of ’ in ½0; ��. There-

fore, the smallest eigenvalues correspond to the smallest

values of ’ that satisfy (12). Let m0 > m1 > � � � > mn�1.

Examining the interval 0; 2�2 maxðmjÞþ1

h i, we see that if

m0 > m1 > � � � > mn�1 > m0=2, then there is exactly one

solution for (12) in each of the n� 1 intervals �2lj�1

; �2lj

h ifor

j ¼ 1; 2; � � � ; n� 1.Pn�1

j¼0 fð’; ljÞ is plotted in Fig. 5. Let the

solution in the kth interval be ’k, and �k ¼ 2� 2 cos’k. We

have ’1 < ’2 < � � � < ’n�1, and thus, �1 < �2 < � � � < �n�1.

We therefore have

�

2lk�1< ’k <

�

2lk; ð13Þ

and substituting in (6), we get

�

21� lj

lk

� �< �ðjÞ <

�

21� lj

lk�1

� �: ð14Þ

Considering the �ðjÞk for the kth eigenvector and the

jth chain, we see that

�ð0Þ < � � � < �ðk�1Þ < 0 < �ðkÞ < � � � < �ðn�1Þ and ð15Þsin �ð0Þ < � � � < sin �ðk�1Þ < 0 < sin �ðkÞ < � � � < sin �ðn�1Þ: ð16Þ

We drop the subscript in the above equations. We see thatfor the first eigenvector, corresponding to the smallesteigenvalue, �ð0Þ ¼ 1= sin �ð0Þ < 0, whereas �ðjÞ ¼ 1= sin �ðjÞ >0 for j ¼ 1; � � � ; n� 1. Thus, from (4), we see that theeigenvector corresponding to the smallest eigenvalue separates thelongest chain from the rest. Similarly, the eigenvectorcorresponding to the second smallest eigenvalue separatesthe two longest chains from the rest, and so on. Thus, weare able to discriminate between n chains by using theeigenvectors corresponding to the n� 1 smallest eigenva-lues. Fig. 4b illustrates an example of an asymmetricextended star graph, with n ¼ 4. The nodes are plotted usingthe first n� 1 eigenvectors. When there are multiple chainsof the same length, i.e., mj1

¼ mj2¼ � � � ¼ mjq ¼ m, there

exists an eigenvalue 2� 2 cosð�i=ð2mþ 1ÞÞ for i ¼ 1; 2; � � � ,with multiplicity q � 1 in addition to the eigenvaluesdescribed above. The eigenvectors corresponding to theeigenvalue 2� 2 cosð�=ð2mþ 1ÞÞ are of the form

xðjÞi ¼

�ðjÞ sin �i=ð2mþ 1Þð Þ; if j ¼ j1; � � � ; jq;0; otherwise;

�ð17Þ

where the �ðjÞ are determined as described in the symmetriccase. Therefore, the q chains of length m are discriminatedby the q � 1 eigenvectors corresponding to the eigenvalue2� 2 cosð�=ð2mþ 1ÞÞ of multiplicity q � 1. Although wehave not explicitly dealt with ring graphs, the eigenvectorsof ring graphs have a very similar structure as that of pathgraphs (see the supplemental material, which can be foundon the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TPAMI.2007.70823), andit is straightforward to establish similar results.

3.2.2 Eigenvectors of Grid Graphs

Let G ¼ ðV ;EÞ and H ¼ ðW;F Þ be graphs. Then, G�H is

the graph with node set V �W and edge set ðv1; wÞ; ðv2; wÞð Þand ðv; w1Þ; ðv; w2Þð Þ, where ðv1; v2Þ 2 E and ðw1; w2Þ 2 F .

We use the following theorem [32] in our analysis:

Theorem 1. Let G ¼ ðV ;EÞ and H ¼ ðW;F Þ be graphs with

Laplacian Eigenvalues �0; . . . ; �m and �0; . . . ; �n, respec-

tively. Then, for each 0 � i < m and 0 � j < n, the Laplacian

of G�H, LðG�HÞ, has an eigenvector z of eigenvalue �i þ�j such that zðv; wÞ ¼ xiðvÞyjðwÞ. xiðvÞ is the vth element of

the ith eigenvector of LðGÞ, and similarly, yjðwÞ of LðHÞ.Let G and H be path graphs of length m and n,

respectively, where ðkþ 1Þn > m > kn, and k 2 IN. Then,�i ¼ 2� 2 cos i�=2mð Þ and �j ¼ 2� 2 cos j�=2nð Þ. G�H is agrid graph with grid dimensions m� n. Clearly, the largerthe value of k, the “longer” the object. We then have 0 ¼�0 ¼ �0 < �1 < . . . < �k < �1 < �kþ1 < . . . . Thus, the smal-lest k eigenvalues are �0 þ �1; � � � ; �0 þ �k, and the corre-sponding eigenvectors are zðv; wÞ ¼ xiðvÞyjðwÞ ¼ xiðvÞ.Thus, all nodes along the width of the object are mapped tothe same point in a k-dimensional eigenspace, and the nodesmap to a smooth 1D curve in the eigenspace. We can easilysee that the same results hold for 3D grid graphs as well,where m and n are the largest and the second largestdimensions. The above result is illustrated in Fig. 6, where


Fig. 5. The plots correspond to the example in Fig. 4b, with n ¼ 4.

The first n� 1 intervals and the corresponding solutions ’1, ’2, and

’3 are marked. We note thatPn�1

j¼0 f0ð’; ljÞ < 0. (a)

Pn�1j¼0 fð’; ljÞ.

(b)Pn�1

j¼0 f0ð’; ljÞ.

all nodes along the width of the chain are mapped to thesame point. This underlines the property of the LaplacianEigenmap transformation to map chains whose length isgreater than their width to a 1D curve in the eigenspace.

Combining the properties shown in Sections 3.2.2 and3.2.1, if we have n chains (with the lengths greater than theirwidths) to segment, we need to map the nodes to aneigenspace of n� 1 dimensions. If we map to an eigenspaceof higher dimensions, the nodes still retain their 1Dstructure, as long as the chains are sufficiently long; i.e.,the ratio of their length to their width is greater than two.The number of eigenvectors that can be used depends onthe ratio of the length of the chains to their width. Thegreater the ratio, the greater the number of eigenvectors thatcan be used, with the chains preserving their 1D structure inthe eigenspace.

3.3 Comparison with Other Manifold Techniques

We observe the embedding of the data in the example inFig. 3a obtained from different manifold techniques byusing the program provided by Wittman.1 A comparison ofthe embedding obtained using techniques such as LaplacianEigenmaps, Isomaps [19], Multidimensional Scaling [30],LLE [28], and Diffusion maps [29] is presented in Fig. 7. Wenote that, from the point of mapping to a 1D curve, theLaplacian Eigenmap and the Diffusion map perform verywell. The Diffusion map is a variation of the LaplacianEigenmap that uses a Gaussian kernel of width � toconstruct a weighted graph and normalizes the Laplacianoperator. Isomap tries preserving the geodesic distancebetween nodes, and this does not lead to the nodesbelonging to “thick” chains being mapped onto a 1D curve,as in the case of the Laplacian Eigenmap. This is alsoillustrated for the grid graph in Fig. 6. We specificallycompare Isomaps to Laplacian Eigenmaps in Fig. 8. Theobjective is to segment according to the chains to whichthey belong. We note that the Laplacian Eigenmap does amuch better job of mapping the nodes from the three chainsto 1D curves than Isomap. We also note that the Isomap(Fig. 8c) has little structure in higher dimensions, unlikeLaplacian Eigenmap (Fig. 8e). We consider nodes in chain 2(Fig. 8a). The position of node i along the chain ti in thisexample happens to be the y-coordinate of the node and ismarked in the figure. Let node i map to yyLi in the LE andyyIi in the Isomap. We compute 1D splines fL and fI to fitthe nodes in the LE and Isomap cases to minimize

Pi e

Li

andP

i eIi , respectively, where eLi ¼ kyyLi � fLðtiÞk and

eIi ¼ kyyIi � fIðtiÞk. eLi and eIi are plotted in Fig. 8f. We seethat on the right side of the dashed line, eLðtÞ � eIðtÞ.However, on the left-hand side of the dashed line, i.e., as thenodes approach the junction, eL starts rapidly increasing.This graph suggests that a 1D spline is an excellent choice tofit nodes belonging to a chain in LE, and the spline fit erroris a very good indicator of the proximity to a junction. In thecase of the Isomap, on the other hand, the spline fit error ismore or less constant for the length of the chain and is notnegligible compared to the length of the spline. Table 2summarizes the spline fit error (MSE) to the spline length Lratio for both cases. We also present the mapping of thevoxel data of a real human subject in the Isomap and LE inFig. 9. We again note that the different chains form 1Dcurves in the LE and can be modeled by 1D splines, unlikein the Isomap case.

The LE transformation has the following additionaladvantages: The neighborhood matrix W and the LaplacianL are easily computed as the nodes lie on a grid. Thetransformation is global in nature. Since the 1D nature ofthe nodes is retained in higher dimensions, we can map to ahigher dimensional space than strictly necessary. Forexample, if we wish to segment n chains, all of whoselengths are at least twice their widths, we can map them tothe LE whose dimension is between n� 1 and 2n.

4 SEGMENTATION USING SPLINES

We model the human body as being composed of severalarticulated chains connected at joints. Each articulatedchain may consist of one or more rigid segmentsconnected in a chain and forms a smooth 1D curve inthe LE because its length is greater than its thickness. Wesegment the voxels into these different chains by exploit-ing the structure and mutual orientation of the 1D curvesthat they form in the LE. Since the transformation is basedon neighborhood relations between voxels in a normal 3Dspace, it is not much affected by the articulation at joints.However, at junctions where three or more such chainsmeet, the 1D curves representing different chains divergein different directions (e.g., at the neck joint, where thehead, two arms, and trunk meet). We can fit a 1D spline inthe LE and use the spline fit error at a given node as anindicator of its proximity to a junction. The spline fitprocess also enables us to obtain the position of the nodesalong their respective 1D curves or articulated chains. Alloperations described in the following are performed in theLE. We describe the segmentation algorithm using a realexample. Fig. 10 illustrates the voxel representation of asubject in a pose, where there is self-contact between thepalm and hip.

4.1 Initialization

We can classify the articulated chains into two typesaccording to whether they are connected at one end(type 1) or both ends (type 2) to other chains. In theexample in Fig. 11a, the two legs, head, and one of thearms are of type 1, i.e., one end of the chain is free, andthe left arm and the trunk are of type 2, i.e., both ends areattached to other chains. For type 1 chains, we note thatthe node at the free end is farthest from other chains.However, for type 2 chains, the node that is farthest fromthe other chains lies in the middle of the chain. In order to


Fig. 6. Example of a grid graph with length (13) three times greater thanthe width (4). The first three eigenvectors map nodes along the widthonto the same point, and the corresponding structure in 3D eigenspaceis perfectly 1D. (a) Graph. (b) LE. (c) Isomap.

1. Available at http://www.math.umn.edu/~wittman/mani/.

initialize the spline, we begin with the node that isfarthest from all existing chains, which we call yy0. Tobegin with, in the absence of existing splines, we select thenode that is farthest from the origin, denoted by the redasterisk in Fig. 11. The starting node for the second, third,and fourth splines are denoted by the green, blue, andmagenta asterisks.

We obtain a set of nodes that are closest to the initialnode, as shown in Figs. 12a and 12b and Figs. 13a and 13bfor type 1 and type 2 cases, respectively. We can thendetermine if the initial node yy0 lies at the beginning of thecurve or the free end of the curve by determining thenumber of lines in space that can be fit to the cluster ofnodes. We find N0 closest (euclidean distance) nodesyy1; . . . ; yyN0

to the “pivot” node yy0 and perform PCA onyyi � yy0 to find the two biggest principal components uuðaÞ anduuðbÞ. The N0 closest nodes are marked blue in Figs. 12a and12b, whereas the first two principal components of theN0 nodes are plotted in Fig. 12c. We find the principaldirections (lines), which are linear functions of the twoprincipal components. In the type 1 case (Fig. 12c), there isonly one direction, and we grow a single spline. In the type2 case (Fig. 13c), there are two principal directions (because

we start in the middle), and we grow separate splines ineach direction.

4.2 Spline Fitting

We illustrate the spline-fitting procedure in Fig. 14. Given a

set of nodes and the principal axis as computed in the

previous section, we project each node yyi onto the principal

axis to obtain its site value ti. The cluster of nodes and the

principal axes are plotted in Fig. 14. The nodes, which are

6D vectors, are plotted against their site parameters t in

Fig. 14c. A 6D spline ffEIG can be computed to minimize the

error given byP

i kffEIGðtiÞ � yyik2. The spline used is a cubic

spline with two continuous derivatives and is computed

using the Matlab spline toolbox function spap2.

4.3 Spline Propagation

We propagate the spline by adding nodes that are closest tothe growing end of the spline (e.g., the blue nodes inFig. 14). The principal axis used to compute the site value isrecomputed locally by using the additional nodes. Whenthe angle between the recomputed principal axis and theprevious principal axis exceeds �max, a new principal axisand a corresponding pivot point are computed. This is sothat the principal axis adapts to the curvature of the voxels.The black vertical lines in Fig. 14c denote the boundaries ofdifferent principal axes. �max is an adaptive threshold and iscomputed as �max ¼ �MAXeold= eold þ 5enewð Þ, where �MAX ¼15 degrees. eold is the spline fit error if the old principal axisis used, and enew is the spline fit error if the new principalaxis is used. When the curvature of the spline is high,enew � eold, and �max � �MAX ¼ 15 degrees. However, when


Fig. 7. Comparison of manifold techniques: the nodes in the first six dimensions of the embedding space using different techniques. The nodescorrespond to the example in Fig. 3. (a) Laplacian. (b) Isomap. (c) MDS. (d) LLE. (e) Diffusion Map.

Fig. 8. Isomap versus Laplacian Eigenmap. The Isomap embedding is compared to the Laplacian embedding. The 1D structure is not retained inhigher dimensions in the case of Isomap, as it tries preserving geodesic distances. (a) Nodes on the grid. (b) Isomap eigenvectors 1-3. (c) Isomapeigenvectors 4-6. (d) Laplacian eigenvectors 1-3. (e) Laplacian eigenvectors 4-6. (f) Error.

TABLE 2Comparison of the Laplacian Eigenmap and Isomap

L is the spline length, whereas MSE is the mean squared error.

the nodes diverge (e.g., at a junction), enew � eold, and

�max � �MAX=6 ¼ 2:5 degrees. The maximum angle between

adjacent principal axes is therefore 15 degrees when the

curve is strongly 1D and 3 degrees when it is not.

4.4 Termination

A node is considered an outlier if the spline fit error ofthat node exceeds a fixed threshold CL

ffiffiffidp

, whereC ¼ 0:005, L is the length of the average spline in the LE(which is set to 1, as we have normalized the LE such thatyyi 2 ½0; 1�6), and d is the dimension of the LE. In otherwords, a node is an outlier if it does not lie close to thecomputed 1D spline in the LE. The number of outliersincreases rapidly at a junction because the nodes divergein widely different directions. When the number of outliernodes is greater than 5, we stop growing the spline. Weshow in Figs. 15b, 15c, and 15d the successful segmenta-tion of the voxels into different articulated chains,although there is contact between the arms and the body.

4.5 Constructing the Body Graph

The spline-fitting procedure is stopped when six splines

have been discovered. Each node (vvi in the normal 3D

space and yyi in LE) belonging to a spline has a site value ti.

This value denotes the position of the node along the 1D

curve and can be used to compute the curve skeleton in

Fig. 15e. We compute a 3D smoothing spline with the set of

nodes ti; vvið Þ in the normal space. The spline ffLIMB seeks to

minimize the error given byP

i kffLIMBðtiÞ � vvik2. We thus

compute the curve skeleton for each of the splines in the

normal space. Type 1 chains contain a single spline. Type 2

chains contain two splines, which are merged together to

form a single spline. We now have a set of splines and

construct a graph to describe the connections between the

ends of the splines in the LE (see Fig. 15f).

5 PROBABILISTIC REGISTRATION

We use a top-down approach, using our knowledge of the

human body to perform the registration of the different

chains that were segmented using a bottom-up approach.

The objective is to identify the segmented chains and resolve

possible ambiguities such as those in Fig. 15f so that we can

obtain the joint connections shown in Fig. 15g. We obtained

six splines in the previous section, and we denote the

ith spline as si with nodes at each end (node0i and node1

i ).

Type 1 splines have a connected node at one end and Type 2

splines have connected nodes at both ends. For each

connected node, we compute the euclidean distance in the

LE to the closest node (dðnodeki ; nodeljÞ) and compute the

mean of this distance for all connected nodes (dmean). We can

then assign a probability to the “joint” between connected

nodeki and nodelj as

Prnodeki connected to nodelj

�¼ e�dðnodeki ;nodeljÞ=dmean : ð18Þ

We note that some of these “joints” are true joints, but some

of them are pseudojoints caused by contact between bodyparts, e.g., between the hip and the left palm in Fig. 15.

We wish to register the si to the known body chains bi in

Fig. 2a. We denote possible registrations as a permutationj1; j2; . . . ; j6ð Þ of 1; 2; . . . ; 6, which indicates that sji ¼ bi. The

probability of the registration, as given in (19), is theproduct (20) of the probability of each chain match being

correct ðPr½bi ¼ sji �Þ and the probability of the connection

between the appropriate nodes Pr½conn:�.

Pr½ðj1; . . . ; j6Þ� ¼ Pr½b1 ¼ sj1 ; . . . ; b6 ¼ sj6 �; ð19Þ

¼ Pr½conn:�Y6

i¼1

Pr½bi ¼ sji � !

: ð20Þ


Fig. 10. Voxels in a normal space and the LE. (a) Normal.(b) Eigenvectors 1-3. (c) Eigenvectors 4-6.

Fig. 11. Voxels in the normal space and LE. The red, green, blue, andmagenta asterisks denote the starting node for the first, second, third, andfourth splines. (a) Normal. (b) Eigenvectors 1-3. (c) Eigenvectors 4-6.

Fig. 9. Comparison of Laplacian Eigenmaps and Isomaps for a real example of the human-subject voxel reconstruction. (a) Normal. (b) Isomap:dimensions 1-3. (c) Dimensions 4-6. (d) Laplacian eigenmap: dimensions 1-3. (e) Dimensions 4-6.

The length and the thickness (or “girth”) for each chain is

obtained using the computed spline function ffLIMB. The

length of the chain is the length of the spline. The “girth” of

the chain is the mean of the error of the spline reconstruction

kffLIMBðtiÞ � vvik2. Let lk and gk be the length and “girth” of the

kth chain. We normalize them by the maximum lengths,

respectively, i.e., lk ¼ lk=maxflkg and gk ¼ gk=maxfgkg,and sort them so that lS1 ; . . . ; lS6

� ¼ sortð l1; . . . ; l6f gÞ and

gS1 ; . . . ; gS6�

¼ sortð g1; . . . ; g6f gÞ. Note that b1 and b2 are the

trunk and head, b3 and b4 are the arms, and b5 and b6 are the

legs. The probability Pri;k ¼ Pr bi ¼ sk is computed as

Pri;k /e�5jgk�gTRUNKj; i ¼ 1;e�5jlk�lHEADj; i ¼ 2;e�2jlk�lARMj�2jgk�gARMj; i ¼ 3; 4;e�2jlk�lLEGj�2jgk�gLEGj; i ¼ 5; 6;

8>><>>: ð21Þ

where gTRUNK ¼ gS6 , lHEAD ¼ lS1 , lARM ¼ ðlS3 þ lS4 Þ=2, gARM ¼ðgS1 þ gS2 Þ=2, lLEG ¼ ðlS5 þ lS6 Þ=2, and gLEG ¼ ðgS3 þ gS4 Þ=2. The

key idea here is that we expect that lHEAD < lTRUNK <

lARM < lLEG and gARM < gLEG < gHEAD < gTRUNK. We there-

fore set gARM ¼ ðgS1 þ gS2 Þ=2, i.e., the mean of the two

smallest values of “girth.” The other values are correspond-

ingly set.In fact, for most poses (where there is no self-contact),

the only chain that has a nonzero probability of connec-tions at both nodes is the trunk, and therefore, the numberof permutations is greatly reduced. For the example, inFig. 15f, the blue, red, and black chains have an equalprobability of being identified as the trunk based on theconnections alone. The properties of the individual chainhelp discriminate between the trunk and the arms. Thechains are labeled according to the registration with thehighest probability. If the probability of the best registra-tion is too low, the frame is discarded as unsuitable for usein model estimation.

6 HUMAN BODY MODEL ESTIMATION

Our objective is to estimate the human body model andpose from the segmented and registered voxels obtained inthe previous section. The two sets of parameters of interestare the pose parameters (joint angles) and the bodystructure (joint locations and superquadric parameters).The skeleton curve (Fig. 15f) can be computed as describedin the previous section. The position of each voxel along thearticulated chain is used at various stages of the algorithmand is important. There are several models that can be usedto represent human bodies [23], and we select the super-quadric model described in [33]. In this section, we describethe algorithm to estimate the human body model para-meters and the pose. We use a hierarchical approach,beginning with a skeletal model (joint locations and limblengths in Fig. 16a) and then proceeding to increase themodel complexity and refining parameters to obtain avolumetric model (superquadric parameters in Fig. 16c).The joint locations cannot be reliably estimated from asingle frame or pose. We therefore compute the skeletoncurve of the subject from a set of keyframes whereregistration is successful. These keyframes are spread aparttemporally so that a set of distinct poses is obtained.

The stature (or height) of the subject is a key parameterthat is strongly related to a number of human body modelparameters such as the lengths of long bones in the body[34]. Anthropometric studies have been performed on


Fig. 12. Fitting lines to nodes (type 1). (a) Eigenvectors 1-3.(b) Eigenvectors 4-6. (c) Line fit.

Fig. 13. Fitting lines to nodes (type 2). (a) Eigenvectors 1-3.(b) Eigenvectors 4-6. (c) Line fit.

Fig. 14. Growing a spline: adding nodes on the growing end. (a) LEdimensions 1-3. (b) LE dimensions 4-6. (c) Six-dimensional nodesversus t.

Fig. 15. Segmentation and registration in the LE. (a) Normal space. (b) Nodes are segmented in the LE, eigenvectors 1-3. (c) Eigenvectors 4-6. (d)The segmented voxels are represented in original 3D space. (e) The computed skeleton. (f) The computed skeleton with the two main joints and (g)where they are connected.

certain demographic groups to study the relationshipbetween stature and the long bones in the body [35], [36].These studies indicate that we can estimate the lengths ofthe large bones for an average human subject from thestature. We can construct a skeleton model for the averagesubject as a function of the stature by scaling the limblengths and the joint locations by the ratio of the stature ofthe subject to the stature of the average human. Weintroduce the skeleton model and superquadric model inSection 6.1 and describe the distance between a skeletoncurve and a skeleton model. In the first step (Section 6.2),we find the optimal stature for the subject by using theskeleton model. In the second step (Section 6.3), weoptimize for the joint locations based on the skeletonmodel, and in the third step (Section 6.4), we estimate andoptimize for the superquadric parameters by using the fullsuperquadric model. Nonlinear optimizations are per-formed using the Matlab optimization toolbox.

6.1 Distance between Skeleton Curve and SkeletonModel

The skeleton model and the superquadric model areillustrated in Figs. 16b and 16c, respectively. In order toestimate how well a skeleton model fits a skeleton curve, weneed to compute some kind of “distance” between theskeleton curve and the skeleton model. Since the skeletoncurves are all registered, we can compute the distancebetween each skeleton curve and its corresponding skeletonmodel independently. The computation of this “distance” isdescribed in the following paragraph.

Consider a set of ordered points xx1; xx2; � � � ; xxn on askeleton curve corresponding, e.g., to the arm (see Fig. 17).The corresponding skeleton model for the arm consists ofthree line segments: L1, L2, and L3. We compute thedistance eji between xxi and the closest point on line segment

Lj. We need to assign each point to a line segment. Since theset of points on the skeleton curve is ordered, we impose theconstraint that the assignment is performed in a monotonicmanner. That is, points xx1; � � � ; xxn1

are assigned to L1, pointsxxn1þ1; � � � ; xxn2

are assigned to L2 and points xxn2þ1; � � � ; xxn areassigned to L3. For a given value of n1, n2 is chosen so thatthe distance between points xxn1

and xxn2is approximately

equal to the length of the line segment L2. For the aboveassignment, the distance between the skeleton curve isgiven by the vector ðe1

1 � � � e1n1e2n1þ1 � � � e2

n2e3n2þ1 � � � e3

nÞ0. n1 and

n2 are chosen so as to minimize the sum of the elements inthe vector.

6.2 Estimation of Skeleton Body Model Parametersfrom Stature

Given the skeleton model M, the pose � is computed intwo steps: First, the pose of the trunk is determined, andsecond, the pose of the remaining articulated chains iscomputed. The pose parameter � is given by the pose ofthe trunk �T and the pose of the different chains withrespect to the trunk. The z-axis of the trunk is aligned withthe skeleton curve of the trunk. The y-axis of the trunk is inthe direction of the line joining the right pelvic joint to theleft pelvic joint. The x-axis points to the forward direction.This direction is estimated using the direction of the feetand is orthogonal to the computed yz-plane. Once the trunkpose has been estimated, the joint locations at the hips,shoulders, and neck are fixed. It is then possible toindependently estimate the pose of each of the articulatedchains. The objective is to compute the pose of the skeletonmodel so that the distance between the points on theskeleton curve and the skeleton model is minimized.

We can compute the skeleton model from the statureparameter alone. In the first iteration of the algorithm, weoptimize for the stature of the subject to minimize theskeleton fit error. The initial estimate of the stature of thesubject can be obtained from the lengths of various limbsegments by using the relationship between the long bonesand the stature. The skeleton fit error versus the stature fora synthetic sequence (Section 7.4) is presented in Fig. 20.The initial stature estimated was 2,168 mm, whereas thecorrect stature is 2,100 mm. It should be emphasized thatthe computation of the stature is in the context of itsrelationship to the body structure of the subject, assumingthat the subject resembles a normal human. The initialskeleton segments have been superimposed on the com-puted skeleton curves in Fig. 18. The optimization results inan optimal value of the stature, which we can use to


Fig. 16. Human body models.

Fig. 17. Computing distance between the skeleton curve and the skeleton model. (a) Sample points on the skeleton curve. (b) Distance to the closest

point on the skeleton model before optimization. (c) After optimization.

compute the initial pose estimate �0 and the initial skeletalmodel estimate M0

skel.

6.3 Optimization of Joint Locations

In the second step, � is optimized, keeping Mskel fixed, andthen Mskel is optimized while keeping � fixed. Theoptimization is performed so that the model and poseparameters lie in a bounded region centered at M0

skel and�0T , respectively. The difference from the previous section is

that the skeleton model parameters are individually variedrather than through the stature parameter. The skeletonmodel superimposed on the curve skeleton after jointlocations are optimized is presented in Fig. 19.

6.4 Estimation of Superquadric Parameters

The superquadric parameters for the trunk, head, arms,forearms, thighs, and legs are estimated from voxels, asthese body segments are large enough to be estimated usingthe resolution in our experiments. At this stage, we knowwhich body segment on the skeleton model each point onthe skeleton curve is closest to. Since we also know thelocation of each voxel along the skeleton curve, we canassociate each voxel with a body segment. Each articulatedchain can therefore be segmented into its component rigidsegments. Using the estimated joint angles, the orientationof the coordinate frame attached to the component segmentcan also be computed. For a given body segment, the pose isnormalized using the body coordinate frame so that thebody segment is positioned at the origin and is aligned withthe z-axis, as in Fig. 21. The tapered superquadric [33]described by (22) is illustrated in Fig. 21. The main

parameters of the superquadric are marked. In particular,the s parameter denotes the taper of the superquadric, ands ¼ 0 means that the xy cross-section is uniform:

x

x0

� �2

þ y

y0

� �2

¼ 1þ szz0

� �1� 1� 2z

z0

� �d !;

0 � z � z0:ð22Þ

We compute the area of the cross section of the voxels Az

(the plane parallel to the xy-plane) at regularly spaced

points along the z-axis. We assume that the cross-section is a

disc, and we find the radius r from the area by using the

relation A ¼ �r2. An ellipse of equal area would be such

that r2 ¼ x0y0. We compute the radius at different points

along the z-axis, as rz ¼ffiffiffiffiffiffiffiffiffiffiffiAz=�

p, which we refer to as the

radial profile in Fig. 22. The radial profile is computed in all

the keyframes for each body segment. The median radial

profiles for some of the body segments are presented in

Fig. 22. The length, radius, and scale parameters of the body

segment are computed from the median radial profile. We

set x0 ¼ y0 ¼ r in (22) for all body segments, except the

trunk and head, for which we determine the x0 and y0

parameters of the superquadric in the following manner.

We obtain the xy-histogram, i.e., Iðx; yÞ, a function whose

value at ðxi; yiÞ is given by the number of voxels that have x

and y coordinates given by xi and yi, respectively. Sa;b ¼fðx; yÞ : ðxaÞ

2 þ ðybÞ2 < 1g is the set of all points that lie inside

the ellipse with parameters a and b. We find the values of x0

and y0 that satisfy the constraint x0y0 ¼ r2 and maximize

the functionPðx;yÞ2Sx0 ;y0

Iðx; yÞ.


Fig. 18. Fit of skeleton model after the stature has been optimized.

Fig. 19. Fit of skeleton model after the joint locations have been optimized.

Fig. 20. Fit error versus stature. Fig. 21. A tapered superquadric.

We then refine the pose by using the superquadric body

segments and the voxels directly. The objective is to obtain

the pose that maximizes the overlap between the super-

quadric model and the voxels. The pose is refined by the

bounded optimization of the pose parameter to minimize

the “distance” between the voxels and the superquadric

model. This “distance” depends on the position of each

voxel with respect to the closest superquadric. The distance

is set to e0 if the voxel is on the surface, and this is e�1 if it is

on the axis of the superquadric. The distance increases

exponentially as the voxel is farther from the surface of the

superquadric. The distance vector ee ¼ ½e1; e2; � � � ; eN �>,

where ei ¼ minðeð1Þi ; eð2Þi ; � � � ; eðJÞi Þ, and e

ðjÞi is the distance of

the ith voxel with respect to the jth body segment and is

given as

eðjÞi ¼

exp rji � qji

� �; if 0 � zji � z

j0;

exp rji þ pji

� �; otherwise;

8<: ð23Þ

where

pji ¼ min��zj0 � zji ��; ��zji��

; ð24Þ

rji ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffixji

xj0

!2

þ yji

yj0

!2vuut ; and ð25Þ

qji ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1þ sj z

ji

zj0

!1� 1� 2

zji

zj0

!d0@

1A

vuuut : ð26Þ

ðxji ; yji ; z

jiÞ are the voxel coordinates in the coordinate system

of the jth body segment, and ðxj0; yj0; z

j0; s

j; djÞ are thesuperquadric parameters of the jth body segment.Although the distance function appears complicated, it ismerely a measure of how close the voxel is to the centralaxis of the superquadric. The refined pose is the pose thatminimizes keek.

7 EXPERIMENTAL RESULTS

We present the results of our experiments on syntheticdata obtained from animation models and on real dataobtained both from 3D laser scans and synchronized videosequence cameras. We illustrate our algorithm on poseswith self-contact to illustrate the ability of our algorithm tosegment body poses in complex poses correctly. We alsopresent the results of the model estimation algorithm ondifferent sources such as 3D laser scan data and voxelreconstructions computed from multicamera video se-quences and different subjects. These different sourcesresult in voxel data with varying degrees of accuracy.

7.1 Segmentation on Real Data

The results of the algorithm on different subjects in bothsimple and difficult poses are presented in Fig. 23. Thevoxels were computed using gray-scale images fromNcam ¼ 12 calibrated cameras. Background subtractionwas performed to obtain binary foreground silhouettesfor each image. Space carving was performed using thebinary silhouettes and the calibration data to obtain avoxel representation, where each voxel block is of size30� 30� 30 mm. The space-carving algorithm consists ofprojecting nodes on a 3D grid onto each camera imageand then determining for each grid point if it lies insidethe silhouette for at least Ncarve ¼ 11 � Ncam images. Alarge value of Ncarve leads to better accuracy but maycreate holes in voxel reconstruction due to errors in thebackground subtraction. We therefore sacrifice someaccuracy in order to improve robustness.

We note that the algorithm succeeds in segmenting thevoxels into parts based on the 1D structure and theconnection between the different body segments. We havesuccessfully performed segmentation and registration in thecase of self-contact, as illustrated in Figs. 23a and 23b, whichother algorithms (such as [3]) do not address. We also notethat this low-level segmentation lends itself to a high-levelprobabilistic registration process. This process allows us toreject improbable registrations based on the estimatedconnections between the segments and lets us use priorknowledge of the properties of the different segments andthe graph describing their connectivity.


Fig. 22. Radial profiles of different body segments. The solid line is the median radial profile. The dotted line is the superquadric radius with the scaleparameters set to zero. The dashed line is the superquadric radius with estimated scale parameter. The x-axis of the plots is the distance, inmillimeters, along the z-axis of the body segment coordinate system. The y-axis of the plots is the radius value, also in millimeters. (a) Trunk.(b) Head. (c) Upper arm. (d) Forearm. (e) Thigh. (f) Leg.

Fig. 23. Segmentation and registration for different subjects and poses. (a) Subject A. (b) Subject A. (c) Subject B. (d) Subject C.

7.2 Video Data

We also provide results on the voxel data obtained from thevideo sequences. The manner in which the voxels wereobtained is described in Section 7.1. Given that the qualityof the voxels construction is inferior due to space carvingand background subtraction artifacts, we used 20 frames forthe human body model estimation algorithms. The resultsof the human body model estimation for different subjectsare presented in Fig. 27.

7.3 HumanEvaII Data Set

We show the results of the segmentation and registrationalgorithm on two sequences from the HumanEvaII data setin Fig. 24. We map the nodes to a 5D LE, as the accuracy ofvoxel reconstruction is low, and we do not gain much bymapping to a higher dimensional space. The algorithm doesnot find the requisite number of body segments in themajority of the frames principally due to two reasons. Thearms are too close to the body, are obscured in a majority ofthe cameras, and are undetected, or segmented limbs arerejected due to the length of their curve skeleton being tooshort. The voxel reconstruction algorithm also creates a“ghost limb” as an artifact of the space-carving algorithm incertain configurations of the subject with respect to thecameras. It should be noted that both of these problems canbe alleviated by using more cameras. The problematicframes are rejected automatically. We report results on theWalking (frames 1-350) and Balancing (800-1,222) subsets. Atotal of 68 frames (around 9 percent of the total) weresegmented and registered.

7.4 Synthetic Data Set

We provide results on human body model estimation byusing a synthetic sequence that has been generated from aknown model and known motion sequence described bythe standard BVH format. A sample 3D frame and the

corresponding voxel data are presented in Fig. 25. The voxelresolution used was 30 mm. The human body parametersand the pose parameters are known, and we can compare theestimated human body model and the motion parameterswith the ground-truth values. We note that the known bodyparameters are only the joint locations and not the shapeparameters. The 3D animation is a smooth, fairly realisticmesh, as shown in in Fig. 25. The results of the human bodymodel estimation are illustrated in Fig. 25. The sequence had120 frames, and the six different chains were correctlysegmented and registered in 118 of the 120 frames (theregistration failed in the remaining two frames, and theywere automatically discarded). We used 10 equally spacedframes as the keyframes in our human body modelestimation algorithm. The human body model used in thisestimation used two rigid segments for the trunk. Thehuman body model used in the other experiments uses onerigid segment for the trunk. The pose was computed for allthe 118 frames by using the estimated model. The errors inthe joint angles at the important joints are compared inTable 3. The error is in degrees and is computed usingcos�1hnnG; nnEi, where nnG and nnE are the actual and estimatedunit vectors describing the direction of the segment atthe joint.

7.5 Three-Dimensional Scan Data

The synthetic sequence described in the previous sectionhad limited motion at several joints. We also tested ourhuman body model estimation algorithm on differentsubjects by using laser scan data, which provides 3Dmeshes. Voxels can be computed from 3D meshes bydetermining if nodes on a regular 3D grid lie inside themesh structure. The subject in each case strikes differentposes that exercise different joint angles. The subjects are ofdifferent heights and builts. The voxel was computed fromthe 3D mesh obtained from the laser scanner. A set of five


Fig. 24. Successfully segmented and registered frames from two subjects in the HumanEvaII data set.

Fig. 25. Sample voxel and human body model estimation from synthetic sequence.

TABLE 3Joint Angle Error for Skeleton and Superquadric Optimization

different poses per subject was used to estimate the humanbody model. Each pose is quite different from the other, the3D scans are relatively accurate, and we are thus able toestimate the human body model parameters from a fewernumber of poses. The results of the human body model fordifferent subjects are presented in Fig. 26. This experimentillustrates that the human body model estimation can beperformed using a limited number of frames, provided thatthe poses are varied.

8 SUMMARY AND CONCLUSION

We proposed a novel method for performing segmentationof articulating objects in the Laplacian Eigenspace. Themethod is different from other techniques such as Isomapand LLE, as the objective is to map the nodes to a higherdimensional space IRk, where k depends on the number ofchains, so as to extract the 1D structure of the articulatingchains. It is shown that the Laplacian Eigenmap transformis suitable for extracting the 1D structure and for segment-ing the different chains at the joints. k-dimensional splinesare used to model these smooth 1D curves in the eigen-space. Once segmentation has been performed, the skeletonis estimated using the registration of the nodes along the1D curve. A useful application is the segmentation of 3Dvoxel data of articulating human subjects obtained fromdifferent sources. The algorithm is able to handle difficultposes where there is self-contact and also obtain a measureof how well the segmented object fits the simple priormodel. This measure is used to select the best frames forestimating a complete 3D model for human subjects. Thealgorithm does not require that the subjects follow apredetermined motion protocol. The model estimationframework does not restrict the complexity of the model,and more complex models can be estimated using theproposed algorithm. The results of our completely auto-matic segmentation, registration, and human body modelestimation algorithm on subjects with varying body massindices (BMIs) using data obtained from differentsources such as 3D laser scans and video cameras is alsopresented.

ACKNOWLEDGMENTS

This project was funded by the US National Science

Foundation (NSF) under Grant 0325715. The authors are

grateful to Professor Tom Andriacchi, Dr. Lars Munder-

mann, Dr. Stefano Corazza, and Dr. Ajit Chaudhari of

Stanford University for their helpful discussions and for

providing the 3D laser scan data. The authors also thank

James Sherman at the University of Maryland for help with

the several experiments in the paper.

REFERENCES

[1] L. Mundermann, D. Anguelov, S. Corazza, A. Chaudhari, and T.P.Andriacchi, “Validation of a Markerless Motion Capture Systemfor the Calculation of Lower Extremity Kinematics,” Proc. 20thCongress Int’l Soc. of Biomechanics and 29th Ann. Meeting Am. Soc.Biomechanics, 2005.

[2] S. Corazza, L. Mundermann, and T.P. Andriacchi, “Lower LimbKinematics through Model-Free Markerless Motion Capture,”Proc. 20th Congress Int’l Soc. of Biomechanics and 29th Ann. MeetingAm. Soc. Biomechanics, 2005.

[3] C.-W. Chu, O.C. Jenkins, and M.J. Mataric, “Markerless KinematicModel and Motion Capture from Volume Sequences,” Proc. IEEEConf. Computer Vision and Pattern Recognition (CVPR ’03), vol. 2,pp. 475-482, June 2003.

[4] I. Miki�c, M. Trivedi, E. Hunter, and P. Cosman, “Human BodyModel Acquisition and Tracking Using Voxel Data,” Int’l J.Computer Vision, vol. 53, no. 3, 2003.

[5] L. Mundermann, S. Corazza, and T. Andriacchi, “AccuratelyMeasuring Human Movement Using Articulated ICP with Soft-Joint Constraints and a Repository of Articulated Models,” Proc.IEEE Conf. Computer Vision and Pattern Recognition, June 2007.

[6] A. Sundaresan and R. Chellappa, “Segmentation and ProbabilisticRegistration of Articulated Body Model,” Proc. 18th IEEE Int’lConf. Pattern Recognition (ICPR ’06), vol. 2, pp. 92-96, Aug. 2006.

[7] A. Sundaresan and R. Chellappa, “Multicamera Tracking ofArticulated Human Motion Using Motion and Shape,” Proc.Seventh Asian Conf. Computer Vision (ACCV ’06), vol. 2, pp. 131-140,Jan. 2006.

[8] D.M. Gavrila, “The Visual Analysis of Human Movement: ASurvey,” Computer Vision and Image Understanding, vol. 73, no. 1,pp. 82-98, 1999.

[9] J. Aggarwal and Q. Cai, “Human Motion Analysis: A Review,”Computer Vision and Image Understanding, vol. 73, no. 3, pp. 428-440, 1999.

[10] T. Moeslund and E. Granum, “A Survey of Computer Vision-Based Human Motion Capture,” Computer Vision and ImageUnderstanding, pp. 231-268, 2001.


Fig. 26. Human body model estimation from 3D scan obtained for different subjects. (a) Voxel from scan. (b) Subject E. (c) Subject A. (d) Subject F.(e) Subject G.

Fig. 27. Human body model estimation for different subjects from video sequences. (a) Voxel from silhouettes. (b) Subject A. (c) Subject B.(d) Subject D. (e) Subject C.

[11] L. Sigal and M. Black, “HumanEva: Synchronized Video andMotion Capture Dataset for Evaluation of Articulated HumanMotion,” Technical Report CS-06-08, Brown Univ., 2006.

[12] K. Rohr, Human Movement Analysis Based on Explicit Motion Models.Kluwer Academic Publishers, 1997.

[13] D. Ramanan and D.A. Forsyth, “Finding and Tracking Peoplefrom the Bottom Up,” Proc. IEEE Conf. Computer Vision and PatternRecognition (CVPR ’03), vol. 2, pp. 467-474, June 2003.

[14] X. Ren, A.C. Berg, and J. Malik, “Recovering Human BodyConfigurations Using Pairwise Constraints between Parts,” Proc.10th IEEE Int’l Conf. Computer Vision (ICCV ’05), vol. 1, pp. 824-831,Oct. 2005.

[15] G. Mori and J. Malik, “Estimating Human Body ConfigurationsUsing Shape Context Matching,” Proc. Seventh European Conf.Computer Vision (ECCV ’02), pp. 666-680, 2002.

[16] I.A. Kakadiaris and D. Metaxas, “3D Human Body ModelAcquisition from Multiple Views,” Proc. First Int’l Conf. ComputerVision (ICCV ’95), pp. 618-623, June 1995.

[17] K. Cheung, S. Baker, and T. Kanade, “Shape-from-Silhouette ofArticulated Objects and Its Use for Human Body KinematicsEstimation and Motion Capture,” Proc. IEEE Conf. Computer Visionand Pattern Recognition (CVPR ’03), vol. 1, pp. 77-84, June 2003.

[18] J. Carranza, C. Theobalt, M. Magnor, and H. Seidel, “Free-viewpoint Video of Human Actors,” ACM Trans. Graphics, vol. 22,no. 2, pp. 569-577, 2003.

[19] J.B. Tenenbaum, V. de Silva, and J.C. Langford, “A GlobalGeometric Framework for Nonlinear Dimensionality Reduction,”Science, vol. 290, no. 5500, pp. 2319-2323, 2000.

[20] D. Anguelov, D. Koller, H. Pang, and P. Srinivasan, “RecoveringArticulated Object Models from 3D Range Data,” Proc. 20th Conf.Uncertainty in Artificial Intelligence (UAI ’04), pp. 18-26, 2004.

[21] N. Krahnstoever and R. Sharma, “Articulated Models fromVideo,” Proc. IEEE Conf. Computer Vision and Pattern Recognition(CVPR ’04), vol. 1, pp. 894-901, June 2004.

[22] G. Brostow, I. Essa, D. Steedly, and V. Kwatra, “Novel SkeletalRepresentation for Articulated Creatures,” Proc. Ninth EuropeanConf. Computer Vision (ECCV ’04), vol. 3, pp. 66-78, May 2004.

[23] N.I. Badler, C.B. Phillips, and B.L. Webber, Simulating Humans.Oxford Univ. Press, 1993.

[24] D. Gavrila and L. Davis, “3-D Model-Based Tracking of Humansin Action: A Multiview Approach,” Proc. IEEE Conf. ComputerVision and Pattern Recognition (CVPR ’96), pp. 73-80, 1996.

[25] M. Belkin and P. Niyogi, “Laplacian Eigenmaps for Dimension-ality Reduction and Data Representation,” Neural Computation,vol. 15, no. 6, pp. 1373-1396, 2003.

[26] X. He, S. Yan, Y. Hu, P. Niyogi, and H.-J. Zhang, “FaceRecognition Using Laplacianfaces,” IEEE Trans. Pattern Analysisand Machine Intelligence, vol. 27, no. 3, pp. 328-340, Mar. 2005.

[27] B. Scholkopf, A. Smola, and K.-R. Muller, “Nonlinear ComponentAnalysis as a Kernel Eigenvalue Problem,” Neural Computation,vol. 10, pp. 1299-1319, 1998.

[28] S.T. Roweis and L.K. Saul, “Nonlinear Dimensionality Reductionby Locally Linear Embedding,” Science, vol. 290, no. 5500,pp. 2323-2326, 2000.

[29] B. Nadler, S. Lafon, R. Coifman, and I. Kevrekidis, “DiffusionMaps, Spectral Clustering and the Eigenfunctions of Fokker-Planck Operators,” Proc. Conf. Neural Information ProcessingSystems (NIPS ’05), Dec. 2005.

[30] T. Cox and M. Cox, Multidimensional Scaling. Chapman and Hall,1994.

[31] F.R.K. Chung, Spectral Graph Theory. American MathematicalSociety, 1997.

[32] B. Mohar, “The Laplacian Spectrum of Graphs,” Graph Theory,Combinatorics, and Applications, vol. 2, pp. 871-898, 1991.

[33] A. Sundaresan and R. Chellappa, “Acquisition of ArticulatedHuman Body Models Using Multiple Cameras,” Proc. Fourth Conf.Articulated Motion and Deformable Objects, pp. 78-89, July 2006.

[34] A. Ozaslan, H.T.M. Yasar Iscan, I. Oxaslan, and S. Koc,“Estimation of Stature from Body Parts,” Forensic Science Int’l,vol. 132, no. 1, pp. 40-45, 2003.

[35] B.Y. Choi, Y.M. Chae, I.H. Chung, and H.S. Kang, “Correlationbetween the Postmortem Stature and the Dried Limb-BoneLengths of Korean Adult Males,” Yonsei Medical J., vol. 38, no. 2,pp. 79-85, 1997.

[36] M.C.D. Mendonca, “Estimation of Height from the Length of LongBones in a Portuguese Adult Population,” Am. J. PhysicalAnthropology, vol. 112, pp. 39-48, 2000.

Aravind Sundaresan received the BE (Hons)degree from the Birla Institute of Technology andScience, Pilani, India, in 2000 and the MS andPhD degrees from the University of Maryland,College Park, in 2005 and 2007, respectively. Heis currently a computer scientist in the ArtificialIntelligence Center, SRI International, MenloPark. His research interests are pattern recogni-tion, image processing, and computer vision andtheir applications to markerless motion capture

and robotic vision. He is a corecipient (with R. Chellappa) of the 2006Best Student-Authored Paper from the International Association ofPattern Recognition Computer Vision Track and the 2007 OutstandingInnovator Award from the University of Maryland. He is a studentmember of the IEEE.

Rama Chellappa received the BE (Hons)degree from the University of Madras, India, in1975, the ME (with distinction) degree from theIndian Institute of Science, Bangalore, in 1977,and the MSEE and PhD degrees in electricalengineering from Purdue University, West La-fayette, Indiana, in 1978 and 1981, respectively.He was with the University of Southern Califor-nia (USC), Los Angeles, where he was anassistant from 1981 to 1986, and an associate

professor from 1986 to 1991, and the director of the Signal and ImageProcessing Institute from 1988 to 1990. Since 1991, he has been aprofessor of electrical engineering and an affiliate professor of computerscience at the University of Maryland, College Park. He is also thedirector of the Center for Automation Research and a member of theInstitute for Advanced Computer Studies. Recently, he has been namedthe Minta Martin Professor of Engineering. He is serving as adistinguished lecturer of the IEEE Signal Processing Society and hasbeen elected to receive a Technical Achievement Award from the IEEEComputer Society. Over the last 26 years, he has published numerousbook chapters, peer-reviewed journal articles, and conference proceed-ings in image and video processing, analysis, and recognition. He is alsoa coeditor/coauthor of six books on neural networks, Markov randomfields, face/gait-based human identification, and activity modeling. Hiscurrent research interests are face and gait analysis, 3D modeling fromvideo, automatic target recognition from stationary and moving plat-forms, surveillance and monitoring, hyperspectral processing, imageunderstanding, and commercial applications of image processing andunderstanding. He has served as an associate editor of several IEEETransactions. He was a coeditor-in-chief of Graphical Models and ImageProcessing and the editor-in-chief of the IEEE Transactions on PatternAnalysis and Machine Intelligence from 2001 to 2004. He served as amember of the IEEE Signal Processing Society Board of Governorsfrom 1996 to 1999 and as its Vice President of Awards and Membershipfrom 2002 to 2004. He is a Fellow of the IEEE and the InternationalAssociation for Pattern Recognition and a golden core member of theIEEE Computer Society. He has received several awards, including aUS National Science Foundation (NSF) Presidential Young InvestigatorAward in 1985, three IBM Faculty Development Awards, the 1990Excellence in Teaching Award from the School of Engineering, USC, the1992 Best Industry Related Paper Award (with Q. Zheng) and the 2006Best Student-Authored Paper in the Computer Vision Track (with A.Sundaresan) from the International Association of Pattern Recognition,the 2000 Technical Achievement Award from the IEEE SignalProcessing Society, and the 2004 Meritorious Service Award from theIEEE Computer Society. He was a Distinguished Faculty ResearchFellow from 1996 to 1998 and a Distinguished Scholar-Teacher in 2003at the University of Maryland. He is a corecipient (with A. Sundaresan) ofthe 2007 Outstanding Innovator Award from the Office of TechnologyCommercialization and the AJ Clark School of Engineering 2007 FacultyOutstanding Research Award. He has served as the general andtechnical program chair for several IEEE international and nationalconferences and workshops.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.


IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...aravind/documents/sundaresan08-LE.pdf · 2 ieee transactions on pattern analysis and machine intelligence, vol. 30, no. 10, october

Documents