Automatic labelling of anatomical landmarks on 3D body scans

Automatic labelling of anatomical landmarks on 3D

body scans

Christian Lovatoa, Umberto Castellania, Carlo Zancanarob, AndreaGiachettia

aDipartimento di Informatica, Universita di Verona,Strada Le Grazie 15, 17134 Verona, Italy

email: [email protected] Phone: +39 045 8027998, Fax: +39 045 8027068bDipartimento di Scienze Neurologiche e del Movimento, Universita di Verona, Italy

Abstract

In this paper we describe and test a pipeline for the extraction and semanticlabelling of geometrically salient points on acquired human body models.Points of interest are extracted on the preprocessed scanned geometries asmaxima of the autodiffusion function at different scales and annotated byan expert, where possible, with a corresponding semantic label related to aspecific anatomical location.

On the extracted points we computed several descriptors (e.g. Heat Ker-nel Signature, Wave Kernel Signature, Derivatives of Heat Kernel Signature)and used labels and descriptors to train supervised classifiers, in order tounderstand if it is possible to recognize the points on new models.

Experimental results show that this approach can be used to detect andrecognize robustly at least a selection of landmarks on subjects with differ-ent body types and independently on pose and could therefore applied forautomatic anthropometric analysis.

Keywords: Salient points, Heat diffusion, 3D body scanning,Anthropometry, Dissimilarity-based classification

1. Introduction

The use of 3D digital scanners to capture relevant information abouthuman body shape is rapidly growing and not only limited to ergonomicstudies or clothing design, but also for healthcare-related applications [1].

Preprint submitted to Graphical Models August 24, 2014

https://www.researchgate.net/publication/2961884_3D_Body_Scanning_and_Healthcare_Applications?el=1_x_8&enrichId=rgreq-8b16ad41-e500-4bc1-b497-8f20fcf37752&enrichSource=Y292ZXJQYWdlOzI2NDkyMzk5MDtBUzoxMzQwMTQwODE5NjYwODBAMTQwODk2Mjg1NTQ4Nw==

To obtain quantitative parameters from the scans, it is, however, neces-sary to identify landmarks, perform specific measurements, extract geomet-rical descriptors of the shape that can be compared across different subjectsand correlated, for example, with other diagnostic data. Most of the softwaretools used to process acquired models requires the use of specific acquisitionprotocols, with constraints on subject pose. This may create problems incomparing data acquired in different places and performing large-scale mul-ticentric studies as well as in applying advanced shape analysis tools on thecaptured models.

If we want to find meaningful anatomical landmarks, there is anotherrelevant issue related to the fact that many points use in anatomical or an-thropometric studies do not have a well defined geometrical characterization,being usually located through manual palpation. The local geometry is notonly poorly characterized, but may be also relevantly different in subjectswith different body built. For this reason, even using state of the art pointmatching methods, it is not possible to localize accurately those points in dig-ital models, as revealed by a recent contest [2]. The error in locating standardanthropometric points (defined in the International Society for the Advance-ment of Kinanthropometry manual, www.isakonline.com), with registrationbased or spectral based methods was not negligible as these technique per-form optimally in case of same shapes isometrically deformed and not in caseof different body types.

On the other hand, it is not necessary, to perform anthropometric studies,to rely on traditional landmarks, as shown, for example, in [3] where robustmeasurements are derived from curve-skeleton analysis and body segmenta-tion and used to predict composition parameters. Similarly, we could derivenew robust measurements exploiting the points on the body surface that canbe easily characterized with their geometric properties, e.g. geometricallysalient points. These measurements, clearly different from standard anthro-pometrical ones, but correlated with them, could be performed automaticallyand independently on acquisition resolution, quality, pose, etc.

In this paper we present a study aimed at understanding if, among thesurface points that can be characterized for geometric saliency on 3D scans,there are meaningful locations that can be automatically recognized on hu-man scans with few quality constraints and large differences related to sex,body type and pose. These points can be used to extract quantitative mea-surements on the human body that can be then tested for repeatability androbustness and correlated with other parameters measured on the same sub-

2

https://www.researchgate.net/publication/261293067_Robust_Automatic_Measurement_of_3D_Scanned_Models_for_the_Human_Body_Fat_Estimation?el=1_x_8&enrichId=rgreq-8b16ad41-e500-4bc1-b497-8f20fcf37752&enrichSource=Y292ZXJQYWdlOzI2NDkyMzk5MDtBUzoxMzQwMTQwODE5NjYwODBAMTQwODk2Mjg1NTQ4Nw==

jects.To accomplish the task we extracted salient points at different scales

exploiting the autodiffusion function (ADF), asked an expert to give themsemantic labels, if possible, and characterized the points with several descrip-tors (pose invariant) encoding local and contextual information. Supervisedclassification tests have been then performed in order to understand whichpoints can be successfully recognized, how the recognition depends on poseand body type and which are the automatic labelling algorithms providingthe best accuracy.

The paper is organized as follows: Section 2 presents an overview ofrelated work, Section 3 the complete processing and analysis pipeline, Section4 the experimental tests performed and Section 5 a final discussion of theresults.

2. Related work

The automatic identification on digital models of landmarks used in man-ual anthropometric protocols is not easy, because they are not necessarilygeometrically salient. A possible solution consists of establishing a com-plete point to point correspondence between the acquired mesh and one ormore labelled templates and transfer the template labels to the new model.Generic deformable human models or SCAPE models encoding pose andprincipal body type variations [4, 5] can be deformed in order to fit thedata. A possible simplified solution for their extraction is to solve a globalgraph optimization problem. In [6, 7] manual landmarks placed on modelsfrom the Civilian American and European Surface Anthropometry ResourceProject (CAESAR) are joined in a graph structure and statistical learning isused to predict corresponding points’ position in new models, assuming sim-ilar pose. Localization errors are, however, rather high especially for poorlycharacterized points. In [8] a similar approach is applied in canonical formspace to obtain posture invariance. All these kind of methods have beentried by the participants to the SHREC’14 contest on automatic location oflandmarks used in manual anthropometry [2]. The results of the contest,however, showed that the accuracy of the best techniques on subjects withdifferent body types is not very good for anthropometric applications, evenworking on good-quality meshes and similar poses.

To simplify the task, a possible solution is to search for different pointsand measures as usually done in commercial measurements systems used,

3

https://www.researchgate.net/publication/234777770_SCAPE_Shape_completion_and_animation_of_people?el=1_x_8&enrichId=rgreq-8b16ad41-e500-4bc1-b497-8f20fcf37752&enrichSource=Y292ZXJQYWdlOzI2NDkyMzk5MDtBUzoxMzQwMTQwODE5NjYwODBAMTQwODk2Mjg1NTQ4Nw==

https://www.researchgate.net/publication/220067583_Landmark-free_posture_invariant_human_shape_correspondence?el=1_x_8&enrichId=rgreq-8b16ad41-e500-4bc1-b497-8f20fcf37752&enrichSource=Y292ZXJQYWdlOzI2NDkyMzk5MDtBUzoxMzQwMTQwODE5NjYwODBAMTQwODk2Mjg1NTQ4Nw==

https://www.researchgate.net/publication/47863819_A_Statistical_Model_of_Human_Pose_and_Body_Shape?el=1_x_8&enrichId=rgreq-8b16ad41-e500-4bc1-b497-8f20fcf37752&enrichSource=Y292ZXJQYWdlOzI2NDkyMzk5MDtBUzoxMzQwMTQwODE5NjYwODBAMTQwODk2Mjg1NTQ4Nw==

https://www.researchgate.net/publication/225870208_Human_shape_correspondence_with_automatically_predicted_landmarks?el=1_x_8&enrichId=rgreq-8b16ad41-e500-4bc1-b497-8f20fcf37752&enrichSource=Y292ZXJQYWdlOzI2NDkyMzk5MDtBUzoxMzQwMTQwODE5NjYwODBAMTQwODk2Mjg1NTQ4Nw==

https://www.researchgate.net/publication/221625664_Automatic_Locating_of_Anthropometric_Landmarks_on_3D_Human_Models?el=1_x_8&enrichId=rgreq-8b16ad41-e500-4bc1-b497-8f20fcf37752&enrichSource=Y292ZXJQYWdlOzI2NDkyMzk5MDtBUzoxMzQwMTQwODE5NjYwODBAMTQwODk2Mjg1NTQ4Nw==

for example, in the fashion industry. Points and measurements locaiton aremainly derived from an accurate human body partitioning exploiting specificprotocols, poses, machines, etc. and are therefore not good for examiningheterogeneous data or stored archives, while more general methods testedon human meshes [9, 10, 11] are not used to derive reliable point basedmeasurements.

Point location methods exploiting pose contraints have been proposed inthe literature. Combinations of local descriptors and posture related priorsare used, for example, in Leong et al [12] where depth maps and cross sectionsperpendicular to the body axis are used to find the landmarks, or in [13]where Spin Images and posture-related constraints are used to match points.

3. The proposed approach

In this paper we describe a novel approach to deal with the problem ofrecognizing human body landmarks on different subjects. The idea is to ex-tract geometrically salient points in a multiscale framework, asking an expertif these points can be actually described with a semantic label, and then try-ing to see if the same label could be automatically assigned by a supervisedclassification method using different point descriptors. If this is possible, therepeatability of landmarks related measurements or feature characterizationon landmarks could be tested to understand if these landmarks can be usefulfor human body analysis.

In the following subsection we describe in detail the procedures used forpoint extraction, description, labelling and classification.

3.1. Points detection

Our detection of salient points is based on the heat diffusion theory. Thischoice is motivated by the robustness against isometric deformations and thehigh repeatability demonstrated in the SHREC 2010 benchmark [14] by theheat-kernel based detectors.

In order to detect a feature point we therefore compute the so calledautodiffusion function as [15]:

ADF (x, t) = kt(x, x). (1)

where kt is the heat kernel, that can be expressed using the Laplace-Beltrami decomposition coefficients φ(x):

4

https://www.researchgate.net/publication/222427034_From_geometric_to_semantic_human_body_models?el=1_x_8&enrichId=rgreq-8b16ad41-e500-4bc1-b497-8f20fcf37752&enrichSource=Y292ZXJQYWdlOzI2NDkyMzk5MDtBUzoxMzQwMTQwODE5NjYwODBAMTQwODk2Mjg1NTQ4Nw==

https://www.researchgate.net/publication/4040343_A_Discrete_Reeb_Graph_approach_for_the_segmentation_of_human_body_scans?el=1_x_8&enrichId=rgreq-8b16ad41-e500-4bc1-b497-8f20fcf37752&enrichSource=Y292ZXJQYWdlOzI2NDkyMzk5MDtBUzoxMzQwMTQwODE5NjYwODBAMTQwODk2Mjg1NTQ4Nw==

https://www.researchgate.net/publication/258196563_Automatic_detecting_anthropometric_landmarks_based_on_spin_image?el=1_x_8&enrichId=rgreq-8b16ad41-e500-4bc1-b497-8f20fcf37752&enrichSource=Y292ZXJQYWdlOzI2NDkyMzk5MDtBUzoxMzQwMTQwODE5NjYwODBAMTQwODk2Mjg1NTQ4Nw==

kt(x, x) =∞∑k=1

e−Ektφ2(x) (2)

The ADF describes the heat diffusion from the point x to itself. As high-lighted in [16] the local maxima of the ADF are feature points. In practicewe detect a feature point x if ADFt(x) > ADFt(xi) for all xi in the ringneighborhood of x.

Being the ADF invariant under isometric deformations the extractedpoints should be invariant if the body pose is changed. Furthermore, theprocedure is stable under perturbation of the shape, that happen typicallyin different subjects, and can be computed easily and efficiently.

It is worth noting that at higher scales (i.e., large values of t) the detectionselects only points with very strong protrusions. In order to increase thenumber of detected points we need to decrease t capturing more local surfacefeatures.

In this work we have to compare salient points extracted at differentscales and that should be approximately corresponding on different subjects(whose body size may be different). To have salient points comparable acrosssubjects we normalize the scale, introducing a new variable s representing thescale space in an uniform reference frame for all the models. This variable isderived by the classical ADF time sampling used in the Heat Kernel Signaturematching [16]. In fact, we computed the ADF in 101 sampled time values,specific for each model and equally spaced in logarithmic scale in the interval[tmintmax] where tmin = 4log(10/λ200) and tmax = 4log(10/λ2) being λi theeigenvalues of the ADF. The uniform scale variable s is simply defined bymapping the model dependent time samples ti into the sample index valuess = i.

3.2. Feature-point description

In order to describe feature points, we tested different state-of the artdescriptors still based on the spectrum of the Laplace-Beltrami operator onthe surface. This choice is motivated by the fact that they are able to encodelocal and global information and are invariant under non rigid motion [17].

The first descriptor tested is the Heat Kernel Signature (HKS), introducedin [16] and applied in [18] for object retrieval. Given a surface point x, theHKS is an n dimensional descriptor vector, defined as:

HKS(x) = [ADF (x, t1), · · · , ADF (x, tn)] (3)

5

https://www.researchgate.net/publication/221470100_A_Performance_Evaluation_of_3D_Keypoint_Detectors?el=1_x_8&enrichId=rgreq-8b16ad41-e500-4bc1-b497-8f20fcf37752&enrichSource=Y292ZXJQYWdlOzI2NDkyMzk5MDtBUzoxMzQwMTQwODE5NjYwODBAMTQwODk2Mjg1NTQ4Nw==

https://www.researchgate.net/publication/220506247_A_Concise_and_Provably_Informative_Multi-Scale_Signature_Based_on_Heat_Diffusion?el=1_x_8&enrichId=rgreq-8b16ad41-e500-4bc1-b497-8f20fcf37752&enrichSource=Y292ZXJQYWdlOzI2NDkyMzk5MDtBUzoxMzQwMTQwODE5NjYwODBAMTQwODk2Mjg1NTQ4Nw==



https://www.researchgate.net/publication/49661952_Shape_Recognition_with_Spectral_Distances?el=1_x_8&enrichId=rgreq-8b16ad41-e500-4bc1-b497-8f20fcf37752&enrichSource=Y292ZXJQYWdlOzI2NDkyMzk5MDtBUzoxMzQwMTQwODE5NjYwODBAMTQwODk2Mjg1NTQ4Nw==

The sampled ti of our analysis are the same applied in the previous sub-section for points detection and are similarly model dependent, and ideallyable to capture at the same index value i, the same kind of anatomical fea-tures in human bodies that differs in body sizes. The mapping of the timevariable t in the normalized scale variable s should ideally make ADF (x, s)values similar in similar body position across different subjects and the HKS,that in scaled variable can be defined as

HKS(x) = [ADF (x, 1), · · · , ADF (x, n)] (4)

a good descriptor for salient points’ recognition.Obviously this is not exactly true due to variability in subjects’ body

types. A possible way to overcome the matching inaccuracy due to theresulting scaling problems, may be the use of non-Euclidean distances in HKSvectors comparison. The problems is similar to that of matching images thatdiffer for different brightness scaling, and this means that we can refer to thehuge literature related to histogram comparison to improve point matchingusing HKS features.

From the HKS we derived other two descriptors based on the observationthat also the increments in switching from one scale to the next one havehigh discriminative properties. Therefore, as further descriptor we introducethe first order derivative of the HKS with respect to the scale variable:

DHKS(x) = [DADF (x, 1), · · · , DADF (x, n− 1)]. (5)

where DADF (x, i) = [ADF (x, i+ 1) − ADF (x, i)].and the second order derivative of the HKS

D2HKS(x) = [D2ADF (x, 1), · · · , D2ADF (x, n− 2)]. (6)

where D2ADF (x, i) = [DADF (x, i+ 1) −DADF (x, i)].Another popular descriptor based on the Laplace-Beltrami spectral de-

composition is the Wave Kernel Signature (WKS) proposed by Aubry et al[19]. The WKS at a point ~x is related to the average probabilities of mea-suring quantum mechanical particles with a defined set of energy levels inthe location ~x and can be defined similarly to the HKS as a function ofLaplace-Beltrami decomposition coefficients. The Schroedinger equation inthe spectral domain can be written as

φ(x, t) =∞∑k=1

eiEktf(Ek)φk (7)

6

https://www.researchgate.net/publication/221429878_The_wave_kernel_signature_A_quantum_mechanical_approach_to_shape_analysis?el=1_x_8&enrichId=rgreq-8b16ad41-e500-4bc1-b497-8f20fcf37752&enrichSource=Y292ZXJQYWdlOzI2NDkyMzk5MDtBUzoxMzQwMTQwODE5NjYwODBAMTQwODk2Mjg1NTQ4Nw==

and the average probability of measuring a particle with energy distributionfei is

p(x, ei) =∞∑k=1

f 2ei

(Ek)φ2k(x) (8)

Given a set of energy distributions we can define the Wave Kernel Signa-ture as:

WKS(x) = [p(x, e1) · · · p(x, eN)] (9)

In the original formulation, authors considered a family of log-normalenergy distributions, motivated by a perturbation analysis of the Laplacianspectrum. In our test we used the same choice and sampled 100 energydistributions to obtain the WKS descriptor.

3.3. Experts’anatomical labelling

We asked an experienced anatomist to tell if the extracted salient pointscan have a real anatomical meaning giving them, in that case, a specificlabel. Due to the impossibility to check the salient points at each value of s,we annotated the answer of the expert and the corresponding labels at threespecific scales, corresponding to s = 1, s = 25 and s = 100. The choice ofthe intermediate scale is motivated by the fact that at that value of s we finda number of salient points that is approximately half of that found at s = 1.

The expert labelled 17 classes of points that correspond to specific bodyfeature localization (Toes, fingertips, nose, chin, shoulder, scapular, bicep,olecranical, lateral elbow, breast, crotch, trochanteric, gluteal, patellar, knee,medial knee, heel). With this choice, several points extracted at the finestscale are left unlabeled due to the fact that they are not clearly localizedin a well defined anatomical landmark. We decided with the anatomist togive them different labels in our classification test, related to the body areawhere they are located. These labels were head, cranial, ear, antibrachial,brachial, chest, vertebral, abdominal, femural, crural, lower leg. Only a fewpoints were, in this way, unlabelled (in our classification framework labelledas ”other”). The reason for this choice is simply to give the possibility ofexpoloiting the points, that are not usable for comparisons or measurements,as a contextual information for further analysis (e.g. processing at finerscales).

An example of points extracted at s = 1 and labelled by the expert isshown in Figure 1. The list of the labels used is represented in Figure 2.

7

Figure 1: Examples of well defined and labelled locations extracted on a test model ats = 1.

3.4. Supervised classification

Our final goal is to see if detected points can be correctly labelled withan example-based approach.

We first considered a Nearest Neighbor classification testing different met-rics to compute distances in the feature spaces, using methods classically usedin histogram comparison. We also tested the use of different classificationmethods, both linear and nonlinear, using the implementations provided byPRTools (www.37steps.org). In particular we were interested in testing alsothe classification in dissimilarity spaces [20], that demonstrated very goodperformances in practical applications. According to this paradigm, we canrepresent data instances not using directly the feature representation, but us-ing the distances from a set of examples (representation set) as a new featurespace where the classes are usually more easily separable.

4. Experimental results

4.1. Dataset

We tested the approach on 80 surface meshes captured by a whole bodyscanner device (Breuckmann BodyScan) used in ordinary anthropometric

8

Figure 2: The set of semantic label given by the experienced anatomist includes specificpoints and regions. Some of the points have a well defined expected cardinality (e.g. 1nose, 2 shoulders, etc.)

procedures. In order to have a challenging testbed, and to test effects of bodytypes and and pose on detection and recognition of salient points we includedin the dataset 20 models acquired on normal men, 20 models acquired onnormal women, 20 models acquired on overweight women and 20 on malesin largely varied poses.

Subjects did not wear caps causing the appearance of several quite dif-ferent curvature maxima in the head. Acquired models were preprocessedin order to produce a watertight mesh. The pre-processing procedure, im-plemented with Meshlab scripting [21] consisted of floor and noise removal,Poisson remeshing and Quadric Mesh decimation with to obtain target 30Kfaces meshes.

Figure 3 shows examples of the different classes with superimposed salientpoints extracted at s = 1.

4.2. Salient points extraction

Figure 4 shows the salient points extracted on a test model at s = 100, s =25, s = 1. These are the scales where we asked the expert anatomist to assign

9

Figure 3: Example of pre-processed scans of the four different classes: females (first col-umn), obese females(second column), males(third colums), males in different poses (fourthcolumn). Note that subjects usually do not wear caps and that many spurious points alsodue to big holes in acquisition appear in the head. This make the point recognition quitehard. Salient points extracted at s = 1 (e.g. t = tmin = 4log(10/λ200)) are superimposedto the mesh surfaces.

semantic labels to the points.It is possible to see that the number of landmarks is quite small at the

first two scales, while is sufficiently high at the finest one. At the finest scalemost of the extracted points seem to correspond well to major geometricalprotrusions of the body and some points without a strong geometrical char-acterization are appearing. It must be considered that at low values of t(and s) the effect of acquisition noise can become relevant and influence thepoint extraction. If we plot the total number of salient points detected inthe tested scale range, we see that it starts growing quite rapidly when s isless than 20 (see Figure 5).

Table 1 shows the number of points with different labels assigned by the

10

expert at the three selected scales. The last column indicates the expectednumber of points that should ideally appear. It is possible to see that at thecoarsest scale of our analysis the detected points are quite few and recognizedas belonging to just three classes (fingers, head, toes). Adding more detaildifferent recognized salient point classes appear, even if at the intermediatescale only in a subset of the models analyzed.

At the finest scale of our analysis (s = 1), however, it is possible to seethat there is a number of points that is found in all or almost all the models,and with the correct cardinality (heel, shoulders, scapular, nose, chin, breasts,gluteal). Toes and fingers are not detected in the right number due to theinput mesh quality and resolution (toes and fingers are often actually notvisible in the input meshes). However, at least one salient point per limb ofeach class is always detected.

Other point classes are not consistently found in all the models at s = 1.This is partially due to the still too coarse resolution and mesh simplification,partially to the actually different shape of the models. Using more detailedinput meshes and a different tmin in the autodiffusion function samplingwould allow probably to find other point classes well represented, at the risk,however, of finding more noisy and spurious points quite different acrossdifferent subjects.

We considered therefore the scale s = 1 a sufficiently good tradeoff inorder to test a possible automatic detection and recognition of selected salientpoints.

4.3. Salient points labelling

We validated the multi-class labelling procedure to classify all the salientpoints detected in models at s = 1. For this task, we designed a leave-subject-out procedure, labelling the salient points of each subject with multi-labelclassifiers trained on the descriptors computed on all the other subjects,averaging the results obtained.

In the first test we used simply a nearest neighbor classification, using thedifferent spectral descriptors described in Section 3 and testing different met-rics to evaluate distances in feature spaces. It is possible to see that the WaveKernel Signature largely outperforms the other descriptors (see Table 2 andthat the use of different metrics to compare signatures can slightly improvethe classification accuracy with respect to the usual Euclidean distance.

Table 3 shows the results of the test performed only with the WKS de-scriptor, using different classifiers. It is possible to see that is difficult to

11

point s=1 s=25 s=100 expected n.

nose 80 2 0 80chin 80 1 0 80

scapular 125 9 0 160shoulder 160 130 0 160breast 158 114 0 160crotch 72 15 0 80gluteal 150 118 0 160

trochanterion 56 20 0 160knee 18 0 0 160

patellar 113 15 0 160medial knee 101 0 0 160

heel 160 25 0 160toes 177 174 175 800bicep 35 0 0 160elbow 29 1 0 160

lateral elbow 48 1 0 160olecranical 60 3 0 160fingertips 620 583 586 800

Table 1: The number of salient points extracted and assigned to the different classes atthe three sample scales considered. Only at the finest scale we have some point classesrepresented with the expected cardinality (indicated, when possible, in the fourth column).

12

Figure 4: Salient points extracted on a sample model at s = 100 (left), s = 25 (center),s = 1 (right).

outperform the Nearest Neighbor labelling for this specific task: the use ofother popular classification techniques, in fact, does not improve the classi-fication accuracy.

We were, however, able to obtain a non negligible improvement in thelabelling performances by applying a dissimilarity based approach.

The only technique tested that outperform Nearest Neighbor approachesis a particular dissimilarity based method that integrates a class specific dis-tance for representation objects. The simple use of the dissimilarity basedparadigm, does not reduce the amount of classification errors, that are quitesimilar (slightly higher) than those obtained with the Nearest Neighbor clas-sifiers using the same metric to compute descriptor difference. Note that theresults shown in Table 3 for dissimilarity based methods are obtained usingall the training set as representation. However, using the dissimilarity-based

13

10 20 30 40 50 60 70 80 90 10010

20

30

40

50

Meann.ofpoints

persubject

s

Figure 5: The average number of points detected as heat kernel maxima on the testsubjects is approximately constant at coarser scales and is quite rapidly increasing at finerscales (lower s).

HKS DHKS D2HKS WKS

Distance error error error error

Euclidean 0.2907 0.2244 0.1994 0.1404Chi square 0.2868 0.4245 0.2663 0.1318

Angular dist. 0.2533 0.2150 0.2008 0.1460Jeffrey div. 0.2871 0.4295 0.4878 0.1302Cityblock 0.2884 0.2231 0.1951 0.1347

Table 2: Leave-one-subject-out crossvalidation errors for the labeling of salient pointsat the finest scale (s=1), obtained with Nearest Neighbor classifier, HKS and DHKSdescriptors and different distance metrics to compare them.

approach, it is possible to optimize the multi-class labelling in order to con-sider classes’ peculiarities. In detail, we can compute distances of classesfrom the representation objects using specific metrics for each class. A pos-sible solution is, for example, to compute from the training set (that in ourtests is also the representation set), the eigenspace of each class and com-pute distances from representation objects using the Cityblock distance inthis space, possibly considering only the most relevant eigenvectors.

Table 3 shows that this approach is effective in reducing the overall clas-sification error for the complete multi-class labelling. We used here the first50 eigenvectors for each representation object class.

An obvious drawback of the procedure is the computational complexity

14

of the procedure, considering also that to obtain the best results we used arepresentation space with a very high dimensionality (salient points extractedby all the models with the exception of the tested one).

Classifier error

NN (Jeffrey Divergence) 0.1302Parzen density 0.2096

Quadratic Bayes normal 0.1422Support Vector (linear) 0.1374

Support Vector (rbf) 0.1773Dissimilarity based (Euclidean) 0.1368

Dissimilarity based (Jeffrey Divergence) 0.1314Dissimilarity based (Rotation+Cityblock) 0.1140

Table 3: Multi class labelling error obtained with WKS descriptors and different supervisedclassification algorithms. Only a dissimilarity-based approach with distances adapted tothe representation objects classes is able to improve the simple Nearest Neighbor classifier.

An interesting question that arises is how these results depend on sex,body type and poses. With our heterogeneous test set we can check whichis the classification error related to the different subsets of models. Table4 shows the number of salient points extracted at s = 1 and the averageclassification error for the four model classes. The results show that posechanges does not affect relevantly the classification accuracy (as expected,using spectral descriptors), even if may decrease the number of extractedpoints. The model subset where results seem less accurate is normal weightwomen.

This is, however, mainly due to the difficulty in classifying spurious salientpoints created by hair, that have a bad local characterization due to variableshape and acquisition noise (see Figure 7)

4.4. Specific points recognition

The confusion matrix of the best multi-class labelling can be seen inFigure 6. Most of the errors are related to the regional labels that are notwell characterized. It is therefore interesting to check if the specific pointsthat are robustly extracted at s = 1 (shoulders, chin, nose, scapular, breasts,gluteal, patellar, fingers, toes) can be effectively recognized.

15

points extracted avg classif. error

Normal weight women 56.65 0.1624Overweight women 56.2 0.1041

Males in standard pose 57.20 0.0909Males in different poses 51.85 0.0974

Table 4: Differences in average number of salient points extracted at corresponding scalesand average classification error between the different model subsets. It is possible to seethat the different poses reduce the number of extracted points but not the accuracy.

Table 5 shows the precision and recall values for each single class of thissubset, where we give two values of recall (true positive rate): the first (re-call 1) is the number of correctly labelled points divided by the number ofthe number of detected points with that label assigned by the expert, thesecond (recall 2) is the number of correctly labelled points for each classdivided by the total number of points with the same expected label. Thislast value is therefore the probability to obtain the correct detection of allthe points with the selected label considering also possible detector failures.Precision (Positive Predictive value) is the number of correct classificationresults divided by the sum of true and false positives for the class considered.It is possible to see that, for some selected point types (nose, chin, shoulder,heel, breast) the true positive rate is one or close to one even consideringpossible missed detections. These results can be considered satisfactory, con-sidering also that the classification accuracy could be improved, for example,by optimizing descriptors and metric for specific point classes, using hierar-chical classification or adding more spatial constraints (e.g. forcing spatialcoherence of labels with Markov Random fields, or using other graph basedapproaches). We plan to perform more experiments on class adaptive metriclearning but this would require the availability of a larger annotated trainingset. We also expect that, even keeping this simple approach, the availabilityof a larger set of examples will be effective in eliminating false positives andfalse negatives.

In any case, it is possible to say that the proposed approach is feasibleand automatic recognition of semantically corresponding points and regionsin largely variable human models can be performed with a good precision.

16

Figure 6: Confusion matrix of the global 30-class labelling of all the salient points of the80 models tested. Most errors (false positive and false negative, e.g. bars outside theprincipal diagonal) are related to the ”other” label and to confusion of neighboring labels.Values are mapped to intensity with a logaritmic scale to make errors visible.

4.5. Salient points derived measurements

In order to check the possibility of performing useful measurements fromthe automatically detected and labelled salient points we performed a test ona set of 12 subjects for which a scanning procedure was performed twice andthat underwent also manual anthropometric measurements. On both theacquired models of each subject, we estimated three anthropometric mea-surements based on the extracted salient points and evaluated their differ-ences. These difference have been compared with the lowest average errorin locating the closest anthropometric landmarks obtained in the Shrec 14contest [2]. We also computed the correlation coefficients between the newmeasurement and this most related anthropometric measurements that arelisted in Table 6. These preliminary results, reported in Table 6, seem toconfirm that the use of geometrically salient points may be used to derivemeasurements with similar meaning with respect to standard anthropomericones, but probably more robust than those obtained from automatic location

17

Figure 7: Two example of large holes and artefacts (with corresponding anomalous salientpoints) related to hair in female models.

of standard landmarks.

5. Discussion

We presented a framework for the automatic recognition of points (andregions) in human body scans. The method is based on the extraction at spe-cific scales of geometrically salient points that are then labelled using spectraldescriptors and supervised labelling. The use of the Wave Kernel Signatureas point descriptor demonstrated a very good discriminative power, outper-forming Heat Kernel Signature and derived descriptors. Nearest neighborclassifier (best with Jeffrey Divergence) provided good classification results.A dissimilarity based classification approach using different metrics accordingto the representation object classes was, however, the most effective classifi-cation method tested, even if computationally expensive.

The results obtained demonstrate the feasibility of automatic recognitionof anatomical points or regions based on simple spectral based descriptorson human models with large variations in pose and body type. The goodclassification results obtained seemed to us a good result considering thepoor quality of meshes, that presented relevant artefacts due to holes, hairand motion. A subset of anatomical points was robustly both detected andclassified on these noisy meshes. The accuracy of the classification could beimproved further by using machine learning techniques to define new spectralbased descriptors optimized for the specific class of models, using approachesinspired, for example by the work of Litman et al. [22]. A problem of the

18

point Recall 1 Recall 2 Precision

nose 0.96 0.96 0.87chin 1.00 1.00 0.97

scapular 0.82 0.64 0.77shoulder 0.97 0.97 0.96breast 0.97 0.92 0.93gluteal 0.85 0.79 0.83patellar 0.94 0.66 0.94

heel 1.00 1.00 0.99toes 0.98 0.22 0.98

fingertips 0.99 0.77 0.99

Table 5: Recall (True Positive Rate) and precision (Positive Predictive Value) obtained inthe classification of selected points. Recall 2 considers also the missing detections (is theratio between the number of correctly classified points and expected number of detectedpoint with the same label.

experimental procedure applied is that, due to the use of a limited meshaccuracy and a non negligible Heat Kernel smoothing, we had a relevantnumber of missed points for another subset of points labels. Using moredetailed input meshes and a different time sampling for the autodiffusionfunction, the detection rates could be easily improved, at the cost of beingmore sensitive to noise.

The location of the extracted salient points does not necessarily matchthe human placement of corresponding landmarks, even if it is not necessarilyless reliable (there is a large inter-operator error in locating anatomical land-marks [23]) and could be used differently to extract useful information on thesubjects’ body. A first test presented here demonstrated the potential use ofthe automatically labeled salient points to derive quantitative measurementscharacterizing the human body. The semantic correspondence between dif-ferent body types in different poses obtained with the point labelling can be,however, used as a starting point for other applications, e.g. shape registra-tion, dense correspondences computed with methods requiring initialization[24, 25].

It is also worth noting that the appearance of some of the points could bequite different in subjects with different body type, fatness, musculature, etc.In this case the accurate location of the point is clearly impossible, but geo-metric properties of the local neighborhood could give relevant information

19

Measurement avg.diff. std.dev. ALML err. Manual meas. corr.

Shoulder dist. 7.1 mm. 6.6 mm. 12.4 mm. Biacromiale 74%Patellar-heel ∗ 4.3 mm. 6.7 mm. 15.6 mm. Tib.laterale h 73%Max toe-heel 2.8 mm. 2.2 mm. - Foot length 88%

Table 6: Average differences between repeated measurements of distances between recog-nized landmarks on 12 different subjects. Differences are not negligible, but quite smallerthan the error expected in automatic location of closest traditional landmarks used inmanual anthropometry (ALML err.) according to [2]. The most similar standard anthro-pometric measurements and their correlation with the new values are reported in the lasttwo columns. (*) In two cases patellar point was not detected.

on the specific body shape.

5.1. Acknowledgements

Special thanks to Francesco Piscitelli and Chiara Milanese for help withthe data selection and to Bob Duin and Manuele Bicego for useful discussions.

References

[1] P. C. Treleaven, J. Wells, 3d body scanning and healthcare applications.,IEEE Computer (2008) 28–34.

[2] A. Giachetti, E. Mazzi, F. Piscitelli, M. Aono, A. B. Hamza, T. Bo-nis, P. Claes, A. Godil, C. Li, M. Ovsjanikov, et al., Shrec14 track:Automatic location of landmarks used in manual anthropometry, in:Eurographics Workshop on 3D Object Retrieval, 2014, pp. 93–100.

[3] A. Giachetti, C. Lovato, F. Piscitelli, C. Milanese, C. Zancanaro, Ro-bust automatic measurement of 3d scanned models for human body fatestimation., IEEE journal of biomedical and health informatics (2014).

[4] D. Anguelov, P. Srinivasan, D. Koller, S. Thrun, J. Rodgers, J. Davis,SCAPE: shape completion and animation of people, ACM Trans. Graph.24 (2005) 408–416.

[5] N. Hasler, C. Stoll, M. Sunkel, B. Rosenhahn, H. . P. Seidel, A statisticalmodel of human pose and body shape, Computer Graphics Forum 28(2009) 337–346.

20

[6] Z. B. Azouz, C. Shu, A. Mantel, Automatic locating of anthropometriclandmarks on 3d human models, 3D Data Processing Visualization andTransmission, International Symposium on (2006) 750–757.

[7] S. Wuhrer, P. Xi, C. Shu, Human shape correspondence with automat-ically predicted landmarks, Machine Vision and Applications 23 (2012)821–830.

[8] S. Wuhrer, C. Shu, P. Xi, Landmark-free posture invariant human shapecorrespondence, The Visual Computer (2011) 1–10.

[9] M. Mortara, G. Patane, M. Spagnuolo, From geometric to semantichuman body models, Computers & Graphics (2006) 185–196.

[10] C. Lovato, U. Castellani, A. Giachetti, Automatic segmentation ofscanned human body using curve skeleton analysis, in: Computer Vi-sion/Computer Graphics Collaboration Techniques, Springer, 2009, pp.34–45.

[11] Y. Xiao, P. Siebert, N. Werghi, A discrete reeb graph approach for thesegmentation of human body scans, in: 3-D Digital Imaging and Mod-eling, 2003. 3DIM 2003. Proceedings. Fourth International Conferenceon, 2003, pp. 378–385.

[12] I.-F. Leong, J.-J. Fang, M.-J. Tsai, Automatic body feature extractionfrom a marker-less scanned human body, Computer-Aided Design 39(2007) 568–582.

[13] Y. Li, Y. Zhong, Automatic detecting anthropometric landmarks basedon spin image, Textile Research Journal 82 (2012) 622–632.

[14] A. M. Bronstein, M. M. Bronstein, B. Bustos, U. Castellani, M. Cristani,B. Falcidieno, L. J. Guibas, I. Sipiran, I. Kokkinos, V. Murino, M. Ovs-janikov, G. Patane, M. Spagnuolo, J. Sun, Shrec 2010: robust fea-ture detection and description benchmark, in: Proc. EUROGRAPHICSWorkshop on 3D Object Retrieval, 2010.

[15] K. Gbal, J. A. Bærentzen, H. Aanæs, R. Larsen, Shape analysis usingthe auto diffusion function, in: Computer Graphics Forum, volume 28,Wiley Online Library, 2009, pp. 1405–1413.

21

[16] J. Sun, M. Ovsjanikov, L. Guibas, A concise and provably informativemulti-scale signature based on heat diffusion, in: Proceedings of theSymposium on Geometry Processing, 2009, pp. 1383–1392.

[17] S. Salti, F. Tombari, L. D. Stefano, A performance evaluation of 3dkeypoint detectors, in: 3D Imaging, Modeling, Processing, Visualizationand Transmission (3DIMPVT), 2011 International Conference on, IEEE,2011, pp. 236–243.

[18] A. M. Bronstein, M. M. Bronstein, Shape recognition with spectraldistances, IEEE Trans. Pattern Analysis and Machine Intelligence 33(2011) 1065–1071.

[19] M. Aubry, U. Schlickewei, D. Cremers, The wave kernel signature: Aquantum mechanical approach to shape analysis, in: Computer VisionWorkshops (ICCV Workshops), 2011 IEEE International Conference on,IEEE, 2011, pp. 1626–1633.

[20] E. Pekalska, P. Paclik, R. P. W. Duin, A generalized kernel approach todissimilarity-based classification, J. Mach. Learn. Res. 2 (2002) 175–211.

[21] P. Cignoni, M. Callieri, M. Corsini, M. Dellepiane, F. Ganovelli,G. Ranzuglia, Meshlab: an open-source mesh processing tool, in: Eu-rographics Italian Chapter Conference, The Eurographics Association,2008, pp. 129–136.

[22] R. Litman, A. M. Bronstein, Learning spectral descriptors for de-formable shape correspondence, IEEE Trans. Pattern Anal. Mach. Intell.36 (2014) 171–180.

[23] M. Kouchi, M. Mochimaru, Errors in landmarking and the evaluation ofthe accuracy of traditional and 3d anthropometry, Applied ergonomics42 (2011) 518–527.

[24] M. Ovsjanikov, Q. Mrigot, F. Mmoli, L. Guibas, One point isometricmatching with the heat kernel, Computer Graphics Forum 29 (2010)1555–1564.

[25] J. Pokrass, A. M. Bronstein, M. M. Bronstein, P. Sprechmann,G. Sapiro, Sparse modeling of intrinsic correspondences, in: ComputerGraphics Forum, volume 32, Wiley Online Library, 2013, pp. 459–468.

22

Automatic labelling of anatomical landmarks on 3D body scans

Documents