Top Banner
Vis Comput (2013) 29:555–564 DOI 10.1007/s00371-013-0819-z ORIGINAL ARTICLE Content-based 3D model retrieval using a single depth image from a low-cost 3D camera Min Soo Bae · In Kyu Park Published online: 23 April 2013 © Springer-Verlag Berlin Heidelberg 2013 Abstract In this paper, we propose an efficient method for 3D model retrieval using a single depth image. Unlike ex- isting algorithms that use a complete 3D model or a user sketch as input queries, a single depth image is used as an input query, which can be captured easily with an off-the- shelf lowcost 3D camera, such as a Kinect camera. 3D mod- els in the database are represented by multiple depth images acquired from adaptively sampled viewpoints. The proposed algorithm can retrieve relevant 3D models while considering local 3D geometric characteristics using a rotation-invariant feature descriptor. The proposed method consists of three steps: preprocessing, multiple depth image based represen- tation (M-DIBR), and description of 3D models, and sim- ilarity measurement and comparison. Experimental results demonstrate that the proposed algorithm is convenient to use and its performance is comparable to recent algorithms in terms of retrieval accuracy and speed. Keywords Content-based 3D model retrieval · 3D camera · Depth image based representation · Adaptive view-sampling · Rotation-invariant descriptor 1 Introduction During the last decade, we have witnessed an increasing re- quirement for 3D models in many multimedia and enter- tainment areas. Recent applications in 3D animation and M.S. Bae · I.K. Park ( ) School of Information and Communication Engineering, Inha University, Incheon 402-751, Korea e-mail: [email protected] M.S. Bae e-mail: [email protected] movies, computer games, and web services popularly use 3D models, which can be created by modeling tools or cap- tured directly from the real world. However, considering the millions of available 3D models in the public domain, re- trieving the relevant 3D models and reusing them is an ef- fective approach. Using keyword-based search machines has been the con- ventional strategy for retrieving 3D models. Google 3D Warehouse [5] is a well-known tool in this category. How- ever, keyword labeling is a tedious task and 3D models are often labeled with irrelevant or wrong labels, which precludes a correct search. Consequently, content-based re- trieval of 3D models has attracted broad interest and a va- riety of algorithms have been developed [23]. It should be noted that a convenient user interface is another important factor in a content-based retrieval system, in addition to high retrieval accuracy. Recently, inexpensive 3D cameras such as the Microsoft Kinect [12] have become popular in 3D scene modeling and human-computer interfaces (HCI) that use perceived depth information. Such devices make it is easy and affordable to capture a depth image with its associated color image. This opens a new possibility for using a consumer-level 3D cam- era as an input device for a content-based 3D model retrieval system. By using a single depth image instead of a com- plete 3D model or 2D partial views as input queries, both user convenience and robust retrieval performance are main- tained, because a depth image of a target object is now easy to capture and still carries useful 3D geometric information. To the best of our knowledge, no previous study has ade- quately addressed this scenario. In this paper, an efficient method for 3D model retrieval that uses a single depth image captured by an inexpensive 3D camera is proposed. An advantage of the proposed al- gorithm is that the acquisition of input queries is simple
10

Content-based 3D model retrieval using a single depth ...image.inha.ac.kr/paper/tvcj2013bae.pdf · 4.2 Projection compensation In the similarity measurement between a pair of depth

Apr 26, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Content-based 3D model retrieval using a single depth ...image.inha.ac.kr/paper/tvcj2013bae.pdf · 4.2 Projection compensation In the similarity measurement between a pair of depth

Vis Comput (2013) 29:555–564DOI 10.1007/s00371-013-0819-z

O R I G I NA L A RT I C L E

Content-based 3D model retrieval using a single depth imagefrom a low-cost 3D camera

Min Soo Bae · In Kyu Park

Published online: 23 April 2013© Springer-Verlag Berlin Heidelberg 2013

Abstract In this paper, we propose an efficient method for3D model retrieval using a single depth image. Unlike ex-isting algorithms that use a complete 3D model or a usersketch as input queries, a single depth image is used as aninput query, which can be captured easily with an off-the-shelf lowcost 3D camera, such as a Kinect camera. 3D mod-els in the database are represented by multiple depth imagesacquired from adaptively sampled viewpoints. The proposedalgorithm can retrieve relevant 3D models while consideringlocal 3D geometric characteristics using a rotation-invariantfeature descriptor. The proposed method consists of threesteps: preprocessing, multiple depth image based represen-tation (M-DIBR), and description of 3D models, and sim-ilarity measurement and comparison. Experimental resultsdemonstrate that the proposed algorithm is convenient to useand its performance is comparable to recent algorithms interms of retrieval accuracy and speed.

Keywords Content-based 3D model retrieval · 3Dcamera · Depth image based representation · Adaptiveview-sampling · Rotation-invariant descriptor

1 Introduction

During the last decade, we have witnessed an increasing re-quirement for 3D models in many multimedia and enter-tainment areas. Recent applications in 3D animation and

M.S. Bae · I.K. Park (�)School of Information and Communication Engineering,Inha University, Incheon 402-751, Koreae-mail: [email protected]

M.S. Baee-mail: [email protected]

movies, computer games, and web services popularly use3D models, which can be created by modeling tools or cap-tured directly from the real world. However, considering themillions of available 3D models in the public domain, re-trieving the relevant 3D models and reusing them is an ef-fective approach.

Using keyword-based search machines has been the con-ventional strategy for retrieving 3D models. Google 3DWarehouse [5] is a well-known tool in this category. How-ever, keyword labeling is a tedious task and 3D modelsare often labeled with irrelevant or wrong labels, whichprecludes a correct search. Consequently, content-based re-trieval of 3D models has attracted broad interest and a va-riety of algorithms have been developed [23]. It should benoted that a convenient user interface is another importantfactor in a content-based retrieval system, in addition to highretrieval accuracy.

Recently, inexpensive 3D cameras such as the MicrosoftKinect [12] have become popular in 3D scene modeling andhuman-computer interfaces (HCI) that use perceived depthinformation. Such devices make it is easy and affordable tocapture a depth image with its associated color image. Thisopens a new possibility for using a consumer-level 3D cam-era as an input device for a content-based 3D model retrievalsystem. By using a single depth image instead of a com-plete 3D model or 2D partial views as input queries, bothuser convenience and robust retrieval performance are main-tained, because a depth image of a target object is now easyto capture and still carries useful 3D geometric information.To the best of our knowledge, no previous study has ade-quately addressed this scenario.

In this paper, an efficient method for 3D model retrievalthat uses a single depth image captured by an inexpensive3D camera is proposed. An advantage of the proposed al-gorithm is that the acquisition of input queries is simple

Page 2: Content-based 3D model retrieval using a single depth ...image.inha.ac.kr/paper/tvcj2013bae.pdf · 4.2 Projection compensation In the similarity measurement between a pair of depth

556 M.S. Bae, I.K. Park

Fig. 1 An example of 3D model retrieval using a single depth imagecaptured using a low-cost 3D camera. Top ranked retrieval results whena depth image is used as an input query are shown. The input wascaptured using a Kinect camera

as compared with other algorithms that use complete 3Dmodel input. In the database, 3D models are first repre-sented by multiple depth images such that each depth imageis then described by employing a rotation-invariant descrip-tor. Multiple depth images of a 3D model are captured fromviewpoints that are adaptively sampled by considering themodel’s view-dependent geometric saliency, such as the cur-vature distribution and projected area of the visible surface,and the prior camera pose. In our implementation, the ma-jor computation involved in the matching procedure is per-formed using a graphics processing unit (GPU). The sim-ilarity measurement between the query and models is thusperformed in a massively parallel manner. An example of3D model retrieval using a single depth image captured byan inexpensive 3D camera is shown in Fig. 1.

2 Related work

The objective of content-based 3D model retrieval is tosearch 3D models that are similar to or belong to the sameclass as the input query. The similarity comparison betweenthe query and the model is performed using the 3D model’sinherent shape and color properties without a text keyword.An in-depth survey of existing content-based 3D model re-trieval algorithms can be found in [23].

2.1 Global feature-based retrieval

Global feature-based methods transform a 3D model into aglobal descriptor vector based on geometric and topologicalfeatures. Paquet et al. used the distribution of surface nor-mal vectors with respect to the principal axes of the modelin [17]. Extended Gaussian image (EGI) is a traditionalmethod of mapping surface normal vectors to a sphere [8].

However, the retrieval performance is seriously affected bythe model’s geometric resolution and precision. Histogram-based methods represent a 3D model as a probability distri-bution of certain properties [4, 16] where similarity is mea-sured by considering the distance between different distribu-tions. These methods are known to be robust to local surfacedistortion. However, since different 3D models could havea similar distribution, their resolving power to distinguishbetween similar models is limited.

2.2 Local feature and graph-based retrieval

Graph-based methods are distinguished from other meth-ods in that a 3D model is decomposed into parts and sub-sequently represented by a graph structure. In [7, 24], theskeleton structure of a 3D model was represented by a Reebgraph. The skeletal graph and the phase matching proposedin [22] allow 3D models to be retrieved robustly in the pres-ence of geometric deformation. Graph-based methods areadequate for models of non-rigid objects, especially whenthe 3D models have articulation. However, it is not easy toextend its usefulness to general 3D models with arbitrarytopology.

2.3 2D and 2.5D view-based retrieval

View-based methods represent a 3D model as a set of 2D(color) or 2.5D (depth) images with different viewpoints.The set of images can further be represented by an aspect-relation graph. Heczko et al. computed the Fourier transformof the silhouette images of a 3D model and used Fourier co-efficients as shape descriptors [6]. In [1, 14, 25], uniformlysampled depth images of a 3D model were used to extractuseful descriptors. Similar to these algorithms, in [15], thescale-invariant feature transform (SIFT) algorithm was ap-plied to depth images to extract the local geometric feature.Shih et al. simplified a 3D model and its associated descrip-tors by projecting the 3D model onto a voxel structure [20].Daras and Axenopoulos proposed a universal retrieval sys-tem in which a color image, depth image, and complete 3Dmodel can be used selectively as an input query [2]. Further-more, in [3], the user’s binary sketch of a 3D model is usedas a query to maximize the convenience of the user interface.

View-based methods are robust to the incompleteness of3D models since the projected images are less affected bythe local shape distortion. However, if the number of sam-ple views is not sufficient or if they are improperly sam-pled, the retrieval performance is seriously degraded. Thisproblem is addressed properly in this paper. We propose anadaptive view sampling method for multiple depth image-based representation (M-DIBR) by considering the geomet-ric saliency of 3D models. The method is described in detailin the following sections.

Page 3: Content-based 3D model retrieval using a single depth ...image.inha.ac.kr/paper/tvcj2013bae.pdf · 4.2 Projection compensation In the similarity measurement between a pair of depth

Content-based 3D model retrieval using a single depth image from a low-cost 3D camera 557

3 Overview of the proposed algorithm

The proposed algorithm consists of three steps: preprocess-ing of the query image, M-DIBR of 3D models, and sim-ilarity measurement. In the preprocessing step, the noisein the input depth image is reduced by applying joint bi-lateral filtering [18]. Then the foreground object is seg-mented from the background interactively using the Grab-Cut method [19]. In the M-DIBR step, the relevant depthviews for model retrieval are adaptively generated. Toachieve this, each model’s geometric saliency is investigatedin order to sample the important depth views of the model interms of the curvature distribution of the visible surface, theprojected area of the visible surface, and the prior camerapose. Finally, in the similarity measurement step, rotation-invariant descriptors are generated for the input depth imageand the sampled depth images of the 3D models. The overallflow of the proposed algorithm is shown in Fig. 2, which isregrouped using online and offline processing modules.

4 Preprocessing of input depth image

4.1 Depth image acquisition and noise reduction

In our approach, input queries are captured by an off-the-shelf inexpensive 3D camera, i.e., a Kinect camera. The ac-quired color and depth images are registered precisely toeach other using Kinect SDK. To remove the backgroundand maintain the query object on the foreground, interactive

object segmentation is applied to the color image. The cor-responding segment in the depth image is used as the inputquery.

In order to reduce the noise in the depth image, joint bilat-eral filtering [18] was employed in this study. Since captureddepth images usually have missing pixels, it is not appropri-ate to apply standard bilateral filtering directly to the depthimage. In joint bilateral filtering, the color image is used toobtain the range and intensity weights of bilateral filtering,and actual smoothing is applied to the depth image using theweights.

4.2 Projection compensation

In the similarity measurement between a pair of depth im-ages, both depth images should have the same projectionparameters. Since perspective projection would produce in-consistent images depending on the focal length and the dis-tance from the object, orthographic projection is believed tobe ideal for depth image-based representation of both inputqueries and 3D models.

Obtaining orthographically projected depth images of 3Dmodels is a trivial task, because the entire procedure takesplace in a graphics environment. The depth images cap-tured by a Kinect camera need to be back-projected in 3Dspace using the camera’s parameters, and then reprojectedorthographically using OpenGL. Given a depth image that isprojected according to perspective (u, v, d(u, v)), the ortho-graphically reprojected position (u′, v′, d(u′, v′)) is simply

Fig. 2 Overview of theproposed algorithm

Page 4: Content-based 3D model retrieval using a single depth ...image.inha.ac.kr/paper/tvcj2013bae.pdf · 4.2 Projection compensation In the similarity measurement between a pair of depth

558 M.S. Bae, I.K. Park

computed as follows:

(d(u, v)

f(u − Cx),

d(u, v)

f(v − Cy), d(u, v)

)(1)

where (Cx,Cy) and f denote the camera’s principal pointand the focal length, respectively.

5 Multiple depth image-based representation of 3Dmodels

This section describes depth image-based representation andan adaptive view-sampling algorithm for selecting multipledepth images. In M-DIBR, a 3D model is represented byseveral 2.5D depth images captured from different view-points [11]. Since only a single depth image is used as aninput query and the number of sampled depth images is lim-ited, it is very important to sample viewpoints efficiently.The goal is to select the viewpoints adaptively so that the3D model’s geometric saliency is fully considered in itsM-DIBR.

5.1 Coordinate normalization of 3D models

Each 3D model in the database has a different size, center,and orientation in its own local coordinates. Therefore, it isnecessary to normalize these parameters before represent-ing the model in multiple depth images. The model is trans-formed such that its bounding sphere becomes a unit spherecentered at the origin. Furthermore, the upright Z directionof the model is specified manually.

5.2 Measurement of depth image saliency

Multiple depth images of a 3D model are obtained by ren-dering the model in 3D space and dumping the depth bufferusing standard OpenGL functions. Initial viewpoints aresparsely sampled on the vertices of the bounding icosahe-dron, which are further subdivided iteratively into denseviewpoints using Loop’s subdivision scheme [10]. However,if the subdivision is performed uniformly, the 3D model’sgeometric features will not be considered. To avoid this, thesubdivision is performed non-uniformly and adaptively byinvestigating the local geometric saliency of each particulardepth view. If a particular view contains more geometric fea-tures, the viewpoints around the current viewpoint are subdi-vided more densely; otherwise, the local viewpoints remainun-subdivided.

In this study, the saliency of a depth image observed fromviewpoint v is measured as a combination of three differ-ent saliency factors: area A(v); surface curvature C(v); andcamera pose P(v). Each factor is normalized between 0.0

Fig. 3 Saliency measurement of sampled depth images. (a) Fourviews of the horse model. (b) Four views of the car model

and 1.0, and the total saliency S(v) is computed by the fol-lowing weighted sum:

S(v) = αA(v) + βC(v) + γP (v) (2)

The weights are empirically determined as α = 1.0, β = 1.5,and γ = 0.8. In Fig. 3, a few examples of saliency measure-ment are illustrated graphically.

5.2.1 Area saliency

In general, the larger the model’s projected area, the morerepresentative is the view. The first term in the saliency mea-surement, (2), which is defined as the normalized area of theforeground, takes this fact into account:

A(v) = n{x|x ∈ F(v)}Amax

(3)

where F(v) is the foreground region of the view projectedfrom v. The normalizing term Amax is the maximum area,which is computed for the initial views using uniformlysampled viewpoints.

Page 5: Content-based 3D model retrieval using a single depth ...image.inha.ac.kr/paper/tvcj2013bae.pdf · 4.2 Projection compensation In the similarity measurement between a pair of depth

Content-based 3D model retrieval using a single depth image from a low-cost 3D camera 559

5.2.2 Curvature saliency

Similar to the edges of a 2D image, the high-frequency com-ponent of a 3D model carries visually important surface de-tails that distinguish it from other models. In this context,a more salient depth view has more surface details than aless salient one. The amount of local surface detail is com-monly measured by the local surface curvature. The pixel-wise mean curvature H(x) is computed as the average ofthe minimum and maximum principal curvatures, i.e., kmin

and kmax, at the foreground location x. Then the curvaturesaliency C(v) is defined as the sum of the normalized meancurvatures, which is given by

C(v) =∑

x∈F(v) |H(x)|Hmax

(4)

where Hmax is the maximum sum of mean curvatures for theinitial depth views with uniformly sampled viewpoints.

5.2.3 Camera pose saliency

The last term in (2) is based on the heuristic prior of thecamera pose, which states that a 3D model is less likely to beviewed from underneath and more likely to be viewed fromthe top. Note that the upright direction of the 3D modelsis already specified manually in the normalization stage. Asthe angle θv between the camera’s viewing vector to v andthe model’s upright direction (Z direction) decreases, thecamera pose saliency term will have a higher response, andit is formulated as follows:

P(v) = 1 + cos θv

2(5)

5.3 Adaptive viewpoint sampling

After computing the saliency of depth images from the view-points at the current (nth) iteration of the viewpoint subdivi-sion, the saliency of the viewpoint triangle is computed bysimply averaging the saliency of three vertices:

S(i, j, k) = λn

{S(i) + S(j) + S(k)

3

}(6)

In (6), the term λn decreases as the iteration continues. With-out multiplying λn, the saliency of certain triangles remainshigh such that they are recursively subdivided, which meansthat the viewpoints become locally too dense. However, bymultiplying λn, we can avoid this problem. In our imple-mentation, λ is empirically found to be 0.8.

Viewpoint triangles with higher saliency are first subdi-vided one by one until the iteration count reaches the limit(350 in our implementation), yielding 1062 viewpoints. Fig-ure 4 shows an example of the adaptively sampled view-points of the horse model. Note that the side views are sam-pled more densely than the others. In particular, the right

Fig. 4 Adaptively sampled viewpoints of the horse model. α = 1.0,β = 1.5, γ = 0.8, λ=0.8

side of the horse is slightly more sampled, because the horsehas one more accessory on its right side than on its left,which increases the curvature saliency of the views from thehorse’s right side.

6 Similarity measurement and comparison

6.1 Rotation-invariant descriptor generation

Since direct matching of depth images is rotation-variant,we need to generate rotation-invariant descriptors from thedepth images. To achieve this, Zernike moments [9] wereemployed in this study. A Zernike moment is a set of or-thogonal complex polynomials defined inside a unit circle,which produces rotation-invariant responses by projectingthe radial basis polynomials Rkm(r) onto the image f (x, y)

as follows:

Zkm = 2(k + 1)

π(N − 1)2

N−1∑x=0

N−1∑y=0

Rkm(rx,y)e−jmθxy f (x, y) (7)

where

Rkm(r) =(k−|m|)/2∑

s=0

(−1)s(k − s)!s!((k + |m|)/2 − s)!((k − |m|)/2 − s)! r

k−2s

(8)

In (7) and (8), N is the resolution of the depth image and k

and m denote the degrees of the Zernike moment with theconstraint that 0 < m < |k| and k − |m| is an even num-ber. rx,y is the distance from (x, y) to the image center((32,32)). In our implementation, k is set to 13 such that56 Zernike moments are generated for each depth image.

Page 6: Content-based 3D model retrieval using a single depth ...image.inha.ac.kr/paper/tvcj2013bae.pdf · 4.2 Projection compensation In the similarity measurement between a pair of depth

560 M.S. Bae, I.K. Park

6.2 Similarity measurement

6.2.1 Similarity measurement using descriptors

Finally, the dissimilarity between the query and the 3D mod-els in the database is computed as the distance between de-scriptors. The distance between the query and a particular3D model is defined as the minimum difference between thequery descriptor and all the V model descriptors.

D = min1≤i≤V

p∑k=0

q∑m=0

∣∣Zquerykm − Zi

km

∣∣ (9)

where Zquerykm and Zi

km represent the Zernike descriptor ofthe query, and ith, the depth image of the model. (9) is eval-uated for all models and the retrieval results are sorted in theincreasing order of D.

6.2.2 Parallelization of similarity measurement

In the similarity measurement, (9) is, in fact, independentlyevaluated for all the models in the database and for all thedepth images of a particular model. Based on this obser-vation, in our approach, these independent evaluations areimplemented in a massively parallel manner on a GPU us-ing NVIDIA compute unified device architecture (CUDA)SDK [13].

The descriptors of all the models in the database arecopied from the system memory to the GPU’s global mem-ory during the initialization step. Then the CPU computesthe descriptor of the input query and copies it into GPUmemory. In the CUDA kernel execution, the GPU generates(p + 1)(q + 1)V parallel threads. Each thread computes thedifferent |Zq

km − Zikm| in (9).

7 Experimental results and discussion

In order to demonstrate the performance of the proposed al-gorithm, extensive experiments were carried out using realand synthetic data. The algorithm was implemented on a ma-chine with Windows 7 and Visual Studio. The machine wasequipped with Intel Core2 Quad Q8300 (2.5 GHz) CPU andNVIDIA GeForce GTX570 GPU. Kinect and Kinect SDKwere used to capture the input query images. In our experi-ments, the Princeton Shape Benchmark (PSB) database [21]was used, which consists of 907 models classified into 131classes. In depth image-based representation, the resolutionof a depth image is fixed to a small resolution, i.e., 64 × 64.However, in saliency analysis of depth images, the depthimage is rerendered in high resolution (512 × 512) so thatthe geometry detail can be analyzed as accurately as possi-ble. The proposed adaptive viewpoint sampling yields 1,062depth images for each model in the database.

Table 1 Processing time of retrieval for a single query (in millisec-onds)

Processing step Processing time

Preprocessing of input query 14.3

Descriptor generation 109.8

Similarity measurement GPU 400.4

CPU 1381.6

Total GPU 524.5

CPU 1505.7

The processing time of retrieval for a single input queryis shown in Table 1. Note that the descriptors of the 3D mod-els in the database are generated in offline processing, whichdoes not affect the processing time for retrieval. For 907models, it takes only a half second to retrieve the rank or-der. Using GPU parallel processing, the processing time isthree times faster than running the procedure on a CPU. Thespeedup factor increases if the database size increases. Nei-ther CPU nor GPU implementation has yet been optimized.

7.1 Performance evaluation on real objects

Using a Kinect camera, we conveniently captured depth im-ages of a real object and entered them in the proposed re-trieval system. In Fig. 5, the input queries and the retrievalresults are shown for eight different objects. It is clearlyshown that, with only a single depth image as an input query,the relevant models are retrieved effectively. Even given theirrelevant results (airplanes and plant at a lower rank) inFig. 5(d), it can be generally understood that the algorithmworks properly since the geometric structure of these resultsis not far from that of the input.

7.2 Performance comparison

Figure 6 shows a comparison of the retrieval performanceof the proposed algorithm and the results of a previousstudy [4]. It should be noted that [4] used a complete 3Dmodel as the input query and performed retrieval in a largerdatabase including the PSB. On the other hand, a singledepth image synthetically captured from the corresponding3D query model was used as the input query. As shown inFig. 6, it is observed that both algorithms retrieved reason-ably correct models. Although the experimental conditionsdo not exactly match, we believe they are sufficiently aliketo show that the performance of the proposed algorithm iscomparable to one of the state-of-the art algorithms. It is re-markable that, unlike in [4], the proposed algorithm uses asingle depth image captured conveniently using an inexpen-sive 3D camera.

Page 7: Content-based 3D model retrieval using a single depth ...image.inha.ac.kr/paper/tvcj2013bae.pdf · 4.2 Projection compensation In the similarity measurement between a pair of depth

Content-based 3D model retrieval using a single depth image from a low-cost 3D camera 561

Fig. 5 Retrieval results for the input queries of the real object. Depth images are captured using Kinect

Page 8: Content-based 3D model retrieval using a single depth ...image.inha.ac.kr/paper/tvcj2013bae.pdf · 4.2 Projection compensation In the similarity measurement between a pair of depth

562 M.S. Bae, I.K. Park

Fig. 6 Retrieval results for the synthetic input for the models in the PSB database. (a) Results of the proposed algorithm. (b) Results of a previousstudy [4]

Page 9: Content-based 3D model retrieval using a single depth ...image.inha.ac.kr/paper/tvcj2013bae.pdf · 4.2 Projection compensation In the similarity measurement between a pair of depth

Content-based 3D model retrieval using a single depth image from a low-cost 3D camera 563

Fig. 7 Precision-recall experiments. (a) Comparison of uniformly sampled viewpoint and adaptively sampled viewpoint. (b) Comparison withconventional methods [3, 16, 20]

7.3 Precision-recall analysis

In order to evaluate the accuracy of the proposed algo-rithm more rigorously, a precision-recall experiment wasperformed. Note that a precision-recall plot is a commonlyused criterion of performance evaluation for most informa-tion retrieval systems. In a precision-recall plot, precisionusually decreases as recall increases. For algorithms withgood performance, the location of the plots is relativelyhigher than for an algorithm having poorer performance. Inaddition, a slowly decreasing rate of the plot is another signof robustness.

To conduct a comprehensive analysis, a large numberof queries captured on the existing models in the databaseshould be tested. However, since it is impractical to do thisfor real objects in a real environment, synthetically sampledqueries were tested in our experiment instead. Given each3D model in the database, we asked three subjects to se-lect the most representative view using a simple interface.Then this view was rendered in 3D space using OpenGL,and depth images were captured. In this manner, 633 inputqueries were created for a total of 211 models in 25 classes.The precision-recall plot was generated by performing re-trieval for these 633 queries.

First, we evaluated the performance difference for dif-ferent view sampling methods: uniform and the proposedadaptive viewpoint sampling. As illustrated in Fig. 7(a), theadaptive viewpoint sampling showed a higher precision forall recall rates. This is because adaptively sampled view-points increase the likelihood of a similar view of the inputquery being included in the M-DIBR of the models. There-fore, more precise matching is possible.

Finally, the proposed algorithm was compared with afew existing retrieval algorithms. For a fair comparison, weexcluded the existing algorithms that use a complete 3Dmodel as an input query. Instead, we tested view-based andglobal feature-based retrieval algorithms, i.e., sketch based-algorithms. The results are shown in Fig. 7(b) as a precision–recall plot, in which these algorithms are named GRLIP, ED,and D2. The proposed algorithm shows a higher precisionvalue for most of the recall rate. In particular, the curve ofthe proposed algorithm does not drop rapidly, which is a de-sirable property when using it in real applications.

8 Conclusions and future work

In this paper, we proposed an efficient algorithm and a com-prehensive system for content-based 3D model retrieval.The proposed algorithm took a single depth image as aninput query using a Kinect camera, which is an inexpen-sive 3D camera. The 3D models in the database were rep-resented by multiple depth images acquired from adaptivelysampled viewpoints. The multiple depth image-based rep-resentation and the proposed adaptive viewpoint samplingtechnique were shown to be effective and robust. The pro-posed algorithm retrieved relevant 3D models effectively,while considering local 3D geometric characteristics usinga rotation-invariant Zernike moment descriptor. Descriptormatching was implemented in massively parallel computa-tion on a GPU, facilitating significantly faster retrieval pro-cessing.

Future work includes extending the depth capturingmethod to more general and easier input queries withoutusing a 3D camera. To achieve this, alternative approaches

Page 10: Content-based 3D model retrieval using a single depth ...image.inha.ac.kr/paper/tvcj2013bae.pdf · 4.2 Projection compensation In the similarity measurement between a pair of depth

564 M.S. Bae, I.K. Park

could employ shapes from multiple stereo or shape frommotion technique. However, in this case, the set up becomesmore challenging than the current one because data com-pleteness and accuracy of these passive methods are lowand the noise level is relatively high. In addition, we plan toimplement the proposed system on a mobile device and linkthe model search to Web services. We believe practical ap-plications in such a scenario would attract much commercialinterest.

Acknowledgements This research was supported by the MKE (TheMinistry of Knowledge Economy), NHN Corp., under IT/SW Cre-ative research program supervised by the NIPA (National IT Indus-try Promotion Agency) (NIPA-2012-(H0505-12-1003)). This researchwas supported by the Basic Science Research Program through the Na-tional Research Foundation of Korea funded by the Ministry of Educa-tion, Science, and Technology (2012R1A1A2009495).

References

1. Chaouch, A., Verroust, A.: A new descriptor for 2d depth imageindexing and 3D model retrieval. In: Proc. of IEEE InternationalConference on Image Processing, pp. 373–376 (2007)

2. Daras, P., Axenopoulos, A.: A 3D shape retrieval framework sup-porting multimodal queries. Int. J. Comput. Vis. 89(2), 229–247(2010)

3. Eitz, M., Richter, R., Boubekeur, T., Hildebrand, K.: Sketch-basedshape retrieval. ACM Trans. Graph. 31(4), 31:1–31:10 (2012)

4. Funkhouser, T., Min, P., Kazhdan, M., Chen, J., Halderman, A.,Dobkin, D., Jacobs, D.: A search engine for 3D models. ACMTrans. Graph. 22(1), 83–105 (2003)

5. Google: 3D Warehouse. http://sketchup.google.com/3dwarehouse

6. Heczko, M., Keim, D., Saupe, D., Vranic, D.: Method for simi-larity search on 3D databases. Datenbank Spektrum 2(2), 54–63(2002)

7. Hilaga, M., Shinagawa, Y., Kohmura, T.: Topology matching forfully automatic similarity estimation of 3D shapes. In: Proc. ofACM SIGGRAPH, pp. 203–212 (2001)

8. Horn, B.: Extended Gaussian images. Proc. IEEE 72(12), 1671–1686 (1984)

9. Khotanzad, A.: Invariant image recognition by Zernike moments.IEEE Trans. Pattern Anal. Mach. Intell. 12(5), 489–497 (1990)

10. Loop, C.T.: Smooth subdivision surfaces based on triangles. M.S.thesis, Department of Mathematics, University of Utah (1987)

11. Maslyuk, L., Igntenko, A., Zhirkov, A., Konushin, A., Park, I.K.,Han, M., Bayakovski, Y.: Depth image-based representation andcompression for static and animated 3D objects. IEEE Trans. Cir-cuits Syst. Video Technol. 14(7), 1032–1045 (2004)

12. Microsoft: Kinect. http://www.microsoft.com/en-us/kinectforwindows

13. NVIDIA Corporation: Compute Unified Device Architecture(CUDA). http://developer.nvidia.com/object/cuda.html

14. Ohbuchi, R., Nakazawa, M., Takei, T.: Retrieving 3D modelsshapes based on their appearance. In: Proc. of ACM SIGMM Inter-national Workshop on Multimedia Information Retrieval, pp. 39–45 (2003)

15. Ohbuchi, R., Osada, K., Furuya, T., Banno, T.: Salient local visualfeatures for shape-based 3D model retrieval. In: Proc. of IEEE In-ternational Conference on Image Processing, pp. 93–102 (2008)

16. Osada, R., Funkhouser, T., Chazelle, B., Dobkin, D.: Shape distri-butions. ACM Trans. Graph. 21(4), 807–832 (2002)

17. Paquet, E., Rioux, M., Murching, A., Naveen, T., Tabatabai, A.:Description of shape information for 2-d and 3-d objects. SignalProcess. Image Commun. 16(1–2), 103–122 (2000)

18. Petschnigg, G., Agrawala, M., Hoppe, H., Szeliski, R., Cohen, M.,Toyama, K.: Digital photography with ash and no-ash image pairs.ACM Trans. Graph. 23(3), 664–672 (2004)

19. Rother, C., Kolmogorov, V., Blake, A.: Grabcut: interactive fore-ground extraction using iterated graph cuts. ACM Trans. Graph.23(3), 309–314 (2004)

20. Shih, J., Hsing, C., Wang, J.: A new 3D model retrieval approachbased on the elevation descriptor. Pattern Recognit. 40(1), 283–295 (2007)

21. Shilane, P.: The Princeton shape benchmark. In: Proc. of Interna-tional Conference on Shape Modeling and Applications, pp. 167–178 (2004)

22. Sundar, H., Silver, D., Gragvani, N., Dickenson, S.: Skeletonbased shape matching and retrieval. In: Proc. of International Con-ference on Shape Modeling and Applications, pp. 130–139 (2003)

23. Tangelder, J.W., Veltkamp, R.C.: A survey of content based 3Dshape retrieval methods. Multimed. Tools Appl. 39(3), 441–471(2008)

24. Tung, T., Schmitt, F.: Augmented reeb graphs for content-basedretrieval of 3D mesh models. In: Proc. of International Conferenceon Shape Modeling and Applications, pp. 157–166 (2004)

25. Vajramushti, N., Kakadiaris, I.A., Theoharis, T., Papaioannou, G.:Efficient 3D object retrieval using depth images. In: Proc. of ACMSIGMM International Workshop on Multimedia Information Re-trieval, pp. 189–196 (2004)

Min Soo Bae received the B.S. andM.S. degrees from Inha Universityin 2010 and 2013, respectively, allin information and communicationengineering. From January 2010 toFebruary 2011, he was a researcherat LG Electronics. Since February2013, he has been a researcher atLIG Nex1. His research interestsare in the area of computer graph-ics and software development, in-cluding content-based 3D model re-trieval and user interface softwaredesign.

In Kyu Park received the B.S.,M.S., and Ph.D. degrees from SeoulNational University (SNU) in 1995,1997, and 2001, respectively, all inelectrical engineering and computerscience. From September 2001 toMarch 2004, he was a Member ofTechnical Staff at Samsung Ad-vanced Institute of Technology(SAIT). Since March 2004, he hasbeen with the School of Informationand Communication Engineering,Inha University, where he is an asso-ciate professor. From January 2007to February 2008, he was an ex-

change scholar at Mitsubishi Electric Research Laboratories (MERL).Dr. Park’s research interests include the joint area of computer graph-ics and vision, including 3D shape reconstruction from multiple views,image-based rendering, computational photography, and GPGPU forimage processing and computer vision. He is a member of IEEE andACM.