A new sparse representation-based object segmentation ...

Vis ComputDOI 10.1007/s00371-015-1171-2

ORIGINAL ARTICLE

A new sparse representation-based object segmentationframework

Jincao Yao1 · Huimin Yu1,2 · Roland Hu1

© Springer-Verlag Berlin Heidelberg 2015

Abstract In this paper, a novel sparse representation-basedobject segmentation model is proposed. The model followsfroma newenergy function that combines the level-set-basedsparse representation and the independent component-basedshape representation within a unified framework. Before theoptimizationof theproposed energy, a set of training shapes isfirstly projected into the shape space spanned by the indepen-dent components. For an arbitrary input shape similar to someof the elements in the training set, the minimization of theenergy will automatically recover a sparse shape combina-tion according to the neighbors in the projected shape space toguide the variational image segmentation. We test our modelon both public datasets and real applications, and the exper-imental results show the superior segmentation capabilitiesof the proposed model.

Keywords Shape segmentation · Sparse representation ·Independent component · Pose alignment

1 Introduction

Object segmentation is one of the key problems in imageprocessing and computer vision. Among the existing meth-

B Huimin [email protected]

Jincao [email protected]

Roland [email protected]

1 College of Information Science and Electronic Engineering,Zhejiang University, Hangzhou, China

2 The State Key Laboratory of CAD & CG,Zhejiang University, Hangzhou, China

ods [1–5], many of them tended to build an image-basedenergy function, and the minimization of the function woulddrive an evolutionary curve to represent the coherent shapeareas. However, in many conditions, the appearance of theshape can be highly misleading; noises, occlusion or similarbackground frequently contaminate the object in the image;and therefore using only the image-based energy functionto segment such contaminated objects often leads to poorresults [6–8]. A popular way to solve this problem is usingthe prior shapes to guide the segmentation. And such priorknowledge-basedmethods have shown to be an effectivewayto improve the performance [9,10].

A common way to involve the shape prior is directlyinvolving a shape prior constraint term to the energy function.For example, by incorporating a global constraint, severallevel-set-based models that segmented the shape with a pre-selected prior sample were developed [11,12]. Besides theglobal constraint methods, shape projecting techniques alsohave been used to improve the performance. Such as theshape models (SMs)-based method which projected andrepresented the new coming shape by the correspondingSMs [13]. The deep learning-based method that projectedthe input shape in different levels [14]. Generally speaking,these models mainly used the best match or projection inthe training set to guide the segmentation. However, in manyapplications, it is difficult to simply find a best match amongthe training set that can precisely guide the segmentation,and there might be only several similar neighbors avail-able. For this problem, a growing understanding in computervision and image processing tends to integrate higher-levelshape representation or clustering models as the constraintto supervise the segmentation [15–18]. Since for a giventraining set, finding and representing the input shape basedon its neighbors in the training set, to some extent, can beseen as a sparse representation process [19–22]. Recently,

123

http://crossmark.crossref.org/dialog/?doi=10.1007/s00371-015-1171-2&domain=pdf

J. Yao et al.

several works have begun to explore combining the sparserepresentation with the shape segmentation. For example,a sparse coding-based segmentation method that can beapplied to hippocampus labeling was proposed in [19]. Avertices and edge-based sparse shape composition (SSC)model has been built to represent the input shape as asparse combination of existing samples [20]. An extensionof this sparse shape combination model was also developedin [21].

Although the models above have made certain progressesin sparse shape representation (SSR), they still suffer fromthree limitations. (1) The existing methods mainly builtthe model according to the vertices or a certain distribu-tion of the training set [17,21]. In many applications, it isdifficult to accurately obtain this information and the train-ing set may also have complex shape statistics. (2) Theprevious SSR models required the pre-alignment of ver-tices to achieve the shape alignment [17,19,21]. In realapplications, it is hard to pre-align the object before thesparse representation. Therefore, the models above maytrap in the problem of balancing different geometric trans-formations. (3) For the image-data-driven issue, the previ-ous models adopted the general error control term or theprobability-based segmentation term to drive the energyminimization [17,22]. However, to those objects whichare surrounded by the region that has the similar appear-ance with the object, the existing data-driven terms aretoo general, and that may mislead the sparse representationterm.

To overcome these drawbacks, in this paper, a newframework called independent component-based local sparseshape representation (IC-LSSR) was proposed. Differentfrom the existing SSR models, we derived an independentcomponent-based sparse subset to deal with the trainingshapes. Comparing with the probability-based method orthe low-order moment-based methods like PCA and SMs,the independent component-based representation considershigher-ordermoments, and it has beenproven to have a strongability to describe the latent structure of the training data;even the data have complex distributions [23–26]. We thenembedded this independent component-based sparse subsetinto a sparse representation energy function and simultane-ously solved the problems of searching the sparse neighbors,recovering the independent component-based shape repre-sentation and the variational image segmentation in a unifiedtheoretical framework. A log-polar transformation-basedSSR modeling method and a local seed constraint term werealso proposed to solve the geometric transformation and gen-eral data-driven problems mentioned above.

The rest of this paper is organized as follows. Section 2introduces the background. Section 3 presents our proposedmodel. Section 4 introduces the minimization method of theproposed energy function. Experimental results are presented

in Sect. 5. Finally, in Sect. 6, we make the conclusions anddescribe the possible future research directions.

2 Background

2.1 The basic sparse shape representation

Assuming that there is a set of normalized training shapesψ = [ψ1, ψ2, . . . , ψN ] ∈ R

L×N , ψ ∈ [0, 1], whichcontains N base shape elements ψi , notice that the two-dimensional shape has been changed into a column vectorψi

by arranging each column sequentially, and L is the lengthof the vector. For an arbitrary input shape ϕ ∈ R

L , if ϕ issimilar to some of the base elements in the training set, tosome extent, we can approximately represent ϕ as a sparselinear combination of those elements:

ϕ ≈ ψ1s1 + ψ2s2 + · · · + ψN sN = ψs ∈ RL (1)

where s represents the sparse coefficient. This coefficient canbe found by minimizing the following sparse formulation:

mins∈RN×1

‖s‖1 , s. t. ‖ψs − ϕ‖2 ≤ δ (2)

where ‖·‖1 represents the L1 norm, and δ is the error controlparameter. An equivalent formulation is

mins∈RN×1

{ESSR(s) = ‖ψs − ϕ‖22 + λ ‖s‖1

}(3)

where λ is the regularization parameter. The above modelprovides a basic SSR model. However, directly solving themodel in this manner may produce an unfaithful result,especially when the input shape has different geometrictransformations and contaminations. Several methods havebeen proposed to solve these problems [17,20,21], butunfortunately these models either needed a strict shape pre-alignment or required the training shapes follow a certainunderlying distribution, and these assumptions limited theability of the sparse models. Considering that for the signalreconstruction task, the ICA has a strong ability to recoverthe signal, in this paper, we proposed a new energy func-tion that formulated the variational sparse representationand the independent component-based shape representationin a unified theoretical framework to improve the perfor-mance.

2.2 An independent component-based shaperepresentation

In order to obtain the independent component-based shaperepresentation, the training set ψ is first centralized by sub-

123

A new sparse representation-based object segmentation framework

tracting the mean shape and changed into Z = [z1, z2, . . . ,zN ]T ∈ R

N×L , where zi = ψi − ψ , ψ = 1N

∑Nj=1 ψ j is

the mean shape. The main idea of ICA is to find a series ofindependent source components M = [m1,m2, . . . ,mC ]T,C < N that can describe the latent structure of input data by

Z = W M (4)

where W represents themixingmatrix. And the aim of ICA isto solve the inverse equation M = WZ and obtain the inde-pendent source component M , whereW denotes a demixingmatrix. There are several methods that can recover M , forexample, the fast-ICA, the convex-ICA (C-ICA) [24,25].Here, we use the recent proposed method C-ICA to generatethe independent shape components, and details of C-ICA canbe found in [25].

Once the independent shape components M are obtained,the arbitrary input shape ϕ, which is similar to some elementsin the training set, can be projected into the independentcomponent-based space through calculating

u = M(ϕ − ψ) (5)

where u is the projection coefficient.

3 The proposed sparse shape representation andsegmentation model

3.1 Sparse representation with embedded independentshape component

Considering that the recovered independent component Mcan be seen as the latent structure of the training set, manyresearches directly use M to formulate the recognition orclassification models [25,26]. However, this kind of for-mulation cannot be used in the sparse representation-basedmethod. Generally, to integrate the sparse representationand independent shape component into a unified theoreticalframework, two problems need to be solved.

1. The projected input shape may relate to most of the inde-pendent components, thus directly using the independentcomponents to formulate the sparse model which cannotguarantee the sparsity. More importantly, the extractedindependent components cannot formaconvex set,whichmeans it is hard to obtain a global minimization.

2. Compared with the similar elements in the training setthe input shape may have geometric transformations thatwill influence the accuracy of the shape representation,and the model needs to solve the geometric invarianceproblem for the shape projection.

For the first problem, recalling the basic definition of ICAin (4), it used a linear combination of extracted independentcomponents to describe the centralized training shapes (bysubtracting the mean shape ψ). Thus, for a projected coeffi-cient b = [b1, b2, . . . , bC ]T, the ICA provides the followingequation to recover a shape in original shape space:

ϕb = ψ +C∑i=1

bimi (6)

whereϕb is the recovered shape.ObservingEq. (6), the recov-ered shape can be represented by three terms: a projectedvector b, a constant mean shape and the recovered sourcecomponents. Further analyzing this shape recovery equation,and considering different definition of the coefficient b, sev-eralmeaningful propositions can be obtained. First,we definethe coefficient be an arbitrary vector in a real number fieldand have the Proposition 1.

Proposition 1 Assuming that the mean shape of a giventraining shape set is ψ , and [m1,m2, . . . ,mC ]T is the corre-sponding recovered independent component, for an arbitraryvector b = [b1, b2, . . . , bC ]T ∈ R

C , we define a shape set

ξ ={

η = ψ +C∑i=1

bimi |b ∈ RC

}(7)

and this independent component-based shape set is a convexset.

The proof of the Proposition 1 is presented in “Appendix1”. As we can see from Proposition 1, the changing of vectorb will generate a variational shape according to the extractedindependent components. Since any shape elementsψi in thetraining set can also be recovered by the independent compo-nent, for the input shape that is similar to someof the elementsin the training set, we can further change the definition of thecoefficient b and create a sparse representation-based shapeset.

Proposition 2 Assuming ψi is one of the elements of train-ing shape set ψ , it can be recovered by ψi = ψ +∑C

j=1 ai, jm j ∈ ξ, ai, j is the weight coefficient, let ai =[ai,1, ai,2, . . . , ai,C ]T, A = [a1, a2, . . . , aN ] ∈ R

C×N , s =[s1, s2, . . . sN ]T ∈ R

N be a sparse coefficient, and (As)i rep-resents the i th element of sparse combination As, then forall the sparse combination we define a shape set ζ

ζ ={

ϑ = ψ +C∑i=1

(As)imi

∣∣∣∣s ∈ RN

}(8)

This sparse representation-based shape set is a subset of ξ ,and it is also a convex set.

123

J. Yao et al.

The proof of the Proposition 2 is presented in “Appendix2”. Based on the two propositions above, in this paper, forthe projected coefficient u in (5), we propose to use a sparseterm

mins∈RN×1

{ESSR(s) = ‖As − u‖22 + λ ‖s‖1

}(9)

to regularize it.

Remark 1 The input u is one of the coefficients of ξ in Propo-sition 1, and the As term is a sparse representation term thatbelongs to the coefficients of subset ζ in Proposition 2. Sincethe input shape is only similar to some of the elements inthe training set, and each of the elements corresponding toa projected vector in A, once we use As to regularize thecoefficient, the sparse property will be maintained. Moreimportantly, the L1 norm sparse representation is a convexformulation; when we build the sparse term according to ζ ,we have a convex model running on a convex set. And thatallow us to achieve a global sparse minimization.

Therefore, through putting the projected Eqs. (5) into (9),the basic independent component embedded sparse modelcan be formulated as:

mins∈RN×1

{ESSR(s)= ∥∥As − Mϕ + Mψ

∥∥22 + λ ‖s‖1

}(10)

3.2 Geometric invariance sparse shape representation

The model above embeds the independent shape componentprojection terms into the sparse representation. However, thegeometric transformation (translation, rotation and scaling)has not been considered. For this problem, existing sparsemodels mainly adopt a shape pre-aligning method [17,19,21]. The drawback of this method is: in many conditions, itis hard to automatically and accurately pre-align the shapebefore the SSR, especially for those contaminated objects.To avoid this problem, our model adopts a log-polar trans-formation method from the image registration area to buildour model.

In the log-polar system, the point is represented by y =[y1, y2] ∈ R2, where y1 denotes the log radial distance,and y2 denotes the angle. Any point x = [x1, x2] ∈ R2 inCartesian space can be represented by log-polar coordinatesthrough calculating

y1 = log√x21 + x22 , y2 = tan−1 x2

x1. (11)

One property of using this transformation is that any rotationand scaling operations in the original space are transformedinto a linear shift. Figure 1 shows an example of this linearshift property. Figure 1(a) is a fish, (b) is the same shape

Fig. 1 a The original shape, b is the shape with rotation and scaling,c transforms (b) into log-polar shape, d transforms (b) into log-polarshape, e shows the linear shift of the rotation and scaling on the enlargedlog-polar shape. The yellow circles label one of the vertices

with a certain rotation and scaling, and (c) and (d) are thelog-polar shapes of (a) and (b). Here we only show y2 inthe range of [−π, π ), while the whole log-polar shape is aperiodic image. As shown in (e), the log-polar shapes in theshifted gray rectangles are exactly the same as (c) and (d).We also circle out one of the vertices of the shape to makethis shift clearer.

Based on this fact, the rotation and scaling of the shapecan be easily represented by Tr (ϕ) = ϕL(y − r), wherer = [r1, r2] ∈ R2, and ϕL is the log-polar shape of ϕ; hereT (·) and T −1(·) represent the log-polar transformation andthe inverse transformation, respectively.

Considering that the log-polar image has this linear shiftproperty, many image registrationmethods built their modelsbased on the log-polar image [27,28]. In this paper,we extendthis method to the SSR area and propose to build our modelbased on the log-polar shape. The benefit of using the log-polar shape to build the SSR lies in twofold:

1. Once the shift vector r has been found, the model cansolve the rotation and scaling simultaneously. Comparingwith other SSR models that separately solved rotationmatrix and scaling parameter or used a pre-alignmentsetting [17,21], our SSR model can avoid the problem ofbalancing the rotation and scaling.

2. There are many methods that can ensure the fast cal-culation of the global optimization of the linear shift,for example, the fast Fourier transform (FFT) [29]. Andcompared with solving the rotation matrix and scalingparameter in the original shape space, solving a linearshift is much easier.

Since the translation problem is a natural process of linearshift in the original shape space, we can directly repre-sent it as ϕ (x − h), where h is the translation vector. Thusthe whole geometric transformation term can be written as

123


Fig. 2 a The edge-based shape elements used in SSC [21], b the probability-based shape elements used in [17], c the log-polar shape elementsused in our model, d using the sparse combination in the subspace ζ to represent the projected input shape in ξ

Tr,h(ϕ) = Tr (ϕ(x − h)). By putting it into Eq. (10), a geo-metric invariance SSR is obtained:

mins∈RN×1

{ESSR(s, r, h)=∥∥As − MTr,h(ϕ)+Mψ

∥∥22+λ ‖s‖1

}

(12)

We compare the basic shape elements modeling methodwith two state-of-the-art sparse models. Figure 2(a) showsthe edge-based shape elements used in SSC [21], (b) is theshape elements used in probability-based sparse shape rep-resentation (P-SSR) [17], (c) is the log-polar shape elementsused in our model, and (d) shows how to sparsely representthe input shape by using the projected neighbors. All theshapes are from Part B of the MPEG-7 CE-Shape-1 dataset[17].

3.3 A seed-based image-driven energy

As an image-related term, the input shape ϕ in (12) is stillunknown. Generally, there are two ways to involve this term,using an edge-basedmethod or a region-basedmethod, and inthis paper, we use a classical region-based Chan–Vese (CV)model in [2] as the basic image-driven term to complete ourmodel. The energy function of CV model is

ECV(φ) =∫

e1(x)H(φ)dx+∫

e2(x)(1 − H(φ))dx

+ v

∫|∇H(φ)|dx (13)

The first two terms of this model are the level-set-based seg-mentation terms, and the last one is the curve regularizationterm. Here we set ei (x) = − log pi (x) as the region descrip-tor, and pi (x) represents the histogram of the object andbackground regions. φ is the level-set-based formulation,H(·) is the Heaviside function, and v is the shape regu-larization parameter. In this paper, we set v = 0.5. As wecan see from (13), the H(φ) in ECV(φ) provides lower-levelimage information, and to some extent, it can be seen as

an input shape and combined with our sparse representationmodel. Therefore, a simple way to connect the image infor-mation to our SSR term is replacing the input shape ϕ in (12)with H(φ), and adding ECV(φ) as our image-data-driventerm.However, this formulationmayhaveproblemswhen theshape is surrounded by noise or a large region which has sim-ilar region descriptor as the input shape, because the modelmay confuse the shapewith the similar background. For theseproblems, a local seed-based method has been proposed inGraphCut areaswhich achieves a good performance [30,31].The basic idea of this method is to involve a seed-based localconstraint and compel the curve to focus on the local regionnear the seed. The advantage of this method is that it canprovide better performance to distinguish the shape from thesurrounding area, and the disadvantage is that the seed mustbe initialized in the shape. Fortunately, with the developmentof object detection, feature extraction (like SIFT and SURF)and tracking techniques, this type of seed is not difficult toobtain [31–33]. Since the seed generation issue is beyondthe scope of our topic, in this paper, the seed is manuallyinitialized.

Considering that the local seed-based method is mainlyformulated according to the local constraint, in this paper, wepropose to use an improved center of gravity-based constraintterm to compel the implicit shape to focus on the local regionand generate a more accurate local region descriptor. For thearbitrary level-set-based shape H(φ), the center of gravity ofthe shape can be calculated by

∫xH(φ)dx/

∫H(φ)dx , and

thus the constraint can be formulated as

μ1 ≤∫xH(φ)dx∫H(φ)dx

≤ μ2, μ1, μ2 ∈ R2 (14)

whereμ1 andμ2 constrain the range. Obviously, the inequal-ity above is hard to be involved in the energy function. Tosolve this problem, we can observe the equivalent inequality

μ1

∫H(φ)dx ≤

∫xH(φ)dx ≤ μ2

∫H(φ)dx (15)

123

J. Yao et al.

Though the inequality above still cannot be added to theenergy function, we can build a soft distance function tosimulate its property. Without loss of generality, we adopta Gaussian kernel-based function to achieve this simulation

∫(1 − e−(x−μ)2/σ 2

)H(φ)dx < ε, ε ∈ R+ (16)

where μ = (μ1 + μ2)/2 represents the center of the seed,and σ defines the range of influence. Since H(φ) is a non-negative function, once its center of gravity is away from μ,the integral result will increase. When we set a proper ε asthe upper bound, the inequality (16) will constrain the centerof gravity of H(φ) close to μ, and this achieves the sameeffect as the inequality (14). Considering that the inequality(16) has this property and our total energy function is alsobased on the minimization, we can remove ε, and formulateour IC-LSSR energy as:

ETOTAL(φ, s, r, h)

= ECV(φ) + γ

∫(1 − e−(x−μ)2/σ 2

)H(φ)dx

+ β(∥∥As − MTr,h(H(φ)) + Mψ

∥∥22 + λ ‖s‖1

)

= ECV(φ) + γ ESEED(φ) + βESSR(s, h, r) (17)

where γ controls the intensity of the constraint, and in ourmodel we set γ = 1, β = 1.5, and λ = 0.1× N−1. With thisformulation, the curve evolution will be compelled to focuson the descriptor and the region near the local seed.

Figure 3 provides an example showing the effect of thisconstraint. The input image includes a corrupted triangle sur-rounded by noise with a similar region descriptor. The firstimage shows the initialized seed, the second image is the seg-mentation result of ECV(φ) term, and the third imagepresentsthe result of the ECV(φ) term with our ESEED(φ) term. Ascan be seen, segmentationwithout a seed constraint only pro-vides a general segmentation,which is severely influenced bythe surrounding regions. However, the segmentation with theconstraint compels the curve to focus on the region near theseed, and a more specific local region descriptor is generated

Fig. 3 aAcorrupted triangle surrounded by noisewith a similar regiondescriptor, and the green circle is the initialized local seed, b the seg-mentation result ofCVmodel, cCVmodelwith our local seed constraint

by our method. The effect of this local descriptor is reflectedinside the triangle, and as is shown in Fig. 3c, our methoddrives the model to generate more accurate representation ofthe triangle.

4 The optimization of the energy function

Since the rotation, scaling and translation have been changedinto linear shift in our model, in this paper, we use the FFT tocalculate the shift vectors. Considering that using FFT andIFFT to calculate the convolution andfind the bestmatch shiftvector is the basic technique in signal processing community,we omit the details here [29].

For the sparse coefficient s and the level-set-based shapeφ, we can adopt a classical alternating gradient descent min-imization scheme. We first fix sparse coefficient s and solvethe implicit shape φ. After that, we fix φ and use gradientdescent with a soft-thresholding method to solve s [34,35].The optimization process is shown in Algorithm 1.

Algorithm 1 The optimization of the proposed IC-LSSRenergy function.

Initialization:

Set s =[1

N,1

N, . . .

1

N

]T∈ RN×1.

Set r = (0, 0) and h = (x1, x2), which is also the centerof the seed.Optimization:1. Obtain the gradient decent flow of φ

φg = δ(φ)[e1 − e2 − vdiv(

∇φ

|∇φ| )+γ (1 − e−(x−μ)2/σ 2

)]

+βδ(φ)T −1−r,−h

[2MT (

As − MTr,h(H(φ)) + Mψ)]

2. Update φ ← φ − φg .3. Using the FFT and IFFT to update the transformationparameters h and r .4. Update the sparse coefficient,

s ← s − κ(2βAT (

As − MTr,h(H(φ)) + Mψ))

where κ(·) is the standard soft-thresholding method [35].5. Jump to Step 1 until reaching convergence.

5 Experiments

We test our model on both synthetic images and real applica-tions. All the experiments were implemented on MATLAB2013b and a computer with an Intel i7 CPU running Win-

123


Fig. 4 a The input images, b the moment constraint method in [33],the yellow curve is the evolution result and the green circle is the ini-tialized curve, c the SMs method [13], d the P-SSR method [17], e the

method in this paper, f the recovered sparse coefficients, g the recoveredlog-polar shape, h the recovered original shape

Fig. 5 Curve evolution of the proposed method. The first row shows the curve evolution, the second row shows the sparse coefficients, the thirdand fourth rows show the recovered shape in the original and log-polar spaces, and the fifth row shows the evolution of Tr,h(H(φ))

123

J. Yao et al.

Fig. 6 a The input CT images, b the moment constraint method [33], c the SMs-based method [13], d the edge-based SSC [21], e the P-SSR [17],f our method. The green circle is the initialized curve of three region-based methods. For our method, it also initialized the local seed

dows 7. We also compared our model with several recentproposed methods [13,17,21,33]. To reach a fair compari-son, we initialized the curve of all the region-based methodswith the same shape and location, and for our method it alsoinitialized the local seed.

First, we tested our method in some contaminated syn-thetic images. The object’s shape and training set were fromthe part B of theMPEG-7 CE-Shape-1, which included 1400shapes from 70 classes (part of the shapes is shown in Fig. 2).Since the shapes in the dataset have different size, we firstlynormalized them, and then the shapes were aligned accord-ing to the mean shape of each class. Figure 4a shows severalsynthetic input shapeswith different contaminations, the seg-mentation results of the MC-based method in [33] are shownin (b), and the yellow curve is the evolution result. A green

circle indicates the initialized curve. As the figure shows,the results are very general; (c) and (d) show the segmenta-tion results of the SMs-based method in [13] and the P-SSRmethod in [17], respectively. Since these methods used low-order SSR models, once the shape has a similar backgroundconnectedor the edges havebeen severely damaged, themod-els would fail to recover the original shape. Figure 4e showsthe segmentation results of the proposed model.

As can be seen, the model obtained good segmentationresults even though the shape has noise, occlusion and similarsurrounding regions. Figure 4f–h shows the recovered sparsecoefficients, the reconstructed shape in the log-polar systemand the original space, respectively. Figure 5 also shows anexample of the evolution process of our proposedmodel. Thefirst row shows the curve evolution process, the second row

123


Fig. 7 First to third columns show the original samples correspondingto the three largest coefficients, the fourth to sixth columns show the nor-malized samples, the seventh and eighth columns show the recoveredshape in the log-polar system and the original shape space, respectively.

The ninth column is the competition results of the local seed-based CVterm and the SSR term, and the last column is the recovered sparsecoefficients

Fig. 8 First column is the input images, the second to fifth columns arethe segmentation results of ourmethodwith different seed initializations

presents the sparse coefficient, and the third and fourth rowsshow the recovered shape in both the original shape spaceand the log-polar system. In addition, the last row showsTr,h(H(φ)). We can clearly see that the curve is influencedby the competition between the image-driven term and the

Fig. 9 Part of the training shapes in walking people segmentation

SSR term, and an intermediate curve is generated during thisprocedure. Finally, the evolving curve stops on the boundaryof the shape, and this boundary can be seen as the competitionresult of the lower-level region-based energy and higher-levelSSR.

123

J. Yao et al.

Fig. 10 a The input images, b the MC-based method [33], c the SMs [13], d the SSC [21], e the P-SSR [17], f our method

For the synthetic images, the original sample of the inputshape is directly included in the training set. Thus, the modelonly needs to find the nearest neighbor in the projected shapespace to recover the contaminated shape. However, in manyconditions, the training set may not include the sample thatcan exactly represent the input shape, and there might beonly several similar elements available. In the following part,the segmentation in the real images was examined, and thetraining set did not directly include the original shape of theobject.

First, our model was tested in kidney CT image seg-mentation. All the images were from the “OpenMedImg”dataset [36]. The input images included 179 samples from theCT image 1 and CT image 2, and the training set included336 kidney shapes from the shape set 1 and 2. In order tomake the neighbors of the training shapes ordered by theircorrelation, the K-means approach was applied to divide thetraining shapes into 16 clusters, and thenwe aligned the shapeelements in each class and normalized their mean shapes intothe same size.

As is known, in CT images, it is common for the kid-ney to appear to be connected with other organs which havevery similar properties. For such contaminated shapes, itis difficult to accomplish the segmentation by using purely

image-based methods. Figure 6a shows some examples ofthe abovementioned CT images. The segmentation resultsand comparisons are presented in Fig. 6b–f. As indicated inthe Fig. 6b, without involving prior knowledge, the segmen-tation results of the MC-based method severely influencedby the noise and the connected background. The SMs-basedmethod in (c) used the general CV model without anyconstraint as the data-driven term. Obviously, this methodseverely misled the model. The edge-based SSC modeland P-SSR in (d) and (e) obtained slightly better resultsfor those parts that were not contaminated. However, thesegmentations were also influenced by the contaminationsobviously. Figure 6h is the segmentation results of ourmodel,and it can be seen that our model separated the shape ofinterest from the connected region even though the shapehad similar background connected or different geometrictransformation.

To make the shape representation of our model clearer,we show the original training samples corresponding to thethree largest sparse coefficients in the first three columns ofFig. 7. The fourth to sixth columns show the normalizedsamples. The seventh to eighth columns show the recov-ered shape in the log-polar system and the original shapespace, respectively. The ninth column shows the supervised

123


Fig. 11 First to third columnsshow the samples correspondingto the three largest coefficients,the fourth column is therecovered shapes in originalshape space, the fifth row is therecovered log-polar shapes, andthe sixth column shows therecovered sparse coefficients

segmentation result. This result also can be seen as a competi-tion result of the sparse representation term and image-driventerm. The tenth column shows the recovered sparse coeffi-cients. As shown in Figs. 6 and 7, though the object’s shapesin the images were not directly included in the training set,by using the proposed IC-LSSR, we successfully recovereda independent component-based sparse shape combinationfrom the training set to approximately represent the objectand supervise the segmentation.

The optimization of our method is driven by three terms:the seed-based constraint, the low-level image informationand the high-level SSR. Therefore, different from the purelyseed-basedmethods [30,31,33] that need to initialize the seedinside the object, our model has better robustness. Figure 8shows some of the examples of our method with differentseed initializations. As is shown, our IC-LSSR model cansuccessfully recover the shape even when the seeds are ini-tialized partly outside the object.

We also tested our model on the walking people seg-mentation. The training set included 172 silhouettes of awalking people, and some of the training samples are shownin Fig. 9. The input images were the 215 video frames ofa walking person, and parts of the images are presented inFig. 10a.

As we can see, the object was severely contaminated.The second to fifth rows of Fig. 10 show the results of theMC-based method [33], the SMs [13], the SSC [21] andthe P-SSR method in [17], respectively. For the shapes thatwere not occluded or overlappedwith the similar background(the last column), all the prior knowledge-based methodsobtained satisfactory results. However, for the shapes thatwere contaminated, the performance of these methods wasgetting worse, and the segmentations were influenced by thesurrounding regions and occlusions obviously. The SMs in(c) and P-SSR in (e) failed to recover the original shape,while the SSC in (d) recovered a shape that was unfaith-ful to the object. Figure 10f is the segmentation results ofour model; by using a local constraint and the indepen-dent component-based SSR, our model successfully seg-mented the shape of interest from the occlusion and similarbackground.

We also show the original training samples correspond-ing to the three largest sparse coefficients in the first threecolumns of Fig. 11. The fourth and fifth columns show therecovered shape in the log-polar system and original shapespace, respectively. The sixth column shows the recoveredcoefficients.

123

J. Yao et al.

Table 1 Comparison of the average errors, computational time and iteration number

CT image segmentation Walking people segmentation

Ave. time (s) Ave. error (%) Ave. iterationnumber

Ave. time (s) Ave. error (%) Ave. iterationnumber

MC 0.66 ± 0.05 59.54 ± 6.96 7.33 ± 0.71 0.57 ± 0.05 196.39 ± 31.63 6.55 ± 0.59

SSC 4.70 ± 0.91 31.19 ± 3.26 66.01 ± 13.37 3.36 ± 0.25 61.13 ± 7.73 57.76 ± 4.10

P-SSR 9.60 ± 0.97 19.96 ± 2.10 23.16 ± 2.33 7.54 ± 0.65 65.79 ± 5.65 16.73 ± 1.47

SMs 19.65 ± 1.91 45.79 ± 10.11 85.15 ± 8.36 15.67 ± 1.30 51.31 ± 11.03 61.43 ± 5.19

Our 1.17 ± 0.10 3.61 ± 0.36 15.49 ± 1.27 0.75 ± 0.06 6.83 ± 0.70 10.93 ± 0.91

Table 1 compared the average errors (the percentage ofincorrectly labeled pixels per image), the iteration numberand the computational time of several methods. For the aver-age errors issue, we used the manually segmented resultsas the ground truth. We did all the experiments ten roundswith random initialization near the object. The first num-ber of each cell of the Table 1 represents the total average,and the second number is the maximal deviation. The boldfont marks the minimum value to facilitate comparison. Asis shown, the purely region-based MC consumed the leastcomputational time and iteration number. However, due tothe absence of the prior knowledge-based term, the aver-age errors of MC were extremely high. Among the priorknowledge-basedmethods, the time-consuming of ourmodelwas slightly better than SSC and significantly better than P-SSR and SMs. For the energy function optimization issue,P-SSR and our method required relatively less iteration thanSSC and SMs. Since our model involved both higher-levelshape representation term and local seed constraint, the aver-age segmentation error of ourmethodwas significantly betterthan all the other methods.

6 Conclusions

A novel sparse neighbor-based SSR framework, called IC-LSSR, was introduced in this paper. The model follows froma new energy function that combines the variational SSRand the independent component-based shape representationwithin a unified theoretical framework. Our model adopteda log-polar transformation-based method to build geometricinvariance SSR. Due to the linear shift property of the log-polar shape, the proposed model transforms the geometricinvariance problem of the SSR into a pair of shape linearshift problems. A local seed constraint was also proposed inthe paper to obtain amore accurate local SSR.With the recov-ery of the sparsest coefficients, the online decision of the bestshape representation is obtained according to the sparse coef-ficients and the independent components. The competitionbetween the image-data-driven term and the sparse represen-

tation term eventually segments the shape taking into accountof both the image information and the high-level SSR. Wetestedourmodel onbothpublic datasets and real applications,and the experimental results show a satisfactory segmenta-tion performance.

In future work, we plan to consider new shape projectionmethod to further improve the performance. Other possi-ble directions for such research include incorporating a newimage-data-driven model or a visual attention mechanism togenerate a more accurate shape representation.

Acknowledgments This work is supported by the Natural Sci-ence Foundation of China (NSFC No. 61471321 and No. 61202400)and National Key Basic Research Project of China (973 Program2012CB316400).

Appendix 1

Proof of Proposition 1 Given aγ ∈ [0, 1], for the arbitraryvariational shapes η1 = ψ + ∑n

i=1 b1i mi ∈ ξ and η2 =

ψ + ∑ni=1 b

2i mi ∈ ξ , we have

η = γψ1 + (1 − γ )ψ2

= ψ +n∑

i=1

(γ (b1i − b2i ) + b2i

)mi

Let bi = γ (b1i −b2i )+b2i , we have η = ψ +∑ni=1 bimi ∈

ξ . Thus the shape set ξ is a convex set. �

Appendix 2

Proof of Proposition 2 Since A = [a1, a2, . . . , aN ] ∈ RC×N

is the projected training set, and s = [s1, s2, . . . , sN ]T ∈ RN

be a sparse coefficient, for any input s we have As ∈ RC ,

considering that b = [b1, b2, . . . , bC ]T ∈ RC in Proposi-

tion 1 is a arbitrary vector, thus all the sparse combinationAs can be seen as a subset of b and the As-based set ζ alsocan be seen as a subset of vector b-based ξ .

123


The proof of convexity of ζ is similar to Proposition 1,given a γ ∈ [0, 1], for the arbitrary variational shapes ϑ1 =ψ+∑C

i=1 (As1)imi ∈ ξ andϑ2 = ψ+∑ni=1 (As2)imi ∈ ξ ,

we have

ϑ = γϑ1 + (1 − γ )ϑ2

= ψ +n∑

i=1

(A

(γ

(s1 − s2

)+ s2

))mi

Let s = γ(s1 − s2

) + s2, we have ϑ = ψ + ∑ni=1

(As

)i

mi ∈ ζ . Thus the shape set ζ is also a convex set. �

References

1. Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: algorithms based on Hamilton–Jacobi formula-tions. J. Comput. Phys. 79(1), 12–49 (1988)

2. Chan, T.F., Vese, L.A.: Active contours without edges. IEEE Trans.Image Process. 10(2), 266–277 (2001)

3. Roe, E., Mello, C.A.B.: Restoring images of ancient color post-cards. Vis. Comput. 31(5), 627–641 (2015)

4. Ye, J.T., Xu, G.L.: Geometric flow approach for region-basedimage segmentation. IEEE Trans. Image Process. 21(12), 4735–4745 (2012)

5. Lu, Z., Carneiro, G., Bradley, A.P.: An improved joint optimizationof multiple level set functions for the segmentation of overlap-ping cervical cells. IEEE Trans. Image Process. 24(4), 1261–1272(2015)

6. Chiwoo, P., Huang, J.Z., Ji, J.X., Ding, Y.: Segmentation, infer-ence and classification of partially overlapping nanoparticles. IEEETrans. Pattern Anal. Mach. Intell. 35(3), 1–10 (2013)

7. Liu, X., Song, M., Tao, D., Bu, J., Chen, C.: Random geometricprior forest for multiclass object segmentation. IEEE Trans. ImageProcess. 24(10), 3060–3070 (2015)

8. Yi, C.C., Tian, Y.L.: Localizing text in scene images by boundaryclustering, stroke segmentation, and string fragment classification.IEEE Trans. Image Process. 21(9), 4256–4268 (2012)

9. Cremers, D., Schmidt, F., Barthel, F.: Shape priors in variationalimage segmentation: convexity, Lipschitz continuity and globallyoptimal solutions. In: Proceedings of the IEEE International Con-ference on Computer Vision and Pattern Recognition (CVPR),pp. 1–6 (2005)

10. Lan, R., Sun, H.: Automated human motion segmentation viamotion regularities. Vis. Comput. 31(1), 35–53 (2015)

11. Cremers, D., Schnorr, C., Weickert, J.: Diffusion-snakes: com-bining statistical shape knowledge and image information in avariational framework. In: IEEE Workshop on Variational andLevel Set Methods in Computer Vision, pp. 137–144 (2001)

12. Tran, T., Pham, V., Shyu, K.: Moment-based alignment for shapeprior with variational B-spline level set. Mach. Vis. Appl. 24(5),1075–1091 (2013)

13. Lecumberry, F., Pardo, A., Sapiro, G.: Simultaneous object classi-fication and segmentation with high-order multiple shape models.IEEE Trans. Image Process. 19(3), 625–635 (2010)

14. Chen, F., Yu, H.M., Hu, R., Zeng, X.: Deep learning shape priorsfor object segmentation. In: Proceedings of the IEEE InternationalConference on Computer Vision and Pattern Recognition (CVPR),pp. 1870–1877 (2013)

15. Berretti, S., Del Bimbo, A., Pala, P.: Sparse matching of salientfacial curves for recognition of 3-D faces with missing parts. IEEETrans. Inf. Forensics Secur. 8(2), 374–389 (2013)

16. Boscaini, D., Castellani, U.: A sparse coding approach for local-to-global 3D shape description. Vis. Comput. 30(11), 1233–1245(2014)

17. Chen, F., Yu, H.M., Hu, R.: Shape sparse representation for jointobject classification and segmentation. IEEETrans. Image Process.22(3), 992–1004 (2013)

18. Zhong,M.,Qin,H.: Sparse approximation of 3D shapes via spectralgraph wavelets. Vis. Comput. 30(6), 751–761 (2014)

19. Abiantun, R., Prabhu, U., Savvides, M.: Sparse feature extractionfor pose-tolerant face recognition. IEEETrans. PatternAnal.Mach.Intell. 36(10), 2061–2073 (2014)

20. Zhang, S.T., Zhan, Y.Q., Dewan, M., Huang, J.Z.: Sparse shapecomposition: a new framework for shape prior modeling. In: Pro-ceedings of the IEEE International Conference onComputerVisionand Pattern Recognition (CVPR), pp. 451–458 (2011)

21. Zhang, S.T., Zhan, Y.Q., Dewan, M., Huang, J.Z., Metaxas, D.N.,Zhou, X.S.: Towards robust and effective shape modeling: sparseshape composition. J. Med. Image Anal. 16(3), 265–277 (2012)

22. Spratling, M.W.: Image segmentation using a sparse coding modelof cortical area V1. IEEE Trans. Image Process. 22(4), 1631–1643(2013)

23. Pierre, C.: Independent component analysis: a new concept. J. Sig-nal Process. 36(3), 287–314 (1994)

24. Hyvarinen, A.: Fast and robust fixed-point algorithms for inde-pendent component analysis. IEEE Trans. Neural Netw. 10(3),626–634 (1999)

25. Chien, J., Hsieh, H.: Convex divergence ICA for blind source sep-aration. IEEE Trans. Audio Speech Lang. Process. 20(1), 302–313(2012)

26. Tao, M., Zhou, F., Liu, Y., Zhang, Z.: Tensorial independent com-ponent analysis-based feature extraction for polarimetric SAR dataclassification. IEEE Trans. Geosci. Remote Sens. 53(5), 2481–2495 (2015)

27. Chen, S.Q., Cremers,D., Radke,R.J.: Image segmentationwith oneshape prior—a template-based formulation. J. Image Vis. Comput.30(12), 1032–1042 (2012)

28. Gonzalez, R.: Fourier based registration of differentially scaledimages. In: IEEE International Conference on Image Processing(ICIP), pp. 1282–1285 (2013)

29. Cooley, J.W., Tukey, J.W.: An algorithm for the machine calcula-tion of complex Fourier series. J. Math. Comput. 19(90), 297–301(1965)

30. Ahn, J., Kim, K., ByunH.: Robust object segmentation using graphcut with object and background seed estimation. In: InternationalConference on Pattern Recognition (ICPR), pp. 361–364 (2006)

31. Dong, C., Yan, B., Chen, W., Zeng, L., Chen, J., Li, J.: Iterativegraph cuts segmentation with local constraints. In: InternationalConference on Natural Computation (ICNC), pp. 1377–1381(2013)

32. Bay,H., Tuytelaars, T.,Gool, L.V.: Surf: speededup robust features.J. Comput. Vis. Image Underst. 110(3), 346–359 (2008)

33. Klodt, M., Cremers, D.: A convex framework for image seg-mentation with moment constraints. In: Proceedings of the IEEEInternational Conference on Computer Vision and Pattern Recog-nition (CVPR), pp. 2346–2243 (2011)

34. Donoho, D.: De-noising by soft-thresholding. IEEE Trans. Inf.Theory 41(3), 613–627 (1995)

35. Bredies, K., Lorenz, D.: Linear convergence of iterative soft-thresholding. J. Fourier Anal. Appl. 14(5), 813–837 (2008)

36. www.openmedimg.com

123

www.openmedimg.com

J. Yao et al.

Jincao Yao received the B.S.degree in computer and infor-mation science from Hefei Uni-versity of Technology and theM.S. degree in signal processingfrom Shanghai Normal Univer-sity, in 2004 and 2007, respec-tively. He is currently pursu-ing the Ph.D. degree in signaland information processing withthe College of Information Sci-ence andElectronic Engineering,Zhejiang University, Hangzhou,China. His current research inter-ests include machine learning,

computer vision and shape-driven techniques in image processing.

Huimin Yu received the Ph.D.degree in communication andelectronic systems from theCollege of Information Sci-ence andElectronic Engineering,Zhejiang University, Hangzhou,China, in 1996. He is currentlya Professor with the College ofInformation Science and Elec-tronic Engineering and the StateKey Laboratory of CAD&CG,Zhejiang University. His currentresearch interests include com-puter vision and machine learn-ing, including 2-D and 3-D video

and image processing, depth acquisition, and image recognition andunderstanding.

Roland Hu was born inChongqing, China, in 1979. Hereceived the B.Sc. degree in elec-trical engineering from TsinghuaUniversity, Beijing, China, andthe Ph.D. degree in audio-visual person recognition fromthe University of Southampton,Southampton, UK, in 2002 and2007, respectively. He was aPost-Doctoral Researcher withtheCommunications andRemoteSensing Laboratory, UniversitéCatholique de Louvain, Louvain,Belgium, from 2007 to 2009.

Since 2009, he has been an Associate Professor with the Collegeof Information Science and Electronic Engineering, Zhejiang Univer-sity, Hangzhou, China. His current research interests include computervision, image processing, pattern recognition and digital watermarking.

123

A new sparse representation-based object segmentation ...

Documents