Fenchel Duality Based Dictionary Learning for Restoration of Noisy Images

5214 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 12, DECEMBER 2013

Fenchel Duality Based Dictionary Learning forRestoration of Noisy Images

Shanshan Wang, Student Member, IEEE, Yong Xia, Member, IEEE, Qiegen Liu,Pei Dong, Student Member, IEEE, David Dagan Feng, Fellow, IEEE, and Jianhua Luo

Abstract Dictionary learning based sparse modeling has beenincreasingly recognized as providing high performance in therestoration of noisy images. Although a number of dictionarylearning algorithms have been developed, most of them attackthis learning problem in its primal form, with little effort beingdevoted to exploring the advantage of solving this problem in adual space. In this paper, a novel Fenchel duality based dictionarylearning (FD-DL) algorithm has been proposed for the restorationof noise-corrupted images. With the restricted attention to theadditive white Gaussian noise, the sparse image representation isformulated as an 2-1 minimization problem, whose dual formu-lation is constructed using a generalization of Fenchels dualitytheorem and solved under the augmented Lagrangian frame-work. The proposed algorithm has been compared with fourstate-of-the-art algorithms, including the local pixel grouping-principal component analysis, method of optimal directions,K-singular value decomposition, and beta process factor analysis,on grayscale natural images. Our results demonstrate that the

Manuscript received December 27, 2012; revised June 1, 2013 andAugust 29, 2013; accepted September 1, 2013. Date of publication Sep-tember 20, 2013; date of current version October 8, 2013. This work wassupported in part by the Australian Research Council grant, in part by thejoint project of the National Natural Science Foundation of China underGrant 30911130364 and French ANR 2009 under Grant ANR-09-BLAN-0372-01, in part by the Region Rhne-Alpes of France under Project MiraRecherche 2008, in part by the Youth Scientific Research Foundation ofJiangxi Province under Grant 20132BAB211030, in part by the NationalNatural Science Foundation of China under Grant 61362001, and in part bythe China Scholarship Council under Grant 2011623084. The associate editorcoordinating the review of this manuscript and approving it for publicationwas Prof. Wai-Kuen Cham. (Corresponding author: S. Wang.)

S. Wang is with the School of Biomedical Engineering, Shanghai JiaoTong University, Shanghai 200240, China, and also with the Biomedicaland Multimedia Information Technology (BMIT) Research Group, Schoolof Information Technologies, University of Sydney, Sydney 2006, Australia(e-mail: [email protected]).

Y. Xia is with the BMIT Research Group, School of Information Technolo-gies, University of Sydney, Sydney 2006, Australia, and also with the ShaanxiProvincial Key Labortaory of Speech and Image Information Processing,School of Computer Science, Northwestern Polytechnical University, Xian710072, China (e-mail: [email protected]).

Q. Liu is with the Institute of Biomedical and Health Engineering, SIAT,Chinese Academy of Sciences, Shenzhen 518055, China, and also withthe Department of Electronic Information Engineering, Nanchang University,Nanchang 330031, China (e-mail: [email protected]).

P. Dong is with the BMIT Research Group, School of InformationTechnologies, University of Sydney, Sydney 2006, Australia (e-mail: [email protected]).

D. D. Feng is with the BMIT Research Group, School of InformationTechnologies, University of Sydney, Sydney 2006, Australia, and also withMed-X Research Institute, Shanghai Jiao Tong University, Shanghai 200030,China (e-mail: [email protected]).

J. Luo is with the School of Aeronautics and Astronautics, Shanghai JiaoTong University, Shanghai 200240, China (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIP.2013.2282900

FD-DL algorithm can effectively improve the image quality andits noisy image restoration ability is comparable or even superiorto the abilities of the other four widely-used algorithms.

Index Terms Fenchel duality, dictionary learning, non-linearconjugate gradient descent method, image restoration, dualaugmented Lagrangian.

I. INTRODUCTION

D IGITAL images are frequently affected by noise duringacquisition or transmission in a noisy environment. It haslong been desired that an algorithm can restore as close aspossible the true image through removing the noise from theobserved image. The image denoising problem is important,not only because it serves many image and video applications,but also because it, as one of the most fundamental inverseproblems, provides a convenient platform over which imageprocessing algorithms can be assessed [1]. During the pastdecades, image denoising has been widely studied and alarge number of algorithms have been proposed to addressthis problem from different perspectives and diverse pointsof view. These algorithms include various local spatial filters,non-local filters, transform domain filters and total-variationalmethods [2]. Recently, the denoising techniques via sparserepresentation of an image over a trained redundant dictionaryhave drawn increasing research attention, due to the factthat natural images are intrinsically sparse in some domains[3][5]. Sparse image modeling has become recognized as pro-viding extremely high performance for applications as diverseas noise reduction [1], deblurring [6], super-resolution [7],image reconstruction [8] and blind source separation [9].Therefore, in this paper we intend to focus on sparse rep-resentation based image denoising approaches and restrict ourattention to the additive white Gaussian noise, which is oneof the most common types of acquisition noise encounteredin real applications.

The research on sparse image modeling can be tracedback to 1996, when Olshausen and Field [10] revealed thebiological foundation for learning sparse codes of naturalimages. With the observation that the visual cortex tries toproduce an efficient representation of an image by extractingthe statistically independent structures in it, they assumed thatan image x RM can be represented by a linear superpositionof basis functions

x =N

j=1a j j (1)

1057-7149 2013 IEEE

WANG et al.: FD-DL FOR RESTORATION OF NOISY IMAGES 5215

where a = [a1, a2, ..., aN ]T RN is a coefficient vector, andeach basis function j RM is also known as an atom ofthe dictionary RMN . They further advocated that thedesirable properties in mammalian primary visual cortex, suchas being spatially localized, oriented and bandpass, can be welldescribed when the following two objectives are imposed onthe representation [10], [11]

E = [information preservation] [sparseness of a] (2)where is a positive parameter that balances these twoobjectives. Generally, the quality of information preservationcan be measured by the p-norm of the difference betweenthe actual image and reconstructed image, and the sparsenessof the coefficient vector can be characterized by its 0-normor 1-norm [1]. Thus, the image x can be sparsely representedthrough solving the following optimization problem

min,a

x a2p + aq (3)where p is typically set as 1, 2 or , and q {0, 1}. To takeinto account the content variation across different regions orimages, the dictionary is usually learned by minimizing thetotal objective over a set of randomly selected image patchesX = [x1, x2, ..., xL] RML , shown as follows

min

L

i=1min

aixi ai2p + aiq . (4)

To solve this problem, we can adopt analytical dictionar-ies, such as the discrete cosine transform (DCT), wavelets,curvelets and contourlets, and use approximation algorithmsto find a suboptimal sparse representation. Popular sparseapproximation algorithms include the greedy algorithms suchas matching pursuit (MP) [12], [13] and orthogonal matchingpursuit (OMP) [14], convex relaxation algorithms such as thebasis pursuit (BP) [15] and least absolute shrinkage and selec-tion operator (LASSO), and the focal underdetermined systemsolver (FOCUSS) [16]. Although simplifying the sparse repre-sentation problem, analytical dictionaries may fail to describeeffectively the image due to the lack of the adaptability tolocal image structures.

Besides analytical dictionaries, there exist other dictionariesthat are more likely to lead to better approximation qualityand sparsity of the coefficient vector. Hence, many adaptivedictionary learning algorithms have been proposed in theliterature [17], which can be roughly grouped into proba-bilistic methods and non-probabilistic methods. Olshausen andField [11] developed a maximum likelihood (ML) dictionarylearning algorithm, which aims to maximize the likelihood thatthe image x has an effective and sparse representation over thedictionary , i.e.

= argmax

log P(x|)

= argmax

log

aP(x|a,)P(a)da. (5)

To ensure the computational tractability of the integral like-lihood, they assumed that the prior distribution P(a) is aproduct of Laplacian distributions for each coefficient and the

approximation noise is the zero-mean Gaussian noise [17].Lewicki and Sejnowski [18] modified the first assumptionand approximated the integral form of the likelihood witha Gaussian integral around the posterior estimate of a.Olshausen and Millman [19] learned the overcomplete dic-tionary based on modeling the sparsity of coefficients witha Gaussian mixture distribution. Plumbley [20] modified thesecond assumption by assuming the approximation noise bezero and developed an exact 1 sparse optimization algorithm.To obtain more stable solutions, Bagnell and Bradley [21]replaced the 1 constraint with a Kullback-Leibler (KL)divergence, which leads to efficient convex inference andstable coefficient vectors [17]. Engan et al. [22] developedthe method of optimal directions (MOD), which uses theOMP algorithm [14] to find a sparse vector and introduces aclosed-form solution for the dictionary update step. Instead ofmaximizing the likelihood P(x|), Kreutz-Delgado et al. [23]maximizes the posterior probability P(, a|x). In this max-imum a posteriori (MAP) dictionary learning algorithm, anadditional constraint is imposed on the dictionary and thesparse approximation is realized with the FOCUSS [16].Zhong et al. [24] employed the MAP estimation to calculatethe conditional moments of the posterior distribution, whichenables the development of an expectation-maximization (EM)algorithm for learning the overcomplete dictionary and infer-ring the most probable sparse coefficients. Besides the integralapproximation, sampling techniques have also been used indictionary learning. Dobigeon and Tourneret [25] proposed ahierarchical Bayesian model based on the assumption that thedictionary be orthogonal and the prior distributions of coeffi-cients be Bernoulli-Gaussian processes with hyperparameters.They employed the Markov chain Monte Carlo (MCMC) sam-pling strategy to estimate the unknown coefficients, dictionaryand noise variance. The Bayesian technology can also be usedwhen a beta process is employed as the prior for learning thedictionary. Zhou et al. [26] extended the beta process factoranalysis (BPFA) model and proposed three related priors forsparse binary vectors: (1) the basic truncated beta-Bernoulliprocess, (2) a Dirichlet process (DP), and (3) a probit stick-breaking process (PSBP). Consequently, they developed theBPFA, DP-BPFA and PSBP-BPFA algorithms for a rangeof applications, including image denoising, interpolation andreconstruction.

Some dictionary learning algorithms are based on vectorquantization (VQ) achieved by K-means clustering [17].Schmid-Saugeon and Zakhor [27] developed a VQ methodfor matching pursuit based video coding. Aharon et al. [28]proposed the K-singular value decomposition (K-SVD)algorithm, which updates the columns of the dictionarysequentially, one at a time, by utilizing the singular valuedecomposition. This update process is a generalized K-meanssince each patch can be represented by multiple atomswith different weights [17]. The description of dictionariescan be simplified by assuming a dictionary be a set offunctions determined by a small number of parameters.Parametric dictionary learning algorithms [29], [30]aim to optimize those parameters instead of the dictionaryitself. Although it can save a considerable amount of memory


and computation, this strategy has the intrinsic limitationof requiring a properly and empirically selected parametricdictionary that only matches the structure of a specified classof signals. Rubinstein et al. [31] developed a double sparsityalgorithm, which combines the advantages of structured andimplicit dictionaries. Fast and efficient dictionary learningapproaches have been studied to gain additional leverageon performance and computational cost. Lee et al. [32]proposed efficient sparse coding algorithms that are basedon iteratively solving the 1-regularized and 2-constrainedleast squares problem. Nevertheless, these attempts still facethe difficulties caused by the non-convexity of the objective,and may result in less robust performance. A remedy is tosample more image patches so as to average and stabilizethe result. This strategy, however, dramatically increases theoverall computational complexity. In our previous work, wehave proposed a predual dictionary learning algorithm [33],which, unfortunately, only solves the dictionary learningproblem with nonnegative sparse coefficients. We furtherdeveloped a class of Bregman iteration/augmented Lagrangianbased dictionary learning methods for restoring noisy images[34], [35]. While most above mentioned methods updatethe dictionary in the batch mode, there are also some otherK-means variant algorithms adapting the dictionary to onlyone sample or a mini batch of samples at each iteration, suchas [36], [37], known as online dictionary learning.

As shown in our survey, most existing studies attack thedictionary learning task in a very direct way, no matter throughsolving the maximum likelihood problem given in (5) oroptimizing the primal formulation shown in (4). Recently, thedual augmented Lagrangian (DAL) algorithm [38], [39] intro-duced in the machine learning community has demonstratedits super linear convergence property and good optimizationperformance. Inspired by the DAL algorithm, in this paperwe intend to approach the dictionary learning problem fromthe perspective of solving the dual problem, with the aimof further improving the accuracy and efficiency of sparseimage representation. Our motivations are three-fold. First,the Fenchel duality based formulation converts the problemfrom searching a minimum in RN to searching a maximumin RM . Since normally N M in overcomplete/redundantdictionary learning [11], [18], [24], [28], this formulationprovides a theoretical basis to reduce the complexity. Second,the sparse coefficient a is treated as a Lagrangian multiplierin the augmented Lagrangian (AL) framework, and thus nolonger suffers from the non-differentiability and the couplingof the original primal function [38], [39]. Third, the dualformulation provides a lower bound for the primal functionwhich can further constrain the dictionary learning problem[39], [40]. Therefore, we propose the Fenchel duality baseddictionary learning (FD-DL) algorithm for the restoration ofnoisy images. In this algorithm, we formulate the sparserepresentation of each image patch as an 2-1 minimizationproblem and employ a generalization of Fenchels dualitytheorem to attain the dual formulation of the original objectivefunction. To enable direct update of the restored image, weadd a global constraint to the objective function, which isoptimized under the AL framework using a two-stage nested

iterative process. We also employ the non-linear conjugategradient descent algorithm to attack the inner minimizationtask more efficiently. To demonstrate the improved perfor-mance in noisy image restoration, the proposed algorithmhas been compared to four state-of-the-art algorithms, includ-ing the local pixel grouping-principal component analysis(LPG-PCA) [41], MOD algorithm, K-SVD algorithm andBPFA algorithm, on a set of natural images which werecorrupted by different levels of white Gaussian noise.

II. MATHEMATICAL PRELIMINARIES

Definition 1 Fenchel-Legendre conjugate [40]: Given afunction f : y RN (,], the Fenchel-Legendreconjugate is

f (z) = supy

{yTz f (y)

}(6)

where y is the primal variable, z is the dual variable, andsup {} denotes the supremum operator.

Theorem 1 A generalization of Fenchels duality theorem[38], [40]: Let f be a proper convex function on RN , g be aproper convex function on RM , and be a linear transformfrom RM to RN . One has

infx

{ f (x) + g (x)} = sups

{ f (s) g

(Ts

)}(7)

where f and g are the Fenchel-Legendre conjugate functionsof f and g, respectively, and inf {} denotes the infimumoperator.

III. IMAGE RESTORATION MODEL

An observed image Y can be viewed as an unknown trueimage I being contaminated with additive white Gaussiannoise n

Y = I + n. (8)Let R(i) denote an M N matrix that extracts the i -th

M M image patch from the image I of N pixels. Eachof the L partially overlapping image patches can be expressedas

xi = R(i)I, i = 1, 2, ..., L . (9)The sparse representation of each image patch xi over aredundant dictionary can be formally given as follows

xi = ai s.t. ai0 C1 (10)where C1 is a positive constant that controls the sparsity of therepresentation and each column of is constrained to haveunit Euclidean norm, i.e. i2 = 1 for i = 1, 2, ..., N , toavoid the scaling ambiguity [1], [17]. Besides this constrainton each patch, the following prior global constraint is alsoused to ensure the proximity between images Y and I [1]

Y I22 C2 2 (11)where C2 denotes a positive constant that constrains therestoration error, and is the standard deviation of thenoise. With properly selected weighting parameters and ,


the image restoration can be formulated as the followingoptimization problem

argmin,A,I

Y I22 +L

i=1

R(i)I ai2

2 + L

i=1ai1 (12)

where both and are positive values. In this objectivefunction, the first term emphasizes a global force that con-strains the restored image, the second term manifests the localconstraint on the representation of each patch, and the thirdterm controls the sparseness of the coefficient vectors [1].It should be noted that we relax the 0-norm with 1-normfor the sparse regularization of each patch xi to avoid thecomputational intractability [1], [17].

IV. FD-DL ALGORITHM

To solve the above optimization problem, we adopt thedivide-and-conquer strategy [42] and convert it into two sub-problems

argmin,A

L

i=1

R(i)I ai2

2 + L

i=1ai1 (13)

argminI

Y I22 +L

i=1

R(i)I ai2

2 (14)

which can be solved separately in an iterative process.

A. Update Dictionary and Coefficient MatrixTo solve the sub-problem given in (13), we start with one

image patch and omit the subscript i for notational simplicity

minRMN

minaRN

x a22 + a1 =: fp(a)

. (15)

We consider fp(a) defined in (15) as a primal function ofvariable a. According to the Fenchels duality theorem, wecan get the dual formulation as follows

minRMN

maxsRM

14s 2x22 + x22 (Ts)

=: fd(s)

(16)

where (v) =N

j=1 (v j

), and

(v j

) = 0, if v j ,

and (v j

) = + otherwise.After introducing an assistant variable v to approxi-

mate Ts, the AL framework allows the pursuit of and ato be solved by attacking the following optimization problem[38], [39]

{a = argmina maxs,v L (s, v;, a) = argmin L (s, v;, a) (17)

where

L(s, v;, a) = 14s 2x22 + x22 (v)

+aT(v Ts) 2v Ts22 (18)

is the AL function, is a positive penalty parameter, andthe primal variable a is introduced as a Lagrangian multi-plier [38], [39]. Here, the tilde is utilized to label the variableswhich maximize or minimize their relative functions.

This problem can be solved through an iterative process thatupdates the dual variables s and v, sparse coefficients a, anddictionary , respectively. At the k-th iteration, we assume theprimal variable a and are fixed and implicitly eliminate vvia expressing it as a function of s, shown as follows

vk = argmaxv

Lk (s, v;k, ak)

= argminv

(v) +k

2

v (k)Ts 1k

ak

2

2

= P((k)Ts + 1

kak

)(19)

where P (z) =(

min(|zi |, ) zi|zi |)n

i=1 is the projection of ann-dimensional vector z on the ball of radius , and theratio zi/|zi | is defined to be zero if zi = 0 [38].

After expressing vk as a function of s, the update of s nowturns to be

sk+1 = argmaxs

Lk (s, vk+1;k, ak)

= argmins

14s 2x22 +

k

2

ST((k)Ts + 1

kak

)2

2 =:g(s)

(20)where

ST (z) = z P (z) =(

max (|zi | , 0) zi|zi |)n

i=1

=

zi , if zi 0, if < zi < zi + , if zi .

(21)

Note that the maximization problem is now replaced by theminimization of a surrogate function g(s), whose gradient canbe calculated as follows

g (s) = 12

(s 2x) + kkST((k)Ts + 1

kak

). (22)

To avoid the computationally expensive calculation of theobjective functions Hessian [39], we employ the non-linearconjugate gradient descent algorithm with backtracking linesearch [43] to attack this minimization problem, as summa-rized in Algorithm 1.

With the updated value of sk+1, the value of v can bedirectly updated according to (19), which we rewrite as follows

vk+1 = P((k)

Tsk+1 + 1

kak

). (23)

Next, the sparse coefficients a can be updated as [38], [39]ak+1 = ak + k

((k)

Tsk+1 vk+1

)

=k[(

1k

ak + (k)Tsk+1)P

((k)

Tsk+1+ 1

kak

)]

= STk(k(k)

Tsk+1 + ak

)(24)


Algorithm 1 Non-Linear Conjugate Gradient DescentAlgorithm With Backtracking Line Search1: Task: Train the numerical approximation solution sk+1 of the

problem in (20).2: Parameters: J stop criteria by the number of iterations.3: , and W line search parameters.4: Initialization: j 0; d0 g

(sk,0

); sk,0 d0.

5: while j < J do6: t 0.01, w 07: while g

(sk, j + tsk, j

)> g

(sk, j

)+ t

(d j

)Tsk, j and w 20),the FD-DL algorithm outperforms the other algorithms onmost, if not all, test images. Meanwhile, when all test imagesand all noise levels were considered, the overall win rate ofFD-DL is higher than those of the LPG-PCA, MOD, K-SVD,and BPFA.

VI. DISCUSSION

A. Initial DictionaryProper dictionary initialization plays an important role in

many dictionary learning algorithms. Since the DCT-basedinitial dictionary has been widely used in many algorithms,such as the MOD [22] and K-SVD [28], we conductedall denoising experiments again to evaluate the performancevariation of those four algorithms caused by the change of theinitial dictionary.

For each test case, we initialized the MOD [22],K-SVD [28] and proposed FD-DL algorithm with the sameovercomplete DCT dictionary. As for the BPFA, we respectedits freedom to use its own designed singular value decompo-sition (SVD) based dictionary initialization. Table III lists theSSIM indices of restored images. It shows that the proposedFD-DL algorithm performed better in cases of medium orhigh noise. Specifically, the FD-DL won on more imagesat most noise levels. Meanwhile, when all 12 noise levelswere considered, the BPFA performed best on two images andthe FD-DL performed best on four images. In summary, theFD-DL achieved better image restoration quality in 61.1% ofthe test cases, which is higher than the percentage archivedby the BPFA. These results show that the proposed FD-DLalgorithm also introduced better performance than theother three algorithms when non-random initial dictionarywas used.

Next, we compared the performance of the proposed FD-DLalgorithm when two dictionary initialization strategies wereused. For each test case, let the SSIM index generated byusing the DCT-based initial dictionary be subtracted by theSSIM index obtained by using the random initial dictionary.The difference of the SSIM indices was depicted in Fig. 6.


TABLE IAVERAGE SSIM OF THE DENOISED IMAGES OBTAINED BY APPLYING THE LPG-PCA ALGORITHM AND FOUR DICTIONARY LEARNING ALGORITHMS

WITH RANDOM INITIAL DICTIONARY TO EACH WHITE GAUSSIAN NOISE-CORRUPTED IMAGE FIVE TIMES. FOR EACH TEST IMAGE, THE SSIMVALUES OF NOISE-CORRUPTED IMAGE (TOP LEFT) AND RESULTS OF THE LPG-PCA (TOP RIGHT), MOD (MIDDLE LEFT), BPFA (MIDDLE RIGHT),

K-SVD (BOTTOM LEFT) AND PROPOSED FD-DL ALGORITHM (BOTTOM RIGHT) WERE DISPLAYED IN 3 2 GRID AND THE LARGEST SSIMWAS HIGHLIGHTED IN BOLD. NOTE THAT THE SSIM VALUES WERE COMPARED WITH MORE THAN THREE DECIMAL PLACES

It shows that the absolute difference between the SSIM indicesof restored image is in a range of [0, 0.015].

B. Computational CostTo assess the computational complexity of those five image

restoration algorithms, we ran the denoising experiments mul-tiple times (Intel Core 2 Duo 3.00 GHz, 4 GB RAM and 64-bit MATLAB version 7.11). The average time cost on imageswith different sizes and different noise levels was given inTable IV. It shows that the proposed FD-DL algorithm ismore efficient than the other four algorithms when the noise

level is low ( = 10). Unfortunately, when the noise level ismedium or high ( = 30 or 70), the FD-DL is more timeconsuming than the MOD and K-SVD. Meanwhile, it holdsfor all five algorithms that it costs more time to restore a largerimage.

Our current solution still has relatively high computationalcost in high level of noise, which can be ascribed to threemajor causes. First, since the sparse coefficient a is treated asa Lagrangian multiplier, it requires to be explicitly updatedin the optimization procedure. Second, our solution usesall extracted image patches to train the dictionary, whileothers may not. For example, the K-SVD algorithm only


TABLE IIAVERAGE PSNR OF THE DENOISED IMAGES OBTAINED BY APPLYING THE LPG-PCA ALGORITHM AND FOUR DICTIONARY LEARNING ALGORITHMS

WITH RANDOM INITIAL DICTIONARY TO EACH WHITE GAUSSIAN NOISE-CORRUPTED IMAGE FIVE TIMES. FOR EACH TEST IMAGE, THE PSNRVALUES OF NOISE-CORRUPTED IMAGE (TOP LEFT) AND RESULTS OF THE LPG-PCA (TOP RIGHT), MOD (MIDDLE LEFT), BPFA (MIDDLE RIGHT),

K-SVD (BOTTOM LEFT) AND PROPOSED FD-DL ALGORITHM (BOTTOM RIGHT) WERE DISPLAYED IN 3 2 GRID AND THE LARGEST PSNR WASHIGHLIGHTED IN BOLD. NOTE THAT THE PSNR VALUES WERE COMPARED WITH MORE THAN THREE DECIMAL PLACES

uses a portion of extracted patches when the number ofsamples is higher than a threshold. Third, our solution attacksthe inner minimization problem given in Eq. (20) usingthe nonlinear conjugate gradient descent method with back-tracking line search, as summarized in Algorithm 1, whichalthough avoids calculating the Hessian matrix, but remainstime-consuming.

In our future work, we will investigate sampling schemesto obtain a smaller but effective set of image patches fordictionary learning, and will further study more efficientoptimized techniques to attack the inner problem in Eq. (20).

C. Noise Estimation

Similar to many existing dictionary learning approaches [1],[33][35], our approach also uses the standard deviation ofnoise as a guidance for our implementation. In real applica-tions, this parameter can be estimated by using many statisticalor robust median noise estimators [47], [48]. We further com-pared the performances of the proposed denoising algorithmwith the true noise levels and estimated noise levels obtainedby a patch-based noise level estimation algorithm [48]. Theresults shown in Fig. 7 reveal that, no matter our algorithm


TABLE IIIAVERAGE SSIM OF THE DENOISED IMAGES OBTAINED BY APPLYING THE MOD (TOP LEFT), K-SVD (BOTTOM LEFT), BPFA (TOP RIGHT) ANDPROPOSED FD-DL ALGORITHM (BOTTOM RIGHT) WITH NON-RANDOM INITIAL DICTIONARY TO EACH WHITE GAUSSIAN NOISE-CORRUPTED

IMAGE (2 2 GRID) FIVE TIMES. FOR EACH CASE, THE SSIM VALUES WERE COMPARED WITH MORE THAN THREE DECIMAL PLACESAND THE LARGEST SSIM WAS HIGHLIGHTED IN BOLD

TABLE IVAVERAGE COMPUTATIONAL TIME COST OF APPLYING THE LPG-PCA,

MOD, K-SVD, BPFA AND PROPOSED FD-DL ALGORITHM TONATURAL IMAGES CONTAMINATED WITH DIFFERENT

LEVELS OF WHITE GAUSSIAN NOISE

was initialized with the DCT or random matrix, the denoisingperformances obtained by using the true noise levels andestimated noise levels are similar. Therefore, we suggest that,when applying the proposed algorithm to real applications,available noise estimation methods, such as the one reportedin Ref. [48] can be adopted to estimate the standard deviationof the noise.

Fig. 7. Performance of the proposed denoising algorithm under the guidanceof the true noise standard deviation (in blue) and estimated (in red) withDCT-based initialization in (a) PSNR and (b) SSIM and with random matrixbased initialization in (c) PSNR and (d) SSIM.

VII. CONCLUSION

Distinguished from most of existing dictionary learningwork, this paper provides researchers another option for solv-ing the original large-scale dictionary learning problems in adual space. The proposed FD-DL algorithm takes advantage


of the Fenchel duality and achieves sparse image represen-tation in a very effective way. When it was applied to therestoration of Gaussian noise corrupted images, the resultsshow that the FD-DL algorithm possesses competitive oreven superior ability to remove noise while preserving imagedetails compared to four state-of-the-art algorithms, namelyLPG-PCA, MOD, K-SVD and BPFA. Our future work willfocus on further improving the efficiency of the proposedalgorithm and applying it to image super-resolution and recon-struction.

APPENDIXTHE DEDUCTION OF THE DUAL FORMULATION

Based on the generalization of Fenchels duality theo-rem (7), we know the dual function of

minaRN

f (a) + a1 (29)

where f (a) = x a22, ismaxsRM

f (s) (Ts). (30)

If we consider the conjugate function f (s) of f (z) =x z22 with z = a, it is straightforward to derive f (s)as

f (s) = supz

(< s, z > x z22)

= < s, x 12

s > 14s22 =

14s 2x22 x22.

(31)Therefore, the dual formulation of fp(a) in (15) is fd(s)in (16).

REFERENCES[1] M. Elad and M. Aharon, Image denoising via sparse and redundant

representations over learned dictionaries, IEEE Trans. Image Process.,vol. 15, no. 12, pp. 37363745, Dec. 2006.

[2] A. Buades, B. Coll, and J. Morel, A review of image denoisingalgorithms, with a new one, Multiscale Model. Simul., vol. 4, no. 2,pp. 490530, 2005.

[3] H. Rauhut, K. Schnass, and P. Vandergheynst, Compressed sensingand redundant dictionaries, IEEE Trans. Inf. Theory, vol. 54, no. 5,pp. 22102219, May 2008.

[4] M. Yaghoobi, T. Blumensath, and M. Davies, Dictionary learningfor sparse approximations with the majorization method, IEEE Trans.Signal Process., vol. 57, no. 6, pp. 21782191, Jun. 2009.

[5] S. Wang, Q. Liu, Y. Xia, P. Dong, J. Luo, Q. Huang, and D. D. Feng,Dictionary learning based impulse noise removal via L1L1 mini-mization, Signal Process., vol. 93, no. 9, pp. 26962708, 2013.

[6] W. Dong, L. Zhang, G. Shi, and X. Wu, Image deblurring andsuper-resolution by adaptive sparse domain selection and adap-tive regularization, IEEE Trans. Image Process., vol. 20, no. 7,pp. 18381857, Jul. 2011.

[7] J. Yang, Z. Wang, Z. Lin, S. Cohen, and T. Huang, Coupled dictionarytraining for image super-resolution, IEEE Trans. Image Process.,vol. 21, no. 8, pp. 34673478, Aug. 2012.

[8] S. Ravishankar and Y. Bresler, MR image reconstruction from highlyundersampled k-space data by dictionary learning, IEEE Trans. Med.Imag., vol. 30, no. 5, pp. 10281041, May 2011.

[9] V. Abolghasemi, S. Ferdowsi, and S. Sanei, Blind separation of imagesources via adaptive dictionary learning, IEEE Trans. Image Process.,vol. 21, no. 6, pp. 29212930, Jun. 2012.

[10] B. A. Olshausen and D. J. Field, Emergence of simple-cell receptivefield properties by learning a sparse code for natural images, Nature,vol. 381, no. 6583, pp. 607609, 1996.

[11] B. Olshausen and D. Field, Sparse coding with an overcompletebasis set: A strategy employed by V1? Vis. Res., vol. 37, no. 23,pp. 33113326, 1997.

[12] S. Mallat and Z. Zhang, Matching pursuits with time-frequencydictionaries, IEEE Trans. Signal Process., vol. 41, no. 12,pp. 33973415, Dec. 1993.

[13] P. Durka, D. Ircha, and K. Blinowska, Stochastic time-frequencydictionaries for matching pursuit, IEEE Trans. Signal Process., vol. 49,no. 3, pp. 507510, Mar. 2001.

[14] J. Tropp, Greed is good: Algorithmic results for sparse approxi-mation, IEEE Trans. Inf. Theory, vol. 50, no. 10, pp. 22312242,Oct. 2004.

[15] S. Chen, D. Donoho, and M. Saunders, Atomic decomposition bybasis pursuit, SIAM J. Sci. Comput., vol. 20, no. 1, pp. 3361, 1998.

[16] I. Gorodnitsky and B. Rao, Sparse signal reconstruction from limiteddata using FOCUSS: A re-weighted minimum norm algorithm, IEEETrans. Signal Process., vol. 45, no. 3, pp. 600616, Mar. 1997.

[17] I. Tosic and P. Frossard, Dictionary learning, IEEE Signal Process.Mag., vol. 28, no. 2, pp. 2738, Mar. 2011.

[18] M. Lewicki and T. Sejnowski, Learning overcomplete representa-tions, Neural Comput., vol. 12, no. 2, pp. 337365, 2000.

[19] B. A. Olshausen and K. J. Millman, Learning sparse codes witha mixture-of-Gaussians prior, in Advances in Neural Informa-tion Processing Systems. Cambridge, MA, USA: MIT Press, 1999,pp. 841847.

[20] M. D. Plumbley, Dictionary learning for L1-exact sparse coding, inProc. 7th Int. Conf. ICA, LNCS 4666. 2007, pp. 406413.

[21] J. A. Bagnell and D. M. Bradley, Differentiable sparse coding, inAdvances in Neural Information Processing Systems. Cambridge, MA,USA: MIT Press, 2008, pp. 113120.

[22] K. Engan, S. O. Aase, and J. H. Husy, Multi-frame compression:Theory and design, Signal Process., vol. 80, no. 10, pp. 21212140,2000.

[23] K. Kreutz-Delgado, J. Murray, B. Rao, K. Engan, T. Lee, andT. Sejnowski, Dictionary learning algorithms for sparse representa-tion, Neural Comput., vol. 15, no. 2, pp. 349396, 2003.

[24] M. Zhong, H. Tang, H. Chen, and Y. Tang, An EM algorithm forlearning sparse and overcomplete representations, Neurocomputing,vol. 57, pp. 469476, Mar. 2004.

[25] N. Dobigeon and J.-Y. Tourneret, Bayesian orthogonal componentanalysis for sparse representation, IEEE Trans. Signal Process.,vol. 58, no. 5, pp. 26752685, May 2010.

[26] M. Zhou, H. Chen, J. W. Paisley, L. Ren, L. Li, Z. Xing, D. B. Dunson,G. Sapiro, and L. Carin, Nonparametric Bayesian dictionary learningfor analysis of noisy and incomplete images, IEEE Trans. ImageProcess., vol. 21, no. 1, pp. 130144, Jan. 2012.

[27] P. Schmid-Saugeon and A. Zakhor, Dictionary design for matchingpursuit and application to motion-compensated video coding, IEEETrans. Circuits Syst. Video Technol., vol. 14, no. 6, pp. 880886,Jun. 2004.

[28] M. Aharon, M. Elad, and A. Bruckstein, K-SVD: An algorithm fordesigning overcomplete dictionaries for sparse representation, IEEETrans. Signal Process., vol. 54, no. 11, pp. 43114322, Nov. 2006.

[29] M. Yaghoobi, L. Daudet, and M. E. Davies, Parametric dictionarydesign for sparse coding, IEEE Trans. Signal Process., vol. 57, no. 12,pp. 48004810, Dec. 2009.

[30] M. Ataee, H. Zayyani, M. Babaie-Zadeh, and C. Jutten, Parametricdictionary learning using steepest descent, in Proc. IEEE ICASSP,Mar. 2010, pp. 19781981.

[31] R. Rubinstein, M. Zibulevsky, and M. Elad, Double sparsity: Learningsparse dictionaries for sparse signal approximation, IEEE Trans.Signal Process., vol. 58, no. 3, pp. 15531564, Mar. 2010.

[32] H. Lee, A. Battle, R. Raina, and A. Y. Ng, Efficient sparse codingalgorithms, in Advances in Neural Information Processing Systems.Cambridge, MA, USA: MIT Press, 2006, pp. 801808.

[33] Q. Liu, S. Wang, and J. Luo, A novel predual dictionary learn-ing algorithm, J. Vis. Commun. Image Represent., vol. 23, no. 1,pp. 182193, 2012.

[34] Q. Liu, S. Wang, J. Luo, Y. Zhu, and M. Ye, An augmented Lagrangianapproach to general dictionary learning for image denoising, J. Vis.Commun. Image Represent., vol. 23, no. 5, pp. 753766, 2012.

[35] Q. Liu, J. Luo, S. Wang, M. Xiao, and M. Ye, An augmentedLagrangian multi-scale dictionary learning algorithm, EURASIPJ. Adv. Signal Process., vol. 2011, no. 1, pp. 116, Sep. 2011.

[36] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, Online dictionary learningfor sparse coding, in Proc. ICML, 2009, pp. 689696.


[37] K. Skretting and K. Engan, Recursive least squares dictionary learningalgorithm, IEEE Trans. Signal Process., vol. 58, no. 4, pp. 21212130,Apr. 2010.

[38] R. Tomioka, T. Suzuki, and M. Sugiyama, Super-linear convergenceof dual augmented Lagrangian algorithm for sparsity regularized esti-mation, J. Mach. Learn. Res., vol. 12, pp. 15371586, Nov. 2011.

[39] R. Tomioka and M. Sugiyama, Dual-augmented Lagrangian methodfor efficient sparse reconstruction, IEEE Signal Process. Lett., vol. 16,no. 12, pp. 10671070, Dec. 2009.

[40] R. T. Rockafellar, Convex Analysis. Princeton, NJ, USA: PrincetonUniv. Press, 1970.

[41] L. Zhang, W. Dong, D. Zhang, and G. Shi, Two-stage image denoisingby principal component analysis with local pixel grouping, PatternRecognit., vol. 43, no. 4, pp. 15311549, 2010.

[42] T. Goldstein and S. Osher, The split Bregman method for L1-regularized problems, SIAM J. Imag. Sci., vol. 2, no. 2, pp. 323343,2009.

[43] M. Lustig, D. Donoho, and J. M. Pauly, Sparse MRI: The applicationof compressed sensing for rapid MR imaging, Magn. Resonance Med.,vol. 58, no. 6, pp. 11821195, 2007.

[44] J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli,Image denoising using scale mixtures of Gaussians in the waveletdomain, IEEE Trans. Image Process., vol. 12, no. 11, pp. 13381351,Nov. 2003.

[45] D. R. Martin, C. Fowlkes, D. Tal, and J. Malik, A database ofhuman segmented natural images and its application to evaluatingsegmentation algorithms and measuring ecological statistics, in Proc.8th IEEE ICCV, Jul. 2001, pp. 416423.

[46] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, Imagequality assessment: From error visibility to structural similarity, IEEETrans. Image Process., vol. 13, no. 4, pp. 600612, Apr. 2004.

[47] F. Luisier, T. Blu, and M. Unser, A new SURE approach to imagedenoising: Interscale orthonormal wavelet thresholding, IEEE Trans.Image Process., vol. 16, no. 3, pp. 593606, Mar. 2007.

[48] X. Liu, M. Tanaka, and M. Okutomi, Noise level estimation usingweak textured patches of a single noisy image, in Proc. IEEE ICIP,Sep. 2012, pp. 665668.

Shanshan Wang (S12) received her bachelorsdegree in biomedical engineering from Central SouthUniversity, China in 2009. She is now pursuing herdouble-Ph.D. degree as a cotutelle student from bothShanghai Jiao Tong University, China on biomedicalengineering and University of Sydney, Australia oncomputer science. Her research interest is inverseproblem in medical imaging and image process-ing such as MR/PET image reconstruction, imagedenoising and dictionary learning.

Yong Xia (S05M08) received the B.E., M.E., andPh.D. degrees in computer science and technologyfrom Northwestern Polytechnical University, Xian,China, in 2001, 2004, and 2007, respectively. Heis currently a Postdoctoral Research Fellow in theBiomedical and Multimedia Information TechnologyResearch Group, School of Information Technolo-gies, University of Sydney, Sydney, Australia. He isalso an Associate Medical Physics Specialist in theDepartment of PET and Nuclear Medicine, RoyalPrince Alfred Hospital, Sydney. His research inter-

ests include medical imaging, image processing, computer-aided diagnosis,pattern recognition, and machine learning.

Qiegen Liu was born on Dec. 1983. He receivedthe B.S. degree in Applied Mathematics from Gan-nan Normal College, B. E. degree in ComputationMathematics and Ph.D. degree in Biomedical Engi-neering from Shanghai Jiao Tong University. He isnow working at Department of Electronic Informa-tion Engineering, Nanchang University, Nanchang330031, China. His current research interest is sparserepresentation theory and its applications in imageprocessing and MRI reconstruction.

Pei Dong (S12) received the bachelors degree inelectronic engineering and masters degree in signaland information processing from Beijing Universityof Technology, China, in 2005 and 2008, respec-tively. He is currently pursuing the Ph.D. degree inSchool of Information Technologies, The Universityof Sydney, Australia. His current research interestsinclude video and image processing, pattern recog-nition, machine learning, and computer vision.

David Dagan Feng (S88M88SM94F03)received the M.E. degree in electrical engineering &computer science from Shanghai Jiao Tong Univer-sity, Shanghai, China, in 1982, the M.Sc. degree inbiocybernetics and the Ph.D. degree in computer sci-ence from the University of California, Los Angeles,CA, USA, in 1985 and 1988, respectively, where hereceived the Crump Prize for Excellence in MedicalEngineering. He is currently the Head of School ofInformation Technologies and the Director of theInstitute of Biomedical Engineering and Technology,

University of Sydney, Sydney, Australia, a Guest Professor of a number ofUniversities, and a Chair Professor of Hong Kong Polytechnic University,Hong Kong. He is a fellow of ACS, HKIE, IET, and the Australian Academyof Technological Sciences and Engineering.

Jianhua Luo was born in Zhejiang province, China,on January 9, 1958. He received the computerscience M.S. degree from Hangzhou University inHangzhou City, China, in 1992, and the biomedicalengineering Ph.D. degree from Zhejiang Universityin 1995. He is currently a professor at ShanghaiJiao Tong University and also served as an invitedProfessor at the INSA Lyon, France. He is currentlyprincipal investigator of several research projectssupported by the NSFC of China and High Tech-nology Research Development Plan (863 plan) of

China. His research domain concerns magnetic resonance imaging, includingimage reconstruction, and image processing.

/ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 150 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 600 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages false /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 400 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False

/Description >>> setdistillerparams> setpagedevice

Fenchel Duality Based Dictionary Learning for Restoration of Noisy Images

Documents