-
5214 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 12,
DECEMBER 2013
Fenchel Duality Based Dictionary Learning forRestoration of
Noisy Images
Shanshan Wang, Student Member, IEEE, Yong Xia, Member, IEEE,
Qiegen Liu,Pei Dong, Student Member, IEEE, David Dagan Feng,
Fellow, IEEE, and Jianhua Luo
Abstract Dictionary learning based sparse modeling has
beenincreasingly recognized as providing high performance in
therestoration of noisy images. Although a number of
dictionarylearning algorithms have been developed, most of them
attackthis learning problem in its primal form, with little effort
beingdevoted to exploring the advantage of solving this problem in
adual space. In this paper, a novel Fenchel duality based
dictionarylearning (FD-DL) algorithm has been proposed for the
restorationof noise-corrupted images. With the restricted attention
to theadditive white Gaussian noise, the sparse image
representation isformulated as an 2-1 minimization problem, whose
dual formu-lation is constructed using a generalization of Fenchels
dualitytheorem and solved under the augmented Lagrangian
frame-work. The proposed algorithm has been compared with
fourstate-of-the-art algorithms, including the local pixel
grouping-principal component analysis, method of optimal
directions,K-singular value decomposition, and beta process factor
analysis,on grayscale natural images. Our results demonstrate that
the
Manuscript received December 27, 2012; revised June 1, 2013
andAugust 29, 2013; accepted September 1, 2013. Date of publication
Sep-tember 20, 2013; date of current version October 8, 2013. This
work wassupported in part by the Australian Research Council grant,
in part by thejoint project of the National Natural Science
Foundation of China underGrant 30911130364 and French ANR 2009
under Grant ANR-09-BLAN-0372-01, in part by the Region Rhne-Alpes
of France under Project MiraRecherche 2008, in part by the Youth
Scientific Research Foundation ofJiangxi Province under Grant
20132BAB211030, in part by the NationalNatural Science Foundation
of China under Grant 61362001, and in part bythe China Scholarship
Council under Grant 2011623084. The associate editorcoordinating
the review of this manuscript and approving it for publicationwas
Prof. Wai-Kuen Cham. (Corresponding author: S. Wang.)
S. Wang is with the School of Biomedical Engineering, Shanghai
JiaoTong University, Shanghai 200240, China, and also with the
Biomedicaland Multimedia Information Technology (BMIT) Research
Group, Schoolof Information Technologies, University of Sydney,
Sydney 2006, Australia(e-mail: [email protected]).
Y. Xia is with the BMIT Research Group, School of Information
Technolo-gies, University of Sydney, Sydney 2006, Australia, and
also with the ShaanxiProvincial Key Labortaory of Speech and Image
Information Processing,School of Computer Science, Northwestern
Polytechnical University, Xian710072, China (e-mail:
[email protected]).
Q. Liu is with the Institute of Biomedical and Health
Engineering, SIAT,Chinese Academy of Sciences, Shenzhen 518055,
China, and also withthe Department of Electronic Information
Engineering, Nanchang University,Nanchang 330031, China (e-mail:
[email protected]).
P. Dong is with the BMIT Research Group, School of
InformationTechnologies, University of Sydney, Sydney 2006,
Australia (e-mail: [email protected]).
D. D. Feng is with the BMIT Research Group, School of
InformationTechnologies, University of Sydney, Sydney 2006,
Australia, and also withMed-X Research Institute, Shanghai Jiao
Tong University, Shanghai 200030,China (e-mail:
[email protected]).
J. Luo is with the School of Aeronautics and Astronautics,
Shanghai JiaoTong University, Shanghai 200240, China (e-mail:
[email protected]).
Color versions of one or more of the figures in this paper are
availableonline at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIP.2013.2282900
FD-DL algorithm can effectively improve the image quality andits
noisy image restoration ability is comparable or even superiorto
the abilities of the other four widely-used algorithms.
Index Terms Fenchel duality, dictionary learning,
non-linearconjugate gradient descent method, image restoration,
dualaugmented Lagrangian.
I. INTRODUCTION
D IGITAL images are frequently affected by noise
duringacquisition or transmission in a noisy environment. It
haslong been desired that an algorithm can restore as close
aspossible the true image through removing the noise from
theobserved image. The image denoising problem is important,not
only because it serves many image and video applications,but also
because it, as one of the most fundamental inverseproblems,
provides a convenient platform over which imageprocessing
algorithms can be assessed [1]. During the pastdecades, image
denoising has been widely studied and alarge number of algorithms
have been proposed to addressthis problem from different
perspectives and diverse pointsof view. These algorithms include
various local spatial filters,non-local filters, transform domain
filters and total-variationalmethods [2]. Recently, the denoising
techniques via sparserepresentation of an image over a trained
redundant dictionaryhave drawn increasing research attention, due
to the factthat natural images are intrinsically sparse in some
domains[3][5]. Sparse image modeling has become recognized as
pro-viding extremely high performance for applications as diverseas
noise reduction [1], deblurring [6], super-resolution [7],image
reconstruction [8] and blind source separation [9].Therefore, in
this paper we intend to focus on sparse rep-resentation based image
denoising approaches and restrict ourattention to the additive
white Gaussian noise, which is oneof the most common types of
acquisition noise encounteredin real applications.
The research on sparse image modeling can be tracedback to 1996,
when Olshausen and Field [10] revealed thebiological foundation for
learning sparse codes of naturalimages. With the observation that
the visual cortex tries toproduce an efficient representation of an
image by extractingthe statistically independent structures in it,
they assumed thatan image x RM can be represented by a linear
superpositionof basis functions
x =N
j=1a j j (1)
1057-7149 2013 IEEE
-
WANG et al.: FD-DL FOR RESTORATION OF NOISY IMAGES 5215
where a = [a1, a2, ..., aN ]T RN is a coefficient vector,
andeach basis function j RM is also known as an atom ofthe
dictionary RMN . They further advocated that thedesirable
properties in mammalian primary visual cortex, suchas being
spatially localized, oriented and bandpass, can be welldescribed
when the following two objectives are imposed onthe representation
[10], [11]
E = [information preservation] [sparseness of a] (2)where is a
positive parameter that balances these twoobjectives. Generally,
the quality of information preservationcan be measured by the
p-norm of the difference betweenthe actual image and reconstructed
image, and the sparsenessof the coefficient vector can be
characterized by its 0-normor 1-norm [1]. Thus, the image x can be
sparsely representedthrough solving the following optimization
problem
min,a
x a2p + aq (3)where p is typically set as 1, 2 or , and q {0,
1}. To takeinto account the content variation across different
regions orimages, the dictionary is usually learned by minimizing
thetotal objective over a set of randomly selected image patchesX =
[x1, x2, ..., xL] RML , shown as follows
min
L
i=1min
aixi ai2p + aiq . (4)
To solve this problem, we can adopt analytical dictionar-ies,
such as the discrete cosine transform (DCT), wavelets,curvelets and
contourlets, and use approximation algorithmsto find a suboptimal
sparse representation. Popular sparseapproximation algorithms
include the greedy algorithms suchas matching pursuit (MP) [12],
[13] and orthogonal matchingpursuit (OMP) [14], convex relaxation
algorithms such as thebasis pursuit (BP) [15] and least absolute
shrinkage and selec-tion operator (LASSO), and the focal
underdetermined systemsolver (FOCUSS) [16]. Although simplifying
the sparse repre-sentation problem, analytical dictionaries may
fail to describeeffectively the image due to the lack of the
adaptability tolocal image structures.
Besides analytical dictionaries, there exist other
dictionariesthat are more likely to lead to better approximation
qualityand sparsity of the coefficient vector. Hence, many
adaptivedictionary learning algorithms have been proposed in
theliterature [17], which can be roughly grouped into
proba-bilistic methods and non-probabilistic methods. Olshausen
andField [11] developed a maximum likelihood (ML)
dictionarylearning algorithm, which aims to maximize the likelihood
thatthe image x has an effective and sparse representation over
thedictionary , i.e.
= argmax
log P(x|)
= argmax
log
aP(x|a,)P(a)da. (5)
To ensure the computational tractability of the integral
like-lihood, they assumed that the prior distribution P(a) is
aproduct of Laplacian distributions for each coefficient and
the
approximation noise is the zero-mean Gaussian noise [17].Lewicki
and Sejnowski [18] modified the first assumptionand approximated
the integral form of the likelihood witha Gaussian integral around
the posterior estimate of a.Olshausen and Millman [19] learned the
overcomplete dic-tionary based on modeling the sparsity of
coefficients witha Gaussian mixture distribution. Plumbley [20]
modified thesecond assumption by assuming the approximation noise
bezero and developed an exact 1 sparse optimization algorithm.To
obtain more stable solutions, Bagnell and Bradley [21]replaced the
1 constraint with a Kullback-Leibler (KL)divergence, which leads to
efficient convex inference andstable coefficient vectors [17].
Engan et al. [22] developedthe method of optimal directions (MOD),
which uses theOMP algorithm [14] to find a sparse vector and
introduces aclosed-form solution for the dictionary update step.
Instead ofmaximizing the likelihood P(x|), Kreutz-Delgado et al.
[23]maximizes the posterior probability P(, a|x). In this max-imum
a posteriori (MAP) dictionary learning algorithm, anadditional
constraint is imposed on the dictionary and thesparse approximation
is realized with the FOCUSS [16].Zhong et al. [24] employed the MAP
estimation to calculatethe conditional moments of the posterior
distribution, whichenables the development of an
expectation-maximization (EM)algorithm for learning the
overcomplete dictionary and infer-ring the most probable sparse
coefficients. Besides the integralapproximation, sampling
techniques have also been used indictionary learning. Dobigeon and
Tourneret [25] proposed ahierarchical Bayesian model based on the
assumption that thedictionary be orthogonal and the prior
distributions of coeffi-cients be Bernoulli-Gaussian processes with
hyperparameters.They employed the Markov chain Monte Carlo (MCMC)
sam-pling strategy to estimate the unknown coefficients,
dictionaryand noise variance. The Bayesian technology can also be
usedwhen a beta process is employed as the prior for learning
thedictionary. Zhou et al. [26] extended the beta process
factoranalysis (BPFA) model and proposed three related priors
forsparse binary vectors: (1) the basic truncated
beta-Bernoulliprocess, (2) a Dirichlet process (DP), and (3) a
probit stick-breaking process (PSBP). Consequently, they developed
theBPFA, DP-BPFA and PSBP-BPFA algorithms for a rangeof
applications, including image denoising, interpolation
andreconstruction.
Some dictionary learning algorithms are based on
vectorquantization (VQ) achieved by K-means clustering
[17].Schmid-Saugeon and Zakhor [27] developed a VQ methodfor
matching pursuit based video coding. Aharon et al. [28]proposed the
K-singular value decomposition (K-SVD)algorithm, which updates the
columns of the dictionarysequentially, one at a time, by utilizing
the singular valuedecomposition. This update process is a
generalized K-meanssince each patch can be represented by multiple
atomswith different weights [17]. The description of
dictionariescan be simplified by assuming a dictionary be a set
offunctions determined by a small number of parameters.Parametric
dictionary learning algorithms [29], [30]aim to optimize those
parameters instead of the dictionaryitself. Although it can save a
considerable amount of memory
-
5216 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 12,
DECEMBER 2013
and computation, this strategy has the intrinsic limitationof
requiring a properly and empirically selected parametricdictionary
that only matches the structure of a specified classof signals.
Rubinstein et al. [31] developed a double sparsityalgorithm, which
combines the advantages of structured andimplicit dictionaries.
Fast and efficient dictionary learningapproaches have been studied
to gain additional leverageon performance and computational cost.
Lee et al. [32]proposed efficient sparse coding algorithms that are
basedon iteratively solving the 1-regularized and
2-constrainedleast squares problem. Nevertheless, these attempts
still facethe difficulties caused by the non-convexity of the
objective,and may result in less robust performance. A remedy is
tosample more image patches so as to average and stabilizethe
result. This strategy, however, dramatically increases theoverall
computational complexity. In our previous work, wehave proposed a
predual dictionary learning algorithm [33],which, unfortunately,
only solves the dictionary learningproblem with nonnegative sparse
coefficients. We furtherdeveloped a class of Bregman
iteration/augmented Lagrangianbased dictionary learning methods for
restoring noisy images[34], [35]. While most above mentioned
methods updatethe dictionary in the batch mode, there are also some
otherK-means variant algorithms adapting the dictionary to onlyone
sample or a mini batch of samples at each iteration, suchas [36],
[37], known as online dictionary learning.
As shown in our survey, most existing studies attack
thedictionary learning task in a very direct way, no matter
throughsolving the maximum likelihood problem given in (5)
oroptimizing the primal formulation shown in (4). Recently, thedual
augmented Lagrangian (DAL) algorithm [38], [39] intro-duced in the
machine learning community has demonstratedits super linear
convergence property and good optimizationperformance. Inspired by
the DAL algorithm, in this paperwe intend to approach the
dictionary learning problem fromthe perspective of solving the dual
problem, with the aimof further improving the accuracy and
efficiency of sparseimage representation. Our motivations are
three-fold. First,the Fenchel duality based formulation converts
the problemfrom searching a minimum in RN to searching a maximumin
RM . Since normally N M in overcomplete/redundantdictionary
learning [11], [18], [24], [28], this formulationprovides a
theoretical basis to reduce the complexity. Second,the sparse
coefficient a is treated as a Lagrangian multiplierin the augmented
Lagrangian (AL) framework, and thus nolonger suffers from the
non-differentiability and the couplingof the original primal
function [38], [39]. Third, the dualformulation provides a lower
bound for the primal functionwhich can further constrain the
dictionary learning problem[39], [40]. Therefore, we propose the
Fenchel duality baseddictionary learning (FD-DL) algorithm for the
restoration ofnoisy images. In this algorithm, we formulate the
sparserepresentation of each image patch as an 2-1
minimizationproblem and employ a generalization of Fenchels
dualitytheorem to attain the dual formulation of the original
objectivefunction. To enable direct update of the restored image,
weadd a global constraint to the objective function, which
isoptimized under the AL framework using a two-stage nested
iterative process. We also employ the non-linear
conjugategradient descent algorithm to attack the inner
minimizationtask more efficiently. To demonstrate the improved
perfor-mance in noisy image restoration, the proposed algorithmhas
been compared to four state-of-the-art algorithms, includ-ing the
local pixel grouping-principal component analysis(LPG-PCA) [41],
MOD algorithm, K-SVD algorithm andBPFA algorithm, on a set of
natural images which werecorrupted by different levels of white
Gaussian noise.
II. MATHEMATICAL PRELIMINARIES
Definition 1 Fenchel-Legendre conjugate [40]: Given afunction f
: y RN (,], the Fenchel-Legendreconjugate is
f (z) = supy
{yTz f (y)
}(6)
where y is the primal variable, z is the dual variable, andsup
{} denotes the supremum operator.
Theorem 1 A generalization of Fenchels duality theorem[38],
[40]: Let f be a proper convex function on RN , g be aproper convex
function on RM , and be a linear transformfrom RM to RN . One
has
infx
{ f (x) + g (x)} = sups
{ f (s) g
(Ts
)}(7)
where f and g are the Fenchel-Legendre conjugate functionsof f
and g, respectively, and inf {} denotes the infimumoperator.
III. IMAGE RESTORATION MODEL
An observed image Y can be viewed as an unknown trueimage I
being contaminated with additive white Gaussiannoise n
Y = I + n. (8)Let R(i) denote an M N matrix that extracts the i
-th
M M image patch from the image I of N pixels. Eachof the L
partially overlapping image patches can be expressedas
xi = R(i)I, i = 1, 2, ..., L . (9)The sparse representation of
each image patch xi over aredundant dictionary can be formally
given as follows
xi = ai s.t. ai0 C1 (10)where C1 is a positive constant that
controls the sparsity of therepresentation and each column of is
constrained to haveunit Euclidean norm, i.e. i2 = 1 for i = 1, 2,
..., N , toavoid the scaling ambiguity [1], [17]. Besides this
constrainton each patch, the following prior global constraint is
alsoused to ensure the proximity between images Y and I [1]
Y I22 C2 2 (11)where C2 denotes a positive constant that
constrains therestoration error, and is the standard deviation of
thenoise. With properly selected weighting parameters and ,
-
WANG et al.: FD-DL FOR RESTORATION OF NOISY IMAGES 5217
the image restoration can be formulated as the
followingoptimization problem
argmin,A,I
Y I22 +L
i=1
R(i)I ai2
2 + L
i=1ai1 (12)
where both and are positive values. In this objectivefunction,
the first term emphasizes a global force that con-strains the
restored image, the second term manifests the localconstraint on
the representation of each patch, and the thirdterm controls the
sparseness of the coefficient vectors [1].It should be noted that
we relax the 0-norm with 1-normfor the sparse regularization of
each patch xi to avoid thecomputational intractability [1],
[17].
IV. FD-DL ALGORITHM
To solve the above optimization problem, we adopt
thedivide-and-conquer strategy [42] and convert it into two
sub-problems
argmin,A
L
i=1
R(i)I ai2
2 + L
i=1ai1 (13)
argminI
Y I22 +L
i=1
R(i)I ai2
2 (14)
which can be solved separately in an iterative process.
A. Update Dictionary and Coefficient MatrixTo solve the
sub-problem given in (13), we start with one
image patch and omit the subscript i for notational
simplicity
minRMN
minaRN
x a22 + a1 =: fp(a)
. (15)
We consider fp(a) defined in (15) as a primal function
ofvariable a. According to the Fenchels duality theorem, wecan get
the dual formulation as follows
minRMN
maxsRM
14s 2x22 + x22 (Ts)
=: fd(s)
(16)
where (v) =N
j=1 (v j
), and
(v j
) = 0, if v j ,
and (v j
) = + otherwise.After introducing an assistant variable v to
approxi-
mate Ts, the AL framework allows the pursuit of and ato be
solved by attacking the following optimization problem[38],
[39]
{a = argmina maxs,v L (s, v;, a) = argmin L (s, v;, a) (17)
where
L(s, v;, a) = 14s 2x22 + x22 (v)
+aT(v Ts) 2v Ts22 (18)
is the AL function, is a positive penalty parameter, andthe
primal variable a is introduced as a Lagrangian multi-plier [38],
[39]. Here, the tilde is utilized to label the variableswhich
maximize or minimize their relative functions.
This problem can be solved through an iterative process
thatupdates the dual variables s and v, sparse coefficients a,
anddictionary , respectively. At the k-th iteration, we assume
theprimal variable a and are fixed and implicitly eliminate vvia
expressing it as a function of s, shown as follows
vk = argmaxv
Lk (s, v;k, ak)
= argminv
(v) +k
2
v (k)Ts 1k
ak
2
2
= P((k)Ts + 1
kak
)(19)
where P (z) =(
min(|zi |, ) zi|zi |)n
i=1 is the projection of ann-dimensional vector z on the ball of
radius , and theratio zi/|zi | is defined to be zero if zi = 0
[38].
After expressing vk as a function of s, the update of s nowturns
to be
sk+1 = argmaxs
Lk (s, vk+1;k, ak)
= argmins
14s 2x22 +
k
2
ST((k)Ts + 1
kak
)2
2 =:g(s)
(20)where
ST (z) = z P (z) =(
max (|zi | , 0) zi|zi |)n
i=1
=
zi , if zi 0, if < zi < zi + , if zi .
(21)
Note that the maximization problem is now replaced by
theminimization of a surrogate function g(s), whose gradient canbe
calculated as follows
g (s) = 12
(s 2x) + kkST((k)Ts + 1
kak
). (22)
To avoid the computationally expensive calculation of
theobjective functions Hessian [39], we employ the
non-linearconjugate gradient descent algorithm with backtracking
linesearch [43] to attack this minimization problem, as summa-rized
in Algorithm 1.
With the updated value of sk+1, the value of v can bedirectly
updated according to (19), which we rewrite as follows
vk+1 = P((k)
Tsk+1 + 1
kak
). (23)
Next, the sparse coefficients a can be updated as [38], [39]ak+1
= ak + k
((k)
Tsk+1 vk+1
)
=k[(
1k
ak + (k)Tsk+1)P
((k)
Tsk+1+ 1
kak
)]
= STk(k(k)
Tsk+1 + ak
)(24)
-
5218 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 12,
DECEMBER 2013
Algorithm 1 Non-Linear Conjugate Gradient DescentAlgorithm With
Backtracking Line Search1: Task: Train the numerical approximation
solution sk+1 of the
problem in (20).2: Parameters: J stop criteria by the number of
iterations.3: , and W line search parameters.4: Initialization: j
0; d0 g
(sk,0
); sk,0 d0.
5: while j < J do6: t 0.01, w 07: while g
(sk, j + tsk, j
)> g
(sk, j
)+ t
(d j
)Tsk, j and w 20),the FD-DL algorithm outperforms the other
algorithms onmost, if not all, test images. Meanwhile, when all
test imagesand all noise levels were considered, the overall win
rate ofFD-DL is higher than those of the LPG-PCA, MOD, K-SVD,and
BPFA.
VI. DISCUSSION
A. Initial DictionaryProper dictionary initialization plays an
important role in
many dictionary learning algorithms. Since the DCT-basedinitial
dictionary has been widely used in many algorithms,such as the MOD
[22] and K-SVD [28], we conductedall denoising experiments again to
evaluate the performancevariation of those four algorithms caused
by the change of theinitial dictionary.
For each test case, we initialized the MOD [22],K-SVD [28] and
proposed FD-DL algorithm with the sameovercomplete DCT dictionary.
As for the BPFA, we respectedits freedom to use its own designed
singular value decompo-sition (SVD) based dictionary
initialization. Table III lists theSSIM indices of restored images.
It shows that the proposedFD-DL algorithm performed better in cases
of medium orhigh noise. Specifically, the FD-DL won on more
imagesat most noise levels. Meanwhile, when all 12 noise levelswere
considered, the BPFA performed best on two images andthe FD-DL
performed best on four images. In summary, theFD-DL achieved better
image restoration quality in 61.1% ofthe test cases, which is
higher than the percentage archivedby the BPFA. These results show
that the proposed FD-DLalgorithm also introduced better performance
than theother three algorithms when non-random initial
dictionarywas used.
Next, we compared the performance of the proposed FD-DLalgorithm
when two dictionary initialization strategies wereused. For each
test case, let the SSIM index generated byusing the DCT-based
initial dictionary be subtracted by theSSIM index obtained by using
the random initial dictionary.The difference of the SSIM indices
was depicted in Fig. 6.
-
WANG et al.: FD-DL FOR RESTORATION OF NOISY IMAGES 5221
TABLE IAVERAGE SSIM OF THE DENOISED IMAGES OBTAINED BY APPLYING
THE LPG-PCA ALGORITHM AND FOUR DICTIONARY LEARNING ALGORITHMS
WITH RANDOM INITIAL DICTIONARY TO EACH WHITE GAUSSIAN
NOISE-CORRUPTED IMAGE FIVE TIMES. FOR EACH TEST IMAGE, THE
SSIMVALUES OF NOISE-CORRUPTED IMAGE (TOP LEFT) AND RESULTS OF THE
LPG-PCA (TOP RIGHT), MOD (MIDDLE LEFT), BPFA (MIDDLE RIGHT),
K-SVD (BOTTOM LEFT) AND PROPOSED FD-DL ALGORITHM (BOTTOM RIGHT)
WERE DISPLAYED IN 3 2 GRID AND THE LARGEST SSIMWAS HIGHLIGHTED IN
BOLD. NOTE THAT THE SSIM VALUES WERE COMPARED WITH MORE THAN THREE
DECIMAL PLACES
It shows that the absolute difference between the SSIM indicesof
restored image is in a range of [0, 0.015].
B. Computational CostTo assess the computational complexity of
those five image
restoration algorithms, we ran the denoising experiments
mul-tiple times (Intel Core 2 Duo 3.00 GHz, 4 GB RAM and 64-bit
MATLAB version 7.11). The average time cost on imageswith different
sizes and different noise levels was given inTable IV. It shows
that the proposed FD-DL algorithm ismore efficient than the other
four algorithms when the noise
level is low ( = 10). Unfortunately, when the noise level
ismedium or high ( = 30 or 70), the FD-DL is more timeconsuming
than the MOD and K-SVD. Meanwhile, it holdsfor all five algorithms
that it costs more time to restore a largerimage.
Our current solution still has relatively high computationalcost
in high level of noise, which can be ascribed to threemajor causes.
First, since the sparse coefficient a is treated asa Lagrangian
multiplier, it requires to be explicitly updatedin the optimization
procedure. Second, our solution usesall extracted image patches to
train the dictionary, whileothers may not. For example, the K-SVD
algorithm only
-
5222 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 12,
DECEMBER 2013
TABLE IIAVERAGE PSNR OF THE DENOISED IMAGES OBTAINED BY APPLYING
THE LPG-PCA ALGORITHM AND FOUR DICTIONARY LEARNING ALGORITHMS
WITH RANDOM INITIAL DICTIONARY TO EACH WHITE GAUSSIAN
NOISE-CORRUPTED IMAGE FIVE TIMES. FOR EACH TEST IMAGE, THE
PSNRVALUES OF NOISE-CORRUPTED IMAGE (TOP LEFT) AND RESULTS OF THE
LPG-PCA (TOP RIGHT), MOD (MIDDLE LEFT), BPFA (MIDDLE RIGHT),
K-SVD (BOTTOM LEFT) AND PROPOSED FD-DL ALGORITHM (BOTTOM RIGHT)
WERE DISPLAYED IN 3 2 GRID AND THE LARGEST PSNR WASHIGHLIGHTED IN
BOLD. NOTE THAT THE PSNR VALUES WERE COMPARED WITH MORE THAN THREE
DECIMAL PLACES
uses a portion of extracted patches when the number ofsamples is
higher than a threshold. Third, our solution attacksthe inner
minimization problem given in Eq. (20) usingthe nonlinear conjugate
gradient descent method with back-tracking line search, as
summarized in Algorithm 1, whichalthough avoids calculating the
Hessian matrix, but remainstime-consuming.
In our future work, we will investigate sampling schemesto
obtain a smaller but effective set of image patches fordictionary
learning, and will further study more efficientoptimized techniques
to attack the inner problem in Eq. (20).
C. Noise Estimation
Similar to many existing dictionary learning approaches
[1],[33][35], our approach also uses the standard deviation ofnoise
as a guidance for our implementation. In real applica-tions, this
parameter can be estimated by using many statisticalor robust
median noise estimators [47], [48]. We further com-pared the
performances of the proposed denoising algorithmwith the true noise
levels and estimated noise levels obtainedby a patch-based noise
level estimation algorithm [48]. Theresults shown in Fig. 7 reveal
that, no matter our algorithm
-
WANG et al.: FD-DL FOR RESTORATION OF NOISY IMAGES 5223
TABLE IIIAVERAGE SSIM OF THE DENOISED IMAGES OBTAINED BY
APPLYING THE MOD (TOP LEFT), K-SVD (BOTTOM LEFT), BPFA (TOP RIGHT)
ANDPROPOSED FD-DL ALGORITHM (BOTTOM RIGHT) WITH NON-RANDOM INITIAL
DICTIONARY TO EACH WHITE GAUSSIAN NOISE-CORRUPTED
IMAGE (2 2 GRID) FIVE TIMES. FOR EACH CASE, THE SSIM VALUES WERE
COMPARED WITH MORE THAN THREE DECIMAL PLACESAND THE LARGEST SSIM
WAS HIGHLIGHTED IN BOLD
TABLE IVAVERAGE COMPUTATIONAL TIME COST OF APPLYING THE
LPG-PCA,
MOD, K-SVD, BPFA AND PROPOSED FD-DL ALGORITHM TONATURAL IMAGES
CONTAMINATED WITH DIFFERENT
LEVELS OF WHITE GAUSSIAN NOISE
was initialized with the DCT or random matrix, the
denoisingperformances obtained by using the true noise levels
andestimated noise levels are similar. Therefore, we suggest
that,when applying the proposed algorithm to real
applications,available noise estimation methods, such as the one
reportedin Ref. [48] can be adopted to estimate the standard
deviationof the noise.
Fig. 7. Performance of the proposed denoising algorithm under
the guidanceof the true noise standard deviation (in blue) and
estimated (in red) withDCT-based initialization in (a) PSNR and (b)
SSIM and with random matrixbased initialization in (c) PSNR and (d)
SSIM.
VII. CONCLUSION
Distinguished from most of existing dictionary learningwork,
this paper provides researchers another option for solv-ing the
original large-scale dictionary learning problems in adual space.
The proposed FD-DL algorithm takes advantage
-
5224 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 12,
DECEMBER 2013
of the Fenchel duality and achieves sparse image represen-tation
in a very effective way. When it was applied to therestoration of
Gaussian noise corrupted images, the resultsshow that the FD-DL
algorithm possesses competitive oreven superior ability to remove
noise while preserving imagedetails compared to four
state-of-the-art algorithms, namelyLPG-PCA, MOD, K-SVD and BPFA.
Our future work willfocus on further improving the efficiency of
the proposedalgorithm and applying it to image super-resolution and
recon-struction.
APPENDIXTHE DEDUCTION OF THE DUAL FORMULATION
Based on the generalization of Fenchels duality theo-rem (7), we
know the dual function of
minaRN
f (a) + a1 (29)
where f (a) = x a22, ismaxsRM
f (s) (Ts). (30)
If we consider the conjugate function f (s) of f (z) =x z22 with
z = a, it is straightforward to derive f (s)as
f (s) = supz
(< s, z > x z22)
= < s, x 12
s > 14s22 =
14s 2x22 x22.
(31)Therefore, the dual formulation of fp(a) in (15) is fd(s)in
(16).
REFERENCES[1] M. Elad and M. Aharon, Image denoising via sparse
and redundant
representations over learned dictionaries, IEEE Trans. Image
Process.,vol. 15, no. 12, pp. 37363745, Dec. 2006.
[2] A. Buades, B. Coll, and J. Morel, A review of image
denoisingalgorithms, with a new one, Multiscale Model. Simul., vol.
4, no. 2,pp. 490530, 2005.
[3] H. Rauhut, K. Schnass, and P. Vandergheynst, Compressed
sensingand redundant dictionaries, IEEE Trans. Inf. Theory, vol.
54, no. 5,pp. 22102219, May 2008.
[4] M. Yaghoobi, T. Blumensath, and M. Davies, Dictionary
learningfor sparse approximations with the majorization method,
IEEE Trans.Signal Process., vol. 57, no. 6, pp. 21782191, Jun.
2009.
[5] S. Wang, Q. Liu, Y. Xia, P. Dong, J. Luo, Q. Huang, and D.
D. Feng,Dictionary learning based impulse noise removal via L1L1
mini-mization, Signal Process., vol. 93, no. 9, pp. 26962708,
2013.
[6] W. Dong, L. Zhang, G. Shi, and X. Wu, Image deblurring
andsuper-resolution by adaptive sparse domain selection and
adap-tive regularization, IEEE Trans. Image Process., vol. 20, no.
7,pp. 18381857, Jul. 2011.
[7] J. Yang, Z. Wang, Z. Lin, S. Cohen, and T. Huang, Coupled
dictionarytraining for image super-resolution, IEEE Trans. Image
Process.,vol. 21, no. 8, pp. 34673478, Aug. 2012.
[8] S. Ravishankar and Y. Bresler, MR image reconstruction from
highlyundersampled k-space data by dictionary learning, IEEE Trans.
Med.Imag., vol. 30, no. 5, pp. 10281041, May 2011.
[9] V. Abolghasemi, S. Ferdowsi, and S. Sanei, Blind separation
of imagesources via adaptive dictionary learning, IEEE Trans. Image
Process.,vol. 21, no. 6, pp. 29212930, Jun. 2012.
[10] B. A. Olshausen and D. J. Field, Emergence of simple-cell
receptivefield properties by learning a sparse code for natural
images, Nature,vol. 381, no. 6583, pp. 607609, 1996.
[11] B. Olshausen and D. Field, Sparse coding with an
overcompletebasis set: A strategy employed by V1? Vis. Res., vol.
37, no. 23,pp. 33113326, 1997.
[12] S. Mallat and Z. Zhang, Matching pursuits with
time-frequencydictionaries, IEEE Trans. Signal Process., vol. 41,
no. 12,pp. 33973415, Dec. 1993.
[13] P. Durka, D. Ircha, and K. Blinowska, Stochastic
time-frequencydictionaries for matching pursuit, IEEE Trans. Signal
Process., vol. 49,no. 3, pp. 507510, Mar. 2001.
[14] J. Tropp, Greed is good: Algorithmic results for sparse
approxi-mation, IEEE Trans. Inf. Theory, vol. 50, no. 10, pp.
22312242,Oct. 2004.
[15] S. Chen, D. Donoho, and M. Saunders, Atomic decomposition
bybasis pursuit, SIAM J. Sci. Comput., vol. 20, no. 1, pp. 3361,
1998.
[16] I. Gorodnitsky and B. Rao, Sparse signal reconstruction
from limiteddata using FOCUSS: A re-weighted minimum norm
algorithm, IEEETrans. Signal Process., vol. 45, no. 3, pp. 600616,
Mar. 1997.
[17] I. Tosic and P. Frossard, Dictionary learning, IEEE Signal
Process.Mag., vol. 28, no. 2, pp. 2738, Mar. 2011.
[18] M. Lewicki and T. Sejnowski, Learning overcomplete
representa-tions, Neural Comput., vol. 12, no. 2, pp. 337365,
2000.
[19] B. A. Olshausen and K. J. Millman, Learning sparse codes
witha mixture-of-Gaussians prior, in Advances in Neural
Informa-tion Processing Systems. Cambridge, MA, USA: MIT Press,
1999,pp. 841847.
[20] M. D. Plumbley, Dictionary learning for L1-exact sparse
coding, inProc. 7th Int. Conf. ICA, LNCS 4666. 2007, pp.
406413.
[21] J. A. Bagnell and D. M. Bradley, Differentiable sparse
coding, inAdvances in Neural Information Processing Systems.
Cambridge, MA,USA: MIT Press, 2008, pp. 113120.
[22] K. Engan, S. O. Aase, and J. H. Husy, Multi-frame
compression:Theory and design, Signal Process., vol. 80, no. 10,
pp. 21212140,2000.
[23] K. Kreutz-Delgado, J. Murray, B. Rao, K. Engan, T. Lee,
andT. Sejnowski, Dictionary learning algorithms for sparse
representa-tion, Neural Comput., vol. 15, no. 2, pp. 349396,
2003.
[24] M. Zhong, H. Tang, H. Chen, and Y. Tang, An EM algorithm
forlearning sparse and overcomplete representations,
Neurocomputing,vol. 57, pp. 469476, Mar. 2004.
[25] N. Dobigeon and J.-Y. Tourneret, Bayesian orthogonal
componentanalysis for sparse representation, IEEE Trans. Signal
Process.,vol. 58, no. 5, pp. 26752685, May 2010.
[26] M. Zhou, H. Chen, J. W. Paisley, L. Ren, L. Li, Z. Xing, D.
B. Dunson,G. Sapiro, and L. Carin, Nonparametric Bayesian
dictionary learningfor analysis of noisy and incomplete images,
IEEE Trans. ImageProcess., vol. 21, no. 1, pp. 130144, Jan.
2012.
[27] P. Schmid-Saugeon and A. Zakhor, Dictionary design for
matchingpursuit and application to motion-compensated video coding,
IEEETrans. Circuits Syst. Video Technol., vol. 14, no. 6, pp.
880886,Jun. 2004.
[28] M. Aharon, M. Elad, and A. Bruckstein, K-SVD: An algorithm
fordesigning overcomplete dictionaries for sparse representation,
IEEETrans. Signal Process., vol. 54, no. 11, pp. 43114322, Nov.
2006.
[29] M. Yaghoobi, L. Daudet, and M. E. Davies, Parametric
dictionarydesign for sparse coding, IEEE Trans. Signal Process.,
vol. 57, no. 12,pp. 48004810, Dec. 2009.
[30] M. Ataee, H. Zayyani, M. Babaie-Zadeh, and C. Jutten,
Parametricdictionary learning using steepest descent, in Proc. IEEE
ICASSP,Mar. 2010, pp. 19781981.
[31] R. Rubinstein, M. Zibulevsky, and M. Elad, Double sparsity:
Learningsparse dictionaries for sparse signal approximation, IEEE
Trans.Signal Process., vol. 58, no. 3, pp. 15531564, Mar. 2010.
[32] H. Lee, A. Battle, R. Raina, and A. Y. Ng, Efficient sparse
codingalgorithms, in Advances in Neural Information Processing
Systems.Cambridge, MA, USA: MIT Press, 2006, pp. 801808.
[33] Q. Liu, S. Wang, and J. Luo, A novel predual dictionary
learn-ing algorithm, J. Vis. Commun. Image Represent., vol. 23, no.
1,pp. 182193, 2012.
[34] Q. Liu, S. Wang, J. Luo, Y. Zhu, and M. Ye, An augmented
Lagrangianapproach to general dictionary learning for image
denoising, J. Vis.Commun. Image Represent., vol. 23, no. 5, pp.
753766, 2012.
[35] Q. Liu, J. Luo, S. Wang, M. Xiao, and M. Ye, An
augmentedLagrangian multi-scale dictionary learning algorithm,
EURASIPJ. Adv. Signal Process., vol. 2011, no. 1, pp. 116, Sep.
2011.
[36] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, Online
dictionary learningfor sparse coding, in Proc. ICML, 2009, pp.
689696.
-
WANG et al.: FD-DL FOR RESTORATION OF NOISY IMAGES 5225
[37] K. Skretting and K. Engan, Recursive least squares
dictionary learningalgorithm, IEEE Trans. Signal Process., vol. 58,
no. 4, pp. 21212130,Apr. 2010.
[38] R. Tomioka, T. Suzuki, and M. Sugiyama, Super-linear
convergenceof dual augmented Lagrangian algorithm for sparsity
regularized esti-mation, J. Mach. Learn. Res., vol. 12, pp.
15371586, Nov. 2011.
[39] R. Tomioka and M. Sugiyama, Dual-augmented Lagrangian
methodfor efficient sparse reconstruction, IEEE Signal Process.
Lett., vol. 16,no. 12, pp. 10671070, Dec. 2009.
[40] R. T. Rockafellar, Convex Analysis. Princeton, NJ, USA:
PrincetonUniv. Press, 1970.
[41] L. Zhang, W. Dong, D. Zhang, and G. Shi, Two-stage image
denoisingby principal component analysis with local pixel grouping,
PatternRecognit., vol. 43, no. 4, pp. 15311549, 2010.
[42] T. Goldstein and S. Osher, The split Bregman method for
L1-regularized problems, SIAM J. Imag. Sci., vol. 2, no. 2, pp.
323343,2009.
[43] M. Lustig, D. Donoho, and J. M. Pauly, Sparse MRI: The
applicationof compressed sensing for rapid MR imaging, Magn.
Resonance Med.,vol. 58, no. 6, pp. 11821195, 2007.
[44] J. Portilla, V. Strela, M. J. Wainwright, and E. P.
Simoncelli,Image denoising using scale mixtures of Gaussians in the
waveletdomain, IEEE Trans. Image Process., vol. 12, no. 11, pp.
13381351,Nov. 2003.
[45] D. R. Martin, C. Fowlkes, D. Tal, and J. Malik, A database
ofhuman segmented natural images and its application to
evaluatingsegmentation algorithms and measuring ecological
statistics, in Proc.8th IEEE ICCV, Jul. 2001, pp. 416423.
[46] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli,
Imagequality assessment: From error visibility to structural
similarity, IEEETrans. Image Process., vol. 13, no. 4, pp. 600612,
Apr. 2004.
[47] F. Luisier, T. Blu, and M. Unser, A new SURE approach to
imagedenoising: Interscale orthonormal wavelet thresholding, IEEE
Trans.Image Process., vol. 16, no. 3, pp. 593606, Mar. 2007.
[48] X. Liu, M. Tanaka, and M. Okutomi, Noise level estimation
usingweak textured patches of a single noisy image, in Proc. IEEE
ICIP,Sep. 2012, pp. 665668.
Shanshan Wang (S12) received her bachelorsdegree in biomedical
engineering from Central SouthUniversity, China in 2009. She is now
pursuing herdouble-Ph.D. degree as a cotutelle student from
bothShanghai Jiao Tong University, China on biomedicalengineering
and University of Sydney, Australia oncomputer science. Her
research interest is inverseproblem in medical imaging and image
process-ing such as MR/PET image reconstruction, imagedenoising and
dictionary learning.
Yong Xia (S05M08) received the B.E., M.E., andPh.D. degrees in
computer science and technologyfrom Northwestern Polytechnical
University, Xian,China, in 2001, 2004, and 2007, respectively. Heis
currently a Postdoctoral Research Fellow in theBiomedical and
Multimedia Information TechnologyResearch Group, School of
Information Technolo-gies, University of Sydney, Sydney, Australia.
He isalso an Associate Medical Physics Specialist in theDepartment
of PET and Nuclear Medicine, RoyalPrince Alfred Hospital, Sydney.
His research inter-
ests include medical imaging, image processing, computer-aided
diagnosis,pattern recognition, and machine learning.
Qiegen Liu was born on Dec. 1983. He receivedthe B.S. degree in
Applied Mathematics from Gan-nan Normal College, B. E. degree in
ComputationMathematics and Ph.D. degree in Biomedical Engi-neering
from Shanghai Jiao Tong University. He isnow working at Department
of Electronic Informa-tion Engineering, Nanchang University,
Nanchang330031, China. His current research interest is
sparserepresentation theory and its applications in imageprocessing
and MRI reconstruction.
Pei Dong (S12) received the bachelors degree inelectronic
engineering and masters degree in signaland information processing
from Beijing Universityof Technology, China, in 2005 and 2008,
respec-tively. He is currently pursuing the Ph.D. degree inSchool
of Information Technologies, The Universityof Sydney, Australia.
His current research interestsinclude video and image processing,
pattern recog-nition, machine learning, and computer vision.
David Dagan Feng (S88M88SM94F03)received the M.E. degree in
electrical engineering &computer science from Shanghai Jiao
Tong Univer-sity, Shanghai, China, in 1982, the M.Sc. degree
inbiocybernetics and the Ph.D. degree in computer sci-ence from the
University of California, Los Angeles,CA, USA, in 1985 and 1988,
respectively, where hereceived the Crump Prize for Excellence in
MedicalEngineering. He is currently the Head of School
ofInformation Technologies and the Director of theInstitute of
Biomedical Engineering and Technology,
University of Sydney, Sydney, Australia, a Guest Professor of a
number ofUniversities, and a Chair Professor of Hong Kong
Polytechnic University,Hong Kong. He is a fellow of ACS, HKIE, IET,
and the Australian Academyof Technological Sciences and
Engineering.
Jianhua Luo was born in Zhejiang province, China,on January 9,
1958. He received the computerscience M.S. degree from Hangzhou
University inHangzhou City, China, in 1992, and the
biomedicalengineering Ph.D. degree from Zhejiang Universityin 1995.
He is currently a professor at ShanghaiJiao Tong University and
also served as an invitedProfessor at the INSA Lyon, France. He is
currentlyprincipal investigator of several research
projectssupported by the NSFC of China and High Tech-nology
Research Development Plan (863 plan) of
China. His research domain concerns magnetic resonance imaging,
includingimage reconstruction, and image processing.
/ColorImageDict > /JPEG2000ColorACSImageDict >
/JPEG2000ColorImageDict > /AntiAliasGrayImages false
/CropGrayImages true /GrayImageMinResolution 150
/GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true
/GrayImageDownsampleType /Bicubic /GrayImageResolution 600
/GrayImageDepth -1 /GrayImageMinDownsampleDepth 2
/GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true
/GrayImageFilter /DCTEncode /AutoFilterGrayImages false
/GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict >
/GrayImageDict > /JPEG2000GrayACSImageDict >
/JPEG2000GrayImageDict > /AntiAliasMonoImages false
/CropMonoImages true /MonoImageMinResolution 400
/MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true
/MonoImageDownsampleType /Bicubic /MonoImageResolution 1200
/MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000
/EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode
/MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None
] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false
/PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000
0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true
/PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ]
/PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier ()
/PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped
/False
/Description >>> setdistillerparams>
setpagedevice