Top Banner
maximum a posteriori (MAP) estimator using a Markov or a maximum entropy random field model for a prior distribution may be viewed as a minimizer of a variational problem. Using notions from robust statistics, a variational filter re- ferred to as a Huber gradient descent flow is proposed. It is a result of optimizing a Huber functional subject to some noise constraints and takes a hybrid form of a total variation diffusion for large gradient magnitudes and of a linear diffusion for small gradient magnitudes. Using the gained insight, and as a further extension, we propose an information-theoretic gradient descent flow which is a re- sult of minimizing a functional that is a hybrid between a negentropy variational integral and a total variation. Il- lustrating examples demonstrate a much improved per- formance of the approach in the presence of Gaussian and heavy tailed noise. In this article, we present a variational approach to MAP estimation with a more qualitative and tutorial em- phasis. The key idea behind this approach is to use geo- metric insight in helping construct regularizing functionals and avoiding a subjective choice of a prior in MAP estimation. Using tools from robust statistics and information theory, we show that we can extend this strat- egy and develop two gradient descent flows for image denoising with a demonstrated performance. Introduction Linear filtering techniques abound in many image pro- cessing applications and their popularity mainly stems from their mathematical simplicity and their efficiency in the presence of additive Gaussian noise. A mean filter, for example, is the optimal filter for Gaussian noise in the sense of minimum mean square error. Linear filters, how- ever, tend to blur sharp edges, destroy lines and other fine image details, fail to effectively remove heavy tailed noise, and perform poorly in the presence of signal-dependent noise. This led to a search for nonlinear filtering alterna- tives. The research effort on nonlinear median-based fil- tering, for example, has resulted in remarkable results and has highlighted some new promising research avenues [1]. On account of its simplicity, its edge preservation property and its robustness to impulsive noise, the stan- dard median filter remains among the favorites for image processing applications [1]. The median filter, however, often tends to remove fine details in the image, such as thin lines and corners [1]. In recent years, a variety of me- dian-type filters such as stack filters and weighted median filters [1] have been developed to overcome this draw- back. In spite of an improved performance, the solutions would clearly benefit from the regularizing power of a prior on the underlying information of interest. SEPTEMBER 2002 IEEE SIGNAL PROCESSING MAGAZINE 37 1053-5888/02/$17.00©2002IEEE ©2001 IMAGESTATE INC. A. Ben Hamza, Hamid Krim, and Gozde B. Unal
11

Unifying probabilistic and variational estimation - IEEE ...hamza/IEEEmagazine.pdfIn recent years, variational methods and partial dif-ferential equation (PDE) based methods [5], [6]

Jun 01, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Unifying probabilistic and variational estimation - IEEE ...hamza/IEEEmagazine.pdfIn recent years, variational methods and partial dif-ferential equation (PDE) based methods [5], [6]

maximum a posteriori (MAP) estimator using aMarkov or a maximum entropy random fieldmodel for a prior distribution may be viewedas a minimizer of a variational problem.

Using notions from robust statistics, a variational filter re-ferred to as a Huber gradient descent flow is proposed. Itis a result of optimizing a Huber functional subject tosome noise constraints and takes a hybrid form of a totalvariation diffusion for large gradient magnitudes and of alinear diffusion for small gradient magnitudes. Using thegained insight, and as a further extension, we propose aninformation-theoretic gradient descent flow which is a re-sult of minimizing a functional that is a hybrid between anegentropy variational integral and a total variation. Il-lustrating examples demonstrate a much improved per-formance of the approach in the presence of Gaussian andheavy tailed noise.

In this article, we present a variational approach toMAP estimation with a more qualitative and tutorial em-phasis. The key idea behind this approach is to use geo-metric insight in helping construct regularizingfunctionals and avoiding a subjective choice of a prior inMAP estimation. Using tools from robust statistics andinformation theory, we show that we can extend this strat-egy and develop two gradient descent flows for imagedenoising with a demonstrated performance.

IntroductionLinear filtering techniques abound in many image pro-cessing applications and their popularity mainly stemsfrom their mathematical simplicity and their efficiency inthe presence of additive Gaussian noise. A mean filter, forexample, is the optimal filter for Gaussian noise in thesense of minimum mean square error. Linear filters, how-ever, tend to blur sharp edges, destroy lines and other fineimage details, fail to effectively remove heavy tailed noise,and perform poorly in the presence of signal-dependentnoise. This led to a search for nonlinear filtering alterna-tives. The research effort on nonlinear median-based fil-tering, for example, has resulted in remarkable results andhas highlighted some new promising research avenues[1]. On account of its simplicity, its edge preservationproperty and its robustness to impulsive noise, the stan-dard median filter remains among the favorites for imageprocessing applications [1]. The median filter, however,often tends to remove fine details in the image, such asthin lines and corners [1]. In recent years, a variety of me-dian-type filters such as stack filters and weighted medianfilters [1] have been developed to overcome this draw-back. In spite of an improved performance, the solutionswould clearly benefit from the regularizing power of aprior on the underlying information of interest.

SEPTEMBER 2002 IEEE SIGNAL PROCESSING MAGAZINE 371053-5888/02/$17.00©2002IEEE

©20

01IM

AG

ES

TA

TE

INC

.

A. Ben Hamza, Hamid Krim,and Gozde B. Unal

Page 2: Unifying probabilistic and variational estimation - IEEE ...hamza/IEEEmagazine.pdfIn recent years, variational methods and partial dif-ferential equation (PDE) based methods [5], [6]

Among Bayesian image estimation methods which en-joy such regularizations, the MAP estimator with aMarkov or a maximum entropy random field prior[2]-[4] has proven to be a powerful approach to imagerestoration. The limitation in using MAP estimation isthe difficulty of systematically and reliably choosing aprior distribution and its corresponding optimizing en-ergy function and in some cases of the resulting computa-tional complexity.

In recent years, variational methods and partial dif-ferential equation (PDE) based methods [5], [6] havebeen introduced to explicitly account for intrinsic ge-ometry to address a variety of problems including im-age segmentation, mathematical morphology, andimage denoising [7], [8]. The latterwill be the focus of the present arti-cle. The problem of denoising hasbeen addressed using a number ofdifferent techniques includingwavelets [9], order-statistics-basedfilters [1], PDE-based algorithms[7], [8], and variational approaches[10]-[12]. In particular, a largenumber of PDE-based methodshave particularly been proposed totack le the problem of imagedenoising [7] with a good preserva-tion of edges. Much of the appeal ofPDE-based methods lies in theavailability of a vast arsenal of math-ematical tools which at the veryleast act as a key guide in achievingnumerical accuracy as well as stabil-ity. PDEs or gradient descent flowsare generally a result of variationalproblems using the Euler-Lagrangepr inc ip le [13] . One popula rvariational technique used in imagedenoising is the total variationbased approach. It was developed in[6] to overcome the basic limita-tions of all smooth regularizationalgorithms, and a variety of numeri-

cal methods have also recently been developed forsolving total variation minimization problems [6],[14].

Image Analysis: Two PerspectivesProblem StatementIn all real applications, measurements are perturbed bynoise. In the course of acquiring, transmitting, or pro-cessing a digital image, for example, the noise-induceddegradation may be dependent or independent of data.The noise is usually described by its probabilistic model,e.g., Gaussian noise is characterized by two moments.Application-dependent, a degradation often yields a re-sulting signal/image observation model, and the mostcommonly used is the additive one

u u0 = + η, (1)

where the observed image u0 includes the original sig-nal u and the independent and identically distributed(i.i.d) noise process η. Fig. 1 depicts an image contami-nated by three types of noise: Gaussian, Laplacian, andimpulsive.

Image denoising refers to the process of recovering animage contaminated by noise (see Fig. 2). The challengeof the problem of interest lies in faithfully recovering the

38 IEEE SIGNAL PROCESSING MAGAZINE SEPTEMBER 2002

(a) (b)

(c) (d)

1. Image denoising problem: (a) original image, (b) Gaussian noise, (c) Laplaciannoise, and (d) impulsive noise.

Linear filtering techniquesabound in many imageprocessing applications and theirpopularity mainly stems fromtheir mathematical simplicityand their efficiency in thepresenceof additive Gaussian noise.

Page 3: Unifying probabilistic and variational estimation - IEEE ...hamza/IEEEmagazine.pdfIn recent years, variational methods and partial dif-ferential equation (PDE) based methods [5], [6]

underlying signal/image u from u0 , and furthering the es-timation by making use of any prior knowledge/assump-tions about the noise process η. This goal is graphicallyand succinctly described in Fig. 2.

Model-Based ApproachIn a probabilistic setting, the image denoising problemis usually solved in a discrete domain, and in this case animage is expressed by a random matrix u uij= ( ) of graylevels. To account for prior probabilistic information wemay have for u, a technique of choice is that of a maxi-mum a posteriori estimation. Denoting by p u( )the priordistribution for the unknown image u, the MAP estima-tor is given by

$ maxlog ( | ) log ( )u p u u p uu

= +arg 0 , (2)

where p u u( | )0 denotes the conditional probability of u0given u.

A general model for the prior distribution p u( ) is thatof a Markov random field (MRF) which is characterizedby its Gibbs distribution given by [2]

p uZ

u( ) exp

( )= −

1 F

λ,

where Z is a partition function and λ is a constant knownas the temperature in the terminology of physical sys-tems. F is called the energy function and has the formF

C( ) ( )u V ucc

=∈∑ , where C denotes a set of cliques (i.e.,

set of connected pixels) for the MRF, andVc is a potentialfunction defined on a clique. We may define the cliques tobe adjacent pairs of horizontal and vertical pixels. Notethat for large λ, the prior probability becomes flat, andfor small λ, the prior probability exhibits sharp modes.

MRFs have been extensively used in computer visionparticularly for image restoration, and it has been estab-lished that Gibbs distributions and MRFs are equivalent(e.g., see [2]). In other words, if a problem is defined interms of local potentials then there is a simple way of for-mulating the problem in terms of MRFs. If the noise pro-cess η is i.i.d. Gaussian, then we have

p u u Ku u

( | ) exp| |

00

2

22= −

σ,

where K is a normalizing positive constant, σ 2 is thenoise variance, and||⋅ stands for the Euclidean norm or forthe absolute value in the case of a scalar. Thus, the MAPestimator in (2) yields

$ min ( ) | |u u u uu

= + −

arg Fλ2 0

2 .(3)

Image estimation using MRF priors has proven to be apowerful approach to restoration and reconstruction ofhigh-quality images. Its major drawback, besides its com-

putational load, is the difficulty in systematically selectinga practical and reliable prior distribution. The Gibbs priorparameter λ is also of particular importance since it con-trols the balance of influence of the Gibbs prior and thatof the likelihood. If λ is too small, the prior will tend tohave an over-smoothing effect on the solution. Con-versely, if it is too large, the MAP estimator may be unsta-ble and it reduces to the maximum likelihood solution asλ goes to infinity. Another difficulty in using a MAPestimator is the nonuniqueness of the solution when theenergy function F is not convex.

A Variational/Nonparametric Approachto MAP EstimationUnknown prevailing statistics or underlying signal/im-age/noise models often make a “target” desired perfor-mance quantitatively less well defined. Specifically, itmay be qualitative in nature (e.g., preserve high gradi-ents in a geometric setting or determine a worst casenoise distribution in a statistical estimation settingwith a number of interpretations) and may not neces-sarily be tractably assessed by an objective and optimalperformance measure. The formulation of such quali-tative goals is typically carried out by way of adaptedfunctionals which upon being optimized, achieve thestated goal, e.g., a monotonically decreasing functionalof gradient modifying a diffusion [5]. This approach isthe so-called variational approach. It is commonly for-mulated in a continuous domain which enjoys a large

SEPTEMBER 2002 IEEE SIGNAL PROCESSING MAGAZINE 39

Image estimation using MRFpriors has proven to be apowerful approach to restorationand reconstruction ofhigh-quality images.

η

u + u + η

Prior Denoising

û

2. Block diagram of image denoising process.

Page 4: Unifying probabilistic and variational estimation - IEEE ...hamza/IEEEmagazine.pdfIn recent years, variational methods and partial dif-ferential equation (PDE) based methods [5], [6]

arsenal of analytical tools, and hence offers a greaterflexibility. An image is defined as a real-valued functionu:Ω →R, and Ω is a nonempty, bounded, open set inR 2

(usually Ω is a rectangle in R). Throughout, x = ( , )x x1 2denotes a pixel location in Ω, and ||||⋅ denotes theL2 -norm. While the ultimate overall objective in theaforementioned formulation may coincide with that ofa probabilistic formulation, namely the recovery of anunderlying desired signal u, it is herein often implicitand embedded in an energy functional to be optimized.Generally, the construction of an energy functional isbased on some characteristic quantity specified by thetask at hand (gradient for segmentation, Laplacian forsmoothing, etc.). This energy functional is oftentimescoupled to a regularizing force/energy to rule out agreat number of solutions and to also avoid any degen-erate solution.

When considering the signal model (1), our goal maybe succinctly stated as one of estimating the underlyingimage u based on an observation u0 and/or any potentialknowledge of the noise statistics to further regularize thesolution. This yields the following fidelity-constrainedoptimization problem:

minu

(u)

u u

F

s.t. − =02 2σ (4)

where F is a given functional which often defines, asnoted above, the particular emphasis on the features ofthe achievable solution. In other words, we want to findan optimal solution that yields the smallest value of theobjective functional among all solutions that satisfy the

constraints. Using Lagrange’s theorem, the minimizer of(4) is given by

$ min ( )u u u uu

= + −

arg Fλ2 0

2 ,(5)

where λ is a nonnegative parameter chosen so that theconstraint u u0

2 2− = σ is satisfied. In practice, the pa-rameter λ is often estimated or chosen a priori.

Equations (3) and (5) show a close connection betweenimage recovery via MAP estimation and image recoveryvia optimization of variational integrals. One may in factreexpress (3) in an integral form similar to that of (5).

A critical issue, however, is the choice of the variationalintegral F , which, as discussed later, is often driven bygeometric arguments. Among the better knownfunctionals (also called variational integrals) in imagedenoising are the Dirichlet and the total variationintegrals defined, respectively, as

D( ) | | ( ) | |u u dx TV u u dx= ∇ = ∇∫ ∫12

2

Ω Ωand ,

where ∇u denotes the gradient of the image u.A generalization of these functionals is the variational

integral given by

F( ) (| |)u F u dx= ∇∫Ω, (6)

where F:R R+ → is a given smooth function called avariational integrand or Lagrangian [13]. Using (6), wehence define a functional

L F( ) ( )

(| |) | |

u u u u

F u u u dx

= + −

= ∇ + −

λ

λ2

2

02

02

Ω,

(7)

which by the formulation in (5) becomes

$ min ( )u uu X

=∈

arg L , (8)

where X is an appropriate image space of smooth functions.

Numerical Solution:Gradient Descent FlowsTo solve the optimization problem (8), a variety of itera-tive methods such as gradient descent [6], or fixed pointmethod [15], may be applied.

The first-order necessary condition to be satisfied byany minimizer of the functionalL given by (7) is its vanish-ing first variation (or vanishing gradient). Using the fun-damental lemma of the calculus of variations, thisvanishing first variation yields an Euler-Lagrange equationas a necessary condition to be satisfied by minimizers ofL.In mathematical terms, the Euler-Lagrange equation isgiven by

40 IEEE SIGNAL PROCESSING MAGAZINE SEPTEMBER 2002

Image filtering refers to theprocess of recovering an imagecontaminated by noise.

2

1.8

1.6

1.4

1.2

1

0.8

0.6

0.4

0.2

00 1 2 3 4 5 6

| |∇ u

F u1c (| |)∇

F uc2 (| |)∇

3. Anisotropic Lagrangians.

Page 5: Unifying probabilistic and variational estimation - IEEE ...hamza/IEEEmagazine.pdfIn recent years, variational methods and partial dif-ferential equation (PDE) based methods [5], [6]

−′ ∇∇

+ − =div in

F uu

u u u(| |)| |

( ) ,λ 0 0 Ω,(9)

where “div” stands for the divergence operator. An imageu satisfying (9) is called an extremal ofL.

Using the Euler-Lagrange variational principle, theminimizer of (8) may be interpreted as the steady state so-lution to the following nonlinear elliptic PDE called gra-dient descent flow

u g u u u ut = ∇ ∇ − − × +div in( (| |) ) ( ),λ 0 Ω R ,

where g z F z z( ) ( ) /= ′ , with z >0, and assumed homoge-neous Neumann boundary conditions.

Illustrative CasesThe following examples illustrate the close connectionbetween optimization problems of variational integralsand boundary value problems for partial differentialequations in a no noise constraint case (i.e., settingλ =0).

Heat Equationu ut = ∆ is the gradient descent flow for the Dirichletvariational integral D( )u .

It is important to point out that the Dirichlet func-tional tends to smooth out sharp jumps because it con-trols the second derivative of image intensity i.e., its“spatial acceleration,” and it diffuses the intensity values

isotropically. Fig. 4(b) shows thisblurring effect on a clean image de-picted in Fig. 4(a).

Perona-Malik EquationIt has been shown in [16] that thePerona-Malik (PM) diffusionu g u ut = ∇ ∇div( (| |) ) is the gradientdescent flow for the variational inte-gral

Fc cu F u dx( ) (| |)= ∇∫Ω,

with sample LagrangiansF z c z cc

1 2 2 21( ) log( / )= + orF z c z cc

2 2 2 21( ) ( ( / ))= − −exp , seeFig. 3, where z ∈ +R and c is a tuningpositive constant.

A minimizat ion of suchfunctionals encourages the smooth-ing of homogenuous/small gradientregions and the preservation ofedges/high gradient regions. Notethat ill-posedness of this formulationwas addressed in a number of papers(e.g., see [16]). A result of applyingthe PM flow with Fc

1 to the originalimage in Fig. 4(a) is illustrated in Fig. 4(c). It is worthnoting how the diffusion takes place throughout the ho-mogeneous regions and not across the edges.

Curvature Flowu u ut = ∇ ∇div( /| |) corresponds to the total variationintegral.

While limiting spurious oscillations, TV optimizationpreserves sharp jumps as is often encountered in “blocky”signals/images. Fig. 4(d) illustrates the output of the TVflow.

SEPTEMBER 2002 IEEE SIGNAL PROCESSING MAGAZINE 41

(a) (b)

(c) (d)

4. Filtering results: (a) original image, (b) heat flow, (c) Perona-Mailk flow, and (d) TV flow.

ρ∇

k(|

|)u

–k k(| |∇ u )

5. Huber function.

Page 6: Unifying probabilistic and variational estimation - IEEE ...hamza/IEEEmagazine.pdfIn recent years, variational methods and partial dif-ferential equation (PDE) based methods [5], [6]

Robust Variational ApproachRobustness for Unknown StatisticsIn robust estimation, for example, a case where even thenoise statistics are not precisely known [17], [9] arises. Inthis case, a reasonable strategy would be to assume thatthe noise is a member of some set, or of some class ofparametric families, and to pick the worst case density(least favorable, in some sense) member of that set, andobtain the best signal reconstruction for it. Huber’sε-contaminated normal set Pε is defined as [17]

P Sε ε ε= − + ∈( ) : 1 Φ H H ,

whereΦ is the standard normal distribution,S is the set ofall probability distributions symmetric with respect to theorigin and ε ∈[ , ]0 1 is the known fraction of “contamina-tion.” Huber found that the least favorable distribution inPε which maximizes the asymptotic variance (or, equiva-lently, minimizes the Fisher information) is Gaussian inthe center and Laplacian in the tails. The transition be-tween the two depends on the fraction of contaminationε, i.e., larger fractions correspond to smaller switchingpoints and vice versa.

For the setPε of ε-contaminated normal distributions,the least favorable distribution has a density functionf z zH k( ) (( ) / exp( ( ))= − −1 2ε π ρ (e.g., see [17]), where

ρk is the Huber M-estimator cost function (see Fig. 5)given by

ρk z

z z k

k z k( )

| |

| |=

2

22

2

if

otherwise.

Here k is a positive constant deter-mined by the fraction of contamina-tion ε [17].

Motivated by the robustness ofthe Huber M-filter in a probabilisticsetting [1] and its resilience to im-puls ive noise, we propose avariational filter which, when ac-counting for these properties, leadsto the following energy functional:

Rk ku u dx( ) (| |) .= ∇∫ ρΩ

Note that the Huber variational integral is a hybrid of theDirichlet variational integral (ρk u u(| |) | | /∇ ∝ ∇ 2 2 as k→ ∞)and of the total variation integral (ρk u u(| |) | |∇ ∝ ∇ as k→ 0).

Using the Euler-Lagrange variational principle, aHuber gradient descent flow is obtained as

u g u u u ut k= ∇ ∇ − − × +div in( (| |) ) ( ),λ 0 Ω R ,

where gk is the Huber M-estimator weight function

g zz

z

z kkz

kk( )( ) | |

| |=

′=

ρ 1 if

otherwise.

For large k, this flow yields an isotropic diffusion (heatequation when λ =0), and for small k, it corresponds to totalvariation gradient descent flow (curvature flow when λ =0).

It is worth pointing out that in the case of no noiseconstraint (i.e., setting λ =0), the Huber gradient descentflow yields a robust anisotropic diffusion [18] obtainedby replacing the diffusion functions proposed in [5] withrobust M-estimator weight functions [17], [1].

PM Equation:An Estimation-Theoretic PerspectiveIn a similar spirit as above, one may proceed to justify thePM equation from a specific statistical model. Assumingan image u uij= ( )as a random matrix with i.i.d. elements,the output of the Log-Cauchy filter [21] is defined as asolution to the maximum log-likelihood estimation prob-lem for a Cauchy distribution with dispersion c and esti-mation parameter θ. In other words, the output of aLog-Cauchy filter is the solution to the following robustestimation problem [21]:

( )min log ( ) min ( ),, ,θ θ

θ θi j ij ci j ijc u F u∑ ∑+ − = −2 2

where the cost function Fc coincides with the Lagrangianfunction which yields the PM equation. Hence, in the

42 IEEE SIGNAL PROCESSING MAGAZINE SEPTEMBER 2002

(a) (b)

6. Log-Cauchy filtering: (a) impulsive noise and (b) filtered image.

The performance of a filterclearly depends on the filtertype, the properties ofsignals/images, and thecharacteristics of the noise.

Page 7: Unifying probabilistic and variational estimation - IEEE ...hamza/IEEEmagazine.pdfIn recent years, variational methods and partial dif-ferential equation (PDE) based methods [5], [6]

probabilistic setting the PM flow corresponds to theLog-Cauchy filter. Fig. 6 illustrates the performance ofthe Log-Cauchy filter in removing impulsive noise.

Information-Based FunctionalsInformation Theoretic ApproachIn the previous section, we proposed a least favorabledistribution as a result of exercising our ignorance in de-scribing that of an image gradient within a domain. An-other effective way is to adopt a criterion which boundssuch a case, namely that of entropy. The maximum en-tropy criterion is indeed an important principle in statis-tics for modeling the prior probability p u( )of a process uand has been used with success in numerous image pro-cessing applications [3]. The term is often associatedwith qualifying the selection of a distribution subject tosome moments constraints (e.g., mean, variance, etc.),that is, the available information is described by way ofmoments of some known functions m ur ( ) withr s=1, ,K . Coupling the finiteness of m ur ( ), for exam-ple, with the maximum entropy condition of the datasuggests a most random model p u( ) with the corre-sponding moments constraints as a most adapted model(equivalently minimizing negentropy see [19]).

min ( )log ( )

( )

( ) ( ) ,

u

r r

p u p u du

p u du

m u p u du r

∫∫∫

=

= =

s.t. 1

1µ , , .K s (10)

Using Lagrange’s theorem, the solution of (10) is givenby

p uZ

m urr

sr( ) exp ( )= −

=∑11λ ,

(11)

where λ r s are the Lagrange multipliers, and Z is a parti-tion function. The resulting model p u( )given by (11) mayhence be used as a prior in a MAP estimation formulation.

Entropic Gradient Descent FlowMotivated by the good performance of the maximum en-tropy principle in image/signal analysis applications andinspired by its rationale, we may naturally adapt it to de-scribe the distribution of a gradient throughout an image.Specifically, the large gradients should coincide with tailevents of this distribution, while the small and mediumones representing the smooth regions, form the mass ofthe distribution. Towards that end, we write

H( ) (| |) | |log| | .u H u dx u u dx= ∇ = ∇ ∇∫ ∫Ω Ω

Calling upon the Euler-Lagrange variational principleagain, the following entropic gradient descent flow results:

SEPTEMBER 2002 IEEE SIGNAL PROCESSING MAGAZINE 43

zz zz

z

–1log( )

( )ρk

14

12

10

8

6

4

2

0

–2

Fz()

0 1 2 3 4 5 6z

7. Visual comparison of some variational integrands.

10

8

6

4

2

0

–20 1 2 3 4 5 6

| |∇ u

Hu

TV

(||)

8. Improved entropic Lagrangian.

PossiblePaths

x x x

Time

9. A particle (pixel) may diffuse over many possible paths, andan average is usually computed.

Page 8: Unifying probabilistic and variational estimation - IEEE ...hamza/IEEEmagazine.pdfIn recent years, variational methods and partial dif-ferential equation (PDE) based methods [5], [6]

uu

uu u ut =

+ ∇∇

− − × +div in

10

log| || |

( ),λ Ω R ,

with homogeneous Neumann boundary conditions. Inaddition, this energy spread of the gradient energy may berelated to that sought by the total variation method,which in contrast allows for additional higher gradients.

Improved Entropic Gradient Descent FlowTo summarize and for a comparison, we show in Fig. 7 thebehavior of the variational integrands we have discussed inthis article. It can be readily shown [20] that a differentia-ble hybrid functional between the negentropy variationalintegral and the total variation may be defined as

~( )( ) | |

( ) ( )H

Hu

u u eTV u e

=∇ ≤

ifmeas otherwise,2 Ω

yielding an improved gradient de-scent flow. The quantity meas( )Ωdenotes the Lebesgue measure ofthe image domain Ω, ande = exp( )1 . Fig. 8 depicts theLagrangian corresponding to theimproved entropic flow.

Physical Basis of DiffusionIn contrast to the macroscopicscale which reflects the large num-ber of particles process typicallymodeled by a PDE and wherelarge scale regularization isill-posed, a microscopic scale ap-proach may also be adopted.

The physical notion of diffu-sion pertains to the net transportof particles across a unit surface be-ing proportional to the gradient ofthe material density normal to theunit area. A similar interpretationmay be given to a diffusion of animage by modeling the motion ofpixels along a Brownian (or ran-dom walk on a discrete lattice) tra-jectory (see Fig. 9). Invoking themicroscopic scale of diffusion byway of modeling the particles tra-jectories helps clarify the dynamicswhich are often important to pro-pose a creative solution.

A probabilistic view of theabove evolution equations mayhence be achieved with a carefulinterpretation of an image (atwo-dimensional function) as adensity of particles (image pix-els). Specifically, a diffusion ap-plied to an image is tantamount toevolving a probability densityfunction of a process for which aparticle trajectory (i.e., micro-scopic scale) is captured by a sto-chastic differential equation(SDE) [22]. The corresponding

44 IEEE SIGNAL PROCESSING MAGAZINE SEPTEMBER 2002

(a)

(c)

(e)

(b)

(d)

(f)

10. Outputs of polygonal flows.

Page 9: Unifying probabilistic and variational estimation - IEEE ...hamza/IEEEmagazine.pdfIn recent years, variational methods and partial dif-ferential equation (PDE) based methods [5], [6]

macroscopic process is modeled by aPDE and which for the linear case isthe heat equation. This interpreta-tion is not only intuitive, but alsoprovides a powerful framework forpossibly novel diffusions withunique properties. In [22], a nonlin-ear diffusion was developed for in-vestigating the PM equation in thislight, and thereby resolving a longstanding problem of unknown stop-ping time for unconstrained nonlin-ear diffusion equations. A similarapproach was used in developing po-lygonal flows important in preserv-ing man-made shapes in images[24].

In Fig. 10, the top row shows twosimple images with polygonal struc-tures, namely rectangles and dia-monds, corrupted by additiveGaussian noise. The middle row showsthe results of applying the geometricheat flow (also known as curve short-ening flow) [23], which acts on the im-age level curves, to the noisy images.The geometric blurring (i.e., therounding effects) caused by the geo-metric heat flow can be overcome byusing information on the orientationof the salient image lines. It followsthat modified geometric heat flows canbe designed for specific structures, andthe corresponding results are illus-trated in the bottom row of Fig. 10. Itis important to note that the geometricdiffusion is slowed down along impor-tant structures (i.e the rectangles anddiamonds shapes) [24].

Experimental ResultsThis section presents simulation re-sults where Huber, entropic, total variation, and im-proved entropic gradient descent flows are applied toenhance images corrupted by Gaussian and Laplaciannoise.

The performance of a filter clearly depends on the filtertype, the properties of signals/images, and the character-istics of the noise. The choice of criteria by which to mea-sure the performance of a filter presents certain difficultiesand only gives a partial picture of reality. To assess the per-formance of the proposed denoising methods, a meansquare error (MSE) between the filtered and the originalimage is evaluated and used as a quantitative measure ofperformance of the proposed techniques. The regulariza-tion parameter (or Lagrange multiplier) λ for the pro-

SEPTEMBER 2002 IEEE SIGNAL PROCESSING MAGAZINE 45

(a) (b)

(c) (d)

(e) (f)

11. Filtering results for Gaussian noise: (a) original image, (b) noisy image, (c) Huber,(d) entropic, (e) total variation, and (f) improved entropic.

Table 1. MSE Computations for Gaussian noise.

PDEMSE

SNR = 4.79 SNR = 3.52 SNR = 2.34

Huber 234.1499 233.7337 230.0263

Entropic 205.0146 207.1040 205.3454

TV 247.4875 263.0437 402.0660

ImprovedEntropic 121.2550 137.9356 166.4490

Page 10: Unifying probabilistic and variational estimation - IEEE ...hamza/IEEEmagazine.pdfIn recent years, variational methods and partial dif-ferential equation (PDE) based methods [5], [6]

posed gradient descent flows is cho-sen to be proportional to sig-nal-to-noise ratio (SNR) in all theexperiments.

To evaluate the performance of theproposed gradient descent flows inthe presence of Gaussian noise, theimage shown in Fig. 11(a) has beencorrupted by Gaussian white noisewith SNR = 479. db. Fig. 11 displaysthe results of filtering the noisy imageshown in Fig. 11(b) by Huber withoptimal k =1345. , entropic, total vari-ation and improved entropic gradientdescent flows. Qualitatively, we ob-serve that the proposed techniquesare able to suppress Gaussian noisewhile preserving important featuresin the image. The resulting MSEcomputations are depicted in Table 1.

The Laplacian noise is somewhatheavier than the Gaussian noise.Moreover, the Laplace distribution issimilar to Huber’s least favorable dis-tribution [17] at least in the tails. Todemonstrate the application of theproposed gradient descent flows toimage denoising, qualitative andquantitative comparisons are per-formed to show a much improvedperformance of these techniques. Fig.12(b) shows a noisy image contami-nated by Laplacian white noise withSNR =391. db. The MSE’s resultsobtained by applying the proposedtechniques to the noisy image areshown in Table 2. Note that fromFig. 12 it is clear that the improvedentropic gradient descent flow out-performs the other flows in remov-ing Laplacian noise. Comparison ofthese images clearly indicates thatthe improved entropic gradient de-scent flow preserves well the image

structures while removing heavy tailed noise.

AcknowledgmentThis work was supported in part by an AFOSR grantF49620-98- 1-0190 and NSF.

A. Ben Hamza received his degrees in applied mathemat-ics. From March 2000 to February 2001, he was a researchassociate with the Electrical and Computer EngineeringDepartment at North Carolina State University, Raleigh,where he is currently pursuing the Ph.D. degree. His re-

46 IEEE SIGNAL PROCESSING MAGAZINE SEPTEMBER 2002

Table 2. MSE Computations for Laplacian Noise.

PDEMSE

SNR = 6.33 SNR = 3.91 SNR = 3.05

Huber 237.7012 244.4348 248.4833

Entropic 200.5266 211.4027 217.3592

TV 138.4717 176.1719 213.1221

ImprovedEntropic 104.4591 170.2140 208.8639

(a) (b)

(c) (d)

(e) (f)

12. Filtering results for Laplacian noise: (a) original image, (b) noisy image, (c) Huber,(d) entropic, (e) total variation, and (f) improved entropic.

Page 11: Unifying probabilistic and variational estimation - IEEE ...hamza/IEEEmagazine.pdfIn recent years, variational methods and partial dif-ferential equation (PDE) based methods [5], [6]

search interests include nonlinear probabilistic andvariational filtering, information-theoretic measures, andcomputer vision.

Hamid Krim received his degrees in electrical engineer-ing. As a Member of Technical Staff at AT&T Bell Labs,he has worked in the areas of telephony and digitalcommunication systems/subsystems. Following anNSF postdoctoral fellowship at Foreign Centers of Ex-cellence, LSS/University of Orsay, Paris, France, he be-came a Research Scientist at the Laboratory forInformation and Decision Systems, Massachusetts In-stitute of Technology, Cambridge. He is presently onthe faculty in the Electrical and Computer EngineeringDepartment, North Carolina State University, Raleigh.His research interests are in statistical signal and imageanalysis and mathematical modeling with a keen em-phasis on applied problems. He is a Senior Member ofthe IEEE and an Associate Editor for the IEEE Transac-tions on Signal Processing.

Gozde B. Unal received her B.S. degree in electrical engi-neering from Middle East Technical University, Ankara,in 1996 and her M.S. degree in electrical engineeringfrom the Bilkent University, Ankara, Turkey, in 1998. Sheis pursuing her Ph.D. in the Electrical and Computer En-gineering Department at North Carolina State Univer-sity, Raleigh, and is a research assistant with the Vision,Information and Statistical Signal Theories and Applica-tions group. Her research interests include curve evolu-tion theory with connections to information theory andprobability theory, application of curve evolution tech-niques to various image and video processing problemssuch as image smoothing, image and texture segmenta-tion, object tracking, and computer vision problems suchas object recognition.

References[1] J. Astola and P. Kuosmanen, Fundamentals of Nonlinear Digital Filtering.

Boca Raton, FL: CRC, 1997.

[2] S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions andthe Bayesian restoration of images,” IEEE Trans. Pattern Anal. MachineIntell., vol. 6, pp. 721-741, July 1984.

[3] H. Stark, Ed., Image Recovery: Theory and Application. New York: Aca-demic, 1987.

[4] S.C. Zhu and D. Mumford, “Prior learning and Gibbs reaction-diffusion,”

IEEE Trans. Pattern Anal. Machine Intell., vol. 19, pp. 1236-1244, Nov.1997.

[5] P. Perona and J. Malik, “Scale space and edge detection using anisotropicdiffusion,” IEEE Trans. Pattern Anal. Machine Intell., vol. 12, pp. 629-639,July 1990.

[6] L. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noiseremoval algorithms,” Physica D, vol. 60, pp. 259-268, 1992.

[7] D. Tschumperle and R. Deriche, “Diffusion PDEs on vector-valued im-ages,” IEEE Signal Processing Mag., vol. 19, pp. 16-25, Sept. 2002.

[8] I. Pollak, “Segmentation and restoration via nonlinear multiscale filtering,”IEEE Signal Processing Mag., vol. 19, pp. 26-36, Sept. 2002.

[9] H. Krim and I.C. Schick, “Minimax description length for signal denoisingand optimized representation,” IEEE Trans. Inform. Theory, vol. 45, pp.898-908, Apr. 1999.

[10] I. Pollak, A.S. Willsky, and H. Krim, “Image segmentation and edge en-hancement with stabilized inverse diffusion equations,” IEEE Trans. ImageProcessing, vol. 9, pp. 256-266, Feb. 2000.

[11] J. Morel and S. Solemini, Variational Methods in Image Segmentation.Boston, MA: Birkhaüser, 1995.

[12] C. Samson, L. Blanc-Feraud, G. Aubert, and J. Zerubia, “A variationalmodel for image classification and restoration,” IEEE Trans. Pattern Anal.Machine Intell., vol. 22, pp. 460-472, May 2000.

[13] M. Giaquinta and S. Hildebrandt, Calculus of Variations I: The LagrangianFormalism. New York: Springer-Verlag, 1996.

[14] A. Chambolle and P.L. Lions, “Image recovery via total variationminimization and related problems,” Num. Math., vol. 76, pp. 167-188, 1997.

[15] K. Ito and K. Kunisch, “Restoration of edge-flat-grey scale images,” In-verse Problems, vol. 16, no. 4, pp. 909-928, Aug. 2000.

[16] Y.L. You, W. Xu, A. Tannenbaum, and M. Kaveh, “Behavioral analysis ofanisotropic diffusion in image processing,” IEEE Trans. Image Processing,vol. 5, pp. 1539-1553, Nov. 1996.

[17] P. Huber, Robust Statistics. New York: Wiley, 1981.

[18] M.J. Black, G. Sapiro, D.H. Marimont, and D. Heeger, “Robustanisotropic diffusion,” IEEE Trans. Image Processing, vol. 7, pp. 421-432,Mar. 1998.

[19] H. Krim, “On the distributions of optimized multiscale representations,”ICASSP-97, vol. 5, pp. 3673 -3676, 1997.

[20] A. Ben Hamza and H. Krim, “A variational approach to maximum a pos-teriori estimation For Image denoising,” Lecture Notes in Comput. Sci.,vol. 2134, pp. 19-34, Sept. 2001.

[21] A. Ben Hamza and H. Krim, “Image denoising: A nonlinear robust statis-tical approach,” IEEE Trans. Signal Processing, vol. 49, pp. 3045-3054, Dec.2001.

[22] H. Krim and Y. Bao, “Smart nonlinear diffusion: A probabilistic ap-proach,” submitted for publication.

[23] M.A. Grayson, “The heat equation shrinks embedded plane curves toround points,” J. Differ. Geometry, vol. 26, pp. 285-314, 1987.

[24] G.B. Unal, H. Krim, A. Yezzi, “Stochastic differential equations and geo-metric flows,” IEEE Trans. Image Processing, to be published.

SEPTEMBER 2002 IEEE SIGNAL PROCESSING MAGAZINE 47