IImage Processing: MathematicsG Aubert, Universite de Nice
Sophia Antipolis,Nice, FranceP Kornprobst, INRIA, Sophia Antipolis,
France 2006 Elsevier Ltd. All rights reserved.Our society is often
designated as being an infor-mation society. It could also be
defined as animage society. This is not only because image is
apowerful and widely used medium of communica-tion, but also
because it is an easy, compact, andwidespread way to represent the
physical world. Ifwe think about it, it is indeed striking to
realize justhow much images are omnipresent in our livesthrough
numerous applications such as medical andsatellite imaging,
videosurveillance, cinema,robotics, etc.Many approaches have been
developed to processthese digital images, and it is difficult to
say whichone is more natural than the other. Image processinghas a
long history. Maybe the oldest methods comefrom 1D signal
processing techniques. They rely onfilter theory (linear or not),
on spectral analysis, oron some basic concepts of probability and
statistics.For an overview, we refer the interested reader tothe
book by Gonzalez and Woods (1992).In this article, some recent
mathematical conceptswill be revisited and illustrated by the
imagerestoration problem, which is presented below. Wefirst discuss
stochastic modeling which is widelybased on Markov random field
theory and dealsdirectly with digital images. This is followed by
adiscussion of variational approaches where thegeneral idea is to
define some cost functions in acontinuous setting. Next we show how
the scalespace theory is connected with partial
differentialequations (PDEs). Finally, we present the
wavelettheory, which is inherited from signal processingand relies
on decomposition techniques.IntroductionAs in the real world, a
digital image is composed ofa wide variety of structures. Figure 1
shows differentkinds of textures, progressive or sharp contours,and
fine objects. This gives an idea of the complex-ity of finding an
approach that allows to cope withthe different structures at the
same time. It alsohighlights the discrete nature of images which
willbe handled differently depending on the chosenmathematical
tools. For instance, PDEs basedapproaches are written in a
continuous setting,referring to analogous images, and once the
exist-ence and the uniqueness of the solution have beenproved, we
need to discretize them in order to find anumerical solution. On
the contrary, stochasticapproaches will directly consider discrete
images inthe modeling of the cost functions.The Image Restoration
ProblemIt is well known that during formation, transmis-sion, and
recording processes images deteriorate.Classically, this
degradation is the result of twophenomena. The first one is
deterministic and isrelated to the image acquisition modality, to
possibledefects of the imaging system (e.g., blur created byan
incorrect lens adjustment or by motion). Thesecond phenomenon is
random and corresponds tothe noise coming from any signal
transmission. Itcan also come from image quantization. It
isimportant to choose a degradation model as closeas possible to
reality. The random noise is usuallymodeled by a probabilistic
distribution. In manycases, a Gaussian distribution is assumed.
However,some applications require more specific ones, likethe gamma
distribution for radar images (specklenoise) or the Poisson
distribution for tomography.Unfortunately, it is usually impossible
to identify thekind of noise involved for a given real image.A
commonly used model is the following. Letu: & R2!R be an
original image describing a realscene, and let f be the observed
image of the samescene (i.e., a degradation of u). We assume thatf
Au i 1where i stands for a white additive Gaussian noiseand A is a
linear operator representing the blur(usually a convolution). Given
f, the problem isthen to reconstruct u knowing [1]. This problemis
ill-posed, and we are able to carry out only anapproximation of u.
In this article, we will focus onthe simplified model of pure
denoising:f u i 2The Probabilistic ApproachThe Bayesian FrameworkIn
this section, we show how the problem of puredenoising, that is,
recovering u from the equationf =u i knowing only some statistical
informationon i can be solved by using a probabilisticapproach. In
this context, f, u, and i are consideredas random variables. The
general idea for recoveringu is to maximize some prior probability.
Mostmodels involve two parts: a prior model of possiblerestored
images u and a data model expressingconsistency with the observed
data. The prior model is given by a probability space(u, p), where
u is the set of all values of u. Themodel is specified by giving
the probability p(u)on all these values. The data model is a larger
probability space(u, f, p), where u, f is the set of all possible
valuesof u and all possible values of the observed imagef. This
model is completed by giving the condi-tional probability p(f u) of
any image f given u,resulting in the joint probabilities p(f , u)
=p(f u)p(u). Implicitly, we assume that the spaces(u) and (u, f)
are finite although huge.The next step is to use a Bayesian
approachintroduced in image processing by Besag (1974)and Geman and
Geman (1984). The probabilitiesp(u) and p(f u) are supposed to be
known and,given an observed image f, we seek the imageu which
maximizes the conditional a posterioriprobability p(uf ) (MAP:
Maximum A Posteriori).Thanks to the Bayes rule, we havepuf pf upupf
3Let us explain the meaning of the different termsin [3]: The term
p(f u) expresses the probability, thelikelihood, that an image u is
realized in f. It alsoquantifies the lack of total precision of the
modeland the presence of noise. The term p(u) expresses our
incomplete a prioriinformation about the ideal image u (it is
theprobability of the model, i.e., the propensity thatu be realized
independently of the observation f ). The term p(f ) which is the
probability to observe fis a constant and does not play any role
whenmaximizing the conditional probability p(uf )with respect to
u.Let us remark that the problem maxup(uf ) isequivalent to
minuE(u) =log p(f u) log p(u).So Bayesian models lead to a
minimizationprocess.Then the main question is how to assign
theseprobabilities? The easiest probability to determine isp(f u).
If the images u and f consist in a set of valuesu=(ui, j), i, j =1,
N and f =(fi, j), i, j =1, N, we sup-pose the conditional
independence of (fi, jui, j) in anypixel:pf u Ni1pfi.jui.jand if
the restoration model is of the form f =u iwhere i is a white
Gaussian noise with variance o2,thenpfi.jui.j 12op exp fi.j
ui.j22o2andpf u 12oN2 exp Ni.jfi.j ui.j22o2Therefore, at this
stage, the MAP reduces tominimizeEu Kokf uk2log pu 4where k.k
stands for the Euclidean norm on RN2andKo is a constant. So, it
remains now to assign aprobability law p(u). To do that, the most
commonway is to use the theory of Markov random
fields(MRFs).(a)(b)(c)(d)(e)Figure 1 Digital image example. 1 the
close-ups showexamples of low resolution, low contrasts, graduated
shadings,sharp transitions, and fine elements. (a) low resolution,
(b) lowcontrasts, (c) graduated shadings, (d) sharp transitions,
and(e) fine elements.2 Image Processing: MathematicsThe Theory of
Markov Random FieldsIn this approach, an image is described as a
finite setS of sites corresponding to the pixels. For each site,we
associate a descriptor representing the state ofthe site, for
example, its gray level. In order to takeinto account local
interaction between sites, oneneeds to endow S with a system of
neighborhoods V.Definition 1 For each site s, we define its
neighbor-hood V(s) as:Vs ftg such that s 2Vs and t 2Vs )s 2VtThen
we associate to this neighborhood system thenotion of clique: a
clique is either a singleton or a setof sites which are all
neighbors of each other.Depending on the neighborhood system, the
familyof cliques will be different and involve more and lesssites.
We will denote by C the set of all the cliquesrelative to a
neighborhood system V (see Figure 2).Before introducing the general
framework ofMRFs, let us define some notations. For a site s,Xs
will stand for a random variable taking its valuesin some set E
(e.g., E ={0, 1, . . . , 255}) and xs will bea realization of Xs
and xs=(xt)t6s will denote animage configuration where site s has
been removed.Finally, we will denote by X the random variableX=(Xs,
Xt, . . . ) with values in =EjSj.Definition 2 We say that X is an
MRF if the localconditional probability at a site s is only a
functionof V(s), that is,pXs xsXs xs pXs xsxt. t 2VsTherefore, the
gray level at a site depends only ongray levels of neighboring
pixels. Now we give thefollowing fundamental theorem due to
HammersleyClifford (Besag 1974) which states the equivalencebetween
MRFs and Gibbs fields.Theorem 1 Let us suppose that S is finite, E
is adiscrete set and for all x2=EjSj, p(X=x) 0,then X is an MRF
relatively to a system ofneighborhoods V if and only if there
exists a familyof potential functions (Vc)c 2C such thatp(x) =(1Z)
exp(c 2C Vc(x)).The function V(x) =c 2C Vc(x) is called theenergy
potential or the Gibbs measure and Z is anormalizing constant: Z=
exp(x2V(x)).If, for example, the collection of neighborhoods isthe
set of 4-neighbors, then the theorem says thatV(x) =c ={s}
2C1Vc(xs) c ={(s, t)} 2C2Vc(xs, xt).Application to the Denoising
ProblemNow, given this theorem we can reformulate, thanksto [4],
the restoration problem (with the change ofnotation u =x and
us=xs): find u minimizing theglobal energyEu Kokf uk2Vu 5The next
step is now to precise the Gibbsmeasure. In restoration, the
potential V(u) is oftendedicated to impose local regularity
constraints, forexample, by penalizing differences between
neigh-bors. This can be modeled using cliques of order 2 inthe
following manner:Vu us.t 2C2cus utwhere c is a given real function.
This term penalizesthe difference of intensities between neighbors
whichmay come from an edge or some noise. This discretecost
function is very similar to the gradient penaltyterms in the
continuous framework (see the nextsection). The resulting final
energy is (sometimesE(u) is written E(uf ))Eu Kos 2Sfs us2us.t
2C2cus utwhere the constant u is a weighting parameterwhich can be
estimated.The difficulty in choosing the strength of thepenalty
term defined by c is to be able to penalizethe noise while keeping
the most salient features,that is, edges. Historically, the
function c was firstchosen as c(z) =z2but this choice is not good
sincethe resulting regularization is too strong introducinga blur
in the image and loss of the edges. A betterchoice is c(z) =jzj
(Rudin et al. 1992) or aregularized version of this function. Of
course,other choices are possible depending on the con-sidered
application and the desired degree ofsmoothness.In this section, it
has been shown how to modelthe restoration problem through MRFs and
theBayesian framework. Numerically, two main typesof algorithms can
be used to minimize the energy:deterministic algorithms and
stochastic algorithms.The former are generally used when the
globalenergy is strictly convex (e.g., algorithms based onC1 C2 C1
C2Figure 2 Examples of neighborhood system and cliques.Image
Processing: Mathematics 3gradient descent). The latter are rather
used whenE(u) is not convex. There are stochastic minimiza-tion
algorithms mainly based on simulated anneal-ing. Their main
interest is that they always converge(almost surely) to a minimizer
(this is not the casefor deterministic algorithms which give only
localminimizers) but they are often strongly timeconsuming.We refer
the reader to Li (1995) for more detailsabout MRFs and Bayesian
framework andKirkpatrick et al. (1983) for more information
onstochastic algorithms.The Variational ApproachMinimizing a Cost
Function over aFunctional SpaceOne important issue in the previous
section was thedefinition of p(u) which gives some a priori on
thesolution. In the variational approach, this idea isalso present
but the way to infer it is in fact todefine the more suitable
functional space thatdescribes images and their geometrical
properties.The choice of a functional space sets a norm whichin
turn will constrain the solution to a certainsmoothness.We
illustrate this idea in this section on thedenoising problem [2]
which can be seen as adecomposition one. This means that given
theobservation f, we look for u and i such thatf =ui, where i
incorporates all oscillations, thatis, noise, and also texture. Let
us define a functionalto be minimized which takes into account the
data fand possibly some statistical informations about
i:minu.icjujE such that jijG owith f u ig6This formulation means
that we look, among alldecompositions f =u i, for the one which
mini-mizes c(jujE) under the constraint (jijG) =o.Banach spaces E
and G, and functions c and will be discussed in the next
subsection. Since aminimization problem under constraints can
beexpressed with an additional term weighted by aLagrange
multiplier, the formulation [6] can berewritten as:minu.icjujE
`jijG; f u i 7A similar writing consists in replacing i by f u
sothat [7] rewritesminucjujE `jf ujG 8which is the classical
formulation in image restora-tion. From a numerical point of view,
the minimiza-tion is usually carried out by solving the
associatedEuler equations but this may be a difficult task. Themain
concern is the search for E and G and theirnorm (or seminorm). It
is guided by the choice thatan image u is composed of various
geometricstructures (homogeneous regions, edges) whilei =f u
represents oscillations (noise and textures).Examples of Functional
SpacesIn this section, we revisit some possible choices
offunctional spaces summarized in Table 1.The first case (a) was
inspired by the classicalTikhonov regularization. The functional
spaceH1()( & R2) is the space of functions in L2()such that the
distributional gradient Du is in L2().Unfortunately, functions in
H1() do not admitdiscontinuities across curves and this is a
majorproblem with respect to image analysis since imagesare made of
smooth patches separated by sharpvariations.Considering the
problemreported in (a), Rudin et al.(1992) proposed to work on
BV(), the space ofbounded variations (BV) Ambrosio et al.
(2000)defined byBV u2L1;
Du j j < 1 with
Du j j sup
udivdx; 1. 2. . . . . N 2C10N.jjL1 1
9Table 1 Examples of functional spaces and their norm (see model
[8])Model E and jujEc(t ) G and jujG(t )(a) H1(), jujE =
jruj2dx 12t2L2() with its usual norm t2(b) BV(), jujE =
jDuj t L2() with its usual norm t2(c) BV(), jujE =
jDuj t fb 2 L2(); b =div, jjL1()2 1, Nj0=0g t4 Image Processing:
MathematicsIt is equivalent to define BV() as the space ofL1()
functions whose distributional gradient Du isa bounded measure and
[9] is its total variation. Thespace BV() has some interesting
properties:1. lower semicontinuity of the total variation
Du j j with respect to the L1() topology,2. if u2BV(), we can
define, for H1almosteverywhere x 2Su, the complement of
Lebesguepoints (i.e., the jump set of u), a normal nu(x)and two
approximate right and left limitsu(x) and u(x), and3. Du can be
decomposed as a sum of a regularmeasure, a jump measure, and a
Cantor measure:Du rudx u unuH1Su Cuwhere ru is the approximate
gradient and H1theone-dimensional Hausdorff measure.This ability to
describe functions with disconti-nuities across a hypersurface Su
makes BV() veryconvenient to describe images with edges. In
thiscontext, the image restoration problem is wellposed and
suitable numerical tools can be proposed(Chambolle and Lions
1997).One criticism of the model (b) in Table 1 pointedout by Meyer
(2001) is that if f is a characteristicfunction and if f is
sufficiently small with respect toa suitable norm, then the model
(Rudin et al. 1992)gives u=0 and i =f contrary to what one
shouldexpect (u=f and i =0). In fact, the main reason ofthis
phenomenon is that the L2-norm for the icomponent is not the right
one since very oscillatingfunctions can have large L2-norm
(e.g.,fn(x) = cos(nx)). To better describe such
oscillatingfunctions, Meyer (2001) introduced the space offunctions
which can be expressed as a divergenceof L1-fields. This work was
developed in RNandthis framework was adapted to bounded 2Ddomains
by Aubert and Aujol (2005) (see (c) inTable 1). An example of image
decomposition isshown in Figure 3.In this section, we have shown
how the choice ofthe functional spaces is closely related to
thedefinition of a variational formulation. Thefunctionals are
written in a continuous setting andthey can usually be minimized by
solving thediscretized Euler equations iteratively, until
conver-gence. These PDEs and the differential operators
areconstrained by the energy definition but it is alsopossible to
work directly on the equations, forget-ting the formal link with
the energy. Such anapproach has also been much developed in
thecomputer vision community and it is illustrated inthe next
section.We refer the reader to Aubert and Kornprobst(2002) for a
general review of variationalapproaches and PDEs as applied to
image analysis.Scale Spaces and PDEsAnother approach to perform
nonlinear filteringis to define a family of image smoothing
operatorsTt, depending on a scale parameter t. Given animage f (x),
we can define the image u(t, x) =(Ttf )(x)which corresponds to the
image f analyzed at scale t.In this section, following
AlvarezGuichardLionsMorel (Alvarez et al. 1993), we show that u(t,
x)is the solution of a PDE provided some suitableassumptions on
Tt.Basic Principles of a Scale SpaceThis section describes some
natural assumptions tobe fulfilled by scale spaces. We first assume
that theoutput at scale t can be computed from the output ata scale
t h for very small h. This is natural, since acoarser scale view of
the original picture is likely tobe deduced from a finer one. Tt is
obtained bycomposition of transition filters, denoted by Tth, t.So
the first axiom is(A1) Tth=Tth, tTt T0=IdAnother assumption is that
operators act locally,that is, (Tth, tf )(x) depends essentially
upon thevalues of f (y) with y in a small neighborhood of x.Taking
into account the fact that as the scaleincreases, no new feature
should be created by thescale space, we have the local comparison
principle:if an image u is locally brighter than another imagev,
then this order must be conserved by the analysis.This is expressed
by:(A2) For all u and v such that u(y) v(y) in aneighborhood of x
and y 6 x, then for h smallenough, we haveTth.tux ! Tth.tvxThe
third assumption states that a very smoothimage must evolve in a
smooth way with the scaleOriginal u Figure 3 Example of image
decomposition (see Aubert andAujol (2005)).Image Processing:
Mathematics 5space. Denoting the scalar product of two vectors
ofRNby