Convergence Analysis of MAP Based Blur Kernel Estimation · 2017. 10. 20. · Convergence Analysis of MAP based Blur Kernel Estimation Sunghyun Cho DGIST [email protected] Seungyong

Convergence Analysis of MAP based Blur Kernel Estimation

Sunghyun Cho

DGIST

[email protected]

Seungyong Lee

POSTECH

[email protected]

Abstract

One popular approach for blind deconvolution is to for-

mulate a maximum a posteriori (MAP) problem with spar-

sity priors on the gradients of the latent image, and then

alternatingly estimate the blur kernel and the latent image.

While several successful MAP based methods have been

proposed, there has been much controversy and confusion

about their convergence, because sparsity priors have been

shown to prefer blurry images to sharp natural images. In

this paper, we revisit this problem and provide an analysis

on the convergence of MAP based approaches. We first in-

troduce a slight modification to a conventional joint energy

function for blind deconvolution. The reformulated energy

function yields the same alternating estimation process, but

more clearly reveals how blind deconvolution works. We

then show the energy function can actually favor the right

solution instead of the no-blur solution under certain con-

ditions, which explains the success of previous MAP based

approaches. The reformulated energy function and our con-

ditions for the convergence also provide a way to compare

the qualities of different blur kernels, and we demonstrate

its applicability to automatic blur kernel size selection, blur

kernel estimation using light streaks, and defocus estima-

tion.

1. Introduction

Image blur due to camera shakes is an annoying artifact

that severely degrades image quality. Image blur is often

modeled as:

b = k ∗ l + n, (1)

where b is an observed blurry image, k is a blur kernel, lis a latent sharp image, n is noise, and ∗ is the convolution

operator. Blind deconvolution is a problem to estimate land k from a given blurry image b, which is severely ill-

posed because the number of unknowns l and k exceeds the

number of observed data b.One popular approach to blind deconvolution is to for-

mulate the problem as a maximum a posteriori (MAP) prob-

lem with sparsity priors on the gradients of the latent image,

and then alternatingly estimate k and l [2, 18, 3, 1, 22, 23].

While several successful MAP based methods with sparsity

priors have been proposed, there has been much controversy

and confusion about its convergence. Fergus et al. [4], in

their seminal work, reported that they initially tried a MAP

based approach but failed, so adopted a variational Bayesian

(VB) approach. Levin et al. [11] claimed that MAP based

approaches with sparsity priors cannot converge to the right

solution because sparsity priors favor the no-blur solution,

i.e., k = δ, where δ is a dirac delta function, over the cor-

rect one. To resolve this convergence issue, Krishnan et

al. [8] introduced a normalized sparsity measure, which fa-

vors sharp edges over blurry ones. Xu et al. [23] claimed

that MAP based approaches with an unnaturally sparse im-

age representation can converge to the right solution, and

presented a blind deconvolution framework based on an L0

norm based image prior. However, it is not clear whether

their successful results are due to either the optimization

process, the energy function, or some other factors.

This paper provides an analysis on the convergence of

MAP based approaches. Our analysis explicitly shows that

the success of MAP based approaches is due to their en-

ergy function favoring the right solution over the no-blur

one, and even a naıve MAP based approach can converge to

the right solution under certain conditions. For the conver-

gence analysis, we take the most direct approach. We di-

rectly compare the energies of different solutions to find out

which solution is favored by the energy function. We also

experimentally analyze conditions for convergence with a

large collection of images, and show that the conditions are

generally consistent among different images. Our analysis

results support the success of MAP based methods based on

extremely sparse image representations, such as [3, 23].

To this end, we first introduce a simple modification to a

typical joint energy function of l and k and derive an energy

function of k. Typical joint energy functions used in previ-

ous works involve two variables k and l, and this makes it

difficult to analyze the energy functions because all possible

combinations of k and l should be considered. Our modifi-

cation alleviates this by removing one variable from the en-

14808

ergy function. In addition, the reformulated function more

clearly reveals how MAP based blind deconvolution works.

Despite the reformulated energy function having only one

variable, it is still not straightforward to compare the ener-

gies of different solutions. The reformulated function re-

quires to solve a complex nonlinear optimization problem

to compute an energy value, which makes it impossible to

compute the true energy, but only possible to compute an

approximate value larger than the true energy in general.

However, we show that it is possible to compute the true

energy of the no-blur solution with an energy function of a

particular form. Based on this, our experiments show that

the approximate energy of the right solution is still lower

than the true energy of the no-blur solution as long as cer-

tain conditions are satisfied.

The reformulated energy function and the convergence

conditions from our analysis also provide a simple and ef-

fective metric to compare the qualities of blur kernels. We

demonstrate that it can be used as a universal metric for

solving other problems in deblurring, such as automatic blur

size estimation, blur kernel estimation using light streaks,

and defocus estimation, which have previously been solved

using specifically designed metrics for the problems.

Similar attempts besides our work have been made to

unveil the secrets of the success of MAP based approaches.

Perrone and Favaro [16] claimed that the success of previ-

ous MAP based approaches is due to their delayed scaling

strategy in the iterative kernel estimation process. Krishnan

et al. [7] claimed that successful MAP based and variational

Bayesian approaches share common components, such as

sparsity promotion, L2 norm based priors on the blur kernel,

convex sub-problems, and multi-scale frameworks. How-

ever, none of these focused on the energy function, which is

the most important factor for blind deconvolution process.

The most relevant to ours is the work of Wipf and

Zhang [21]. They showed that a VB approach with nec-

essary approximations for making its optimization tractable

results in an unconventional MAP approach, where noise

level, the latent image, and the blur kernel are coupled to-

gether. They also discussed about the difference of VB and

MAP approaches and the convergence of MAP based ap-

proaches. While our work is also on the convergence of

MAP based approaches, our work has a few important dif-

ferences from [21]. First, we provide a thorough analy-

sis with a number of experimental validations while [21]

is completely based on mathematical assumptions and do

not provide any experimental results. Second, in our analy-

sis, we address MAP based blind deconvolution from a per-

spective of energy minimization, and find conditions for an

energy function to favor a sharp solution. Third, our analy-

sis is based on much simpler and more intuitive equations,

which provide a simple and practical guideline to design a

MAP based blind deconvolution, e.g., a proper and effective

range for the weights of prior terms. Fourth, our reformu-

lated energy function can be readily utilized for other types

of blur kernel estimation problems as we show in Sec. 5.

2. Related Work

We may categorize recent blind deconvolution methods

into mainly three categories. The first category is MAP

based approaches, which alternatingly estimate the latent

image and the blur kernel maximizing a joint posterior dis-

tribution. Chan and Wong [2] alternatingly estimated k and

l by minimizing a joint energy function based on total vari-

ation. Shan et al. [18] introduced a prior on image deriva-

tives based on piecewise continuous polynomials and pro-

posed an efficient optimization method. While these meth-

ods are able to estimate a small scale blur kernel, they often

converge to the no-blur solution as shown in [11]. Krish-

nan et al. [8] introduced a normalized sparsity measure that

can avoid the no-blur solution, but the measure is highly

non-linear, so the method requires a relatively long com-

putation time. More recently, Xu et al. [23] proposed an

approximated L0 norm based prior on image gradients, and

showed state-of-the-art results. Pan et al. [15] proposed a

novel prior to promote sparsity of the dark channel instead

of image gradients. However, despite a number of MAP

based approaches having been proposed, it is still unclear

how and when these methods converge to the right solution.

The second category is VB based methods, which re-

quire marginalization over all possible images. Fergus et

al. [4] reported that their initial attempt based on a MAP

based alternating estimation failed, as the estimation pro-

cess either converged to the no-blur solution or diverged,

and they presented a VB approach in order to overcome

such a convergence problem. Levin et al. [11] claimed that

MAP based approaches with sparsity priors are destined to

suffer from the convergence problem because sparsity pri-

ors favor blurry images over natural sharp ones, and pro-

posed to use a VB approach. Later, they also introduced

an efficient approximation to marginalizing over latent im-

ages [12]. Wipf and Zhang [21] showed that a VB approach

can be recast as an unconventional MAP problem with a

particular form of prior that conjoins the latent image, blur

kernel, and noise level. They also provided theoretical anal-

ysis about the convergence of MAP based approaches as

mentioned earlier. While VB approaches have proven to

be able to estimate accurate blur kernels, they often require

complex mathematical derivations, and relatively long com-

putation time even for small images.

The third category uses explicit edge detection such as

[3, 22, 19]. They used explicit edge detection in a multi-

scale iterative framework to effectively estimate a large blur

kernel. Thanks to their explicit edge detection, these meth-

ods can avoid the no-blur solution, and achieve state-of-the-

art results in a relatively short computation time. While

4809

these methods involve edge detection, they usually predict

sparse and sharp gradient maps of the latent image in their

alternating estimation processes, and can still be considered

as variants of MAP based approaches.

3. MAP based Blind Deconvolution

Many previous blind deconvolution methods try to esti-

mate a latent image l and a blur kernel k by optimizing the

following joint energy function of l and k:

f(k, l) = ‖k ∗ l − b‖2 + λlρl(l) + λkρk(k) (2)

or its variant. The first term on the right hand side is a data

term, and the second and third terms are prior or regulariza-

tion terms on l and k, respectively. λl and λk are the relative

strengths for ρl and ρk, respectively. For ρl, sparsity priors

have been widely used, such as total variation [2], natural

image statistics [18], and L0-norm based priors [23]. Eqn.

(2) can be optimized by alternatingly optimizing two sub-

problems:

fl(l; k) = ‖k ∗ l − b‖2 + λlρl(l), and (3)

fk(k; l) = ‖k ∗ l − b‖2 + λkρk(k). (4)

In this paper, for ease of analysis, we consider a variant

of Eqn. (2), which is based on image gradients. We define

l = {lx, ly}, where lx and ly correspond to horizontal and

vertical gradient maps of the latent image, respectively. We

further assume that lx and ly are independent of each other

as done in [3, 4, 23]. b = {bx, by} is defined in the same

manner. We then define each term in Eqn. (2) as:

‖k ∗ l − b‖2 = ‖k ∗ lx − bx‖2 + ‖k ∗ ly − by‖

2, (5)

ρl(l) =∑

i

{φ(lx,i) + φ(ly,i)} , and (6)

ρk(k) = ‖k‖2 (7)

where i is the pixel index. We define φ(x) as:

φ(x) =

{

|x|α, if |x| ≥ τ

τα−2|x|2, otherwise(8)

so that we can analyze the effects of different sparseness of

ρl(l) on the convergence of blind deconvolution by chang-

ing α. We use τ = 0.01 in all our experiments. While it

is more effective to use image intensities and gradients to-

gether for blind deconvolution [23], a gradient based energy

function makes it possible to compute the exact global op-

timum of Eqn. (3) for k = δ, as we will show later, and

consequently makes our analysis easier.

It is known that a naıve implementation of Eqn. (2) of-

ten fails to converge to the right solution, but converges

to the no-blur solution. Levin et al. [11] claimed that this

(a) Sharp image anda blur kernel

(b) Blurred image anda delta kernel

(c) Sparsity prior valuesof (a) and (b)

0

50000

100000

150000

200000

250000

300000

0.1 0.4 0.7 1 1.3 1.6 1.9

(a)(b)

Figure 1. The x and y axes of (c) represent different α and spar-

sity prior values, respectively. While both (a) and (b) produce the

exactly same blurred image, the sharp image has higher sparsity

prior values for all α.

is because of the natures of image blur and sparsity pri-

ors. They showed that image blur has two opposite ef-

fects. First, it makes edges blurry, making image gradients

less sparse. Second, it reduces variance of image gradients,

making them sparser. Previous methods using sparsity pri-

ors are based on the first effect, assuming that sharp latent

images are mostly piecewise constant with a few step edges.

However, natural sharp images usually have large variance

of image gradients even in smooth regions, so the second

effect is much stronger than the first one. Therefore, even

though sparsity priors prefer sharp edges to blurry ones in

the ideal case, they still prefer a blurry image to a sharp one.

Fig. 1 describes the aforementioned second effect of im-

age blur. Fig. 1a is a pair of a sharp image and a blur kernel,

which represents a sharp solution, and Fig. 1b is a pair of

a blurred image and the delta blur kernel, which represents

the no-blur solution. The sharp solution and the no-blur so-

lution produce the exactly same blurred image. We then

compute their sparsity prior values ρl(l) for different α. As

described earlier, the sharp solution has higher values for

ρl(l) compared to the no-blur solution (Fig. 1c), explain-

ing the failure of naıve implementations of MAP based ap-

proaches. While this argument seems valid, several works

based on MAP based approaches such as [3, 23] still report

good results, which contradict the argument.

4. Convergence Analysis

4.1. Reformulated Energy Function

In our analysis, to find out which solution the energy

function really favors, we take the most direct approach.

We compare the energy values of different solutions. How-

ever, Eqn. (2) is not easy to analyze as all possible combi-

nations of l and k need to be considered. To alleviate this,

we first introduce a reformulated energy function derived by

embedding Eqn. (3) into Eqn. (2):

f(k) = minl

f(k, l) = f(k, lk)

= ‖k ∗ lk − b‖2 + λlρl(lk) + λkρk(k) (9)

where

lk = argminl

fl(l; k). (10)

4810

Eqn. (9) is no longer a function of k and l, but a functionof k. To compute f(k) for a given k, we first compute lkin Eqn. (10), and then Eqn. (9). It should also be noted that

optimizing Eqn. (9) is equivalent to optimizing Eqn. (2) as

we will show in Sec. 4.3. Consequently, analyzing Eqn. (9)

is equivalent to analyzing Eqn. (2).

Although Eqn. (9) is now a function of only one vari-

able, it is not feasible to compute the exact energy value

of a given k due to the non-convexity of Eqn. (10). There-

fore, in our analysis, we instead compute an approximate

energy value. Specifically, for a given k, we first solve Eqn.

(10) using the iteratively reweighted least squares (IRLS)

method [9], and obtain an approximate latent image lIRLSk .

Then, we compute an approximate energy f IRLS(k) by com-

puting Eqn. (9) with lIRLSk .

Exact Energy of No-Blur Solution. Unfortunately, it

is less trustworthy to compare f IRLS(k) of different k as

f IRLS(k) is only an approximate value, which is always

larger than the true energy f opt(k) for a given k.1 Thus, for

more accurate analysis, we also compute the exact energy

value of the no-blur solution. Although it is usually impos-

sible to compute the exact energy value of a given k because

of the non-convexity of Eqn. (10) as mentioned earlier, as

we define our energy function completely based on image

gradients, Eqn. (10) is pixel-wise independent for k = δ.

Therefore, we can find lopt

δ by solving:

argminl∗,i|∗∈{x,y}

|l∗,i − b∗,i|2+ λlφ(l∗,i) (11)

for each pixel of lopt

δ,x and lopt

δ,y independently. Eqn. (11) can

easily be solved using exhaustive search.

Analysis While Eqn. (9) is simply a different form of Eqn.

(2), Eqn. (9) more clearly reveals that lk is not an arbitrary

natural image, but a sparse estimate of the latent image lthat is coupled with k, if λlρl(l) is strong enough. In that

case, unlike natural sharp images, l would have no large

variations in smooth regions, but have only flat regions and

a few edges. Then, the sparsity prior term ρl(l) is not af-

fected by the second effect of image blur, but mostly domi-

nated by the first effect. Consequently, Eqn. (9) can actually

favor a sharp solution over the no-blur one.

To verify this, we compare the energy values of the sharp

and no-blur solutions in Fig. 1 using the reformulated en-

ergy function. We denote the blur kernels of the sharp solu-

tion and the no-blur solution by kgt and kδ , respectively. For

kgt, we first compute the sparse estimate lIRLSgt of the latent

image by solving Eqn. (10), and then compute f IRLS(kgt)using Eqn. (9). For kδ , we compute both approximate and

1Formally, for a given k, there exists lopt = argminl fl(l; k) =

argminl f(k, l). By definition, f(k, lopt) ≤ f(k, l) for all l. Conse-

quently, f IRLS(k) = f(k, lIRLS) ≥ f(k, lopt) = fopt(k).

(a) (b) (c)

total energy data sparsity

f IRLS(kgt) 51.89 31.39 40996.5

f IRLS(kδ) 97.18 55.77 82815.7

f opt(kδ) 74.70 18.67 112053.3

Figure 2. Top row: sparse estimates of the latent image for the

ground truth kernel and the delta kernel. As our energy function

is defined using image gradients, latent image estimates are gradi-

ent maps. We visualize them using Poisson image reconstruction,

which restores intensities from image gradients, as done in [4].

Bottom row: energy values, data terms, and sparsity priors of

the ground truth blur kernel kgt and the delta kernel kδ . We set

α = 0.1 and λl = 0.0005.

exact latent images (lIRLSδ , lopt

δ ) and their corresponding en-

ergy values (f IRLS(kδ), fopt(kδ)).

Fig. 2 shows the computed latent images and the en-

ergy values of kgt and kδ . As discussed above, the sparse

estimates lIRLSgt , lIRLS

δ and lopt

δ have only smooth regions

and a few edges together with almost no variation in

smooth regions. f IRLS(kgt) and ρl(lIRLSgt ) are also smaller

than f IRLS(kδ) and ρl(lIRLSδ ), respectively. More impor-

tantly, f IRLS(kgt) and ρl(lIRLSgt ) are smaller than f opt(kδ)

and ρl(lopt

δ ), respectively, even though lIRLSgt is an approxi-

mate estimate. This result means that the global optimum

of Eqn. (9), which is equivalent to the global optimum of

Eqn. (2), favors the sharp solution over the no-blur solution.

4.2. Conditions for Avoiding NoBlur Solution

In this subsection, we analyze when MAP based ap-

proaches converge to the right solution. To this end, we

consider the following two conditions.

f(kgt)/f(kδ) < 1, and (12)

ρl(lgt)/ρl(lδ) < 1. (13)

While the first condition is sufficient for avoiding the no-blur solution, we also consider the second one because the

prior ρl is the key to distinguish between sharp and blurry

latent images. To satisfy the second condition, the latent im-

age estimates lgt and lδ should be sparse enough as shown

in Sec. 4.1. This means that λl should be appropriately large

and α should be small. If λl is too small, then lgt will be

similar to a natural sharp image, which is not sparse but has

large variation in smooth regions, and the second effect of

blur discussed in Sec. 4.1 will kick in. On the other hand,

too large λl will make lgt and lδ entirely flat images with no

edges at all, so they will be indistinguishable. Larger α will

4811

0.2 0.4 0.6 0.8 1

2e−05

4e−05

8e−05

0.00016

0.00032

0.00064

0.00128

0.00256

0.00512

0.01024

0

0.5

1

1.5

2

0.2 0.4 0.6 0.8 1

2e−05

4e−05

8e−05

0.00016

0.00032

0.00064

0.00128

0.00256

0.00512

0.01024

0

0.5

1

1.5

2

(a) f IRLS(kgt)/fopt(kδ) (b) ρl(l

IRLSgt )/ρl(l

opt

δ )Figure 3. The x- and y-axes of each plot represent α and λl, respec-

tively. Values larger than 2 are clipped to 2 for better visualization.

also produce blurrier edges on both lgt and lδ , making them

less distinguishable.

Fig. 3 shows f IRLS(kgt)/fopt(kδ) and ρl(l

IRLSgt )/ρl(l

opt

δ )for different λl and α. Note that the ratios

f IRLS(kgt)/fopt(kδ) and ρl(l

IRLSgt )/ρl(l

opt

δ ) present tighter

bounds for α and λl than the true bounds because lIRLSgt is a

local optimum. Despite these tighter bounds, Fig. 3 shows

that the ground truth blur kernel kgt is favored over kδ by

the energy function f and the prior ρl when α is small and

λl is large enough. When λl is too large, both lgt and lδbecome completely zero, so no longer distinguishable.

To investigate the bounds for convergence more rigor-

ously, we compute the ratios f IRLS(kgt)/fopt(kδ) on two

publicly avaiable datasets: Levin et al.’s [11] and Sun et

al.’s [19] (Fig. 4). Levin et al.’s dataset consists of 32 real

blurred images generated from four images and eight blur

kernels. On the other hand, Sun et al.’s consists of 640 syn-

thetically blurred images generated from 80 sharp images

ranging from natural scenes to man-made environments,

and eight blur kernels. In this experiment, we compute

f IRLS(kgt)/fopt(kδ) for fixed α = 0.1 and different λl. Fig.

4 shows that the energy function favors the ground truth ker-

nel over the no-blur solution for most images once α and λl

are properly set.2 We can also observed that, while different

blur kernels and images show different energy value ratios,

they still show similar trends. This indicates that a carefully

chosen λl can cover most of the images and the blur kernels.

It is also worth noting that some images have the ratio

f IRLS(kgt)/fopt(kδ) above 1 for almost the entire range of

λl, which indicates that the energy function is not able to

distinguish the right solution and the no-blur one. Such im-

ages have a relatively small number of edges, and previ-

ous methods often fail on such images. Our results suggest

that such failures cannot be avoided using different param-

eters, but instead a more improved algorithm is needed. In

the remainder of this paper, we consistently use α = 0.1and λl = 0.00064, which are shown to be the most ef-

fective to distinguish sharp and the no-blur solutions in

these experiments, i.e., the largest number of images have

f IRLS(kgt)/fopt(kδ) < 1 under these parameters (Fig. 5).

2Refer to the supplementary material for the rest of the results.

0

1

2

0.00

001

0.00

002

0.00

004

0.00

008

0.00

016

0.00

032

0.00

064

0.00

128

0.00

256

0.00

512

0.01

024

0.02

048

0.04

096

Im1 K e1 Im1 K e2Im1 K e3 Im1 K e4Im1 K e5 Im1 K e6Im1 K e7 Im1 K e8Im2 K e1 Im2 K e2Im2 K e3 Im2 K e4Im2 K e5 Im2 K e6Im2 K e7 Im2 K e8Im3 K e1 Im3 K e2Im3 K e3 Im3 K e4Im3 K e5 Im3 K e6Im3 K e7 Im3 K e8Im4 K e1 Im4 K e2Im4 K e3 Im4 K e4Im4 K e5 Im4 K e6Im4 K e7 Im4 K e8

0

1

2

0.0

0001

0.0

0002

0.0

0004

0.0

0008

0.0

0016

0.0

0032

0.0

0064

0.0

0128

0.0

0256

0.0

0512

0.0

1024

0.0

2048

0.0

4096

Kernel 1

0

1

2

0.0

0001

0.0

0002

0.0

0004

0.0

0008

0.0

0016

0.0

0032

0.0

0064

0.0

0128

0.0

0256

0.0

0512

0.0

1024

0.0

2048

0.0

4096

Kernel 3

Figure 4. f IRLS(kgt)/fopt(kδ) with respect to different λl’s. (Top:

Levin et al.’s dataset [11]. Bottom: Sun et al.’s dataset [19])

f IRLS(kgt)/fopt(kδ) smaller than 1 means that the ground truth

blur kernel is preferred to the delta kernel by the energy function.

0

20

40

60

80

100

Figure 5. Percentages of images in Sun et al.’s dataset [19] satis-

fying f IRLS(kgt)/fopt(kδ) < 1 with different λl. λl = 0.00064 is

the most effective to distinguish sharp and the no-blur solutions.

4.3. Global Optimum and Convergence Analysis

In Sec. 4.2, we experimentally showed that a MAP based

energy function can favor a sharp solution over the no-blur

solution by comparing their energy values. In this section,

we investigate two questions: 1) does the true blur kernel

actually correspond to the global optimum of Eqn. (9), and

2) how well does naıve MAP based blind deconvolution per-

form compared to previous sophisticated methods?

Regarding the first question, when λl is set strong

enough, a latent image obtained by the true blur kernel

should have sharp edges and flat regions, minimizing ρl(l)in Eqn. (9). On the other hand, a different blur kernel usu-

ally causes blurry edges or ringing artifacts in its latent im-

age, increasing ρl, and eventually its energy value. It is

hard to analytically prove this property because evaluation

of Eqn. (9) involves a complex non-linear optimization in

Eqn. (10). Instead, we provide a simple experiment with 1D

blur kernels, and also experimentally show that minimizing

Eqn. (9) converges to the right solution.

4812

6080

100120140160

1 3 5 7 9 11 13 15

(a) (b) (c) (d)Figure 6. (a), (b) and (c) show sparse latent image estimates l ob-

tained using blur kernels of lengths 1, 7, and 15, respectively. The

original blurry image is blurred by the blur kernel of length 7. (d)

Solid red line: energy values f IRLS(k) of blur kernels of different

lengths, and dashed blue line: f opt(kδ).

Regarding the second question, previous successful

methods adopt either explicit edge detection [3, 22, 19],

edge reweighting [18], changing parameters of the energy

function [18, 21], or variational Bayesian estimation [4, 11,

12, 21]. While such techniques may help improve their per-

formances, we show that even a naıve MAP approach can

perform comparably despite lack of such components.

Fig. 6 shows a simple experiment to see whether the true

blur kernel corresponds to the global optimum. We first

blur a sharp natural image using a 1D blur kernel of length

7. Then, we compute the energy values of blur kernels of

different lengths. Fig. 6(d) shows the energy values of dif-

ferent blur kernels. The plot shows that the ground truth

blur kernel is preferred by the energy function.

Finally, we implement naıve MAP based blind deconvo-

lution, which optimizes Eqn. (9). Note that optimizing Eqn.

(9) is equivalent to optimizing Eqn. (2) as:

mink,l

f(k, l) = mink

{minl

f(k, l)} = mink

f(k). (14)

Moreover, Eqn. (9) yields the exactly same alternating op-

timization process described by Eqns. (3) and (4). Given

an estimate of k, we compute lk by optimizing Eqn. (3),

and then update k by optimizing Eqn. (9), which is equiv-

alent to optimizing Eqn. (4). We implemented single- and

multi-scale versions, and set λk = 0.001. Fig. 7 shows that

the single-scale version can converge to a solution close to

the true kernel whose energy is lower than that of the no-

blur solution. We conducted performance comparison of

the multi-scale version using Levin et al.’s dataset [11] (Fig.

8). Although our result is poorer than [19], which is based

on patch-based priors, it is still comparable to the others.

This shows that even a naıve MAP approach can perform

comparably to the other sophisticated methods. Further-

more, while converging to the true kernel does not neces-

sarily mean that the true kernel is the global optimum, it

indicates that the true kernel is preferred to other kernels

estimated through the optimization process.

5. Energy Function as a Kernel Quality Metric

Besides estimation of blur caused by camera shakes,

there are many problems related to image blur, such as de-

(a) Blurred image &its ground truth blur kernel

(b) Energy values along iterations

1st 2nd 3rd 4th 20th

…

(c) Blur kernels at different iterations

579

111315171921

1 3 5 7 9 11 13 15 17 19

energy at each iteration

optimal energy for no-blur

Figure 7. Minimizing Eqn. (9) converges to a sharp solution, which

is close to the ground truth blur kernel.

0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5

Levin et alFergus et alCho & LeeSun et al (Nat)OursSu

cces

s rat

e

Error ratiosFigure 8. Performance comparison with Levin et al. [11], Fergus et

al. [4], Cho & Lee [3], and Sun et al. [19] using the cumulative er-

ror ratio histogram proposed by [11] and Levin et al.’s dataset [11].

Success rates of other methods are from [19].

focus estimation [25], lens blur estimation [17], blur kernel

size detection [13], fusion of deblurring results obtained by

different blur kernels [14], etc. In those problems, it is es-

sential to have a metric for evaluating the quality of a blur

kernel. Unfortunately, because there has been no univer-

sal metric proven to work, solutions for different problems

defined their own metrics.

The energy function in Eqn. (9) is a function of a blur

kernel, which properly gives a lower energy to a better blur

kernel when λl is properly set. Therefore it provides a sim-

ple and effective metric to compare blur kernels, which can

be applied universally to different problems. While the idea

of using an energy function as a metric may sound straight-

forward and obvious, this simple idea was not possible be-

cause of mainly two reasons. First, the original joint energy

function in Eqn. (2) involves two variables l and k, so it was

rather unclear how to utilize the energy function to other

problems. Second, it was unclear whether and when the

energy function in Eqn. (2) favors the sharp solution over

the no-blur one. Our modification to the energy function

and analysis in Sec. 4 resolve these two issues and make

the above idea possible. In this section, we present three

examples as possible applications of the energy function.

4813

5.1. Automatic Blur Kernel Size Selection

Most blind deconvolution methods require the size of a

blur kernel as input. An input kernel size smaller than the

actual blur size results in erroneous kernel estimation. On

the other hand, a too large kernel size increases the degree of

freedom of kernel estimation, which may lead to an unstable

and erroneous result. However, it is not an easy task for a

user to select a proper size. There have been a few attempts

to automatically find a proper kernel size [14, 13]. Liu et

al. [14] deblurred an image with a set of different blur ker-

nels of different sizes and found a proper kernel size using

their deblurring quality metric trained from crowd-sourced

user study data. Recently, Liu et al. [13] proposed a kernel

size estimation method, which estimates a kernel size from

the autocorrelation of the edge map of a blurred image.

The energy function in Eqn. (9) provides a simpler way

to find out a proper kernel size. Similarly to [14], we first

estimate blur kernels of different sizes. Then, we compute

their energies and choose the kernel with the smallest en-

ergy. Fig. 9 shows an example.

Recall that this simple approach for comparing different

kernels has been made possible due to our analysis. Our re-

formulated energy function states that a properly estimated

latent image must be used for computing the energy instead

of any arbitrary latent image, e.g., a naturally-looking latent

image obtained from a previous method. We also showed

that parameters must be properly set in order to make the

energy function favor the right solution. For example, an

inappropriate λl = 0.00001 produces energy values 13.0,

14.4, and 16.5 for the kernels in Fig. 9(b), (c), and (d),

respectively, and causes the energy function to prefer the

smallest kernel, which is close to the no-blur solution.

5.2. Blur Kernel Estimation from Light Streaks

Images blurred by camera shakes often have light

streaks, which are caused by blurred light bulbs, flash lights,

reflected light, etc (blurred images in Fig. 10). Such light

streaks provide useful information about the shape of the

blur kernel, and a couple of methods have been proposed to

use light streaks for blur kernel estimation. Hua and Low [6]

proposed an interactive method, where the user manually

draws a small bounding box for a light streak, and then, the

system extracts a blur kernel using heuristic image process-

ing operations. Zhe et al. [5] presented a more sophisticated

method. Their method automatically detects light streaks

from a blurred image, then uses the detected light streaks

to estimate a blur kernel. In order to detect light streaks,

their method first uses a set of heuristic rules for detecting

light streak patches. Then the best light streak patch is se-

lected based on the power-law of natural images, and used

for detecting additional light streak patches.

Instead of the power law, which is known to be sensitive

to strong edges [24, 13], Eqn. (9) provides a more direct

(a) Blurred image (b) 15x15 kernelEnergy: 124.2

(c) 75x75 kernelEnergy: 121.0

(d) 115x115 kernelEnergy: 130.2

(e) Blurred image (f) 35x35 kernelEnergy: 19.4

(g) 55x55 kernelEnergy: 20.2

(h) 75x75 kernelEnergy: 18.9

Figure 9. (a) & (e) Blurred images. (b)-(d) & (f)-(h) Estimated

blur kernels of different sizes and their corresponding latent im-

ages. All the deblurring results are obtained using [3], and their

energy values are computed using Eqn. (9). The sizes of the blur

kernels in (b), (f), and (g) are too small, so incorrect kernels are

estimated. On the other hand, a too large kernel size in (d) also

results in incorrect kernel estimation. The energy function in Eqn.

(9) can properly distinguish the correct solutions (c) and (h) from

the others.

43.8

66.7

113.2

181.5

219.2

258.1

123.1

165.7Figure 10. For each image, left: Blurred images with light streaks,

and a magnified patch of a light streak, which reflects the shapes

of blur kernels. Right: The best and worst light streak patches

selected by Eqn. (9) and their corresponding energy values.

measure to select the best light streak patch. Similarly to

[5], we first find a set of candidate light streak patches using

heuristic rules. In our experiment, we use the code of the

authors of [5] to find an initial candidate set. Then, instead

of the power law based metric, we compute their energy

values using Eqn. (9), and choose the one with the lowest

energy. Fig. 10 shows an example. For each blurred image

in Fig. 10, we show the best and worst patches according

to their energy values. While the best patches selected by

Eqn. (9) include proper light streak patches reflecting blur

kernels, the worst patches are far from the true kernels.

4814

5.3. Defocus Estimation

Defocus blur is caused by shallow depth-of-field of an

imaging system, and it is often spatially varying. As the

amount of defocus blur is related to the distance from the

camera to the target object, defocus information can be

useful for depth estimation, salient region estimation, fore-

ground/background segmentation, digital refocusing, etc.

However, estimating a defocus map from a single image

is a challenging task, as the amount of defocus blur can be

different at each pixel. To overcome such difficulty, pre-

vious methods proposed several different features to detect

the amount of blur. Tai and Brown [20] proposed a mea-

sure based on a local contrast prior, which utilizes the rela-

tionship between local image contrast and image gradients.

Zhuo and Sim [25] re-blur the input defocused image with

a Gaussian blur kernel, and use the ratio between the gra-

dients of the input and the re-blurred images to estimate a

defocus map.

Eqn. (9) can also be used for estimating the amount of

defocus blur. We first assume that the shape of defocus blur

is already known, but its size is unknown and spatially vary-

ing, e.g., spatially-variant disk filters. As Eqn. (9) is based

on a sparsity prior, we can compare different blur kernels

more robustly on strong edges. Thus, we first detect edges

using Canny edge detector, and compare energy values on

the detected edge pixels. The energy value of a blur kernel

on an edge pixel is defined as the energy value on a local

image region centered at the edge pixel. As a result, we

obtain a sparse defocus map, where defocus blur sizes are

estimated only on edge pixels. We then spatially propagate

this defocus information to other pixels using the matting

Laplacian algorithm [10], as done in [25].

Fig. 11 shows a defocus estimation example. Fig. 11(b)

is a sparse defocus map estimated from Fig. 11(a). Brighter

pixel means larger defocus blur. Fig. 11(c) shows a full

defocus map obtained from Fig. 11(b) using the matting

Laplacian algorithm. As the upper part of the image is

more distant and more defocused, the estimated defocus

map shows brighter pixels in that part. Fig. 11(d) is an all-

focused result obtained using the defocus map in Fig. 11(c).

Fig. 12 shows additional examples. While Eqn. (9) is a uni-

versal metric, which is not specially designed for defocus

estimation, it produces comparable defocus maps to Zhuo

and Sim’s method [25].

6. Conclusions

In this paper, we analyzed the convergence of MAP

based blind deconvolution, and showed that the energy

function is the key to the success of previous MAP based

approaches. To this end, we introduced a reformulated en-

ergy function. Then, we analyzed conditions for avoiding

no-blur solution, and showed that the energy function can

(a) Input image (b) Sparse defocus map

(c) Dense defocus map (d) A ll-focus image(e) Magnified

views of (a) & (d)

Figure 11. Real defocus example.

(a) Input images (b) Our defocus maps (c) Zhuo and SimFigure 12. Additional real defocus examples. While Eqn. (9) is a

universal metric, which is not designed for defocus estimation, it

produces comparable results to Zhuo and Sim [25].

converge to the right solution. We also demonstrated that

the reformulated energy function can be used as a simple

and effective metric to compare different blur kernels with

three examples. In our experiments, we used IRLS for solv-

ing Eqn. (10), which requires some amount of computation.

One interesting future work would be to develop an efficient

latent image estimation method for solving Eqn. (10) while

guaranteeing Eqns. (12) and (13).

Acknowledgements This work was supported by the DGIST

Start-up Fund Program of the Ministry of Science, ICT and

Future Planning(2017040005). It was also supported by the

Ministry of Science and ICT, Korea, through IITP grant (R0126-

17-1078) and NRF grant (NRF-2014R1A2A1A11052779).

4815

References

[1] J. Cai, H. Ji, C. Liu, and Z. Shen. Blind motion deblurring

from a single image using sparse approximation. In CVPR,

pages 104–111, 2009. 1

[2] T. F. Chan and C.-K. Wong. Total variation blind deconvolu-

tion. TIP, 7(3):370–375, 1998. 1, 2, 3

[3] S. Cho and S. Lee. Fast motion deblurring. ACM Trans.

Graph., 28(5):145:1–145:8, Dec. 2009. 1, 2, 3, 6, 7

[4] R. Fergus, B. Singh, A. Hertzmann, S. T. Roweis, and W. T.

Freeman. Removing camera shake from a single photograph.

ACM Trans. Graph., 25(3):787–794, July 2006. 1, 2, 3, 4, 6

[5] Z. Hu, S. Cho, J. Wang, and M.-H. Yang. Deblurring low-

light images with light streaks. In CVPR, pages 3382–3389,

2014. 7

[6] B.-S. Hua and K.-L. Low. Interactive motion deblurring us-

ing light streaks. In ICIP, 2011. 7

[7] D. Krishnan, J. Bruna, and R. Fergus. Blind Deconvolution

with Non-local Sparsity Reweighting. ArXiv e-prints, Nov.

2013. 2

[8] D. Krishnan, T. Tay, and R. Fergus. Blind deconvolution

using a normalized sparsity measure. In CVPR, pages 233–

240, 2011. 1, 2

[9] A. Levin, R. Fergus, F. Durand, and W. T. Freeman. Image

and depth from a conventional camera with a coded aperture.

ACM Trans. Graph., 26(3), July 2007. 4

[10] A. Levin, D. Lischinski, and Y. Weiss. A closed-form solu-

tion to natural image matting. TPAMI, 30(2):228–242, 2008.

8

[11] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman. Under-

standing and evaluating blind deconvolution algorithms. In

CVPR, pages 1964–1971, 2009. 1, 2, 3, 5, 6

[12] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman. Efficient

marginal likelihood optimization in blind deconvolution. In

CVPR, pages 2657–2664, 2011. 2, 6

[13] S. Liu, H. Wang, J. Wang, S. Cho, and C. Pan. Automatic

blur-kernel-size estimation for motion deblurring. The Visual

Computer, 31(5):733–746, 2015. 6, 7

[14] Y. Liu, J. Wang, S. Cho, A. Finkelstein, and S. Rusinkiewicz.

A no-reference metric for evaluating the quality of motion

deblurring. ACM Trans. on Graphics, 32(6):Article No. 175,

2013. 6, 7

[15] J. Pan, D. Sun, H. Pfister, and M.-H. Yang. Blind image

deblurring using dark channel prior. In CVPR, pages 1628–

1636, 2016. 2

[16] D. Perrone and P. Favaro. Total variation blind deconvolu-

tion: The devil is in the details. In CVPR, pages 2909–2916,

2014. 2

[17] C. Schuler, M. Hirsch, S. Harmelling, and B. Scholkopf.

Blind correction of optical aberrations. In ECCV, pages 187–

200, 2012. 6

[18] Q. Shan, J. Jia, and A. Agarwala. High-quality motion

deblurring from a single image. ACM Trans. Graph.,

27(3):73:1–73:10, Aug. 2008. 1, 2, 3, 6

[19] L. Sun, S. Cho, J. Wang, and J. Hays. Edge-based blur kernel

estimation using patch priors. In ICCP, 2013. 2, 5, 6

[20] Y.-W. Tai and M. S. Brown. Single image defocus map esti-

mation using local contrast prior. In ICIP, 2009. 8

[21] D. Wipf and H. Zhang. Revisiting bayesian blind deconvo-

lution. Journal of Machine Learning Research, 15(1):3595–

3634, 2014. 2, 6

[22] L. Xu and J. Jia. Two-phase kernel estimation for robust

motion deblurring. In ECCV, 2010. 1, 2, 6

[23] L. Xu, S. Zheng, and J. Jia. Unnatural L0 sparse represen-

tation for natural image deblurring. In CVPR, 2013. 1, 2,

3

[24] T. Yue, S. Cho, J. Wang, and Q. Dai. Hybrid image de-

blurring by fusing edge and power spectrum information. In

ECCV, pages 79–93, 2014. 7

[25] S. Zhuo and T. Sim. Defocus map estimation from a single

image. Pattern Recognition, 44(9):1852–1858, 2011. 6, 8

4816

Convergence Analysis of MAP Based Blur Kernel Estimation · 2017. 10. 20. · Convergence Analysis of MAP based Blur Kernel Estimation Sunghyun Cho DGIST [email protected] Seungyong

Documents