IMAGE FUSION DOC

INTRODUCTION

Image fusion is the process by which two or more images are combined into

a single image retaining the important features from each of the original images.

The fusion of images is often required for images acquired from different

instrument modalities or capture techniques of the same scene or objects.

Important applications of the fusion of images include medical imaging,

microscopic imaging, remote sensing, computer vision, and robotics. Fusion

techniques include the simplest method of pixel averaging to more complicated

methods such as principal component analysis and wavelet transform fusion.

Several approaches to image fusion can be distinguished, depending on whether

the images are fused in the spatial domain or they are transformed into another

domain, and their transforms fused.

With the development of new imaging sensors arises the need of a

meaningful combination of all employed imaging sources. The actual fusion

process can take place at different levels of information representation, a generic

categorization is to consider the different levels as, sorted in ascending order of

abstraction: signal, pixel, feature and symbolic level. This focuses on the so-called

pixel level fusion process, where a composite image has to be built of several input

images. To date, the result of pixel level image fusion is considered primarily to be

presented to the human observer, especially in image sequence fusion (where the

input data consists of image sequences). A possible application is the fusion of

forward looking infrared (FLIR) and low light visible images (LLTV) obtained by an

airborne sensor platform to aid a pilot navigate in poor weather conditions or

darkness. In pixel-level image fusion, some generic requirements can be imposed

on the fusion result. The fusion process should preserve all relevant information of

the input imagery in the composite image (pattern conservation) The fusion

scheme should not introduce any artifacts or inconsistencies which would distract

the human observer or following processing stages .The fusion process should be

shift and rotational invariant, i.e. the fusion result should not depend on the

location or orientation of an object the input imagery .In case of image sequence

fusion arises the additional problem of temporal stability and consistency of the

fused image sequence. The human visual system is primarily sensitive to moving

1

http://www.metapix.de/examples.htm

light stimuli, so moving artifacts or time depended contrast changes introduced by

the fusion process are highly distracting to the human observer. So, in case of

image sequence fusion the two additional requirements apply. Temporal stability:

The fused image sequence should be temporal stable, i.e. gray level changes in

the fused sequence must only be caused by gray level changes in the input

sequences, they must not be introduced by the fusion scheme itself; Temporal

consistency: Gray level changes occurring in the input sequences must be present

in the fused sequence without any delay or contrast change.

1.1 FUSION METHODS

1.1.1 Introduction

The following summarize several approaches to the pixel level fusion of

spatially registered input images. Most of these methods have been developed for

the fusion of stationary input images (such as multispectral satellite imagery). Due

to the static nature of the input data, temporal aspects arising in the fusion process

of image sequences, e.g. stability and consistency, are not addressed.

A generic categorization of image fusion methods is the following:

linear superposition

nonlinear methods

optimization approaches

artificial neural networks

image pyramids

wavelet transform

generic multiresolution fusion scheme

2

1.1.2 Linear Superposition

The probably most straightforward way to build a fused image of several

input frames is performing the fusion as a weighted superposition of all input

frames.

The optimal weighting coefficients, with respect to information content and

redundancy removal, can be determined by a principal component analysis (PCA)

of all input intensities. By performing a PCA of the covariance matrix of input

intensities, the weightings for each input frame are obtained from the eigenvector

corresponding to the largest eigenvalue.

A similar procedure is the linear combination of all inputs in a pre-chosen

colorspace (eg. R-G-B or H-S-V), leading to a false color representation of the

fused image.

1.1.3 Nonlinear Methods

Another simple approach to image fusion is to build the fused image by the

application of a simple nonlinear operator such as max or min. If in all input images

the bright objects are of interest, a good choice is to compute the fused image by

an pixel-by-pixel application of the maximum operator.

An extension to this approach follows by the introduction of morphological

operators such as opening or closing. One application is the use of conditional

morphological operators by the definition of highly reliable 'core' features present

in both images and a set of 'potential' features present only in one source, where

the actual fusion process is performed by the application of conditional erosion and

dilation operators.

A further extension to this approach is image algebra, which is a high-level

algebraic extension of image morphology, designed to describe all image

processing operations. The basic types defined in image algebra are value sets,

coordinate sets which allow the integration of different resolutions and

tessellations, images and templates. For each basic type binary and unary

operations are defined which reach from the basic set operations to more complex

ones for the operations on images and templates. Image algebra has been used in

a generic way to combine multisensor images

3

1.1.4 Optimization Approaches

In this approach to image fusion, the fusion task is expressed as an

bayesian optimization problem. Using the multisensor image data and an a-prori

model of the fusion result, the goal is to find the fused image which maximizes the

a-posteriori probability. Due to the fact that this problem cannot be solved in

general, some simplifications are introduced: All input images are modeled as

markov random fields to define an energy function which describes the fusion goal.

Due to the equivalence of of gibbs random fields and markov random fields, this

energy function can be expressed as a sum of so-called clique potentials, where

only pixels in a predefined neighborhood affect the actual pixel.

The fusion task then consists of a maximization of the energy function.

Since this energy function will be non-convex in general, typically stochastic

optimization procedures such as simulated annealing or modifications like iterated

conditional modes will be used.

1.1.5 Artificial Neural Networks

Inspired by the fusion of different sensor signals in biological systems, many

researchers have employed artificial neural networks in the process of pixel-level

image fusion. The most popular example for the fusion of different imaging

sensors in biological systems is described by Newman and Hart line in the 80s:

Rattlesnakes (and the general family of pit vipers) possess so called pit organs

which are sensitive to thermal radiation through a dense network of nerve fibers.

The output of these pit organs is fed to the optical tectum, where it is combined

with the nerve signals obtained from the eyes. Newman and Hart line distinguished

six different types of bimodal neurons merging the two signals based on a

sophisticated combination of suppression and enhancement.

Several researchers modeled this fusion process

1.1.6 Image Pyramids

Image pyramids have been initially described for multiresolution image

analysis and as a model for the binocular fusion in human vision. A generic image

pyramid is a sequence of images where each image is constructed by low pass

4

filtering and sub sampling from its predecessor. Due to sampling, the image size is

halved in both spatial directions at each level of the decomposition process, thus

leading to an multiresolution signal representation. The difference between the

input image and the filtered image is necessary to allow an exact reconstruction

from the pyramidal representation. The image pyramid approach thus leads to a

signal representation with two pyramids: The smoothing pyramid containing the

averaged pixel values, and the difference pyramid containing the pixel differences,

i.e. the edges. So the difference pyramid can be viewed as a multiresolution edge

representation of the input image.

The actual fusion process can be described by a generic multiresolution

fusion scheme which is applicable both to image pyramids and the wavelet

approach. There are several modifications of this generic pyramid construction

method described above. Some authors propose the computation of nonlinear

pyramids, such as the ratio and contrast pyramid, where the multistage edge

representation is computed by an pixel-by-pixel division of neighboring resolutions.

A further modification is to substitute the linear filters by morphological nonlinear

filters, resulting in the morphological pyramid. Another type of image pyramid - the

gradient pyramid - results, if the input image is decomposed into its directional

edge representation using directional derivative filter

1.1.7 Wavelet Transform

A signal analysis method similar to image pyramids is the discrete wavelet

transform. The main difference is that while image pyramids lead to an over

complete set of transform coefficients, the wavelet transform results in a

nonredundant image representation. The discrete 2-dim wavelet transform is

computed by the recursive application of lowpass and high pass filters in each

direction of the input image (i.e. rows and columns) followed by sub sampling.

Details on this scheme can be found in the reference section. One major

drawback of the wavelet transform when applied to image fusion is its well known

shift dependency, i.e. a simple shift of the input signal may lead to complete

different transform coefficients. This results in inconsistent fused images when

invoked in image sequence fusion. To overcome the shift dependency of the

wavelet fusion scheme, the input images must be decomposed into a shift

invariant representation. There are several ways to achieve this: The

5

http://www.metapix.de/references_r.htm

http://www.metapix.de/method7_r.htm


straightforward way is to compute the wavelet transform for all possible circular

shifts of the input signal. In this case, not all shifts are necessary and it is possible

to develop an efficient computation scheme for the resulting wavelet

representation. Another simple approach is to drop the subsampling in the

decomposition process and instead modify the filters at each decomposition level,

resulting in a highly redundant signal representation.

The actual fusion process can be described by a generic multiresolution

fusion scheme which is applicable both to image pyramids and the wavelet

approach.

1.1.8 Generic Multiresolution Fusion Scheme

The basic idea of the generic multiresolution fusion scheme is motivated by

the fact that the human visual system is primary sensitive to local contrast

changes, i.e. edges. Motivated from this insight, and in mind that both image

pyramids and the wavelet transform result in an multiresolution edge

representation, it is straightforward to build the fused image as a fused multiscale

edge representation. The fusion process is summarized in the following: In the first

step the input images are decomposed into their multiscale edge representation,

using either any image pyramid or any wavelet transform. The actual fusion

process takes place in the difference resp. wavelet domain, where the fused

multiscale representation is built by a pixel-by-pixel selection of the coefficients

with maximum magnitude. Finally the fused image is computed by an application

of the appropriate reconstruction scheme

6



Fig. 1 Block Diagram Of Basic Image Fusion Process

7

AIM OF THE PROJECT

2.1 NEW IMAGE FUSION ALGORITHM

The paper adopts the multiresolution analysis discrete wavelet frame

transform and fuzzy region feature fusion scheme to implement the selection of

source image wavelet coefficients. Fig.1 is the framework of the proposed image

fusion algorithm. The first step is to choose an image as object image that can

reflect the object and background clearer than the other image. The second step is

to decompose the source image into multiresolution representation. There are low

frequency band at each level during the next level decomposition. The low

frequency bands of the object image are segmented into region images. The third

step is defining the attributes of the regions by some region features, such as the

mean of gray level in a region. In this case, each pixel point has its membership

value. Then using certain attribute region fusion scheme combining with the

membership value of each pixel, the multiresolution representation of the fusion

result is achieved using defuzzification process. The final step is to do inverse

discrete wavelet frame transform, and the final fusion result is obtained.

8

The fusion of images is the process of combining two or more images into a

single image retaining important features from each. Fusion is an important

technique within many disparate fields such as remote sensing, robotics and

medical applications. Wavelet based fusion techniques have been reasonably

effective in combining perceptually important image features. Shift invariance of

the wavelet transform is important in ensuring robust sub band fusion. Therefore

the novel application of the shift invariant and directionally selective Dual Tree

Complex Wavelet Transform (DT-CWT) to image fusion is now introduced. This

novel technique provides improved qualitative and quantitative results compared to

previous wavelet fusion method.

9

The goals for this Project have been the following.

One goal has been to compile an introduction to the subject of Image

Fusion. There exist a number of studies on various algorithms, but complete

treatments on a technical level are not as common. Material from papers, journals,

and conference proceedings are used that best describe the various parts.

Another goal has been to search for algorithms that can be used to

implement for the image fusion for various applications.

A third goal is to evaluate their performance of with different image quality

metrics. These properties were chosen because they have the greatest impact on

the detection of Image fusion algorithms

A final goal has been to design and implement the Wavelet based fuzzy and

Neural approaches using matlab.

2.2 SCOPE OF THE PROJECT

2.2.1 DWT versus DT-CWT

Figures 3(a) and 3(b) show a pair of multifocus test images that were fused

for a closer comparison of the DWT and DT-CWT methods. Figures 3(d) and 3(e)

show the results of a simple MS method using the DWT and DT-CWT,

respectively. These results are clearly superior to the simple pixel averaging result

shown in 3(c). They both retain a perceptually acceptable combination of the two

“in focus” areas from each input image. An edge fusion result is also shown for

comparison (figure 3(f)) [8]. Upon closer inspection however, there are residual

ringing artefacts found in the DWT fused image not found within the DT-CWT

fused image. Using more sophisticated coefficient fusion rules (such as WBV or

WA) the DWT and DT-CWT results were much more difficult to distinguish.

However, the above comparison when using a simple MS method reflects the

ability of the DT-CWT to retain edge details without ringing.

10

Figure 2.1: (a) First image of the multifocus test set. (b) Second image of the

multi focus test set. (c) Fused image using average pixel values. (d) Fused

image using DWT with an MS fuse rule. (e) Fused image using DT-CWT with

an MS fuse rule. (f) Fused image using multiscale edge fusion

(point representations).

11

2.2.2 Quantitative Comparisons

Often the perceptual quality of the resulting fused image is of prime

importance. In these circumstances comparisons of quantitative quality can often

be misleading or meaningless. However, a few authors [1, 7, 10] have attempted

to generate such measures for applications where their meaning is clearer.

Figures 3(a) and 3(b) reflect such an application: fusion of two images of differing

focus to produce an image of maximum focus. Firstly, a “ground truth” image

needs to be created that can be quantitatively compared to the fusion result

images. This is produced using a simple cut-and-paste technique, physically taking

the “in focus” areas from each image and combining them. The quantitative

measure used to compare the cut-and-paste image to each fused image was

taken from [1]

Figure 2.2: (a) First image (MR) of the medical test set. (b) Second image (CT) of the medical test set. (c) Fused image using average pixel values. (d) Fused image using DWT with an MS fuse rule. (e) Fused image using DT-CWT with an MS fuse rule. (f) Fused image using multiscale edge fusion (point representations).

12

where Igt is the cut-and-paste “ground truth” image, ___ is the fused image and is

the size of the image. Lower values of _ indicate greater similarity between the

images___ and ___ and therefore more successful fusion in terms of quantitatively

measurable similarity. Table 1 shows the results for the various methods used.

The average pixel value method gives a baseline result. The PCA method gave an

equivalent but a slightly worse result. These methods have poor results relatively

to the others. This was expected as they have no scale selectivity. Results were

obtained for the DWT methods using all the bio-orthogonal wavelets available

within the Matlab (5.0) Wavelet Toolbox. Similarly, results were obtained for the

DT-CWT methods using all the shift invariant wavelet filters described in [3].

Results were also calculated for the SIDWT using the Haar wavelet and the

bior2.2 Daubechies wavelet. The table 1 shows the best results for all filters for

each method. For all filters, the DWT results were worse than their DT-CWT

equivalents. Similarly, all the DWT results were worse than their SIDWT

equivalents. This demonstrates the importance of shift invariance in wavelet

transform fusion. The DT-CWT results were also better than the equivalent results

using the SIDWT. This indicates the improvement gained from the added

directional selectivity of the DT-CWT over the SIDWT. The WBV and WA methods

performed better than MS with equivalent transforms as expected, with WBV

performing best for both cases. All of the wavelet transform results were

decomposed to four levels. In addition, the residual low pass images were fused

using simple averaging and the window for the WA and WBV methods were all set

to 3_3.

13

Table 2.1: Quantitative results for various fusion methods.

2.3 EFFECT OF WAVELET FILTER CHOICE FOR DWT AND DT-CWT

BASED FUSION

There are many different choices of filters to affect the DWT transform. In

order not to introduce phase distortions, using filters having a linear phase

response is a sensible choice. To retain a perfect reconstruction property, this

necessitates the use of biorthogonal filters. MS fusion results were compared for

all the images in figures 3 and 4 using all the biorthogonal filters included in the

Mat lab (5.0) Wavelet Toolbox. Likewise there are also many different choices of

filters to affect the DT-CWT transform. MS fusion results were compared for all the

same three image pairs using all the specially designed filters given in [3].

Qualitatively all the DWT results gave more ringing artifacts than the equivalent

DTCWT results. Different choices of DWT filters gave ringing artifacts at different

image locations and scales. The choice of filters for the DT-CWT did not seem to

alter or move the ringing artifacts found within the fused images. The perceived

higher quality of the DT-CWT fusion results compared to the DWT fusion results

was also reflected by a quantitative comparison.

14

WAVELET TRANSFORM OVERVIEW

3.1 WAVELET TRANSFORM

Wavelets are mathematical functions defined over a finite interval and

having an average value of zero that transform data into different frequency

components, representing each component with a resolution matched to its scale.

The basic idea of the wavelet transform is to represent any arbitrary

function as a superposition of a set of such wavelets or basis functions. These

basis functions or baby wavelets are obtained from a single prototype wavelet

called the mother wavelet, by dilations or contractions (scaling) and translations

(shifts). They have advantages over traditional Fourier methods in analyzing

physical situations where the signal contains discontinuities and sharp spikes.

Many new wavelet applications such as image compression, turbulence, human

vision, radar, and earthquake prediction are developed in recent years. In wavelet

transform the basis functions are wavelets. Wavelets tend to be irregular and

symmetric. All wavelet functions, w(2kt - m), are derived from a single mother

wavelet, w(t). This wavelet is a small wave or pulse like the one shown in Fig. 3.2.

Fig. 3.1 Mother wavelet w(t)

Normally it starts at time t = 0 and ends at t = T. The shifted wavelet w(t - m)

starts at t = m and ends at t = m + T. The scaled wavelets w(2kt) start at t = 0 and

end at t = T/2k. Their graphs are w(t) compressed by the factor of 2k as shown in

Fig. 3.3. For example, when k = 1, the wavelet is shown in Fig 3.3 (a). If k = 2 and

3, they are shown in (b) and (c), respectively.

15

(a)w(2t) (b)w(4t) (c)w(8t)

Fig. 3.2 Scaled wavelets

The wavelets are called orthogonal when their inner products are zero. The

smaller the scaling factor is, the wider the wavelet is. Wide wavelets are

comparable to low-frequency sinusoids and narrow wavelets are comparable to

high-frequency sinusoids.

3.1.1 Scaling

Wavelet analysis produces a time-scale view of a signal. Scaling a wavelet

simply means stretching (or compressing) it. The scale factor is used to express

the compression of wavelets and often denoted by the letter a. The smaller the

scale factor, the more “compressed” the wavelet. The scale is inversely related to

the frequency of the signal in wavelet analysis.

3.1.2 Shifting

Shifting a wavelet simply means delaying (or hastening) its onset.

Mathematically, delaying a function f(t) by k is represented by: f(t-k) and the

schematic is shown in fig. 3.4.

(a) Wavelet function (t) (b) Shifted wavelet function (t-k)

Fig. 3.3 Shifted wavelets

16

3.1.3 Scale and Frequency

The higher scales correspond to the most “stretched” wavelets. The more

stretched the wavelet, the longer the portion of the signal with which it is being

compared, and thus the coarser the signal features being measured by the

wavelet coefficients. The relation between the scale and the frequency is shown in

Fig. 3.5.

Low scale High scale

Fig. 3.4 Scale and frequency

Thus, there is a correspondence between wavelet scales and frequency as

revealed by wavelet analysis:

•Low scale a Compressed wavelet Rapidly changing details High

frequency.

•High scale aStretched waveletSlowly changing, coarse featuresLow

frequency.

3.2 DISCRETE WAVELET TRANSFORM

Calculating wavelet coefficients at every possible scale is a fair amount of

work, and it generates an awful lot of data. If the scales and positions are chosen

based on powers of two, the so-called dyadic scales and positions, then

calculating wavelet coefficients are efficient and just as accurate. This is obtained

from discrete wavelet transform (DWT).

3.2.1 One-Stage Filtering

For many signals, the low-frequency content is the most important part. It is

the identity of the signal. The high-frequency content, on the other hand, imparts

details to the signal. In wavelet analysis, the approximations and details are

obtained after filtering. The approximations are the high-scale, low frequency 17

components of the signal. The details are the low-scale, high frequency

components. The filtering process is schematically represented as in Fig. 3.6.

Fig. 3.5 Single stage filtering

The original signal, S, passes through two complementary filters and

emerges as two signals. Unfortunately, it may result in doubling of samples and

hence to avoid this, downsampling is introduced. The process on the right, which

includes downsampling, produces DWT coefficients. The schematic diagram with

real signals inserted is as shown in Fig. 3.7.

Fig. 3.6 Decomposition and decimation

3.2.2 Multiple-Level Decomposition

The decomposition process can be iterated, with successive

approximations being decomposed in turn, so that one signal is broken down into 18

many lower resolution components. This is called the wavelet decomposition tree

and is depicted as in Fig. 3.8.

Fig. 3.7 Multilevel decomposition

3.2.3 Wavelet Reconstruction

The reconstruction of the image is achieved by the inverse discrete wavelet

transform (IDWT). The values are first upsampled and then passed to the filters.

This is represented as shown in Fig. 3.9.

Fig. 3.8 Wavelet Reconstruction

The wavelet analysis involves filtering and downsampling, whereas the

wavelet reconstruction process consists of upsampling and filtering. Upsampling is

the process of lengthening a signal component by inserting zeros between

samples as shown in Fig. 3.10.

19

Fig. 3.9 Reconstruction using upsampling

3.2.4 Reconstructing Approximations and Details

It is possible to reconstruct the original signal from the coefficients of the

approximations and details. The process yields a reconstructed approximation

which has the same length as the original signal and which is a real approximation

of it.

The reconstructed details and approximations are true constituents of the

original signal. Since details and approximations are produced by downsampling

and are only half the length of the original signal they cannot be directly combined

to reproduce the signal. It is necessary to reconstruct the approximations and

details before combining them. The reconstructed signal is schematically

represented as in Fig. 3.11.

Fig. 3.10 Reconstructed signal components

3.2.5 1-D Wavelet Transform

The generic form for a one-dimensional (1-D) wavelet transform is shown in

Fig. 3.12. Here a signal is passed through a low pass and high pass filter, h and g,

20

respectively, then down sampled by a factor of two, constituting one level of

transform.

Fig. 3.11 1D Wavelet Decomposition.

Repeating the filtering and decimation process on the lowpass branch

outputs make multiple levels or “scales” of the wavelet transform only. The process

is typically carried out for a finite number of levels K, and the resulting coefficients

are called wavelet coefficients.

The one-dimensional forward wavelet transform is defined by a pair of filters

and t that are convolved with the data at either the even or odd locations. The

filters s and t used for the forward transform are called analysis filters.

nL nH

li = ∑ sjx2i+j and hi = ∑ tjx2i+1+j

j=-nl j=-nH

Although l and h are two separate output streams, together they have the

same total number of coefficients as the original data. The output stream l, which

is commonly referred to as the low-pass data may then have the identical process

applied again repeatedly. The other output stream, h (or high-pass data), generally

remains untouched. The inverse process expands the two separate low- and high-

pass data streams by inserting zeros between every other sample, convolves the

resulting data streams with two new synthesis filters s’ and t’, and adds them

together to regenerate the original double size data stream.

nH nl

yi = t’jl’

i+j + s’j h’

i+j where l’2i = li, l’ 2i+1 = 0

j= -nH j= -nH h’2i+1 = hi, h’2i = 0

21

To meet the definition of a wavelet transform, the analysis and synthesis

filters s, t, s’ and t’ must be chosen so that the inverse transform perfectly

reconstructs the original data. Since the wavelet transform maintains the same

number of coefficients as the original data, the transform itself does not provide

any compression. However, the structure provided by the transform and the

expected values of the coefficients give a form that is much more amenable to

compression than the original data. Since the filters s, t, s’ and t’ are chosen to be

perfectly invertible, the wavelet transform itself is lossless. Later application of the

quantization step will cause some data loss and can be used to control the degree

of compression. The forward wavelet-based transform uses a 1-D subband

decomposition process; here a 1-D set of samples is converted into the low-pass

subband (Li) and high-pass subband (Hi). The low-pass subband represents a

down sampled low-resolution version of the original image. The high-pass

subband represents residual information of the original image, needed for the

perfect reconstruction of the original image from the low-pass subband

3.3 2-D TRANSFORM HEIRARCHY

The 1-D wavelet transform can be extended to a two-dimensional (2-D)

wavelet transform using separable wavelet filters. With separable filters the 2-D

transform can be computed by applying a 1-D transform to all the rows of the

input, and then repeating on all of the columns.

LL1 HL1

LH1 HH1

22

Fig. 3.12 Subband Labeling Scheme for a one level, 2-D Wavelet Transform

The original image of a one-level (K=1), 2-D wavelet transform, with

corresponding notation is shown in Fig. 3.13. The example is repeated for a three-

level (K =3) wavelet expansion in Fig. 3.14. In all of the discussion K represents

the highest level of the decomposition of the wavelet transform.

LL1 HL1

HL2

HL3

LH1 HH1

LH2 HH2

LH3 HH3

Fig. 3.13 Subband labeling Scheme for a Three Level, 2-D Wavelet Transform

The 2-D subband decomposition is just an extension of 1-D subband

decomposition. The entire process is carried out by executing 1-D subband

decomposition twice, first in one direction (horizontal), then in the orthogonal

(vertical) direction. For example, the low-pass subbands (Li) resulting from the

horizontal direction is further decomposed in the vertical direction, leading to LLi

and LHi subbands.

Similarly, the high pass subband (Hi) is further decomposed into HLi and

HHi. After one level of transform, the image can be further decomposed by

applying the 2-D subband decomposition to the existing LLi subband. This iterative

process results in multiple “transform levels”. In Fig. 3.14 the first level of transform

23

results in LH1, HL1, and HH1, in addition to LL1, which is further decomposed into

LH2, HL2, HH2, LL2 at the second level, and the information of LL2 is used for the

third level transform. The subband LLi is a low-resolution subband and high-pass

subbands LHi, HLi, HHi are horizontal, vertical, and diagonal subband respectively

since they represent the horizontal, vertical, and diagonal residual information of

the original image. An example of three-level decomposition into subbands of the

image CASTLE is illustrated in Fig. 3.15.

H2H1HH

Fig. 3.14 The process of 2-D wavelet transform applied through three

transform levels

To obtain a two-dimensional wavelet transform, the one-dimensional

transform is applied first along the rows and then along the columns to produce

four subbands: low-resolution, horizontal, vertical, and diagonal. (The vertical

subband is created by applying a horizontal high-pass, which yields vertical

edges.) At each level, the wavelet transform can be reapplied to the low-resolution

subband to further decorrelate the image. Fig. 3.16 illustrates the image

decomposition, defining level and subband conventions used in the AWIC

algorithm. The final configuration contains a small low-resolution subband. In

addition to the various transform levels, the phrase level 0 is used to refer to the

original image data. When the user requests zero levels of transform, the original

image data (level 0) is treated as a low-pass band and processing follows its

natural flow.

24

Low Resolution Subband

Fig. 3.15 Image Decomposition Using Wavelets

Wavelet transform is first performed on each source images, then a fusion

decision map is generated based on a set of fusion rules. The fused wavelet

coefficient map can be constructed from the wavelet coefficients of the source

images according to the fusion decision map. Finally the fused image is obtained

by performing the inverse wavelet transform.

From the above diagram, we can see that the fusion rules are playing a

very important role during the fusion process. Here are some frequently used

fusion rules in the previous work:

43

Level 2 Level 1

Vertical subband

HL

4 4

3 3

Level 2 Level 2

Level 1

Horizontal Subband

LH

Level 1

Diagonal Subband

HH

25

When constructing each wavelet coefficient for the fused image. We will

have to determine which source image describes this coefficient better. This

information will be kept in the fusion decision map. The fusion decision map has

the same size as the original image. Each value is the index of the source image

which may be more informative on the corresponding wavelet coefficient. Thus, we

will actually make decision on each coefficient. There are two frequently used

methods in the previous research. In order to make the decision on one of the

coefficients of the fused image, one way is to consider the corresponding

coefficients in the source images as illustrated by the red pixels. This is called

26

pixel-based fusion rule. The other way is to consider not only the corresponding

coefficients, but also their close neighbors, say a 3x3 or 5x5 windows, as

illustrated by the blue and shadowing pixels. This is called window-based fusion

rules. This method considered the fact that there usually has high correlation

among neighboring pixels.

In our research, we think objects carry the information of interest, each pixel

or a small neighboring pixels are just one part of an object. Thus, we proposed a

region-based fusion scheme. When make the decision on each coefficient, we

consider not only the corresponding coefficients and their closing neighborhood,

but also the regions the coefficients are in. We think the regions represent the

objects of interest. We will provide more details of the scheme in the following.

3.4 PROPOSED SCHEME

Neural Network and Fuzzy Logic approach can be used for sensor fusion.

Such a sensor fusion could belong to a class of sensor fusion in which case the

features could be input and decision could be output. The help of Neuro-fuzzy of

fuzzy systems can achieve sensor fusion. The system can be trained from the

input data obtained from the sensors. The basic concept is to associate the given

sensory inputs with some decision outputs. After developing the system. another

group of input data is used to evaluate the performance of the system.

Following algorithm and .M file for pixel level image fusion using Fuzzy

Logic illustrate the process of defining membership functions and rules for the

image fusion process using FIS (Fuzzy Inference System) editor of Fuzzy Logic

toolbox in Matlab.

3.5 PROPOSED ALGORITHM

STEP 1

Read first image in variable M1 and find its size (rows z l , columns: SI).

Read second image in variable M2 and find its size (rows z2, columns: s2).

Variables MI and M2 are images in matrix form where each pixel value is in

the range from 0-255. Use Gray color map.

27

Compare rows and columns of both input images. If the two images are not

of the same size, select the portion,which are of same size.

STEP 2

Apply wavelet decomposition and form spatial decomposition Trees

Convert the images in column form which has C= zl*sl entries.

STEP 3

Create fuzzy interference system of type Mamdani with following

specifications

Name: 'c7'Type: 'mamdani'AndMethod: 'min'OrMethod: 'max'DefuzzMethod: 'centroid'ImpMethod: 'min'AggMethod: 'max'

28

STEP 4

Decide number and type of membership functions for both the input

images by tuning the membership functions.

Input images in antecedent are resolved to a degree of membership

ranging 0 to 255.

Make rules for input images, which resolve the two antecedents to a

single number from 0 to 255.

STEP 5

For num=l to C in steps of one, apply fuzzification using the rules developed

above on the corresponding pixel values of the input images which gives a fuzzy

set represented by a membership function and results in output image in column

format.

Check the rules using rule viewer and surface viewer

29

STEP 6

Convert the column form to matrix form and display the fused image.

3.7 ALGORITHM USING NEURO FUZZY

30

STEP 1

Read first image in variable M1 and find its size (rows z l , columns: SI).

Read second image in variable M2 and find its size (rows z2, columns: s2).

Variables MI and M2 are images in matrix form where each pixel value is in the

range from 0-255. Use Gray color map.

Compare rows and columns of both input images. If the two images are not of the

same size, select the portion,which are of same size.

STEP 2

Apply wavelet decomposition and form spatial decomposition Trees

Convert the images in column form which has C= zl*sl entries.

STEP 3

Form a training data, which is a matrix with three columns and

entries in each column are form 0 to 255 in steps of 1.

Form a check data which is a matrix of pixels of two input images in

a column format

Decide the number and type of Membership Function.

Create fuzzy interference system of type Mamdani with following

specifications

Name: 'c7'Type: 'mamdani'AndMethod: 'min'OrMethod: 'max'

31

DefuzzMethod: 'centroid'ImpMethod: 'min'AggMethod: 'max'

STEP 4

Decide number and type of membership functions for both the input

images by tuning the membership functions.

Input images in antecedent are resolved to a degree of membership

ranging 0 to 255.

Make rules for input images, which resolve the two antecedents to a

single number from 0 to 255.

32

STEP 5

For num=l to C in steps of one, apply fuzzification using the rules developed

above on the corresponding pixel values of the input images which gives a fuzzy

set represented by a membership function and results in output image in column

format.

STEP 6

Start training using ANFIS for the generated Fuzzy Interference system using Training data

Apply Fuzification using Trained Data and Check Data

Convert the column form to matrix form and display the fused image.

QUANTITATIVE COMPARISONS33

4.1 PERFORMANCE EVALUATION OF FUSION

It has been common to evaluate the result of fusion visually. According to

visual evaluation, human judgment determines the quality of the image. Some

independent and objective observers give grade to corresponding image and the

final grade is obtained by taking the average or weighted mean of the individual

grades. Obviously this evaluation method has some drawbacks, namely it is not

accurate and depends on the observer’s experience. For an accurate and truthful

assessment of the fusion product some quantitative measures (indicator) is

required. Two different measures are used in this project to evaluate the results of

fusion process. They are Information Entropy and Root Mean Square Error.

4.2 ENTROPY

One of the quantitative measures in digital image processing is Entropy.

Claude Shannon introduced the entropy concept in quantification of information

content of messages. Although he used entropy in communication, it can be also

employed as a measure and quantify the information content of digital images. A

digital image consists of pixels arranged in rows and columns. Each pixel is

defined by its position and by its grey scale level. For an image consists of L grey

levels, the entropy is defined as:

where is the probability (here frequency) of each grey scale level. As an example a

digital image of type uint8 (unsigned integer 8) has 256 different levels from 0

(black) to 255(white . It must be noticed that in combined images the number of

levels is very large and grey level intensity of each pixel is a decimal, double

number. But the equation (10) is still valid to compute the entropy. For images with

high information content the entropy is large. The larger alternations and changes

in an image give larger entropy and the sharp and focused images have more

changes than blurred and misfocused images. Hence, the entropy is a measure to

assess the quality of different aligned images from the same scene.

34

The Root Mean Square Error between the reference image, I and the fused

image is defined as: F

where and i j denotes the spatial position of pixels, M and are the dimensions of

the images. N This measure is appropriate for a pair of images containing two

objects. First a reference, everywhere-infocus image I is taken. Then two images

are provided from this original image. In one image the first object is focused and

the second one is blurred. In the other image the first object is blurred and another

one is remained focused. The fused image would contain both well-focused

objects.

Often the perceptual quality of the resulting fused image is of prime

importance. In these circumstances, comparisons of quantitative quality can often

be misleading or meaningless. However, a few authors [1, 8, 9] have attempted to

generate such measures for applications where their meaning is clearer. Figure 2

reflects such an application: fusion of two images of differing focus to produce an

image of maximum focus. Firstly, a “ground truth” image needs to be created that

can be quantitatively compared to the fusion result images. This is produced using

a simple cut-and-paste technique, physically taking the “in focus” areas from each

image and combining them. The quantitative measure used to compare the cut-

and-paste image to each fused image wastaken from [1]

where Igt is the cut-and-paste “ground truth” image, Ifd is the fused image and N

is the size of the image. Lower values of _ indicate greater similarity between the

images Igt and Ifd and therefore more successful fusion in terms of quantitatively

measurable similarity. Table 1 shows the results for the various methods used.

The average pixel value method, the pixel based PCA and the DWT methods give

35

poor results relatively to the others as expected. The DT-CWT methods give

roughly equivalent results although the New-CWT method gave slightly worse

results. The results were however very close and should not be taken as indicative

as this is just one experiment and the transforms are producing essentially the

same subband orms. The WBV and WA methods performed better than MS with

equivalent transforms as expected in most cases. The residual low pass images

were fused using simple averaging and the window for the WA and WBV methods

were all set to 3×3. The table 1 shows the best results for all filters available for

each method.

4.3 APPLICATIONS AND TRENDS

4.3.1 Navigation Aid

To allow helicopter pilots navigate under poor visibility conditions (such as

fog or heavy rain) helicopters are equipped with several imaging sensors, which

can be viewed by the pilot in a helmet mounted display. A typical sensor suite

includes both a low-light-television (LLTV) sensor and a thermal imaging forward-

looking-infrared (FLIR) sensor. In the current configuration, the pilot can choose on

of the two sensors to watch in his display. A possible improvement is combine both

imaging sources into a single fused image which contains the relevant image

information of both imaging devices. The following images in the result 1.1

illustrate this application.

4.3.2 Merging Out-Of-Focus Images

Due to the limited depth-of-focus of optical lenses (especially such with long

focal lengths) it is often not possible to get an image which contains all relevant

objects 'in focus'. One possibility to overcome this problem is to take several

pictures with different focus points and combine them together into a single frame

which finally contains the focused regions of all input images. The following

images in the result 1.2 illustrate this approach.

4.3.3 Medical Imaging

36

With the development of new imaging methods in medical diagnostics

arises the need of meaningful (and spatial correct) combination of all available

image datasets. Examples for imaging devices include computer tomography (CT),

magnetic resonance imaging (MRI) or the newer positron emission tomography

(PET). The following images in the result 1.3 illustrate the fusion of a CT and a

MRI image.

4.3.4 Remote Sensing

Remote sensing is a typical application for image fusion: Modern spectral

scanners gather up to several hundred of spectral bands which can be either

visualized and processed individually, or which can be fused into a single image,

depending on the image analysis task. The following images illustrate in the result

1.4 the fusion of two bands of a multispectral scanner.

4.4 RESULTS

37

Fig 4.1 Fusion by Averaging

Fig 4.2 Fusion by Maximum

38

Fig 4.3 Fusion by Minimum

Fig 4.4 Fusion by PCA

39

Fig 4.5 Fusion by averaging

Fig 4.6 Fusion by Averaging

40

FUSION BY AVERAGING FUSION BY MAXIMUM

FUSION BY MINIMUM FUSION BY PCA

41

CONCLUSION

In this project, the use of Discrete Wavelet Transform (DWT), Fuzzy, Neuro

Fuzzy, the fusion of images taken by digital camera was studied. The pixel-level-

based fusion mechanism applied to sets of images. All the results obtained by

these methods are valid in case of using aligned source images from the same

scene. In order to evaluate the results and compare these methods two

quantitative assessment criteria Information Entropy and Root Mean Square Error

were employed. Experimental results indicated that there are no considerable

differences between these two methods in performance. In fact if the result of

fusion in each level of decomposition is separately evaluated visually and

quantitatively in terms of entropy, no considerable difference will be observed (Fig.

5, 6,7, 9 and 11 and Tables 2 and 4). Although some differences identified in lower

levels, DWT and LPT demonstrated similar results from level three of

decomposing. Both techniques reach the best result in terms of information

entropy with a decomposing level of three. Experimental results demonstrated in

Tables 2 and 4 also indicate that LPT algorithm reaches its best quality in terms of

entropy in lower levels than DWT. The RMSE values represented in Table 6 show

that neither LPT nor DWT has better performance in all levels, although the best

result belongs to the LPT method. However the RMSE results compared to quality

and entropy of fused images indicate that RMSE can not be used as a proper

criterion to evaluate and compare the fusion results. Finally the experiments

showed that the LPT approach is implemented faster than DWT. Actually LPT

takes less than half the time in comparison with DWT and with regard to

approximately similar performance, LPT is preferred in real-time applications.

Fuzzy and Neuro-Fuzzy algorithms have been implemented to fuse a variety of

images. The results of fusion process proposed are given in terms of Entropy and

Variance. The fusions have been implemented for medical images and remote

sensing images. It is hoped that the techniques can be extended for colored

images and for fusion of multiple sensor images.

42

5.1 DWT Fusion

The DWT fusion methods provide computationally efficient image fusion

techniques. Various fusion rules for the selection and combination of subband

coefficients increase the quality (perceptual and quantitatively measurable) of

image fusion in specific applications.

5.2 DT-CWT Fusion

The developed DT-CWT fusion techniques provide better quantitative and

qualitative results than the DWT at the expense of increased computation. The

DT-CWT method is able to retain edge information without significant ringing

artifacts. It is also good at faithfully retaining textures from the input images. All of

these features can be attributed to the increased shift invariance and orientation

selectivity of the DT-CWT when compared to the DWT. A previously developed

shift invariant wavelet transform (the SIDWT) has been used for image fusion [7].

However, the SIDWT suffers from excessive redundancy. The SIDWT also lacks

the directional selectivity of the DT-CWT. This is reflected in the superior

quantitative results of the DT-CWT (see table1) Various fusion rules for the

selection and combination of sub band coefficients quantitatively measurable) of

image fusion in specific applications. The DT-CWT has the further advantage that

the phase information is available for analysis. However, after an initial set of

experiments using the notion of phase coherence, no improvement in fusion

performance has been achieved.

43

REFERENCES

[1] Shutao Li, James T. Kwok, Ivor W. Tsang, Yaonan Wang, “ Fusing images with

different focuses using support vector machines” IEEE Transactions on Neural

Networks, 15(6):1555- 1561, Nov. 2004.

[2] P. J. Burt and R. J. Lolczynski, “ Enhanced image capture through fusion” In

Proc. the 4th Intl. Conf. on Computer Vision, pages 173-182, Berlin, Germany, May

1993.

[3] Z. Zhang and R. Blum, “A categorization of multiscale-decomposition-based

image fusion schemes with a performance study for a digital camera application”

Proceedings of the IEEE, pages 1315 -1328, August 1999.

[4] P.J Burt, EH Adelson, “The Laplacian pyramid as a compact image code”. IEEE

Transactions Communications, 31, pp.532-540, April.1983

[5] Shutao Li, James T. Kwok, Yaonan Wang, “Combination of images with diverse

focuses using the spatial frequency”, Information Fusion 2(3): 169-176, 2001

[6] Z. Zhang and R. S. Blum, “ Multisensor Image Fusion using a Region-Based

Wavelet

Transform Approach” Proc. of the DARPA IUW, pp. 1447-1451, 1997.

[7] Pajares, G., De La Cruz, JM, “ A wavelet-based image fusion tutorial”. Pattern

Recognition,37, pp. 1855-1872, 2004.

[8] MATLAB, Wavelet Toolbox User's Guide, http://www.mathworks.com

The Mathworks, Inc., August 2005.

[9] H. Wang, J. Peng and W. Wu, “Fusion algorithm for multisensor images based

on discretemultiwavelet transform”. Vision, Image and Signal Processing,

Proceedings of the IEEE Vol.149, no.5: October 2002.

44

IMAGE FUSION DOC

Documents