Top Banner
LECTURE NOTES ON DIGITAL IMAGE PROCESSING MR. G.Sasi M.E ASST. PROFESSOR DEPT OF ELECTRONICS & COMMUNICATION ENGINEERING NPRCET
98
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dip

LECTURE NOTES

ON

DIGITAL IMAGE PROCESSING

MR. G.Sasi M.E

ASST. PROFESSOR DEPT OF ELECTRONICS & COMMUNICATION ENGINEERING

NPRCET

Page 2: Dip

Syllabus

DIGITAL IMAGE PROCESSING

UNIT I DIGITAL IMAGE FUNDAMENTALS AND TRANSFORMS

Elements of visual perception – Image sampling and quantization basic relationship between pixels – Basic geometric transformations –

Introduction to fourier transform and DFT – Properties of 2D fourier transform – FFT – Separable image transforms – Walsh- Hadamard

– Discrete cosine transform, Haar, Slant-Karhunen – Love transforms.

UNIT II IMAGE ENHANCEMENT TECHNIQUES

Spatial domain methods: Basic grey level transformation – Histogram equalization – Image subtraction – Image averaging – Spatial

filtering – Smoothing, sharpening filters – Laplacian filters – Frequency domain filters – Smoothing – Sharpening filters –

Homomorphic filtering.

UNIT III IMAGE RESTORATION

Model of image degradation/restoration process – Noise models – Inverse filtering – Least mean square filtering – Constrained least

mean square filtering – Blind imager– Pseudo inverse – Singular value decomposition.

UNIT IV IMAGE COMPRESSION

Lossless compression: variable length coding – LZW coding – Bit plane coding– Predictive coding– DPCM. Lossy Compression:

Transform coding – Wavelet coding – Basics of image compression standards – JPEG, MPEG, basics of vector quantization.

UNIT V IMAGE SEGMENTATION AND REPRESENTATION

Edge detection – Thresholding – Region based segmentation – Boundary representation – Chair codes– Polygonal approximation –

Boundary segments – Boundary descriptors – Simple descriptors– Fourier descriptors – Regional descriptors – Simple descriptors–

Texture.

TEXT BOOK

1. Rafael C. Gonzalez, Richard E. Woods, “Digital Image Processing”, 2nd Edition,

Pearson Education, 2003.

REFERENCES

1. William K. Pratt, “Digital Image Processing” , John Willey ,2001

2. Millman Sonka, Vaclav Hlavac, Roger Boyle, Broos/Colic, Thompson Learniy,

Vision, “Image Processing Analysis and Machine”, 1999.

Page 3: Dip

UNIT I DIGITAL IMAGE FUNDAMENTALS AND TRANSFORMS

Elements of visual perception – Image sampling and quantization basic relationship between pixels – Basic geometric transformations –

Introduction to fourier transform and DFT – Properties of 2D fourier transform – FFT – Separable image transforms – Walsh- Hadamard

– Discrete cosine transform, Haar, Slant-Karhunen – Love transforms.

Projections

• There are two types of projections (P ) of interest to us:

1. Perspective Projection

– Objects closer to the capture device appear bigger. Most

image formation situations can be considered to be under

this category, including images taken by camera and the

human eye.

2. Ortographic Projection

– This is “unnatural”. Objects appear the same size re-

gardless of their distance to the “capture device”.

• Both types of projections can be represented via mathematical for-

mulas. Ortographic projection is easier and is sometimes used as a

mathematical convenience.

Page 4: Dip

°

Inside the Camera - Sensitivity

• Once we have cp(x0, y0, λ) the characteristics of the capture device take

over.

• V (λ) is the sensitivity function of a capture device. Each capture device

has such a function which determines how sensitive it is in capturing

the range of wavelengths (λ) present in cp(x0, y0, λ).

• The result is an “image function” which determines the amount of

reflected light that is captured at the camera coordinates (x0, y0). Z

Page 5: Dip

f (x0, y0) = cp(x0, y0, λ)V (λ)dλ (1)

°

Page 6: Dip

Let us determine the image functions for the above sensitivity functions imaging the same scene:

1. This is the most realistic of the three. Sensitivity is concentrated in a band around λ0. Z

f1(x0, y0) = cp(x

0, y0, λ)V1(λ)dλ

2. This is an unrealistic capture device which has sensitivity only to a single wavelength λ0 as

determined by the delta function. However there are devices that get close to such “selective”

behavior.

f2(x0, y0) =

Z Z

cp(x0, y0, λ)V2(λ)dλ =

cp(x0, y0, λ)δ(λ − λ0)dλ

= cp(x0, y0, λ0)

3. This is what happens if you take a picture without taking the cap off the lens of your camera.

f3(x0, y0) =

Z Z

cp(x0, y0, λ)V3(λ)dλ =

cp(x0, y0, λ) 0 dλ

= 0

°

Page 7: Dip

Z

Z

Sensitivity and Color

• For a camera that captures color images, imagine that it has three

sensors at each (x0, y0) with sensitivity functions tuned to the colors or

wavelengths red, green and blue, outputting three image functions: Z

fR (x0, y0) =

fG (x0, y0) =

fB (x0, y0) =

cp(x0, y0, λ)VR (λ)dλ

cp(x0, y0, λ)VG (λ)dλ

cp(x0, y0, λ)VB (λ)dλ

• These three image functions can be used by display devices (such as

your monitor or your eye) to show a “color” image.

°

Page 8: Dip

min , x

[y , y

Digital Image Formation

• The image function fC (x0, y0) is still a function of x0 ∈ [x0

0 max ] and y0 ∈

0 min

0 max ] which vary in a continuum given by the respective intervals.

• The values taken by the image function are real numbers which again

vary in a continuum or interval fC (x0, y0) ∈ [fmin, fmax].

• Digital computers cannot process parameters/functions that vary in a

continuum.

• We have to discretize:

°

Page 9: Dip

P .

f C

Quantization

• fC (i, j) (i = 0, . . . , N − 1, j = 0, . . . , M − 1). We have the second step of

discretization left.

• fC (i, j) ∈ [fmin, fmax], ∀(i, j).

• Discretize the values fC (i, j) to P levels as follows:

Let ∆Q = fmax− fmin

where

ˆ (i, j) = Q(fC

(i, j)) (4)

Q(fC(i, j)) = (k + 1/2)∆Q + fmin

if and only if fC(i, j) ∈ [fmin + k∆Q, fmin + (k + 1)∆Q)

if and only if fmin + k∆Q ≤ fC(i, j) < fmin + (k + 1)∆Q

for k = 0, . . . , P − 1

°

Page 10: Dip

f

C

C

Quantization to P levels

• Typically P = 28 = 256 and we have log2(P ) = log2(28) = 8 bit

quantization.

• We have thus achieved the second step of discretization.

• From now on omit references to fmin, fmax and unless otherwise stated

assume that the original digital images are quantized to 8 bits or 256

levels.

• To denote this refer to ̂

255, i.e., let us say that

(i, j) as taking integer values k where 0 ≤ k ≤

f̂ (i, j) ∈ {0, . . . , 255} (5)

°

Page 11: Dip

f f f

R f f

R G B

G B

(R,G,B) Parameterization of Full Color Images

ˆ (i, j), ˆ (i, j), ˆ (i, j) → full color image

(i, j),

ˆ (i, j) and

, , , ⇒

ˆ (i, j) are called the (R, G, B) parameterization of

the “color space” of the full color image.

• There are other parameterizations, each with its own advantages and

disadvantages.

°

Page 12: Dip

f

Grayscale Images

“Grayscale” image ˆ gray (i, j)

• A grayscale or luminance image can be considered to be one of the

components of a different parameterization.

• Advantage: It captures most of the “image information”.

• Our emphasis in this class will be on general processing. Hence we

will mainly work with grayscale images in order to avoid the various

nuances involved with different parameterizations.

°

Page 13: Dip

f

gray

Images as Matrices

• Recalling the image formation operations we have discussed, note that

the image

.

ˆ gray (i, j) is an N × M matrix with integer entries in the range

0, . . . , 255

• From now on suppress

(ˆ) and denote an image as a matrix “A” (or

Page 14: Dip

B, . . ., etc.) with elements A(i, j) ∈ {0, . . . ,

255} for i = 0, . . . , N − 1 , j =

0, . . . , M − 1.

• So we will be processing matrices!

• Some processing we will do

will take an image A

with A(i, j) ∈ {0, . . . , 255} into a new matrix B

which may not have integer entries!

In these cases we must suitably scale and

round the elements of B in order to display it

as an image.

Computer Vision &

Digital Image Processing

Fourier Transform Properties, the

Laplacian, Convolution and

Correlation

Page 15: Dip

Periodicity of the Fourier transform

• The discrete Fourier transform (and its inverse) are

periodic with period N.

F(u,v) = F(u+N,v) = F(u,v+N) = F(u+N,v+N) • Although F(u,v) repeats itself infinitely for many

values of u and v, only N values of each variable

are required to obtain f(x,y) from F(u,v)

– i.e. Only one period of the transform is necessary to

specify F(u,v) in the frequency domain.

– Similar comments may be made for f(x,y) in the spatial

domain

Page 16: Dip

Conjugate symmetry of the Fourier

transform

• If f(x,y) is real (true for all of our cases), the Fourier

transform exhibits conjugate symmetry

F(u,v)=F*(-u,-v)

or, the more interesting

|F(u,v)| = |F(-u,-v)|

where F*(u,v) is the complex conjugate of F(u,v)

Implications of periodicity & symmetry

• Consider a 1-D case:

– F(u) = F(u+N) indicates F(u) has a period of length N

– |F(u)| = |F(-u)| shows the magnitude is centered about the

origin

• Because the Fourier transform is formulated for

values in the range from [0,N-1], the result is two

back-to-back half periods in this range

• To display one full period in the range, move (shift)

the origin of the transform to the point u=N/2

9-4

Page 17: Dip

Periodicity properties

Fourier spectrum

with back-to-back

half periods in the

range

[0,n-1]

Shifted spectrum

with a

full period

in the

same range

Periodicity properties: 2-D Example

Page 18: Dip

Distributivity & Scaling

• The Fourier transform (and its inverse) are

distributive over addition but not over multiplication

• So,

ℑ{ f1 ( x, y) + f 2 ( x, y)} = ℑ{ f1 ( x, y)} + ℑ{ f2 ( x, y)}

ℑ{ f1 ( x, y) × f 2 ( x, y)} ≠ ℑ{ f1 ( x, y)}× ℑ{ f 2 ( x, y)}

• For two scalars a and b,

af ( x, y) ⇔ aF (u,

v)

f (ax, by) ⇔ 1

F (u / a, v / b) ab

Average Value

• A widely used expression for the average value of a

2-D discrete function is:

1 N −1 N −1

f ( x, y) = f ( x, y) x =0 y =0

• From the definition of F(u,v), for u=v=0,

1 N −1 N −1

F (0,0) = f ( x, y)

x =0 y =0

• Therefore, 1 f ( x, y) = F (0,0)

N

N 2 ∑∑

N ∑∑

Page 19: Dip

The Laplacian

• The Laplacian of a two variable function f(x,y) is

given as:

∇ 2 f ( x, y) =

∂ f +

∂ f

∂x 2 ∂y

2

• From the definition of the 2-D Fourier transform,

ℑ{∇ 2 f ( x, y)}⇔ −(2π )2

(u 2 + v2

)F (u, v)

• The Laplacian operator is useful for outlining edges

in an image

The Laplacian: Matlab example

% Given F(u,v), use the Laplacian

% to construct an edge outlined

% representation of the f(x,y)

[f,fmap]=bmpread('lena128.bmp');

F=fft2(f);

Fedge=zeros(128);

for u=1:128

for v=1:128

Fedge(u,v)=-

(2*pi).^2*(u.^2+v.^2)*F(u,v);

end end

fedge=ifft2(Fedge);

image(real(fedge));colormap(gray(256);

0

2 2

Page 20: Dip

Convolution & Correlation

• The convolution of two functions f(x) and g(x) is

denoted f(x)*g(x) and is given by:

+∞ f ( x) * g ( x) = ∫−∞ f (α ) g ( x − α )dα

• Where α is a dummy variable of integration.

• Example: Consider the following functions f(α ) and

g(α )

f(α)

1

1/2

g(α)

1 α 1 α

1-D convolution example

• Compute g(-α) by folding g(α) about the origin

g(α) g(−α)

1/2 1/2

α α 1 -1

• Compute g(x-α) by displacing g(-α) by the value x

g(−α) g(x−α)

1/2 1/2

-1 α -1 x α

Page 21: Dip

1-D convolution example (continued)

• Then, for any value x, we multiply g(x-α) and f(α)

and integrate from -∞ to +∞

• For 0≤x ≤ 1 we have For 1 ≤ x ≤ 2 we have

f(α)g(x- α) f(α)g(x- α)

1 1

1 α 1 α

1-D convolution example (continued)

• Thus we have

⎧x / 2

f ( x) * g ( x) = ⎪

− x / 2

⎪ ⎩

0 ≤ x ≤ 1

1 ≤ x ≤ 2

elsewhere.

• Graphically, f(x)*g(x)

1/2

1 2 x

0

⎨1

Page 22: Dip

Convolution and impulse functions

• Of particular interest will be the convolution of a

function f(x) with an impulse function δ(x-x0)

+∞

∫ f ( x)δ ( x − x0 )dx =

−∞

f ( x0 )

• The function δ(x-x0) may be viewed as having an

area of unity in an infinitesimal neighborhood around x0 and 0 elsewhere. That is

+∞ 0

∫ δ ( x − x0 )dx = ∫ δ ( x − x0 )dx = 1 −∞ −

Convolution and impulse functions

(continued)

• We usually say that δ(x-x0) is located at x=x0 and the strength of the impulse is given by the value of f(x) at x=x0

• If f(x)=A then, Aδ(x-x0) is impulse of strength A at x=x0. • Graphically this is:

A

x0 x

Aδ(x-x0)

x

x

+

0

Page 23: Dip

Convolution with an impulse function

• Given f(x) is f(α)

A

α α • and g(x)=δ(x+T)+ δ(x)+ δ(x-T)

-T T x g(α)

Convolution with an impulse function

(continued)

• f(x)*g(x) is

A

−Τ α Τ x

Page 24: Dip

Convolution and the Fourier transform

• f(x)*g(x) and F(u)G(u) form a Fourier transform pair

• If f(x) has transform F(u) and g(x) has transform

G(u) then f(x)*g(x) has transform F(u)G(u)

f ( x) * g ( x) ⇔ F (u)G(u)

f ( x) g ( x) ⇔ F (u) * G(u)

• These two results are commonly referred to as the

convolution theorem

Frequency domain filtering

• Enhancement in the frequency domain is straightforward

– Compute the Fourier transform

– Multiply the result by a filter transform function

– Take the inverse transform to produce the enhanced image

• In practice, small spatial masks are used considerably more

than the Fourier transform because of their simplicity of

implementation and speed of operation

• However, some problems are not easily addressable by

spatial techniques

– Such as homomorphic filtering and some image restoration

techniques

Page 25: Dip

Lowpass frequency domain filtering

• Given the following relationship

G(u, v) = H (u, v)F (u, v)

• where F(u,v) is the Fourier transform of an image to be smoothed

• The problem is to select an H(u,v) that yields an

appropriate G(u,v)

• We will consider zero-phase-shift filters that do not

alter the phase of the transform (i.e. they affect the

real and imaginary parts of F(u,v) in exactly the

same manner)

Ideal lowpass filter (ILPF)

• A transfer function for a 2-D ideal lowpass filter (ILPF) is

given as ⎧1 H (u, v) = ⎨

⎩0

if D(u, v) ≤ D0

if D(u, v) > D0

• where D0 is a stated nonnegative quantity (the cutoff

frequency) and D(u,v) is the distance from the point (u,v) to

the center of the frequency plane

D(u, v) = u 2 + v 2

H(u,v)

u

v

Page 26: Dip

Ideal lowpass filter (ILPF) (continued)

• The point D0 traces a circle from the frequency origin giving a locus of

cutoff frequencies (all are at distance D0 from the origin)

• One way to establish a set of “standard” loci is to compute circles that

encompass various amounts of the total signal power PT

• PT is given by

• where P(u,v) is given as

PT

=

N −1 N −1

∑∑ u =0 v =0

P(u, v)

P(u, v) = F (u, v) = R 2 (u, v) + I 2 (u, v)

• For the centered transform, a circle of radius r encompasses β percent of the power, where

⎡ ⎤

β = 100⎢∑∑ P(u, v) / PT ⎥ (the summation is over all points (u, v) encompassed by the circle)

⎣ u v ⎦

2

Page 27: Dip
Page 28: Dip

UNIT 1

2 marks

1. Define Image?

2. What is Dynamic Range?

3. Define Brightness?

4. Define Tapered Quantization?

5. What do you meant by Gray level? 6. What do you meant by Color model?.

7. List the hardware oriented color models?

8. What is Hue of saturation?

9. Explain separability property in 2D fourier transform

10. What are the properties of Haar and slant transform.

11. Define Resolutions?

12. What is meant by pixel?

13. Define Digital image?

14. What are the steps involved in DIP?

16. Specify the elements of DIP system?

18. What are the types of light receptors?

19. Differentiate photopic and scotopic vision?

26. Define sampling and quantization

27. Find the number of bits required to store a 256 X 256 image with 32 gray levels?

28. Write the expression to find the number of bits to store a digital image?

30. What do you meant by Zooming and shrinking of digital images?

32. Write short notes on neighbors of a pixel.

33. Explain the types of connectivity.

34. What is meant by path?

36. What is geometric transformation?

40. What is Image Transform?

Page 29: Dip

16 MARKS

UNIT I

1. Explain the steps involved in digital image processing.

(or)

Explain various functional block of digital image processing # Image acquisition

# Preprocessing

# Segmentation

# Representation and Description

# Recognition and Interpretation

2. Describe the elements of visual perception. # Cornea and Sclera # Choroid – Iris diaphragm and Ciliary body

# Retina- Cones and Rods

3. Describe image formation in the eye with brightness adaptation and

discrimination # Brightness adaptation # Subjective brightness

# Weber ratio

#Mach band effect

#simultaneous contrast

4. Write short notes on sampling and quantization. # Sampling # Quantization

# Representing Digital Images

5. Describe the functions of elements of digital image processing system

with a diagram. # Acquisition # Storage

# Processing

# Communication

# Display

6. Explain the basic relationships between pixels? # Neighbors of a pixel # Connectivity, Adjacency, Path

# Distance Measure

# Arithmetic and Logic Operations

Page 30: Dip

7. Explain the properties of 2D Fourier Transform. # Separability # Translation

# Periodicity and Conjugate Symmetry

# Rotation

# Distribution and Scaling

# Average Value

# Laplacian

# Convolution and correlation

# Sampling

8. ( i )Explain convolution property in 2D fourier transform. * lD Continuous * lD Discrete

* lD convolution theorem

* 2D continuous

* 2D Discrete

* 2D convolution theorem

(ii) Find F (u) and |F (u)|

9. Explain Fast Fourier Transform (FFT) in detail. # FFT Algorithm # FFT Implementation

10. Explain in detail the different separable transforms # Forward lD DFT & 2D DFT # Inverse lD DFT & 2D DFT

# Properties

11. Explain Hadamard transformation in detail. # lD DHT # lD Inverse DHT

# 2D DHT

# 2D Inverse DHT

12. Discuss the properties and applications of

1)Hadamard transform 2)Hotelling transform

Page 31: Dip

bounds

# Properties of hadamard:

Real and orthogonal

fast transform

faster than sine transform

Good energy compaction for image

# Appl:

Image data compression,

filtering and design of course

# Properties of hotelling:

Real and orthogonal

Not a fast transform

Best energy compaction for image

# Appl:

Useful in performance evaluation & for finding performance

13. Explain Haar transform in detail.

# Def P= 2P

+q-l

# Find h k (z)

14. Explain K-L transform in detail.

Consider a set of n or multi-dimensional discrete signal represented as column

vector xl,x2,…xn each having M elements,

Xl

X2

X= .

.

Xn

The mean vector is defined as Mx=E{x}

Where E{x} is the expected value of x. M

For M vector samples mean vector is Mx=l/M ∑ Xk

K=l

T

The co-variant matrix is, Cx=E{(X-Mx)(X-Mx)}

M T

For M samples, Cx=l/M ∑ (xk-Mx)(xk-Mx).

K=l

K-L Transform Y= A (X- MX)

Page 32: Dip

UNIT II IMAGE ENHANCEMENT TECHNIQUES

Spatial domain methods: Basic grey level transformation – Histogram equalization – Image subtraction – Image averaging – Spatial

filtering – Smoothing, sharpening filters – Laplacian filters – Frequency domain filters – Smoothing – Sharpening filters –

Homomorphic filtering.

Dynamic Range, Visibility and Contrast Enhancement

• Contrast enhancing point functions we have discussed earlier expand

the dynamic range occupied by certain “interesting” pixel values in the

input image.

• These pixel values in the input image may be difficult to distinguish

Page 33: Dip

and the goal of contrast enhancement is to make them “more visible”

in the output image.

• Don’t forget we have a limited dynamic range (0 − 255) at our disposal.

Point Functions and Histograms

• In general a point operation/function B(i, j) = g(A(i, j)) results in a new

histogram hB (l) for the output image that is different from hA(l).

• The relationship between hB (l) and hA(l) may not be straightforward as

we have already discussed in Lecture 2.

• You must learn how to calculate hB (l) given hA(l) and the point function

g(l):

– Usually via writing a matlab script that computes hB (l)

from hA(l) and g(l).

– By sketching hB (l) given the sketches for hA(l)

and g(l).

Page 34: Dip

°

Page 35: Dip

“Unexpected” Effect of some Point Functions

− B has ∼ 10 times as

few distinct pixel values.

− Note also the vertical axis

scaling in hB (l).

Page 36: Dip

Stretched/Compressed Pixel Value Ranges

• B(i, j) = g(A(i, j))

Suppose g(l) represents an overall point function which includes con-

trast stretching/compression, emphasis/de-emphasis, rounding, nor-

malizing etc.

• Given an image matrix A, B(i, j) = g(A(i, j)) is also an image matrix.

• g(l) may not be “continuous” or connected and it also may not be

composed of connected line segments.

Page 37: Dip

Image Segmentation

• If one views an image as depicting a scene composed of different

objects, regions, etc. then segmentation is the decomposition of an

image into these objects and regions by associating or “labelling” each

pixel with the object that it corresponds to.

• Most humans can easily segment an image.

• Computer automated segmentation is a difficult problem, requiring

sophisticated algorithms that work in tandem.

• “High level” segmentation, such as segmenting humans, cars etc.,

from an image is a very difficult problem. It is still considered unsolved

and is actively researched.

• Based on point processing, histogram based image segmentation is a

very simple algorithm that is sometimes utilized as an initial guess at

the “true” segmentation of an image.

°

Page 38: Dip

Histogram Based Image Segmentation

• For a given image, decompose the range of pixel values (0, . . . , 255) into

“discrete” intervals Rt = [at, bt], t = 1, . . . , T , where T is the total number

of segments.

• Each Rt is typically obtained as a range of pixel values that correspond

to a hill of hA(l).

• “Label” the pixels with pixel values within each Rt via a point function.

• Main Assumption: Each object is assumed to be composed of

pixels with similar pixel values.

°

Page 39: Dip

Limitations

• Histogram based segmentation operates on each image pixel indepen-

dently. As mentioned earlier, the main assumption is that objects

must be composed of pixels with similar pixel values.

• This independent processing ignores a second important property:

Pixels within an object should be spatially connected. For example,

B3, B4, B5 group spatially disconnected objects/regions into the same

segment.

• In practice, one would use histogram based segmentation in tandem

with other algorithms that make sure that computed objects/regions

are spatially connected.

°

Page 40: Dip

A

A

A

Histogram Equalization

• For a given image A, we will now design a special point function ge (l)

which is called the histogram equalizing point function for A.

• If B(i, j) = ge (A(i, j)), then our aim is to make hB (l) as uniform/flat as

possible irrespective of hA(l)

• Histogram equalization will help us:

– Stretch/Compress an image such that: ∗ Pixel values that occur frequently in A occupy a bigger dynamic range in B,

i.e., get stretched and become more visible.

∗ Pixel values that occur infrequently in A occupy a smaller dynamic range in B,

i.e., get compressed and become less visible.

– Compare images by “mapping” their histograms into a standard

histogram and sometimes “undo” the effects of some unknown

processing.

• The techniques we are going to use to get ge (l) are also applicable

in

histogram modification/specification.

°

Page 41: Dip

k=0

A

A

A

Histogram Equalizing Point Function

• Let g1(l) = Pl

pA(k). Note that g1(l) ∈ [0, 1].

• Image A ⇒ “equalize image” ⇒ B(i, j) = ge (A(i, j)).

•in general pB (l) will not be a uniform probability mass function but

hopefully it will be close.

• In matlab >> help filter to construct ge (A(i, j)) fast.

• Assuming you gAe is an array that contains the computed ge (l), you

can use >> B = gAe(A + 1); to obtain the equalized image.

Page 42: Dip

A

A

Stretching and Compression

• ge (l) stretches the range of pixel values that occur frequently in A.

• ge (l) compresses the range of pixel values that occur infrequently in

.

A

°

Page 43: Dip

Example

°

Page 44: Dip

Comparison/“Undoing”

Instead of comparing A and C, compare their equalized versions.

°

Page 45: Dip

Comparison/“Undoing” - contd.

1

Page 46: Dip

k=0 N M

A

A

Histogram Equalization

• gl(l) = Pl

pA(k) = gl(l) — gl(l — 1) = pA(l) = hA(l)

(l = 1, . . . , 255).

• ge (l) = round(255gl(l)) is the histogram equalizing point function for the

image A.

• B(i, j) = ge (A(i, j)) is the histogram equalized version of A.

• In general, histogram equalization stretches/compresses an image such

that:

– Pixel values that occur frequently in A occupy a bigger dynamic range in B, i.e., get

stretched and become more visible.

– Pixel values that occur infrequently in A occupy a smaller dynamic range in B, i.e., get

compressed and become less visible.

• Histogram equalization is not ideal, i.e., in general B will have a

“flatter” histogram than A, but pB (l) is not guaranteed to be uniform

(flat).

Page 47: Dip

X

Random Images - Images of White Noise

• A single outcome of a continuous amplitude uniform random variable χ ∈ [0, 1] in matlab: >> x = rand(1, 1);

• An N × M matrix of outcomes of a continuous amplitude uniform

random variable χ ∈ [0, 1] in matlab: >> X = rand(N, M );

• An N × M image matrix of outcomes of a discrete amplitude uniform

random variable Θ ∈ {0, 1, . . . 255} in matlab: >> A = round(255 ∗ X );

• A single outcome of a continuous amplitude gaussian random variable χ (µ = 0, σ2 = 1) in matlab: >> x = randn(1, 1);

• An N × M matrix of outcomes of a continuous amplitude gaussian

random variable χ (µ = 0, σ2 = 1) in matlab: >> X = randn(N, M );

• An N ×M image matrix of outcomes of a discrete amplitude “gaussian” random variable Θ ∈ {0, 1, . . . 255}: A(i, j) = gs

(X (i, j)).

Page 48: Dip

Example

Page 49: Dip

Warning

• Remember, two totally different images may have very similar his- tograms.

Page 50: Dip

Histogram Matching - Specification

• Given images A and B, using point processing we would like to generate an image C from A such that hC (l) ∼ hB

(l), (l = 0, . . . , 255).

• More generally, given an image A and a histogram hB

(l) (or sample proba- bility mass function pB (l)), we would like to

generate an image C such that hC (l) ∼ hB (l), (l = 0, . . . ,

255).

• Histogram matching/specification enables us to

“match” the grayscale distribution in one image to

the grayscale distribution in another im- age.

Page 51: Dip

UNIT II

2 Marks

1. Specify the objective of image enhancement technique.

2. Explain the 2 categories of image enhancement.

3. What is contrast stretching?

4. What is grey level slicing?

5. Define image subtraction.

6. What is meant by masking?

7.Define Histogram.

8.What is meant by Histogram Equilisation

9. Differentiate linear spatial filter and non-linear spatial filter.

10. What is meant by laplacian filter?

11. Write the application of sharpening filters?

16 Marks

1. Explain the types of gray level transformation used for image enhancement. # Linear (Negative and Identity) # Logarithmic( Log and Inverse Log)

# Power_law (nth root and nth power)

# Piecewise_linear (Constrast Stretching, Gray level Slicing, Bit plane Slicing)

2. What is histogram? Explain histogram equalization. # P(rk) = nk/n # Ps(s) = l means histogram is arranged uniformly.

3. Discuss the image smoothing filter with its model in the spatial domain. # LPF-blurring # Median filter – noise reduction & for sharpening image

4. What are image sharpening filters. Explain the various types of it. # used for highlighting fine details # HPF-output gets sharpen and background becomes darker

# High boost- output gets sharpen but background remains unchanged

# Derivative- First and Second order derivatives

Appl: # Medical image

# electronic printing

# industrial inspection

5. Explain spatial filtering in image enhancement. # Basics # Smoothing filters

# Sharpening filters

6. Explain image enhancement in the frequency domain. # Smoothing filters # Sharpening filters

# Homomorphic filtering

7. Explain Homomorphic filtering in detail. # f(x, y) = i(x, y) . r(x, y) # Calculate the enhanced image g(x,y)

Page 52: Dip

UNIT III IMAGE RESTORATION

Model of image degradation/restoration process – Noise models – Inverse

filtering – Least mean square filtering – Constrained least mean square filtering –

Blind imager– Pseudo inverse – Singular value decomposition.

Image restoration

• Restoration is an objective process that attempts to

recover an image that has been degraded

– A priori knowledge of the degradation phenomenon

– Restoration techniques generally oriented toward

modeling the degradation

– Application of the inverse process to “recover” the original

image

– Involves formulating some criterion (criteria) of

“goodness” that is used to measure the desired result

Page 53: Dip

Image restoration (continued)

• Removal of blur by a deblurring function is an

example restoration technique

• We will consider the problem only from where a

degraded digital image is given

– Degradation source will not be considered here

• Restoration techniques may be formulated in the

– Frequency domain

– Spatial domain

Image degradation/restoration process

• Given g(x,y), some knowledge about H, and some

knowledge about the noise term, the objective is to produce

an estimate of the original image.

– The more that is known about H and the noise term the closer the

estimate can be

• Various types of restoration filters are used to accomplish

this.

f ( x, y)

Degradation

Function

H

g ( x, y) +

Noise

η ( x, y)

Restoration

filter(s)

f̂ ( x, y)

Page 54: Dip

Image degradation/restoration process

• If H is a linear, position invariant process, then the degraded image can be described as the convolution of h and f with an added noise term:

g ( x, y) = h( x, y) ∗ f ( x, y) + η ( x, y)

• h(x,y) is the spatial domain representation of the degradation function.

• In the frequency domain, the representation is:

G(u, v) = H (u, v)F (u, v) + N (u, v)

• Each term in this expression is the Fourier transform of the of the corresponding terms in the equation above.

Noise models

• Common sources of noise

– Acquisition

• Environmental conditions (heat, light), imaging sensor quality

– Transmission

• Noise in transmission channel

• Spatial and frequency properties of noise

– Frequency properties of noise refer to the frequency content of noise

in the Fourier sense

– For example, if the Fourier spectrum of the noise is constant, the

noise is usually called white noise

• A carry over from the fact that white light contains nearly all frequencies

in the visible spectrum in basically equal proportions

– Excepting spatially periodic noise, we will assume that noise is

independent of spatial coordinates and uncorrelated to the image

Page 55: Dip

Noise probability density functions

• With respect to the spatial noise term, we will be concerned

with the statistical behavior of the intensity values.

• May be treated as random variables characterized by a

probability density function (PDF)

• Common PDFs used will describe:

– Gaussian noise

– Rayleigh noise

– Erlang (Gamma) noise

– Exponential noise

– Uniform noise

– Impulse (salt-and-pepper) noise

Gaussian noise

• Gaussian (normal) noise models are simple to consider

p( z) =

where

1 e

2π σ

−( z − z )2

/ 2σ 2

• The PDF of a Gaussian random variable, z, is given to the right as:

• In this case, approximately 70% of the values of z will be within within one standard deviation

• Approximately 95% of the values of z will be within within two standard deviations

z represents intensity

z represents the mean (average) value of z

σ is the standard deviation

σ 2 is the variance of z

Page 56: Dip

p( z) =

Rayleigh noise

• The PDF of Rayleigh noise is given as:

⎪ 2

( z − a)e−( z − a ) 2 / b

p( z) = ⎨ b

⎪⎩0

for z ≥ a

for z < a

where

z represents intensity

z = a + πb / 4

σ 2 = b(4 − π )

4

• Note the displacement, by a, from the origin

• The basic shape of this PDF is skewed to the right

– Can be useful in approximating skewed histograms

Erlang (Gamma) noise

• The PDF of Erlang noise is

given as:

⎧ ab z b −1

⎪ ⎨ (b − 1)!

e− az

for z ≥ 0

0

where

for z < 0

z represents intensity

z = b / a

σ 2 = b / a 2

• a > 0, b is a positive integer

Page 57: Dip

Exponential noise

• The PDF of exponential noise is given as:

⎧ae− az

p( z) = ⎨

⎩0

for z ≥ 0 for z < 0

where

z represents intensity

z = 1/ a

σ 2 = 1 / a 2

• a > 0

• This PDF is a special case of the Erlang PDF with b=1

Uniform noise

• The PDF of uniform noise is

given as:

⎧ 1

if a ≤ z ≤ b

p( z) = ⎨ b − a 0

where

otherwise

z represents intensity

z = a + b

2

(b − a)2

σ 2 = 12

Page 58: Dip

⎪ ⎨ b

Impulse (salt-and-pepper) noise

• The PDF of (bipolar)

impulse noise is given as:

⎧Pa for z = a p( z) =

⎪P for z = b

⎩ 0 otherwise

• If b>a then any pixel with

intensity b will appear as a

light dot in the image

• Pixels with intensity a will

appear as a dark dot

Example noisy images

Page 59: Dip

Example noisy images (continued)

Page 60: Dip
Page 61: Dip

Sample periodic images and their spectra

50 50

100 100

150 150

200 200

250

50 100 150 200 250

250 50 100 150 200 250

122

124

126

128

130

132

134

136

105 110 115 120 125 130 135 140 145 150

Estimation of noise parameters

• Noise parameters can often be estimated by

observing the Fourier spectrum of the image

– Periodic noise tends to produce frequency spikes

• Parameters of noise PDFs may be known (partially)

from sensor specification

– Can still estimate them for a particular imaging setup

– One method

• Capture a set of “flat” images from a known setup (i.e. a uniform

gray surface under uniform illumination)

• Study characteristics of resulting image(s) to develop an indicator

of system noise

Page 62: Dip

i s

s

Estimation of noise parameters (continued)

• If only a set of images already generated by a sensor are available, estimate the PDF function of the noise from small strips of reasonably constant background intensity

• Consider a subimage (S) and let ps(zi), i=0,1,2,…L-1

• denote the probability estimates of the intensities of the pixels in S.

• L is the number of possible intensities in the image

• The mean and the variance of the pixels in S are given by:

L −1

z = ∑ zi

=0

p ( z ) and σ 2

L −1

= ∑ ( zi

i =0

− z )2 p

( zi )

i

Estimation of noise parameters (continued)

• The shape of the noise histogram identifies the

closest PDF match

– If the shape is Gaussian, then the mean and variance are

all that is needed to construct a model for the noise (i.e.

the mean and the variance completely define the

Gaussian PDF)

– If the shape is Rayleigh, then the Rayleigh shape

parameters (a and b) can be calculated using the mean

and variance

– If the noise is impulse, then a constant (with the exception of the noise) area of the image is needed to calculate Pa

and Pb probabilities for the impulse PDF

Page 63: Dip

Histograms from noisy strips of an area of an

image

Restoration in the presence of noise only –spatial

filtering

• When only additive random noise is present, spatial

filtering is commonly used to restore images

• Common types

– Mean filters

– Order-Statistic filters

– Adaptive filters

Page 64: Dip

Mean filters (arithmetic)

• Arithmetic mean filter

– Computes the average value of a corrupted image g(x,y)

in the area defined by a window (neighborhood)

ˆ 1

f ( x, y) = mn

∑ g (s, t )

( s ,t )∈S xy

– The operation is generally implemented using a spatial

filter of size m*n in which all coefficients have value 1/mn

– A mean filter smoothes local variations in an image

– Noise is reduced as a result of blurring

Mean filters (geometric)

• Geometric mean filter

– A restored pixel is given by the product of the pixels in an

area defined by a window (neighborhood), raised to the

power 1/mn

⎡ f̂ ( x, y) = ⎢

1

⎤ mn

∏ g (s, t )⎥ ⎢⎣( s ,t )∈S xy

– Achieves smoothing comparable to the arithmetic mean

filter, but tends to loose less detail in the process

Page 65: Dip

Arithmetic and geometric mean filter examples

Mean filters (harmonic)

• Harmonic mean filter

– A restored pixel is given by the expression

f̂ ( x, y) = mn

∑ 1

( s ,t )∈S xy g (s, t )

– Works well for salt noise (fails for pepper noise)

– Works well for Gaussian noise also

Page 66: Dip

Mean filters (contraharmonic)

• Contraharmonic mean filter

– A restored pixel is given by the expression

∑ g (s, t )Q +1

f̂ ( x, y) = ( s ,t )∈S xy

∑ g (s, t )Q

( s ,t )∈S xy

– Q is the order of the filter

– Works well for salt and pepper noise (cannot do both simultaneously)

– +Q eliminates pepper noise, -Q eliminates salt noise

– Q=0 → arithmetic mean filter

– Q=-1 → harmonic mean filter

Contraharmonic mean filter examples

Page 67: Dip

Contraharmonic mean filter examples

Page 68: Dip

m

UNIT III

2 Marks

1. What is meant by Image Restoration?

2. What are the two properties in Linear Operator?

3. Explain additivity property in Linear Operator?

4. How a degradation process is modeled?

5. Explain homogenity property in Linear Operator?

8. Define circulant matrix?

10. What are the two methods of algebraic approach?

11. Define Gray-level interpolation?

12. What is pseudo inverse filter?

13.What is meant by least mean square filter?

14. Write the properties of Singular value Decomposition(SVD)?

16 Marks

1. Explain the algebra approach in image restoration. # Unconstrained # Constrained

2. What is the use of wiener filter in image restoration. Explain.

# Calculate f^

# Calculate F^(u, v)

3. What is meant by Inverse filtering? Explain.

# Recovering i/p from its o/p

# Calculate f^(x, y)

4. Explain singular value decomposition and specify its properties.

# U= m=l∑rψ√λm φ

T

This equation is called as singular value decomposition of an image. # Properties

•The SVD transform varies drastically from image to image.

•The SVD transform gives best energy packing efficiency for any given image. •The SVD transform is useful in the design of filters finding least square,minimum solution of linear equation and finding rank of large matrices.

5. Explain image degradation model /restoration process in detail.

# Image degradation model /restoration process diagram

# Degradation model for Continuous function

# Degradation model for Discrete function – l_D and 2_D

6. What are the two approaches for blind image restoration? Explain in detail. > Direct measurement > Indirect estimation

Page 69: Dip

UNIT IV IMAGE COMPRESSION

Lossless compression: variable length coding – LZW coding – Bit plane

coding– Predictive coding– DPCM. Lossy Compression: Transform coding –

Wavelet coding – Basics of image compression standards – JPEG, MPEG, basics of

vector quantization.

Objectives

At the end of this lesson, the students should be able to:

1. Explain the need for standardization in image transmission and reception.

2. Name the coding standards for fax and bi-level images and state their

characteristics.

3. Present the block diagrams of JPEG encoder and decoder.

4. Describe the baseline JPEG approach.

5. Describe the progressive JPEG approach through spectral selection.

6. Describe the progressive JPEG approach through successive

approximation.

7. Describe the hierarchical JPEG approach.

8. Describe the lossless JPEG approach.

9. Convert YUV images from RGB.

10. Illustrate the interleaved and non-interleaved ordering for color images. Introduction

With the rapid developments of imaging technology, image compression and coding

tools and techniques, it is necessary to evolve coding standards so that there is

compatibility and interoperability between the image communication and storage

products manufactured by different vendors. Without the availability of standards,

encoders and decoders can not communicate with each other; the service providers

will have to support a variety of formats to meet the needs of the customers and the

customers will have to install a number of decoders to handle a large number of data

formats. Towards the objective of setting up coding standards, the

international standardization agencies, such as

International Standards Organization (ISO), International Telecommunications Union

(ITU), International Electro-technical Commission (IEC) etc. have formed expert

groups and solicited proposals from industries, universities and research laboratories.

This has resulted in establishing standards for bi-level (facsimile) images and

continuous tone (gray scale) images. In this lesson, we are going to discuss the

Page 70: Dip

highlighting features of these standards. These standards use the coding and

compression techniques – both lossless and lossy which we have already studied in the

previous lessons.

The first part of this lesson is devoted to the standards for bi-level image coding.

Modified Huffman (MH) and Modified Relative Element Address Designate

(MREAD) standards are used for text-based documents, but more recent

Page 71: Dip

standards like JBIG1 and JBIG2, proposed by the Joint bi-level experts’ group (JBIG)

can efficiently encode handwritten characters and binary halftone images. The latter part

of this lesson is devoted to the standards for continuous tone images. We are going to

discuss in details about the Joint Photographic Experts Group (JPEG) standard and its

different modes, such as baseline (sequential), progressive, hierarchical and lossless.

Coding Standards for Fax and Bi-level Images

Consider an A4-sized (8.5 in x 11 in) scanned page having 200 dots/in. An

uncompressed image would require transmission of 3,740,000 bits for this

scanned page. It is however seen that most of the information on the scanned page is

highly correlated along the scan lines, which proceed in the direction of left to right in

top to bottom order and also in between the scan lines. The coding standards have

exploited this redundancy to compress bi-level images. The coding standards

proposed for bi-level images are:

(a) Modified Huffman (MH): This algorithm performs one-dimensional run

length coding of scan lines, along with special end-of-line (EOL), end-of- page

(EOP) and synchronization codes. The MH algorithm on an average achieves a

compression ratio of 20:1 on simple text documents.

(b) Modified Relative Element Address Designate (MREAD): This

algorithm uses a two-dimensional run length coding to take advantage of

vertical spatial redundancy, along with horizontal spatial redundancy. It uses

the previous scan line as a reference when coding the current line. The position

of each black-to-white or white-to-black transition is coded relative to a

reference element in the current scan line. The compression ratio is improved to

25:1 for this algorithm.

(c) JBIG1: The earlier two algorithms just mentioned work well for printed texts

but are inadequate for handwritten texts or binary halftone images (continuous

images converted to dot patterns). The JBIG1 standard, proposed by the

Joint Bi-level Experts Group uses a larger region of support for coding the

pixels. Binary pixel values are directly fed into an arithmetic coder, which

utilizes a sequential template of nine adjacent and previously coded pixels plus

one adaptive pixel to form a 10-bit context. Other than the sequential mode

just described, JBIG1 also supports progressive mode in which a reduced

resolution starting layer image is followed by the transmission of progressively

higher resolution layers. The compression ratios of JBIG1 standard is

slightly better than that of MREAD for text images but has an

improvement of 8-to-1 for binary halftone images.

Page 72: Dip

(d) JBIG2: This is a more recent standard proposed by the Joint bi-level Experts

Group. It uses a soft pattern matching approach to provide a solution to the

problem of substitution errors in which an imperfectly scanned symbol is

wrongly matched to a different symbol, as frequently observed in Optical

Character Recognition (OCR). JBIG2 codes the bit- map of each mark, rather

than its matched class index. In case a good match cannot be found for the

current mark, it becomes a token for a new class. This new token is then coded

using JBIG1 with a fixed template of previous pixels around the current mark.

The JBIG2 standard is seen to be 20% more efficient than the JBIG1 standard

for lossless compression.

Continuous tone still image coding standards

A different set of standards had to be created for compressing and coding

continuous tone monochrome and color images of any size and sampling rate. Of these,

the Joint Photographic Expert Group (JPEG)’s first standard, known as JPEG is the

most widely used one. Only in recent times, the new standard JPEG-2000 has its

implementations in still image coding systems. JPEG is a very simple and easy to

use standard that is based on the Discrete Cosine Transform (DCT).

JPEG Encoder Figure shows the block diagram of a JPEG encoder, which has the following

components:

(a) Forward Discrete Cosine Transform (FDCT): The still images are first

partitioned into non-overlapping blocks of size 8x8 and the image samples

are shifted from unsigned integers with range [0,2 p

− 1] to signed integers

with range [− 2 p −1

,2 p −1

], where p is the number of bits (here, p = 8 ). The

theory of the DCT has been already discussed in lesson-8 and will not be

repeated here. It should however be mentioned that to preserve freedom for

innovation and customization within implementations, JPEG neither specifies

any unique FDCT algorithm, nor any unique IDCT algorithms.

Page 73: Dip

The implementations may therefore differ in precision and JPEG has

specified an accuracy test as a part of the compliance test.

(b) Quantization: Each of the 64 coefficients from the FDCT outputs of a block

is uniformly quantized according to a quantization table. Since the aim is to

compress the images without visible artifacts, each step-size should be chosen

as the perceptual threshold or for “just noticeable distortion”. Psycho-visual

experiments have led to a set of quantization tables and these appear in ISO-

JPEG standard as a matter of information, but not a requirement.

The quantized coefficients are zig-zag scanned, as described in lesson-8. The DC

coefficient is encoded as a difference from the DC coefficient of the previous

block and the 63 AC coefficients are encoded into (run, level) pair.

(c) Entropy Coder: This is the final processing step of the JPEG encoder.

The JPEG standard specifies two entropy coding methods – Huffman and

arithmetic coding. The baseline sequential JPEG uses Huffman only, but codecs

with both methods are specified for the other modes of operation. Huffman

coding requires that one or more sets of coding tables are specified by the

application. The same table used for compression is used needed to decompress

it. The baseline JPEG uses only two sets of Huffman tables – one for DC and

the other for AC.

JPEG Decoder

Figure shows the block diagram of the JPEG decoder. It performs the inverse operation

of the JPEG encoder.

Modes of Operation in JPEG

The JPEG standard supports the following four modes of operation:

• Baseline or sequential encoding

Page 74: Dip

• Progressive encoding (includes spectral selection and successive

approximation approaches).

• Hierarchical encoding

• Lossless encoding Baseline Encoding: Baseline sequential coding is for images with 8-bit samples and

uses Huffman coding only. In baseline encoding, each block is encoded in a single

left-to-right and top-to-bottom scan. It encodes and decodes complete 8x8 blocks with

full precision one at a time and supports interleaving of color components. The FDCT,

quantization, DC difference and zig-zag ordering proceeds. In order to claim JPEG

compatibility of a product it must include the support for at least the baseline encoding

system.

Progressive Encoding: Unlike baseline encoding, each block in progressive

encoding is encoded in multiple scans, rather than a single one. Each scan follows the

zig zag ordering, quantization and entropy coding, as done in baseline encoding, but

takes much less time to encode and decode, as compared to the single scan of

baseline encoding, since each scan contains only a part of the complete information.

With the first scan, a crude form of image can be reconstructed at the decoder and with

successive scans, the quality of the image is refined. You must have experienced this

while downloading web pages containing images. It is very convenient for browsing

applications, where crude reconstruction quality at the early scans may be sufficient for

quick browsing of a page.

There are two forms of progressive encoding: (a) spectral selection approach and (b)

successive approximation approach. Each of these approaches is described below.

Progressive scanning through spectral selection: In this approach, the first scan

sends some specified low frequency DCT coefficients within each block. The

corresponding reconstructed image obtained at the decoder from the first scan therefore

appears blurred as the details in the forms of high frequency components are missing.

In subsequent scans, bands of coefficients, which are higher in frequency than the

previous scan, are encoded and therefore the reconstructed image gets richer with

details. This procedure is called spectral selection, because each band typically

contains coefficients which occupy a lower or higher part of the frequency spectrum

for that 8x8 block.

The spectral select on approach. Here all the 64 DCT coefficients in a block are of

8-bit resolution and successive blocks are stacked

Page 75: Dip

one after the other in the scanning order. The spectral selection approach performs

the slicing of coefficients horizontally and picks up a band of coefficients,

starting with low frequency and encodes them to full resolution.

Progressive scanning through successive approximation: This is also a multiple

scan approach. Here, each scan encodes all the coefficients within a block, but not to

their full quantized accuracy. In the first scan, only the N most significant bits of each

coefficient are encoded (N is specifiable) and in successive scans, the next lower

significant bits of the coefficients are added and so on until all the bits are sent. The

resulting reconstruction quality is good even from the early scans, as the high

frequency coefficients are present from the initial scans.

The successive approximation approach. The organization of the DCT coefficients and

the stacking of the blocks are same as before. The successive approximation approach

performs the slicing operation vertically and picks up a group pf bits, starting with the

most significant ones and progressively considering the lower frequency ones.

Hierarchical encoding: The hierarchical encoding is also known as the pyramidal

encoding in which the image to be encoded is organized in a pyramidal structure

of multiple resolutions, with the original, that is, the finest resolution image on the

lowermost layer and reduced resolution images on the successive upper layers. Each

layer decreases its resolution with respect to its adjacent lower layer by a factor of two

in either the horizontal or the vertical direction or both. Hierarchical encoding may be

regarded as a special case of progressive encoding with increasing spatial resolution

between the progressive stages.

The steps involved in hierarchical encoding may be summarized below:

• Obtain the reduced resolution images starting with the original and for each,

reduce the resolution by a factor of two, as described above.

• Encode the reduced resolution image from the topmost layer of the

pyramid .

• Decode the above reduced resolution image. Interpolate and up-sample it by a

factor of two horizontally and/or vertically, using the identical interpolation

filter which the decoder must use. Use this interpolated and up-sampled image

as a predicted image for encoding the next lower layer (finer resolution) of the

pyramid.

Page 76: Dip

• Encode the difference between the image in the next lower layer and the

predicted image using baseline, progressive or lossless encoding.

• Repeat the steps of encoding and decoding until the lowermost layer

(finest resolution) of the pyramid is encoded.

Hierarchical encoding (Pyramid structure)

Figure illustrates the hierarchical encoding process. In hierarchical encoding, the image

quality at low bit rates surpass the other JPEG encoding methods, but at the cost of

increased number of bits at the full resolution. Hierarchical encoding is used for

applications in which a high-resolution image should be accessed by a low resolution

display device. For example, the image may be printed by a high-resolution printer,

while it is being displayed on a low resolution monitor.

Lossless encoding: The lossless mode of encoding in JPEG follows a simple

predictive coding mechanism, rather than having FDCT + Entropy coder for encoding

and Entropy decoder + IDCT for decoding. Theoretically, it should have been possible

to achieve lossless encoding by eliminating the quantization block, but because of finite

precision representation of the cosine kernels, IDCT can not exactly recover what the

image was before IDCT. This led to a modified and simpler mechanism of predictive

coding.

Page 77: Dip

In lossless encoding, the 8x8 block structure is not used and each pixel is predicted

based on three adjacent pixels, as illustrated in figure using one of the eight possible

predictor modes listed here.

Predictive coding for lossless JPEG

An entropy encoder is then used to encode the predicted pixel obtained from the lossless

encoder. Lossless codecs typically produce around 2:1 compression for color images

with moderately complex scenes. Lossless JPEG encoding finds applications in

transmission and storage of medical images.

Selection Value

Prediction

0 None

1 A

2 B

3 C

4 A+B-C

5 A+(B-C)/2

6 B+(A-C)/2

7 (A+B)/2 Color image formats and interleaving

The most commonly used color image representation format is RGB, the

encoding of which may be regarded as three independent gray scale image

Page 78: Dip

encoding. However, from efficient encoding considerations, RGB is not the best format.

Color spaces such as YUV, CIELUV, CIELAB and others represent the chromatic

(color) information in two components and the luminance (intensity) information in

one component. These formats are more efficient from image compression

considerations, since our eyes are relatively insensitive to the high frequency

information from the chrominance channels and thus the chrominance components can

be represented at a reduced resolution as compared to the luminance components for

which full resolution representation is necessary.

It is possible to convert an RGB image into YUV, using the following relations:

Y = 0.3R + 0.6G + 0.1B

U = B − Y

+ 0.5 2

V = R − Y

+ 0.5 1.6

YUV representation of an example 4x4 image Figure illustrates the YUV representation by considering an example of a 4x4 image.

The Y components are shown as Y1, Y2, ……, Y16. The U and the V components are

sub-sampled by a factor of two in both horizontal and vertical directions and are

therefore of 2x2 size. The three components may be transmitted in either a non-

interleaved manner or an interleaved manner.

The non-interleaved ordering can be shown as

Scan-1: Y1,Y2,Y3,……,Y15,Y16.

Scan-2: U1,U2,U3,U4. Scan-3: 1,V2,V3,V4.

Page 79: Dip

The interleaved ordering encodes in a single scan and proceeds like

Y1, Y2, Y3, Y4, U1, V1, Y5, Y6, Y7, Y8, U2, V2, ………

Interleaving requires minimum of buffering to decode the image at the decoder.

JPEG Performance

Considering color images having 8-bits/sample luminance components and 8- bits/sample for each of

the two chrominance components U and V, each pixel requires 16-bits for representation, if both U and

V are sub-sampled by a factor of two in either of the directions. Using JPEG compression on a wide

variety of such color images, the following image qualities were measured subjectively:

Bits/pixel Quality Compression Ratio

≥ 2 Indistinguishable 8:1

1.5 Excellent 10.7:1

0.75 Very good 21.4:1

0.5 Good 32:1

0.25 Fair 64:1

A more advanced still image compression standard JPEG-2000 has evolved in recent times. This will

be our topic in the next lesson.

Page 80: Dip

UNIT IV

2 Marks

1. What is image compression?

2. What is Data Compression?

3. What are two main types of Data compression?

4. What is the need for Compression?

5. What are different Compression Methods?

7. Define interpixel redundancy?

8. What is run length coding?

9. Define compression ratio.

10. What are the operations performed by error free compression?

11. What is Variable Length Coding?

12. Define Huffman coding

16 Marks

1. What is data redundancy? Explain three basic data redundancy? Definition of data redundancy The 3 basic data redundancy are

> Coding redundancy

> Interpixel redundancy

> Psycho visual redundancy

2. What is image compression? Explain any four variable length coding

compression schemes.

• Definition of image compression • Variable Length Coding

* Huffman coding * B2 Code

* Huffman shift

* Huffman Truncated

* Binary Shift

*Arithmetic coding

3. Explain about Image compression model?

• The source Encoder and Decoder

• The channel Encoder and Decoder

4. Explain about Error free Compression? a. Variable Length coding

i. Huffman coding

ii. Arithmetic coding

b. LZW coding

c. Bit Plane coding

d. Lossless Predictive coding

5. Explain about Lossy compression?

• Lossy predictive coding

• Transform coding • Wavelet coding

Page 81: Dip

6. Explain the schematics of image compression standard JPEG.

• Lossy baseline coding system

• Extended coding system • Lossless Independent coding system

7. Explain how compression is achieved in transform coding and explain about DCT

• Block diagram of encoder

• decoder

• Bit allocation

• 1D transform coding

• 2D transform coding, application

• 1D,2D DCT

8. Explain arithmetic coding • Non-block code • One example

9. Explain about Image compression standards? • Binary Image compression standards • Continuous tone still Image compression standards

• Video compression standards

10. Discuss about MPEG standard and compare with JPEG

• Motion Picture Experts Group

1. MPEG-1

2. MPEG-2

3. MPEG-4

• Block diagram

• I-frame

• p-frame

• B-frame

Page 82: Dip

UNIT V IMAGE SEGMENTATION AND REPRESENTATION

Edge detection – Thresholding – Region based segmentation – Boundary representation –

Chair codes– Polygonal approximation – Boundary segments – Boundary descriptors – Simple

descriptors– Fourier descriptors – Regional descriptors – Simple descriptors– Texture.

What is thresholding segmentation?

Thresholding segmentation is a method, which separates an image into two meaningful regions: foreground and background,

through a selected threshold value T. If the image is a grey image, T is an integer in the range of [0..K], where K is the maximum

intensity value. For example, if the image is an 8-bit gray image, K takes the value of 255 and T is in the range of [0..255].

Whenever the value of T is decided, the segmentation procedure is indicated by the following equation:

B

1 , if G x, y TG x, y

0 , if G x, y T (1)

In equation (1), G(x,y) indicates the intensity value of pixel (x,y) in the grey image G. GB is the segmentation result. Actually it

forms a binary image, in which each value of GB(x,y) gives the category (foreground or background) that the corresponding pixel

belongs to. If GB(x,y) = 1, then pixel (x,y) in the image G is classified as a foreground pixel, otherwise it is classified as a

background pixel.

Equation (1) is formulated under the assumption that foreground pixels in the image G have relatively high intensity values and

background pixels take low intensity values. Of course you can reverse the equation when you need to set the low intensity region

as the foreground.

How to select the value of T

The major problem of thresholding segmentation is how to select the optimal value of threshold T. Usually in order to get the

optimal value of T, we need to statically analyze the so-called “histogram” (or “intensity histogram”) of the input gray image G.

Before talking about the algorithm, first we list all the notions and statistic definitions that relate to histogram as follows.

Basic notations

G(x,y): The input gray image that we want to segment.

GB(x,y): The segmentation result of G. It is a binary image, the value of GB(x,y) is either 0 or 1 indicating the corresponding pixel

(x,y) in G belongs to background or foreground respectively.

Page 83: Dip

K: The maximum possible intensity value defined by G. If G is an 8-bit gray image, then K takes the value 255.

T: The thresholding value. It is an integer within the range [0..K].

N: The total number of pixels in G. If G has width = w, and height = h, then of course N = w h.

Definition of histogram and normalized histogram

HG is the intensity histogram of image G and it maps from each intensity value to an integer. The value of HG(i) indicates the

number of pixels in G that takes the intensity value, i, where i [0..K], K is the maximum intensity value as mentioned above (for

example K = 255 for 8-bit grey images). Obviously HG(i) is an integer value within range [0..N], N is the total number of pixels in

G as mentioned above. Based on the definition of histogram HG, we can get the so-called normalized histogram PG. It is defined

as follows:

PG(i) = HG(i) / N (2)

In equation (2), the value of each PG(i) indicates the percentage of pixels in G that takes the intensity value i. Clearly PG(i) is a

real value within the range [0..1]. The main reason of introducing the normalized histogram is that sometime we need to compare

two histogram from two images that contains different number of pixels. And the definition of normalized histogram makes this

kind of comparison meaningful.

Statistic analysis of normalized histogram

Assume the current thresholding value is T, which separates the input image G into two regions: foreground and background

according the equation (1). The frequency of background, B(T) and the frequency of foreground, F(T) are defined as follows:

T

B G

i 0

T P i , K

F G

i T 1

T P i (3)

Of course the frequency of the entire image G is calculated as = B(T) + F(T) = 1, no matter what value T takes. The mean

intensity values of background and foreground, B(T) and F(T) are calculated as:

T

B G B

i 0

T i P i , K

F G F

i T 1

T i P i (4)

The mean intensity value of the entire image can be calculated as: = B(T) B(T) + F(T) F(T). Clearly no matter what value

T takes, keeps the same. The intensity variances of background and foreground, 2

B(T) and 2

F(T) are defined as:

2T

B2

B

i 0 B

i T P iT

T,

2K

F2

F

i T 1 F

i T P iT

T (5)

Having the definition of variances of the background and the foreground, it is the time to define the so-called “within-class”

variance, 2

within:

within

2

B B F FT T T T T (6)

Also we can define the so-called “between-class” variance, 2

between:

between within

22 2 2

B F B FT T T T T T (7)

In the above equation, 2 indicates the intensity variance of the entire image G and it is calculated as:

2K

2

i 0

i P i (8)

Obviously given the image G, 2 is a constant value independent to the selection of the value of T.

Otsu’s algorithm

The Otsu’s algorithm is simple. We let T try all the intensity values from 0 to K and choose the one that gives the minimum

“within-class” variance 2

within as the optimal thresholding value. Formally speaking:

Optimal value of T = TOpt, where 2

within(TOpt) = 2

within0 T Kmin T (9)

As we said before, 2 =

2within(T)+

2between(T), and

2 is independent of the selection of T, therefore, minimization of

2within

means maximization of 2

between. So the optimal value of T can also be taken as:

Optimal value of T = TOpt, where 2

between(TOpt) = 2

between0 T Kmax T (10)

In fact equation (10) is the usual way that we use to find the optimal thresholding value. It is because that for each T, the

calculation of 2

between only needs the calculations of B, F, B and F according to equation (7). And these values can be updated

iteratively:

Initially, T = 0:

Calculate the mean intensity of the entire image, .;

Page 84: Dip

B(0) = PG(0); F(0) = 1 B(0) = 1 PG(0);

B(0) = 0; F(0) = / F(0), be careful if F(0) =0, then F(0) = 0.

Iteratively, T = T+1:

B(T+1) = B(T) + PG(T+1); F(T+1) = 1 B(T+1);

If B(T+1) = 0, then B(T+1) = 0;

ELSE B(T+1) = [ B(T) B(T) + (T+1) PG(T+1)] / B(T+1);

If F(T+1) = 0; then F(T+1) = 0;

ELSE F(T+1) = [ B(T+1) B(T+1)] / F(T+1);

Part2: Connect Object Recognition

After obtaining the binary image GB, it is the time to count the number of connected objects in the foreground region (or in the

background region if you want). Assume here we interest in the foreground (for background region, you just need to reverse the

equation 1). The way we used to count the connected objects in the foreground region is based on the warshell’s algorithm. The

entire algorithm of connect object counting is described as follows:

Step 1: Create a label image GL with the same size (width and height) as GB, and initially set each GL(x,y) = –1. Create a variable,

current_label, for recording the current available label and initially current_label = 0.

Step 2: Scan the binary image GB sequentially and update the label image GL as follows:

FOR each y FROM 0 TO height – 1

FOR each x FROM 0 TO width –1

IF pixel (x,y) is a foreground, which means GB(x,y) = 1,

THEN

Check the 8 neighborhood pixels around the pixel (x,y);

FOR each neighborhood pixel (x’,y’)

IF it has a nonnegative label in the label image GL, which means GL(x’,y’)>=0,

THEN

SET GL(x,y) = GL(x’,y’);

ELSE

SET GL(x,y) = current_label;

SET current_label = current_label +1;

END IF

END FOR

END IF

END FOR

END FOR

Step 3: Build the 2D transit matrix M_T. Initially create an empty matrix:

M_T[0..current_label–1][0..current_label–1].

Clearly it is an 2D array with size=(current_label) (current_label), and each element in the matrix is first set to 0. Then assign 1

to each element of the matrix that locates at the diagonal. It means: set M_T[i][i]=1, where i is from 0 to current_label–1.

After the above initialization, we need to update the matrix M_T according to the label image GL and let M_T become the transit

matrix of GL. The update procedure is given as follows:

FOR each y FROM 0 TO height – 1

FOR each x FROM 0 TO width –1

IF pixel (x,y) is a foreground pixel, which means GB(x,y) = 1,

THEN

Check the 8 neighborhood pixels around pixel (x,y).

FOR each neighborhood pixel (x’,y’)

IF its label is nonnegative, which means GL(x’,y’)>=0

THEN

SET M_T[GL(x,y)][GL(x’,y’)] = 1;

SET M_T[GL(x’,y’)][GL(x,y)] = 1;

END IF

END FOR

END IF

END FOR

END FOR

Page 85: Dip

After the above procedure, you get the updated matrix M_T, which represents the transit relationship in GL.

Step 4: Calculate the transit closure matrix, M_TC, of M_T. This calculation of transit closure matrix is based on the warshell’s algorithm

which is an iteration procedure described as follows:

(1) Initially create two temporary matrixes M_0 and M_1, which have the same size of the matrix M_T. Copy each value in

M_T into the corresponding position in M_0 and M_1, which is:

FOR i FROM 0 TO current_label–1

FOR j FROM 0 TO current_label–1

SET M_0[i][j] = M_1[i][j] = M_T[i][j];

(2) Update the matrix M_1 using the transitivity law, which is:

FOR i FROM 0 TO current_label–1

FOR j FROM 0 TO current_label–1

FOR k FROM 0 TO current_label–1

IF M_1[i][j] = 1 AND M_1[j][k] = 1

THEN

SET M_1[i][k] = 1;

SET M_1[k][i] = 1;

END IF

END FOR

END FOR

END FOR

(3) Compare M_1 and M_0 to see if they are exactly the same, which means each element from M_1 has the same value as

the corresponding element from M_0. If so, set M_TC = M_1, which means we got the transit closure matrix. If not,

copy M_1 to M_0 and go back to (2).

Step 5: Count the number of connected objects (or the number of equivalent classes) in the transit closure matrix M_TC. It is not hard to

see that the number connected objects equals to the number of distinct rows (or columns) in matrix M_TC. Assume M_TC[i] and

M_TC[j] are the two rows of matrix M_TC, where i j. We say these two rows are distinct if and only if there exists at least one

k [0.. current_label–1] that M_TC[i][k] M_TC[j][k].

Step 6: Output the number of connected objects (or the number of distinct rows of M_TC) and return.

One example of transit closure calculation (step 4)

Assume after step 3, we got the transit matrix M_T. In step 4 (1), we initially set two temporary matrixes M_0 and M_1, and copy

M_T to these two matrixes, as shown in Figure 1. Then we do some operations on the matrix M_1 as described in the Step 4 (2),

we get an updated M_1 as shown in Figure 2:

Figure 1: Copy M_T to M_0 and M_1. Figure 2: Updated M_1 by applying Step 4 (2).

Then according to Step 4 (3), we compare matrix M_1 (Figure 2) with matrix M_0 (Figure 1). We find M_1 is different from

M_0. Therefore we copy M_1 to M_0, as shown in Figure 3. Then go back to (2) do the same operations on M_1 again, and get

M_1 updated again, as shown in Figure 4.

Page 86: Dip

Figure 3: Copy M_1 to M_0. Figure 4: M_1 is updated again.

Then we compare M_1 (Figure 4) and M_0 (Figure 3). Again we find that they are different. So we need to copy M_1 to M_0 (as

shown in Figure 5) and go back to (2) to get M_1 updated again (as shown in Figure 6).

Figure 5: Copy M_1 to M_0. Figure 6: M_1 is updated again.

This time we find that M_1 did not change, which means M_0 (Figure 5) and M_1 (Figure 6) are identical. So we say the current

M_1 is the transit closure matrix that we want, and set M_TC = M_1. Furthermore we discover there are two distinct rows in the

transit closure matrix: (1111100) and (0000011). It means that there are two connected objects.

Detection of Discontinuities

Edge Linking and Boundary Detection

Thresholding

Region-Based Segmentation

Segmentation by Morphological Watersheds

The Use of Motion Segmentation

Page 87: Dip
Page 88: Dip
Page 89: Dip
Page 90: Dip
Page 91: Dip
Page 92: Dip
Page 93: Dip

UNIT V

2 Marks

1. What is segmentation? 2. Write the applications of segmentation.

3. What are the three types of discontinuity in digital image?

4. How the derivatives are obtained in edge detection during formulation?

5. Write about linking edge points.

6. What are the two properties used for establishing similarity of edge pixels?

7. Define Gradient Operator?

8. Define region growing? 9. Define compactness

16 Marks

1. What is image segmentation. Explain in detail.

• Definition - image segmentation

• Discontinity – Point, Line, Edge • Similarity – Thresholding, Region Growing, Splitting and

merging

2. Explain Edge Detection in details? * Basic formation. * Gradient Operators

* Laplacian Operators

3. Define Thresholding and explain the various methods of thresholding in detail?

• Foundation

• The role of illumination

• Basic adaptive thresholding

• Basic adaptive thresholding

• Optimal global & adaptive thresholding.

4. Discuss about region based image segmentation techniques. Compare

threshold region based techniques. * Region Growing * Region splitting and merging

* Comparison

5. Define and explain the various representation approaches?

• chain codes

• Polygon approximations

• Signature

• Boundary segments • Skeletons.

6. Explain Boundary descriptors.

• Simple descriptors.

• Fourier descriptors.

Page 94: Dip

7. Explain regional descriptors

• Simple descriptors

• Texture i. Statistical approach

ii. Structural approach

iii. Spectral approach

8. Explain the two techniques of region representation. • Chain codes • Polygonol approximation

9. Explain the segmentation techniques that are based on finding the regions

directly. • Edge detection line detection • Region growing

• Region splitting

• region merging

10. How is line detected? Explain through the operators • Types of line masks

1. horizontal

2. vertical

3. +45˚,-45˚

Page 95: Dip
Page 96: Dip
Page 97: Dip
Page 98: Dip