SEM 6 MC0086 Digital Image Processing

August 2010

Master of Computer Application (MCA) – Semester 6

MC0086 – Digital Image Processing– 4 Credits

(Book ID: B1007)

Assignment Set – 1 (60 Marks)

1. Explain the following:

A) Fundamental Steps in Digital Image Processing

Processing of digital image involves the following steps to be carried out in a sequence: Image acquisition, Image enhancement, Image restoration, Color image processing, Wavelets and Multiresolution processing, Compression, Morphological processing, Segmentation, Representation with description and finally Object recognition. Image acquisition is the first process. To do so requires an imaging sensor and the capability to digitize the signal produced by the sensor. The sensor could be a monochrome or a color TV camera that produces an entire image of the problem domain every 1/30 seconds. The imaging sensor could also be a line-scan camera that produces a single image line at a time. If the output of the camera or other imaging sensor is not already in digital form, an analog-to-digital converter digitizes it. Note that acquisition could be as being given an image that is already in digital form. Generally, the image acquisition stage involves preprocessing, such as scaling.

Image enhancement is one of the simplest and most appealing areas of digital image processing.

Basically, the idea behind enhancement techniques is to bring out detail that is obscured, or simply to

highlight certain features of interest in an image. A familiar example of enhancement is when we

increase the contrast of an image because “it looks better”. It is important to keep in mind that

enhancement is a very subjective area of image processing. Image restoration is an area that also

deals with improving the appearance of an image. However, unlike enhancement, which is subjective,

image restoration is objective, in the sense that restoration techniques tend to be based on

mathematical or probabilistic models of image degradation. Color image processing is an area that has

been gaining in importance because of the significant increase in the use of digital images on the

Internet. Color is used as the basis for extracting features of interest in an image. Wavelets are the

foundation for representing images in various degrees of resolution. In particular, this is used for image

data compression and for pyramidal representation, in which images are subdivided successively into

smaller regions.

Compression deals with techniques for reducing the storage required to save an image, or the bandwidth required to transmit it. Although storage technology has improved significantly over the past decade, the same cannot be said for transmission capacity. This is true particularly in uses of the Internet, which are characterized by significant pictorial content. Image compression is familiar (perhaps inadvertently) to most users of computers in the form of image file extensions. Morphological processing deals with tools for extracting image components that are useful in the representation and description of shape. Segmentation procedures partition an image into its constituent parts or objects.

In general autonomous segmentation is one of the most difficult tasks in digital image processing. A rugged segmentation procedure brings the process a long way toward successful solution of imaging problems that require objects to be identified individually. On the other hand, weak or erratic segmentation algorithms almost always guarantee eventual failure. In terms of character recognition the key role of segmentation is to extract individual characters and words from the background. The next stage is Representation and description. Here, the first decision that must be made is whether the data should be represented as a boundary or as a complete region. Boundary representation is appropriate when the focus is on external shape characteristics, such as corners and inflections. Regional representation is appropriate when the focus is on internal properties, such as texture or skeletal shape. Choosing a representation is only part of the solution for transforming raw data into a form suitable for subsequent computer processing. Description, also called feature selection, deals with extracting attributes that result in some quantitative information of interest or are basic for differentiating one class of objects from another. Recognition is the process that assigns a label (e.g., “vehicle”) to an object based on its descriptors.

Knowledge about a problem domain is coded into an image processing system in the form of a

knowledge database. This knowledge may be as simple as detailing regions of an image where the

information of interest is known to be located, thus limiting the search that has to be conducted in

seeking that information. The knowledge base can also be quite complex, such as an interrelated list of

all major possible defects in a materials inspection problem or an image database containing high-

resolution satellite images of a region in connection with change-detection applications. In addition to

guiding the operation of each processing module, the knowledge base also controls the interaction

between the modules.

B) Components of an Image Processing System

Ans:

With reference to sensing, two elements are required to acquire digital images. The first is a physical device that is sensitive to the energy radiated by the object we wish to image. The second, called a digitizer, is a device for converting the output of the physical sensing device into digital form. For instance, in a digital video camera, the sensors produce an electrical output proportional to light intensity. The digitizer converts these outputs to digital data. Specialized image processing hardware usually consists of the digitizer just mentioned plus hardware that performs other primitive operations such as an arithmetic logic unit (ALU), which performs arithmetic and logical operations in parallel on entire images. This type of hardware sometimes is called a from-end subsystem, and its most distinguishing characteristic is speed. In other words, this unit performs functions that require fast data throughputs (e.g., digitizing and averaging video images at 30 frames / s) that the typical main computer cannot handle.)

The computer in an image processing system is general-purpose computer and can range from a PC to a supercomputer. Software for image processing consists of specialized modules that perform specific tasks. A well-designed package also includes the capability for the user to write code that, as a minimum, utilizes the specialized modules. More sophisticated software packages allow the integration of those modules and general-purpose software commands from at least one computer language.Mass storage capability is a must in image processing applications. An image of size 1024X1024

pixels, in which the intensity of each pixel is an 8-bit quantity, requires one megabyte of storage space

if the image is not compressed. When dealing with thousands, or even millions, of images, providing

adequate storage in an image processing system can be a challenge. Digital storage for image

processing applications falls into three principal categories: (1) short-term storage for use during

processing. (2) On-line storage for relatively fast re-call, and (3) archival storage, characterized by

infrequent access. Storage is measured in bytes (eight bits), Kbytes (one thousand bytes), Mbytes (one

million bytes), Gbytes (meaning giga, or one billion, bytes), and Tbytes (meaning tera, or one trillion,

bytes). Image displays in use today are mainly color (preferably flat screen) TV monitors. Monitors are

driven by the outputs of image and graphics display cards that are an integral part of the computer

system. Seldom are there requirements for image display applications that cannot be met by display

cards available commercially as part of the computer system. In some cases, it is necessary to have

stereo displays, and these are implemented in the form of headgear containing two small displays

embedded in goggles worn by the user. Hardcopy devices for recording images include laser printers,

film camera, heat-sensitive devices, inkjet units, and digital units, such as optical and CD-ROM disks.

Film provides the highest possible resolution, but paper is the obvious medium of choice for written

material. For presentations, images are displayed on film transparencies or in a digital medium if image

projection equipment is used. The latter approach is gaining acceptance as the standard for image

presentations.

Networking is almost a default function in any computer system in use today. Because of the large

amount of data inherent in image processing applications, the key consideration in image transmission

is bandwidth. In dedicated networks, this typically is not a problem, but communications with remote

sites via the Internet are not always as efficient. Fortunately, this situation is improving quickly as a

result of optical fiber and other broadband technologies.

2. Explain the following:

A) Light and the Electromagnetic Spectrum

Ans:

In 1666, Sir Isaac Newton discovered that when a beam of sunlight is passed through a glass prism,

the emerging beam of light is not white but consists instead of a continuous spectrum of colors ranging

from violet at one end to red at the other. The range of colors we perceive in visible light represents a

very small portion of the electromagnetic spectrum. On one end of the spectrum are radio waves with

wavelengths billions of times longer than those of visible light. At the other end of the spectrum are

gamma rays with wavelengths millions of times smaller than those of visible light.

The electromagnetic spectrum can be expressed in terms of wavelength, frequency, or energy. Wavelength (λ) and frequency (ν) are related by the expression

λ = c / ν (2.3-1)

where c is the speed of light (2.998*108 m/s).The energy of the various components of the electromagnetic spectrum is given by the expression

E =h ν (2.3-2)

where h is Planck’s constant. The units of wavelength are meters, with the terms microns (denoted μm and equal to 10–6 m) and nanometers (10–9 m) being used frequently. Frequency is measured in Hertz (Hz), with one Hertz being equal to one cycle of a sinusoidal wave per second.

Light is a particular type of electromagnetic radiation that can be seen and sensed by the human eye. The visible band of the electromagnetic spectrum spans the range from approximately 0.43 μm (violet) to about 0.79 μm (red).For convenience, the color spectrum is divided into six broad regions: violet, blue, green, yellow, orange, and red. No color (or other component of the electromagnetic spectrum) ends abruptly, but rather each range blends smoothly into the next. The colors that humans perceive in an object are determined by the nature of the light reflected from the object. A body that reflects light and is relatively balanced in all visible wavelengths appears white to the observer. However, a body that favors reflectance in a limited range of the visible spectrum exhibits some shades of color. For example, green objects reflect light with wavelengths primarily in the 500 to 570 nm range while absorbing most of the energy at other wavelengths. Light that is void of color is called achromatic or monochromatic light. The only attribute of such light is its intensity, or amount. The term gray level is generally used to describe monochromatic intensity because it ranges from black, to grays, and finally to white. Chromatic light spans the electromagnetic energy spectrum from approximately 0.43 to 0.79 μm, as noted previously. Three basic quantities areused to describe the quality of a chromatic light source: radiance; luminance; and brightness. Radiance is the total amount of energy that flows from the light source, and it is usually measured in watts (W). Luminance, measured in lumens (lm), gives a measure of the amount of energy an observer perceives from a light source. For example, light emitted from a source operating in the far infrared region of the spectrum could have significant energy (radiance), but an observer would hardly perceive it; its luminance would be almost zero.

At the short-wavelength end of the electromagnetic spectrum, we have gamma rays and hard X-rays.

Gamma radiation is important for medical and astronomical imaging, and for imaging radiation in

nuclear environments. Hard (high-energy) X-rays are used in industrial applications. Moving still higher

in wavelength, we encounter the infrared band, which radiates heat, a fact that makes it useful in

imaging applications that rely on “heat signatures.” The part of the infrared band close to the visible

spectrum is called the near-infrared region. The opposite end of this band is called the far-infrared

region. This latter region blends with the microwave band. This band is well known as the source of

energy in microwave ovens, but it has many other uses, including communication and radar. Finally,

the radio wave band encompasses television as well as AM and FM radio. In the higher energies, radio

signals emanating from certain stellar bodies are useful in astronomical observations.

B) Image Sensing and Acquisition

Ans:

The types of images are generated by the combination of an “illumination” source and the reflection or

absorption of energy from that source by the elements of the “scene” being imaged. For example, the

illumination may originate from a source of electromagnetic energy such as radar, infrared, or X-ray

energy. But, as noted earlier, it could originate from less traditional sources, such as ultrasound or even

a computer-generated illumination pattern. Similarly, the scene elements could be familiar objects, but

they can just as easily be molecules, buried rock formations, or a human brain. We could even image a

source, such as acquiring images of the sun. Depending on the nature of the source, illumination

energy is reflected from, or transmitted through, objects. An example in the first category is light

reflected from a planar surface. An example in the second category is when X-rays pass through a

patient’s body for the purpose of generating a diagnostic X-ray film. In some applications, the reflected

or transmitted energy is focused onto a photo converter (e.g., a phosphor screen), which converts the

energy into visible light.

a principal sensor arrangement used to transform illumination energy into digital images. Incoming

energy is transformed into a voltage by the combination of input electrical power and sensor material

that is responsive to the particular type of energy being detected. The output voltage waveform is the

response of the sensor(s), and a digital quantity is obtained from each sensor by digitizing its response.

In this section, we look at the principal modalities for image sensing and generation.

2.4.1 Image Acquisition using a Single Sensor

the components of a single sensor. The most common sensor of this type is the photodiode, which is constructed of silicon materials and whose output voltage waveform is proportional to light. The use of a filter in front of a sensor improves selectivity. For example, a green (pass) filter in front of a light sensor favors light in the green band of the color spectrum. As a consequence, the sensor output will be stronger for green light than for other components in the visible spectrum.In order to generate a 2-D image using a single sensor, there have to be relative displacements in both the x- and y-directions between the sensor and the area to be imaged. an arrangement used in highprecision scanning, where a film negative is mounted onto a drum whose mechanical rotation provides displacement in one dimension. The single sensor is mounted on a lead screw that provides motion in the perpendiculardirection. Since mechanical motion can be controlled with high precision, this method is an inexpensive (but slow) way to obtain high-resolution images. Other similar mechanical arrangements use a flat bed, with the sensor moving in two linear directions. These types of mechanical digitizers sometimes are referred to as microdensitometers. Another example of imaging with a single sensor places a laser source coincident with the sensor. Moving mirrors are used to control the outgoing beam in a scanningpattern and to direct the reflected laser signal onto the sensor.

2.4.2 Image Acquisition using Sensor StripsA geometry that is used much more frequently than single sensors consists of an in-line arrangement of sensors in the form of a sensor strip. The strip provides imaging elements in one direction. Motion perpendicular to the strip provides imaging in the other direction This is the type of arrangement used in most flat bed scanners. Sensing devices with 4000 or more in-line sensors are possible. In-line sensors are used routinely in airborne imaging applications, in which the imaging system is mounted on an aircraft that flies at a constant altitude and speed over the geographical area to be imaged. One-dimensional imaging sensor strips that respond to various bands of the electromagnetic spectrum are mounted perpendicular to the direction of flight. The imaging strip gives one line of an image at a time, and the motion of the strip completes the other dimension of a two-dimensional image. Lenses or other focusing schemes are used to project the area to be scanned onto the sensors. Sensor strips mounted in a ring configuration are used in medical and industrial imaging to obtain cross-sectional (“slice”) images of 3-D objects. A rotating X-ray source provides illumination and the portion of the sensors

opposite the source collect the X-ray energy that pass through the object (the sensors obviously have to be sensitive to X-ray energy).This is the basis for medical and industrial computerized axial tomography (CAT) imaging. It is important to note that the output of the sensors must be processed by reconstruction algorithms whose objective is to transform the sensed data into meaningful cross-sectional images.

2.4.3 Image Acquisition using Sensor ArraysIndividual sensors can be arranged in the form of a 2-D array. Numerous electromagnetic and some

ultrasonic sensing devices are arranged frequently in an array format. This is also the predominant

arrangement found in digital cameras. A typical sensor for these cameras is a CCD array, which can be

manufactured with a broad range of sensing properties and can be packaged in rugged arrays of 4000

* 4000 elements or more. CCD sensors are used widely in digital cameras and other light sensing

instruments. The response of each sensor is proportional to the integral of the light energy projected

onto the surface of the sensor, a property that is used in astronomical and other applications requiring

low noise images.

The first function performed by the imaging system is to collect the incoming energy and focus it onto an image plane. If the illumination is light, the front end of the imaging system is a lens, which projects the viewed scene onto the lens focal plane. The sensor array, which is coincident with the focal plane, produces outputs proportional to the integral of the light received at each sensor. Digital and analog circuitry sweeps these outputs and converts them to a video signal, which is then digitized by another section of the imaging system. The output is a digital image.

3. Explain the following with respect to Basic concepts in Sampling and Quantization:

A) Representation of Digital Images

Ans:

The result of sampling and quantization is a matrix of real numbers. We will use two principal ways in

this book to represent digital images. Assume that an image f(x, y) is sampled so that the resulting

digital image has M rows and N columns. The values of the coordinates (x, y) now become discrete

quantities. For notational clarity and convenience, we shall use integer values for these discrete

coordinates. Thus, the values of the coordinates at the origin are (x, y)=(0, 0).The next coordinate

values along the first row of the image are represented as (x, y)=(0, 1). It is important to keep in mind

that the notation (0, 1) is used to signify the second sample along the first row.

The notation used above allows us to write the complete M*N digital image in the following compact

matrix form:

The right side of this equation is by definition a digital image. Each element of this matrix array is called

an image element, picture element, pixel, or pel. In some discussions, it is advantageous to use a more

traditional matrix notation to denote a digital image and its elements:

Clearly, aij = f(x=i, y=j) = f(i, j), so the above two matrices are identical matrices.Expressing sampling and quantization in more formal mathematical terms can be useful at times. Let Z

and R denote the set of real integers and the set of real numbers, respectively. The sampling process

may be viewed as partitioning the xy plane into a grid, with the coordinates of the center of each grid

being a pair of elements from the Cartesian product Z2, which is the set of all ordered pairs of elements

Azi, zj B, with zi and zj being integers from Z. Hence, f(x, y) is a digital image if (x, y) are integers from

Z2 and f is a function that assigns a gray-level value (that is, a real number from the set of real

numbers, R) to each distinct pair of coordinates (x, y). This functional assignment obviously is the

quantization process described earlier. If the gray levels also are integers (as usually is the case in this

and subsequent chapters), Z replaces R, and a digital image then becomes a 2-D function whose

coordinates and amplitude values are integers.

This digitization process requires decisions about values for M, N, and for the number, L, of discrete

gray levels allowed for each pixel. There are no requirements on M and N, other than that they have to

be positive integers. However, due to processing, storage, and sampling hardware considerations, the

number of gray levels typically is an integer power of 2:

L= 2k (Equation 1)

We assume that the discrete levels are equally spaced and that they are integers in the interval [0, L-1].

Sometimes the range of values spanned by the gray scale is called the dynamic range of an image,

and we refer to images whose gray levels span a significant portion of the gray scale as having a high

dynamic range. When an appreciable number of pixels exhibit this property, the image will have high

contrast. Conversely, an image with low dynamic range tends to have a dull, washed out gray look. The

number, b, of bits required to store a digitized image is

b=M*N*k. (Equation 2)

When M=N, this equation becomes b = N2k. (Equation 3)

B) Spatial and Gray-Level Resolution

Ans:

Sampling is the principal factor determining the spatial resolution of an image. Basically, spatial resolution is the smallest discernible detail in an image. Suppose that we construct a chart with vertical lines of width W and with the space between the lines also having width W. A line pair consists of one such line and its adjacent space. Thus, the width of a line pair is 2W, and there are 1/2W line pairs per unit distance. A widely used definition of resolution is simply the smallest number of discernible line pairs per unit distance; for example, 100 line pairs per millimeter. Gray-level resolution similarly refers to the smallest discernible change in gray level, but, measuring discernible changes in gray level is a highly subjective process. We have considerable discretion regarding the number of samples used to generate a digital image, but this is not true for the number of gray levels. Due to hardware considerations, the number of gray levels is usually an integer power of 2, as mentioned in the previous section. The most common number is 8 bits, with 16 bits being used in some applications where enhancement of specific gray-level ranges is necessary. Sometimes we find systems that can digitize the gray levels of an image with 10 or 12 bits of accuracy, but these are exceptions rather than the rule.When an actual measure of physical resolution relating pixels and the level of detail they resolve in the

original scene are not necessary, it is not uncommon to refer to an L-level digital image of size M*N as

having a will use this terminology from time to time in subsequent discussions, making a reference to

actual resolvable detail only when necessary for clarity. an image of size 1024*1024 pixels whose gray

levels are represented by 8 bits. The results of subsampling the 1024*1024 image. The subsampling

was accomplished by deleting the appropriate number of rows and columns from the original image.

For example, the 512*512 image was obtained by deleting every other row and column from the

1024*1024 image. The 256*256 image was generated by deleting every other row and column in the

512*512 image, and so on. The number of allowed gray levels was kept at 256. These images show

the dimensional proportions between various sampling densities, but their size differences make it

difficult to see the effects resulting from a reduction in the number of samples. The simplest way to

compare these effects is to bring all the subsampled images up to size 1024*1024 by row and column

pixel replication.

C) Aliasing and Moiré Patterns

Ans:

Functions whose area under the curve is finite can be represented in terms of sines and cosines of

various frequencies. The sine/cosine component with the highest frequency determines the highest

“frequency content” of the function. Suppose that this highest frequency is finite and that the function is

of unlimited duration (these functions are called band-limited functions), then, the Shannon sampling

theorem tells us that, if the function is sampled at a rate equal to or greater than twice its highest

frequency, it is possible to recover completely the original function from its samples. If the function is

undersampled, then a phenomenon called aliasing corrupts the sampled image. The corruption is in the

form of additional frequency components being introduced into the sampled function. These are called

aliased frequencies. Note that the sampling rate in images is the number of samples taken (in both

spatial directions) per unit distance. As it turns out, except for a special case discussed in the following

paragraph, it is impossible to satisfy the sampling theorem in practice. We can only work with sampled

data that are finite in duration. We can model the process of converting a function of unlimited duration

into a function of finite duration simply by multiplying the unlimited function by a “gating function” that is

valued 1 for some interval and 0 elsewhere. Unfortunately, this function itself has frequency

components that extend to infinity. Thus, the very act of limiting the duration of a band-limited function

causes it to cease being band limited, which causes it to violate the key condition of the sampling

theorem. The principal approach for reducing the aliasing effects on an image is to reduce its high-

frequency components by blurring the image prior to sampling. However, aliasing is always present in a

sampled image. The effect of aliased frequencies can be seen under the right conditions in the form of

so called Moiré patterns.

D) Zooming and Shrinking Digital Images.

Ans:

We conclude the treatment of sampling and quantization with a brief discussion on how to zoom and

shrink a digital image. This topic is related to image sampling and quantization because zooming may

be viewed as oversampling, while shrinking may be viewed as undersampling. The key difference

between these two operations and sampling and quantizing an original continuous image is that

zooming and shrinking are applied to a digital image. Zooming requires two steps: the creation of new

pixel locations, and the assignment of gray levels to those new locations. Let us start with a simple

example. Suppose that we have an image of size 500*500 pixels and we want to enlarge it 1.5 times to

750*750 pixels. Conceptually, one of the easiest ways to visualize zooming is laying an imaginary

750*750 grid over the original image. Obviously, the spacing in the grid would be less than one pixel

because we are fitting it over a smaller image. In order to perform gray-level assignment for any point in

the overlay, we look for the closest pixel in the original image and assign its gray level to the new pixel

in the grid. When we are done with all points in the overlay grid, we simply expand it to the original

specified size to obtain the zoomed image. This method of gray-level assignment is called nearest

neighbor interpolation.

Pixel replication is applicable when we want to increase the size of an image an integer number of

times.

For instance, to double the size of an image, we can duplicate each column. This doubles the image

size in the horizontal direction. Then, we duplicate each row of the enlarged image to double the size in

the vertical direction. The same procedure is used to enlarge the image by any integer number of times

(triple, quadruple, and so on). Duplication is just done the required number of times to achieve the

desired size. The gray-level assignment of each pixel is predetermined by the fact that new locations

are exact duplicates of old locations. A slightly more sophisticated way of accomplishing gray-level

assignments is bilinear interpolation using the four nearest neighbors of a point. Let (x’, y’) denote the

coordinates of a point in the zoomed image, and let v(x’, y’) denote the gray level assigned to it. For

bilinear interpolation, the assigned gray level is given by

where the four coefficients are determined from the four equations in four unknowns that can be written

using the four nearest neighbors of point (x’, y’). Image shrinking is done in a similar manner as just

described for zooming. The equivalent process of pixel replication is row-column deletion. For example,

to shrink an image by one-half, we delete every other row and column. We can use the zooming grid

analogy to visualize the concept of shrinking by a noninteger factor, except that we now expand the grid

to fit over the original image, do gray-level nearest neighbor or bilinear interpolation, and then shrink the

grid back to its original specified size. To reduce possible aliasing effects, it is a good idea to blur an

image slightly before shrinking it.

4. Explain the following with respect to Image Enhancement:

A) Edge Crispening

Ans:

Psychophysical experiments indicate that a photograph or visual signal with accentuated or crispened edges is often more subjectively pleasing than an exact photometric reproduction. We will discuss Linear and Statistical differencing technique for edge crispening.4.5.1 Linear Edge CrispeningEdge crispening can be performed by discrete convolution, as defined by Eq. 4.8 in which the impulse response array H is of high-pass form. Several common high-pass masks are given below

These masks possess the property that the sum of their elements is unity, to avoid amplitude bias in the processed image.

4.5.2 Statistical Differencing Statistical differencing involves the generation of an image by dividing

each pixel value by its estimated standard deviation D( j, k) according to the basic relation G( j, k) = F( j,

k) / D( j, k) where the estimated standard deviation

is computed at each pixel over some W * W neighborhood where W = 2w + 1. The function M(j,k) is the

estimated mean value of the original image at point (j, k), which is computed as

The enhanced image G(j,k) is increased in amplitude with respect to the original at pixels that deviate

significantly from their neighbors, and is decreased in relative amplitude elsewhere.

B) Color Image Enhancement:

Ans:

The image enhancement techniques discussed previously have all been applied to monochrome

images. We will now consider the enhancement of natural color images and introduce the pseudocolor

and false color image enhancement methods. Pseudocolor produces a color image from a

monochrome image, while false color produces an enhanced color image from an original natural color

image or from multispectral image bands.

4.6.1 Natural Color Image Enhancement The monochrome image enhancement methods described

previously can be applied to natural color images by processing each color component individually. It is

accomplished by intracomponent and inter-component processing algorithms. Intracomponent

Processing: Typically, color images are processed in the RGB color space. This approach works quite

well for noise cleaning algorithms in which the noise is independent between the R, G and B

components. Edge crispening can also be performed on an intracomponent basis, but more efficient

results are often obtained by processing in other color spaces. Contrast manipulation and histogram

modification intracomponent algorithms often result in severe shifts of the hue and saturation of color

images. Hue preservation can be achieved by using a single point transformation for each of the three

RGB components. For example, form a sum image, and then compute a histogram equalization

function, which is used for each RGB component.

Intercomponent Processing: The intracomponent processing algorithms previously discussed provide

no means of modifying the hue and saturation of a processed image in a controlled manner. One

means of doing so is to transform a source RGB image into a three component image, in which the

three components form separate measures of the brightness, hue and saturation (BHS) of a color

image. Ideally, the three components should be perceptually independent of one another.

4.6.2 Pseudocolor

Pseudocolor is a color mapping of a monochrome image array which is intended to enhance the

detectability of detail within the image. The pseudocolor mapping of an array is defined as

R(j, k) = OR{F(j, k)}

G(j, k) = OG{F(j, k)}

B(j, k) = OB{F(j, k)}

where R(j, k) , G(j, k), B(j, k) are display color components and OR{F(j, k)}, OG{F(j, k)}, OB{F(j, k)} are linear or nonlinear functional operators. This mapping defines a path in three-dimensional color spaceparametrically in terms of the array F(j, k). Mapping A represents the achromatic path through all shades of gray;it is the normal representation of a monochrome image. Mapping B is a spiral path through color space. Another class of pseudocolor mappings includes those mappings that exclude all shades of gray. Mapping C, which follows the edges of the RGB color cube, is such an example.4.6.3 False colorFalse color is a point-by-point mapping of an original color image. It is described by its three primary colors (or of a set of multispectral image planes of a scene) to a color space defined by display tristimulus values that are linear or nonlinear functions of the original image pixel values. A common intent is to provide a displayed image with objects possessing different or false colors from what might be expected. For example, blue sky in a normal scene might be converted to appear red, and green grass transformed to blue. One possible reason for such a color mapping is to place normal objects in a strange color world so that a human observer will pay more attention to the objects than if they were colored normally.

Another reason for false color mappings is the attempt to color a normal scene to match the color

sensitivity of a human viewer. For example, it is known that the luminance response of cones in the

retina peaks in the green region of the visible spectrum. Thus, if a normally red object is false colored to

appear green, it may become more easily detectable. Another psychophysical property of color vision

that can be exploited is the contrast sensitivity of the eye to changes in blue light. In some situations it

may be worthwhile to map the normal colors of objects with fine detail into shades of blue.

In a false color mapping, the red, green and blue display color components are related to natural or

multispectral images Fi by

RD = OR{F1, F2,....}

GD = OG{ F1, F2, ....}

BD = OB{ F1, F2,...}

where OR{ · }, OG{ · }, OB{ · } are general functional operators. As a simple example, the set of red,

green and blue sensor tristimulus values (RS = F1, GS = F2, BS = F3 ) may be interchanged according

to the relation.

C) Multispectral Image Enhancement

Ans:

of a scene in order to accentuate salient features to assist in subsequent human interpretation or

machine analysis. These procedures include individual image band enhancement techniques, such as

contrast stretching, noise cleaning and edge crispening, as discussed earlier. Other methods involve

the joint processing of multispectral image bands. Multispectral image bands can be subtracted in pairs

according to the relation

Dm, n(j, k) = Fm( j, k) – Fn( j, k)

in order to accentuate reflectivity variations between the multispectral bands. An associated advantage

is the removal of any unknown but common bias components that may exist. Another simple but highly

effective means of multispectral image enhancement is the formation of ratios of the image bands. The

ratio image between the mth and nth multispectral bands is defined as

Rm,n (j, k) = Fm(j,k) /Fn(j,k)

It is assumed that the image bands are adjusted to have nonzero pixel values. In many multispectral

imaging systems, the image band Fn( j, k) can be modeled by the product of an object reflectivity

function Rn( j, k) and an illumination function I(j, k) that is identical for all multispectral bands. Ratioing

of such imagery provides an automatic compensation of the illumination factor. The ratio Fm(j, k) / [Fn(j,

k) ± Δ(j, k)] for which Δ(j, k) represents a quantization level uncertainty, can vary considerably if Fn(j, k)

is small. This variation can be reduced significantly by forming the logarithm of the ratios defined by

Lmnjk= logRmnjk= logFmjk– logFnjk

5. Describe the following with respect to Image restoration:

A) General Image Restoration Models

Ans:

In order to effectively design a digital image restoration system, it is necessary quantitatively to

characterize the image degradation effects of the physical imaging system, the image digitizer and the

image display. Basically, the procedure is to model the image degradation effects and then perform

operations to undo the model to obtain a restored image. It should be emphasized that accurate image

modeling is often the key to effective image restoration. There are two basic approaches to the

modeling of image degradation effects: a priori modeling and a posteriori modeling.

In the former case, measurements are made on the physical imaging system, digitizer and display to

determine their response to an arbitrary image field. In some instances, it will be possible to model the

system response deterministically, while in other situations it will only be possible to determine the

system response in a stochastic sense. The posteriori modeling approach is to develop the model for

the image degradations based on measurements of a particular image to be restored.

Basically, these two approaches differ only in the manner in which information is gathered to describe

the character of the image degradation is a general model of a digital imaging system and restoration

process. In the model, a continuous image light distribution C(x,y,t,) dependent on spatial coordinates

(x, y), time (t) and spectral wavelength () is assumed to exist as the driving force of a physical imaging

system subject to point and spatial degradation effects and corrupted by deterministic and stochastic

disturbances. Potential degradations include diffraction in the optical system, sensor nonlinearities,

optical system aberrations, film nonlinearities, atmospheric turbulence effects, image motion blur and

geometric distortion. Noise disturbances may be caused by electronic imaging sensors or film

granularity. In this model, the physical imaging system produces a set of output image fields FO (i) ( x ,y

,t j ) at time instant t j described by the general relation

where OP { . } represents a general operator that is dependent on the space coordinates (x, y), the time

history (t), the wavelength () and the amplitude of the light distribution (C). For a monochrome imaging

system, there will only be a single output field, while for a natural color imaging system, FO (i)( x ,y ,t j )

may denote the red, green and blue tristimulus bands for i = 1, 2, 3, respectively. Multispectral imagery

will also involve several output bands of data.

B) Optical system Models

Ans:

One of the major advances in the field of optics during the past 50 years has been the application of system concepts to optical imaging. Imaging devices consisting of lenses, mirrors, prisms and so on, can be considered to provide a deterministic transformation of an input spatial light distribution to some output spatial light distribution. Also, the system concept can be extended to encompass the spatial propagation of light through free space or some dielectric medium.In the study of geometric optics, it is assumed that light rays always travel in a straight-line path in a

homogeneous medium. By this assumption, a bundle of rays passing through a clear aperture onto a

screen produces a geometric light projection of the aperture. However, if the light distribution at the

region between the light and dark areas on the screen is examined in detail, it is found that the

boundary is not sharp. This effect is more pronounced as the aperture size is decreased. For a pinhole

aperture, the entire screen appears diffusely illuminated. From a simplistic viewpoint, the aperture

causes a bending of rays called diffraction. Diffraction of light can be quantitatively characterized by

considering light as electromagnetic radiation that satisfies Maxwell's equations. The formulation of a

complete theory of optical imaging from the basic electromagnetic principles of diffraction theory is a

complex and lengthy task.

C) Photographic Process Models

Ans:

There are many different types of materials and chemical processes that have been utilized for

photographic image recording. No attempt is made here either to survey the field of photography or to

deeply investigate the physics of photography. Rather, the attempt here is to develop mathematical

models of the photographic process in order to characterize quantitatively the photographic

components of an imaging system.

5.4.1 Monochromatic PhotographyThe most common material for photographic image recording is silver halide emulsion, depicted .In this

material, silver halide grains are suspended in a transparent layer of gelatin that is deposited on a

glass, acetate or paper backing. If the backing is transparent, a transparency can be produced, and if

the backing is a white paper, a reflection print can be obtained. When light strikes a grain, an

electrochemical conversion process occurs, and part of the grain is converted to metallic silver. A

development center is then said to exist in the grain. In the development process, a chemical

developing agent causes grains with partial silver content to be converted entirely to metallic silver.

Next, the film is fixed by chemically removing unexposed grains. The photographic process described

above is called a nonreversal process. It produces a negative image in the sense that the silver density

is inversely proportional to the exposing light. A positive reflection print of an image can be obtained in

a two-stage process with nonreversal materials. First, a negative transparency is produced, and then

the negative transparency is illuminated to expose negative reflection print paper. The resulting silver

density on the developed paper is then proportional to the light intensity that exposed the negative

transparency. A positive transparency of an image can be obtained with a reversal type of film.

5.4.2 Color PhotographyModern color photography systems utilize an integral tripack film, as to produce positive or negative

transparencies. In a cross section of this film, the first layer is a silver halide emulsion sensitive to blue

light. A yellow filter following the blue emulsion prevents blue light from passing through to the green

and red silver emulsions that follow in consecutive layers and are naturally sensitive to blue light. A

transparent base supports the emulsion layers. Upon development, the blue emulsion layer is

converted into a yellow dye transparency whose dye concentration is proportional to the blue exposure

for a negative transparency and inversely proportional for a positive transparency. Similarly, the green

and red emulsion layers become magenta and cyan dye layers, respectively.

Color prints can be obtained by a variety of processes. The most common technique is to produce a

positive print from a color negative transparency onto nonreversal color paper. In the establishment of a

mathematical model of the color photographic process, each emulsion layer can be considered to react

to light as does an emulsion layer of a monochrome photographic material. To a first approximation,

this assumption is correct. However, there are often significant interactions between the emulsion and

dye layers. Each emulsion layer possesses a characteristic sensitivity,

6. Describe the following in the context of Morphological Image processing:

A) Basic operations

Ans:

The foundation of morphological processing is in the mathematically rigorous field of set theory. We will

discuss some fundamental concepts of image set algebra which are the basis for defining the

generalized dilation and erosions operators. Consider a binary-valued source image function F(j, k). A

pixel at coordinate (j, k) is a member of F(j, k), as indicated by the symbol , if and only if it is a logical 1.

A binary-valued image B(j, k) is a subset of a binary-valued image A(j,k), as indicated by B(j,k) A(j,k), if

for every spatial occurrence of a logical 1 of A(j, k), B(j, k) is a logical 1.

A reflected image F~( j, k) is an image that has been flipped from left to rightand from top to bottom. this provides an example of imagecomplementation. Translation of an image, as indicated by the function

G j k = Tr c F j k

consists of spatially offsetting F( j, k) with respect to itself by r rows and ccolumns, where –R≤ r ≤ R and –C ≤ c ≤ C.

6.2.1 DilationWith dilation, an object grows uniformly in spatial extent. Generalizeddilation is expressed symbolically asG j k = F j k H j kwhere F(j, k), for 1 ≤ j, k≤ N is a binary-valued image and H(j, k) for , 1 ≤ j,k≤ L, where L is an odd integer, is a binary-valued array called a structuringelement. For notational simplicity, F(j,k) and H(j,k) are assumed to besquare arrays. Generalized dilation can be defined mathematically andimplemented in several ways. The Minkowski addition definition is

6.2.2 Erosion With erosion an object shrinks uniformly. Generalized erosion is expressed symbolically

as

Gjk= FjkΘ Hjkwhere H(j,k)

is an odd size L * L structuring element. Generalized erosion is defined to be

6.2.3 Properties of Dilation and Erosioni. Dilation is commutative:A B = B Abut in general, erosion is not commutative.A Θ B # B Θ A

ii. Dilation and erosion are opposite in effect; dilation of the background of an object behaves like

erosion of the object. This statement can be quantified by the duality relationship.

6.2.4 Close and Open Dilation and erosion are often applied to an image in concatenation. Dilation

followed by erosion is called a close operation. It is expressed symbolically as

Gjk= FjkHjkwhere H(j,k) is a L * L structuring element. The close operation is defined as Gjk= FjkHjkΘ

H~ jk)

Closing of an image with a compact structuring element without holes (zeros), such as a square or

circle, smooths contours of objects, eliminates small holes in objects and fuses short gaps between

objects.

B) Morphological algorithm operations on gray scale images

Ans:

Morphological concepts can be extended to gray scale images, but the extension often leads to theoretical issues and to implementation complexities. When applied to a binary image, dilation and erosion operations cause an image to increase or decrease in spatial extent, respectively. To generalize these concepts to a gray scale image, it is assumed that the image contains visually distinct gray scale objects set against a gray background. Also, it is assumed that the objects and background are both relatively spatially smooth.

6.5.1 Gray Scale Image Dilation and Erosion

Dilation or erosion of an image could, in principle, be accomplished by hit-or-miss transformations in

which the quantized gray scale patterns are examined in a 3 * 3 window and an output pixel is

generated for each pattern. This approach is, however, not computationally feasible. For example, if a

look-up table implementation were to be used, the table would require 272entries for 256-level

quantization of each pixel. The common alternative is to use gray scale extremum operations over a 3 *

3 pixel neighborhoods.

Consider a gray scale image F(j,k) quantized to an arbitrary number of gray levels. According to the

extremum method of gray scale image dilation, the dilation operation is defined as

where MAX{S1, ..., S9} generates the largest-amplitude pixel of the nine pixels in the neighborhood.

By the extremum method, gray scale image erosion is defined as

where MIN {S1, ..., S9}generates the smallest-amplitude pixel of the nine pixels in the 3 * 3 pixel

neighborhood.

August 2010

Master of Computer Application (MCA) – Semester 6

MC0086 – Digital Image Processing– 4 Credits

(Book ID: B1007)

Assignment Set – 2 (60 Marks)

Answer all Questions Each Question carries TEN

1. Describe the following texture features of Image Extraction:

A) Fourier Spectra Methods

Ans:

Several studies have considered textural analysis based on the Fourier spectrum of an image region,

as discussed in Section 7.3. Because the degree of texture coarseness is proportional to its spatial

period, a region of coarse texture should have its Fourier spectral energy concentrated at low spatial

frequencies. Conversely, regions of fine texture should exhibit a concentration of spectral energy at

high spatial frequencies. Although this correspondence exists to some degree, difficulties often arise

because of spatial changes in the period and phase of texture pattern repetitions. Experiments have

shown that there is considerable spectral overlap of regions of distinctly different natural texture, such

as urban, rural and woodland regions extracted from aerial photographs. On the other hand, Fourier

spectral analysis has proved successful in the detection and classification of coal miner‟s black lung

disease, which appears as diffuse textural deviations from the norm.

B) Edge Detection Methods:

Ans:

Rosenfeld and Troy have proposed a measure of the number of edges in a neighborhood as a textural

measure. As a first step in their process, an edge map array E(j, k) is produced by some edge detector

such that E(j, k) = 1 for a detected edge and E(j, k) = 0 otherwise. Usually, the detection threshold is set

lower than the normal setting for the isolation of boundary points. This texture measure is defined as

Where W = 2w + 1 is the dimension of the observation window. A variation of this approach is to

substitute the edge gradient G(j, k) for the edge map array in Eq.6.

C) Autocorrelation Methods

Ans:

The autocorrelation function has been suggested as the basis of a texture measure. Although it has

been demonstrated in the preceding section that it is possible to generate visually different stochastic

fields with the same autocorrelation function, this does not necessarily rule out the utility of an

autocorrelation feature set for natural images. The autocorrelation function is defined as

for computation over a W X W window with - T m, n T pixel lags. Presumably, a region of coarse texture

will exhibit a higher correlation for a fixed shift than will a region of fine texture. Thus, texture

coarseness should be proportional to the spread of the autocorrelation function. Faugeras and Pratt

have proposed the following set of autocorrelation spread measures:

2. Describe the following features of Edge detection:

A) Edge, Line and Spot models

Ans:

It is a sketch of a continuous domain, one-dimensional ramp edge modeled as a ramp increase in

image amplitude from a low to a high level, or vice versa. The edge is characterized by its height, slope

angle and horizontal coordinate of the slope midpoint. An edge exists if the edge height is greater than

a specified value. An ideal edge detector should produce an edge indication localized to a single pixel

located at the midpoint of the slope. If the slope angle of is 90°, the resultant edge is called a step

edge. In a digital imaging system, step edges usually exist only for artificially generated images such as

test patterns and bilevel graphics data. Digital images, resulting from digitization of optical images of

real scenes, generally do not possess step edges because the antialiasing low-pass filtering prior to

digitization reduces the edge slope in the digital image caused by any sudden luminance change in the

scene. The one-dimensional profile of a line . In the limit, as the line width w approaches zero, the

resultant amplitude discontinuity is called a roof edge. The vertical ramp edge model in the contains a

single transition pixel whose amplitude is at the midvalue of its neighbors. This edge model can be

obtained by performing a 2 * 2 pixel moving window average on the vertical step edge model. The

figure also contains two versions of a diagonal ramp edge. The single pixel transition model contains a

single midvalue transition pixel between the regions of high and low amplitude; the smoothed transition

model is generated by a 2 * 2 pixel moving window average of the diagonal step edge model. presents

models for a discrete step and ramp corner edge. The edge location for discrete step edges is usually

marked at the higher-amplitude side of an edge transition.

B) First-Order Derivative Edge Detection

Ans:

There are two fundamental methods for generating first-order derivative edge gradients. One method involves generation of gradients in two orthogonal directions in an image; the second utilizes a set of directional derivatives. We will be discussing the first method.

8.3.1 Orthogonal Gradient GenerationAn edge in a continuous domain edge segment F(x,y) can be detected by forming the continuous one-dimensional gradient G(x,y) along a line normal to the edge slope, which is at an angle Θ with respect to the horizontal axis. If the gradient is sufficiently large (i.e., above some threshold value), anedge is deemed present. The gradient along the line normal to the edge slope can be computed in terms of the derivatives along orthogonal axes according to the following

For computational efficiency, the gradient amplitude is sometimes approximated by the magnitude combination

The orientation of the spatial gradient with respect to the row axis is

The remaining issue for discrete domain orthogonal gradient generation is to choose a good discrete

approximation to the continuous differentials of Eq. 8.3a.

The simplest method of discrete gradient generation is to form the running difference of pixels along

rows and columns of the image. The row gradient is defined as

and the column gradient is

Diagonal edge gradients can be obtained by forming running differences of diagonal pairs of pixels.

This is the basis of the Roberts cross-difference operator, which is defined in magnitude form as

and in square-root form as

Prewitt has introduced a pixel edge gradient operator described by the pixel numbering The Prewitt

operator square root edge gradient is defined as

With

where K = 1. In this formulation, the row and column gradients are normalized to provide unit-gain positive and negative weighted averages about a separated edge position.

The Sobel operator edge detector differs from the Prewitt edge detector in that the values of the north, south, east and west pixels are doubled (i.e., K = 2). The motivation for this weighting is to give equal importance to each pixel in terms of its contribution to the spatial gradient.

C) Second-Order Derivative Edge Detection

Ans:

Second-order derivative edge detection techniques employ some form of spatial second- order

differentiation to accentuate edges. An edge is marked if a significant spatial change occurs in the

second derivative. We will consider Laplacian second-order derivative method.

The edge Laplacian of an image function F(x,y) in the continuous domain is defined as

where, the Laplacian is

The Laplacian G(x,y) is zero if F(x,y) is constant or changing linearly in amplitude. If the rate of change

of F(x,y) is greater than linear, G(x,y) exhibits a sign change at the point of inflection of F(x,y). The zero

crossing of G(x,y) indicates the presence of an edge. The negative sign in the definition of Eq. 8.4a is

present so that the zero crossing of G(x,y) has a positive slope for an edge whose amplitude increases

from left to right or bottom to top in an image.

Torre and Poggio have investigated the mathematical properties of the Laplacian of an image function.

They have found that if F(x,y) meets certain smoothness constraints, the zero crossings of G(x,y) are

closed curves. In the discrete domain, the simplest approximation to the continuous Laplacian is to

compute the difference of slopes along each axis:

This four-neighbor Laplacian can be generated by the convolution operation

Where

The four-neighbor Laplacian is often normalized to provide unit-gain averages of the positive weighted

and negative weighted pixels in the 3 * 3 pixel neighborhood. The gain-normalized four-neighbor

Laplacian impulse response is defined by

Prewitt has suggested an eight-neighbor Laplacian defined by the gain normalized impulse response array

3. Describe the following with respect to Image Segmentation:

A) Detection of Discontinuities

Ans:

There are three basic types of discontinuities in a digital image: Points, lines and edges. In practice, the

most common way to look for discontinuities is to run a mask through the image. For the 3 x 3 mask .,

this procedure involves computing the sum of products of the coefficients with the gray levels contained

in the region encompassed by the mask. That is, the response of the mask at any point in the image

Where is the gray level of the pixel associated with mask coefficient As Wi. usual, the response of the

mask is defined with respect to its center location. When the mask is centered on a boundary pixel, the

response is computed by using the appropriated partial neighbor hood.

9.2.1 Point Detection

The detection of isolated points in an image is straightforward. we say that a point has been detected

at the location on which the mask is centered if

Where T is a nonnegative threshold and R is the response of the mask at any point in the image.

Basically all that this formulation does is measure the weighted differences between the center point

and its neighbors. The idea is that the gray level of an isolated point will be quite different from the gray

level of its neighbors.

9.2.2 Line Detection

Line detection is an important step in image processing and analysis. Lines and edges are features in

any scene, from simple indoor scenes to noisy terrain images taken by satellite. Most of the earlier

methods for detecting lines were based on pattern matching. The patterns directly followed from the

definition of a line. These pattern templates are designed with suitable coefficients and are applied at

each point in an image. A set of such templates is shown in If the first mask were moved around an

image, it would respond more strongly to lines oriented horizontally. With constant background, the

maximum response would result when the line passed through the middle row of the mask. This is

easily verified by sketching a simple array of 1’s with a line of a different gray level running horizontally

through the array. A similar experiment would reveal that the second mask in responds best to lines

oriented at +45; the third mask to vertical lines; and the fourth mask to lines in the – 45 direction. These

directions can also be established by noting that the preferred direction of each mask is weighted with

larger coefficient i.e., 2 than other possible directions.

Let R1, R2, R3 and R4 denote the responses of the masks in from left to right, where the R’s are given

by equation 9.2. Suppose that all masks are run through an image. If, at a certain point in the image, for

all j ≠ I, that point is said to be more likely associated with a line in the direction of mask i. For example,

if at a point in the image, for j = 2, 3, 4, that particular point is said to be more likely associated with a

horizontal line.

9.2.3 Edge Detection

The most common approach for detecting meaningful discontinuities in gray level. We discuss

approaches for implementing first-order derivative (Gradient operator), second-order derivative

(Laplacian operator).

Basic Formulation

An edge is a set of connected pixels that lie on the boundary between two regions. An edge is a “local”

concept whereas a region boundary, owing to the way it is defined, is a more global idea.

We start by modeling an edge intuitively. This will lead us to formalism in which “meaningful” transitions

in gray levels can be measured.

In practice, optics, sampling, and other acquisition imperfections yield edges that are blurred, with the

degree of blurring being determined by factors such as the quality of the image acquisition system, the

sampling rate, and illumination conditions under which the image is acquired.

The slope of the ramp is inversely proportional to the degree of blurring in the edge. In this model, we

no longer have a thin (one pixel thick) path. Instead, an edge point now is any point contained in the

ramp, and an edge would then be a set of such points that are connected. The thickness is determined

by the length of the ramp. The length is determined by the slope, which is in turn determined by the

degree of blurring. Blurred edges tend to be thick and sharp; edges tend to be thin.

The first derivative is positive at the points of transition into and out of the ramp as we move from left to

right along the profile; it is constant for points in the ramp; and is zero in areas of constant gray level.

The second derivative is positive at the transition associated with the light side of the edge, and zero

along the ramp and in areas of constant gray level. The following are the two additional properties of

the second derivative around an edge: It produces two values for every edge in an image (an

undesirable feature) An imaginary straight line joining the extreme positive and negative values of the

second derivative would cross zero near the midpoint of the edge (zero-crossing property)

B) Edge Linking and Boundary Detection

Ans:

Edge linking and boundary detection operations are the fundamental steps in any image understanding.

Edge linking process takes an unordered set of edge pixels produced by an edge detector as an input

to form an ordered list of edges. Local edge information are utilized by edge linking operation; thus

edge detection algorithms typically are followed by linking procedure to assemble edge pixels into

meaningful edges.

9.3.1 Local Processing

One of the simplest approaches of linking edge points is to analyze the characteristics of pixels in a

small neighborhood (say, 3 x 3 or 5 x 5) about every point (x, y) in an image that has undergone edge-

detection. All points that are similar are linked, forming a boundary of pixels that share some common

properties.

The two principal properties used for establishing similarity of edge pixels in this kind of analysis are (1)

the strength of the response of the gradient operator used to produce the edge pixel; and (2) the

direction of the gradient vector. The first property is given by the value of , the gradient. Thus an edge

pixel with coordinates in a predefined neighborhood of (x, y) is similar in magnitude to the pixel at (x, y)

if

where E is a non negative threshold

The direction (angle) of the gradient vector is given by. An edge pixel at in the predefined neighborhood

of (x, y) has an angle similar to the pixel at (x, y) if

where A is a nonnegative angle threshold. As noted in 9.2.3, the direction of the edge at (x, y) is

perpendicular to the direction of the gradient vector at that point.

A point in the predefined neighborhood of (x, y) is linked to the pixel at (x, y) if both magnitude and

direction criteria are satisfied. This process is repeated at every location in the image. A record must be

kept of linked points as the center of the neighborhood is moved from pixel to pixel. A simple book

keeping procedure is to assign a different gray level to each set of linked edge pixels.

9.3.2 Global Processing via the Hough transform

In this section, points are linked by determining first if they lie on a curve of specified shape. Unlike the

local analysis method, we now consider global relationships between pixels.

Suppose that, for n points in an image, we want to find subsets of these points that lie on straight lines.

One possible solution is to first find all lines determined by every pair of points and then find all subsets

of points that are close to particular lines. The problem with this procedure is that it involves finding n(n-

1)/2 lines and then performing (n)(n(n-1))/2 comparisons of every point to all lines. This approach is

computationally prohibitive in al but the most trivial applications.

9.3.3 Global Processing via Graph-Theoretic TechniquesIn this section, a global approach based on representing edge segments in the form of a graph and

searching the graph for low-cost paths that correspond to significant edges is discussed. This

representation provides a rugged approach that performs well in the presence of noise. As might be

expected, the procedure is considerably more complicated and requires more processing time.

A graph G = (N, A) is a finite, non empty set of nodes N, together with a set A of unordered pair of

distinct elements of N. Each pair of A is called an arc. A graph in which the arcs are directed is called a

directed graph. If an arc is directed from node to node, then is said to be a successor of its parent

node . The process of identifying the successors of a node is called expansion of the node. In each

graph we define levels, such that level consists of a single node, called the start node, and the nodes in

the last level are called goal nodes. A cost can be associated with every arc . A sequence of nodes with

each being a successor of node is called a path from , and the cost of the path is

4. Describe the following with respect to Region Based segmentation:

A) Basic Formulation

Ans:

Let R represent the entire image region. We may view segmentation as a process that partitions R into

n sub regions, R1,R2, …. Rn such that

Here,P(R1) is a logical predicate over the points in set and is the null set.

Condition (a) indicates that the segmentation must be complete; that is, every pixel must be in a region. Condition (b) requires that points in a region must be connected. Condition (c) indicates that the regions must be disjoint. Condition (d) deals with the properties that must be satisfied by the pixels in a segmented region – for example = TRUE if all pixels in have the same gray level. Finally, condition (e) indicates that regions and are different in the sense of predicate P.

B) Region Growing

Ans:

Region growing is one of the conceptually simplest approaches to image segmentation; neighboring pixels of similar amplitude are grouped together to form a segmented region. Region-growing approaches exploit the fact that pixels which are close together have similar gray values. Start with a single pixel (seed) and add new pixels slowly.(1) Choose the Seed pixel.(2) Check the neighboring pixels and add to the region if they are similar to the seed.(3) Repeat step 2 for each of the newly added pixels; stop if no more pixels can be added.

How do we choose the seed(s) in practice?It depends on the nature of the problem.If targets need to be detected using infrared images for example, choose the brightest pixel(s).Without a-priori knowledge, compute the histogram and choose the gray-level values corresponding to the strongest peaksHow do we choose the similarity criteria (predicate)? The homogeneity predicate can be based on any characteristic of the regions in the image such as:average intensityvariancecolortexture

C) Region Splitting and Merging

Ans:

Sub-divide an image into a set of disjoint regions and then merge and/or split the regions in an attempt to satisfy the conditions stated in section 10.3.1.

Let R represent the entire image and select predicate P. One approach for segmenting R is to subdivide

it successively into smaller and smaller quadrant regions so that, for ant region, R1. P(R1) = TRUE. We

start with the entire region. If P(R

) = FALSE, then the image is divided into quadrants. If P is FALSE for any quadrant, we subdivide that

quadrant into sub quadrants, and so on. This particular splitting technique has a convenient

representation in the form of a so called quad tree (that is, a tree in which nodes have exactly four

descendants). The root of the tree corresponds to the entire image and that each node corresponds to

a subdivision. In this case, only was sub divided further.

If only splitting were used, the final partition likely would contain adjacent regions with identical properties. This draw back may be remedied by allowing merging, as well as splitting. Satisfying the constraints of section 10.3.1 requires merging only adjacent regions whose combined pixels satisfy the predicate P. That is, two adjacent regions and are merged only if = TRUE.

5. Describe the following with respect to Shape Analysis:

A) Shape Orientation Descriptors

Ans:

The spatial orientation of an object with respect to a horizontal referenceaxis is the basis of a set of orientation descriptors developed at the StanfordResearch Institute. These descriptors, defined below,

1. Image-oriented bounding box: the smallest rectangle oriented along the rows of the image that encompasses the object2. Image-oriented box height: dimension of box height for image-oriented box3. Image-oriented box width: dimension of box width for image-oriented box4. Image-oriented box area: area of image-oriented bounding box5. Image oriented box ratio: ratio of box area to enclosed area of an object for an image-oriented box6. Object-oriented bounding box: the smallest rectangle oriented along the7. major axis of the object that encompasses the object8. Object-oriented box height: dimension of box height for object-oriented box9. Object-oriented box width: dimension of box width for object-oriented box10. Object-oriented box area: area of object-oriented bounding box11. Object-oriented box ratio: ratio of box area to enclosed area of an object for an object-oriented box12. Minimum radius: the minimum distance between the centroid and a perimeter pixel13. Maximum radius: the maximum distance between the centroid and a perimeter pixel14. Minimum radius angle: the angle of the minimum radius vector with respect to the horizontal axis15. Maximum radius angle: the angle of the maximum radius vector with respect to the horizontal axis16. Radius ratio: ratio of minimum radius angle to maximum radius angle.

B) Fourier Descriptors:

Ans:

The perimeter of an arbitrary closed curve can be represented by its instantaneous curvature at each

perimeter point. Consider the continuous closed curve drawn on the complex plane in which a point on

the perimeter is measured by its polar position z(s) as a function of arc length s. The complex function

z(s) may be expressed in terms of its real part x(s) and imaginary part y(s) as

z(s) = x(s) + iy(s)

The tangent angle defined is given by

The coordinate points [x(s), y(s)] can be obtained from the curvature function by the reconstruction

formulas

where x(0) and y(0) are the starting point coordinates.

Because the curvature function is periodic over the perimeter length P, it can be expanded in a Fourier

series as

where the coefficients cnare obtained from

This result is the basis of an analysis technique developed by Cosgriff and Brill in which the Fourier

expansion of a shape is truncated to a few terms to produce a set of Fourier descriptors. These Fourier

descriptors are then utilized as a symbolic representation of shape for subsequent recognition.

If an object has sharp discontinuities (e.g., a rectangle), the curvature function is undefined at these

points. This analytic difficulty can be overcome by the utilization of a cumulative shape function

This function is also periodic over P and can therefore be expanded in a Fourier series for a shape

description.

C) Thinning and Skeletonizing

Ans:

We have previously discussed the usage of morphological conditional erosion as a means of thinning or skeletonizing, respectively, a binary object to obtain a stick figure representation of the object. There are other non-morphological methods of thinning and skeletonizing. Some of these methods create thinner, minimally connected stick figures. Others are more computationally efficient.Thinning and skeletonizing algorithms can be classified as sequential or parallel. In a sequential

algorithm, pixels are examined for deletion (erasure) in a fixed sequence over several iterations of an

algorithm. The erasure of a pixel in the nth iteration depends on all previous operations performed in

the (n-1)th iteration plus all pixels already processed in the incomplete nth iteration. In a parallel

algorithm, erasure of a pixel in the nth iteration only depends upon the result of the (n-1)th iteration.

Sequential operators are, of course, designed for sequential computers or pipeline processors, while

parallel algorithms take advantage of parallel processing architectures. Sequential algorithms can be

classified as raster scan or contour following. The morphological conditional erosion operators are

examples of raster scan operators. With these operators, pixels are examined in a 3 * 3 window, and

are marked for erasure or not for erasure. In a second pass, the conditionally marked pixels are

sequentially examined in a 3 * 3 window. Conditionally marked pixels are erased if erasure does not

result in the breakage of a connected object into two or more objects.

In the contour following algorithms, an image is first raster scanned to identify each binary object to be

processed. Then each object is traversed about its periphery by a contour following algorithm, and the

outer ring of pixels is conditionally marked for erasure. This is followed by a connectivity test to

eliminate erasures that would break connectivity of an object. Rosenfeld and Arcelli and di Bija have

developed some of the first connectivity tests for contour following thinning and skeletonizing.

6. Describe the following:

A) Image Pyramids

Ans:

Analyzing, manipulating and generating data at various scales should be a familiar concept to anybody involved in Computer Graphics. We will start with “image” pyramids. In pyramids such as a MIP map used for filtering, successive averages are built from the initial .

It is clear that it can be seen as the result of applying box filters scaled and translated over the signal. For initial values we have log 2 (n) terms in the result. Moreover because of the order we have chosen for the operations we only had to compute n = 1 additions (and shifts if means are stored instead of sums). This is not a good scheme for reconstruction, since all we need is the last row of values to reconstruct the signal (of course they are sufficient since they are the initial values, but they are also necessary since only a sum of adjacent values is available from the levels above). We can observe, though, that there is some redundancy in the data. Calling si;j the jth element of level i (0 being the top of the pyramid, k = log2(n) being the bottom level) we have:

We can instead store s0;0 as before, but at the level below we store:

It is clear that by adding s0;0 and s0 1;0 we retrieve s1;0 and by subtracting s0;0 and s0 1;0 we retrieve s1;1. We therefore have the same information with one less element. The same modification applied recursively through the pyramid results in n = 1 values being stored in k = 1 levels. Since we need the top value as well (s0;0), and the sums as intermediary results, the computational scheme . The price we have to pay is that now to effect a reconstruction we have to start at the top of thepyramid and stop at the level desired.

If we look at the operations as applying a filter to the signal, we can see easily that the successive filters in the difference pyramid are (1/2, 1/2) and (1/2, -1/2), their scales and translates. We will see that they are characteristics of the Haar transform. Notice also that this scheme computesthe pyramid in O(n) operations.

B) Series Expansion

Ans:

The standard Fourier transform is especially useful for stationary signals, that is for signals whose properties do not change much (stationarity can be defined more precisely for stochastic processes, but a vague concept is sufficient here) with time (or through space for images). For signals such asimages with sharp edges and other discontinuities, however, one problem with Fourier transform and Fourier synthesis is that in order to accommodate a discontinuity high frequency terms appear and they are not localized, but are added everywhere. In the following examples we will use for simplicity and clarity piece-wise constant signals and piece-wise constant basis functions to show the characteristics of several transforms and encoding schemes. Two sample 1-D signals will be used, one with a singlestep, the other with a (small) range of scales in constant spans.

They have various ordering for their index i, so always make sure you know which ordering is used when dealing with W i (t). The most common, used here, is where i is equal to the number of zero crossings of the function (the so-called sequency order). There are various definitions forthem. A simple recursive one is:

with W 0 (t) = 1.

where j ranges from 0 to1and q = 0 or 1. The Walsh transform is a series of coefficients given by:

and the function can be reconstructed as:

Note that since the original signals have discontinuities only at integral values, the signals are exactly represented by the first 32 Walsh bases at most. But we should also note that in this example, as well as would be the case for a Fourier transform, the presence of a single discontinuity at 21 for signal 1 introduces the highest “frequency” basis, and it has to be added globally for all t. In general cases the coefficients for each basis function decrease rapidly as the order increases, and that usually allows for a simplification (or compression) of the representation of the original signal by dropping the basis functions whose coefficients are small (obviously with loss of information if the coefficients are not 0).

C) Scaling functions - Continuous Wavelet Transform

Ans:

We can choose any set of windows to achieve the constant relative bandwidth, but a simple version is if all the windows are scaled version of each other. To simplify notation, let us define h (t) as:

and scaled versions of h (t) :

where a is the scale factor (that is f = f0 / a ), and the constant 1 / | a | is for energy normalization. The WFT now becomes:

This is known as a wavelet transform, and h (t) is the basic wavelet. It is clear from the above formula that the basic wavelet is scaled, translated and convolved with the signal to compute the transform. The translation corresponds to moving the window over the time signal, and the scaling, which is often called dilation in the context of wavelets, corresponds to the filter frequency bandwidth scaling.

We have used the particular form of h (t) related to the window w (t), but the transform WF() can be defined with any function h(t) satisfying the requirements for a band pass function, that is it is sufficiently regular its square integral is finite and its integral ∫ h(t) dt = 0. We can rewrite the basicwavelet as:

The transform is then written as:

We can reconstruct the signal as:

where c is a constant depending on h (t). The reconstruction looks like a sum of coefficients of orthogonal bases, but the h a, (t) are in fact highly redundant, since they are defined for every point in the a, space.

Since there is a lot of redundancy in the continuous application of the basic wavelet, a natural question is whether we can discretize a and in such a way that we obtain a true orthonormal basis. Following Daubechies one can notice that if we consider two scales a 0 < a 1, the coefficients at scale a 1 canbe sampled at a lower rate than for a 0 since they correspond to a lower frequency. In fact the sampling rate can be proportional to a 0 / a 1 Generally,if:

(i and j integers, T a period ) the wavelets are:

and the discretized wavelets coefficients are:

We hope that with a suitable choice of h(t), a0 and T we can then reconstruct f(t) as:

It is clear that for a0 close to 1 and T small, we are close to the continuous case, and the conditions on h(t) will be mild, but as a0 increases only very special h(t) will work.

SEM 6 MC0086 Digital Image Processing

Documents