Modified DA based DWT-IDWT on FPGA for Image Compression Dept of E&C, Sir MVIT, Bengaluru Page 1 CHAPTER 1 INTRODUCTION
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 1
CHAPTER 1
INTRODUCTION
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 2
1.1 IMAGE
An image (from Latin imago) is an artifact, for example a two-dimensional
picture, that has a similar appearance to some subject—usually a physical object or a
person
Images may be two-dimensional, such as a photograph, screen display, and as
well as a three-dimensional, such as a statue. They may be captured by optical devices—
such as cameras, mirrors, lenses, telescopes, microscopes, etc. and natural objects and
phenomena, such as the human eye or water surfaces.
The word image is also used in the broader sense of any two-dimensional figure
such as a map, a graph, a pie chart, or an abstract painting. In this wider sense, images can
also be rendered manually, such as by drawing, painting, carving, rendered automatically
by printing or computer graphics technology, or developed by a combination of methods,
especially in a pseudo-photograph.
A volatile image is one that exists only for a short period of time. This may be a
reflection of an object by a mirror, a projection of a camera obscura, or a scene displayed
on a cathode ray tube. A fixed image, also called a hard copy, is one that has been
recorded on a material object, such as paper or textile by photography or digital
processes[1].
1.1.1 STILL IMAGE
A still image is a single static image, as distinguished from a moving image (see
below). This phrase is used in photography, visual media and the computer industry to
emphasize that one is not talking about movies, or in very precise or pedantic technical
writing such as a standard. A film still is a photograph taken on the set of a movie or
television program during production, used for promotional purposes.
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 3
Figure 1.1 : Still Image.
1.1.2 MOVING IMAGE
A moving image is typically a movie (film), or video, including digital video. It
could also be an animated display such as a zoetrope.
1.1.3 IMAGE FILE SIZE
Image file size—expressed as the number of bytes—increases with the number of
pixels composing an image, and the colour depth of the pixels.
The greater the number of rows and columns, the greater the image resolution, and
the larger the file. Also, each pixel of an image increases in size when its colour depth
increases—an 8-bit pixel (1 byte) stores 256 colors, a 24-bit pixel (3 bytes) stores 16
million colaors, the latter known as true color[1].
1.2 IMAGE COMPRESSION Image compression, the art science of reducing the amount of data required to
representation image, is one of the most useful and commercially successful technologies
in tke field of digital image processing. The number of images that are compressed and
decompressed daily is staggering, and the compressions and decompressions are virtually
invisible to the user. Anyone who owns a digital camera, surfs the web, or watches the
latest Hollywood movies on digital video disks(dvds) benefits from the algorithms and
standards discussed in this section[2].
Compression is basically of two types:
Lossy Compression
Lossless Compression.
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 4
Lossy compression of data concedes a certain loss of accuracy in exchange for
greatly increased compression. An image reconstructed following lossy compression
contains degradation relative to the original. Often this is because the compression
scheme completely discards redundant information. Under normal viewing conditions no
visible is loss is perceived. It proves effective when applied to graphics images and
digitized voice.
Lossless compression consists of those techniques guaranteed to generate an
exact duplicate of the input data stream after a compress or expand cycle. Here the
reconstructed image after compression is numerically identical to the original image.
Lossless compression can only achieve a modest amount of compression. This is the type
of compression used when storing data base records, spread sheets or word processing
files[2].
1.2.1 NEED FOR THE COMPRESSION
To better understand the need for compact image representations, consider the
amount of data required to represent a two-hour standard definition(SD) television movie
using 720*480*24 bit pixel arrays. A digital movie (or video) is a sequence of video
frames in which each is a full-color still image. Because video players must display the
frames sequentially at rates near 30 fps (frames per second),SD digital video data must be
accessed at
(30 frames/sec)*(720*480pixels/frame)*(3 bytes/pixel)=31,104,000bytes/sec
And a two-hour movie consists of
(31,104,000 bytes/sec)*(3600 sec/hour)*(2 hours)=2.24*10^11 bytes
Or 224 GB (giga bytes) of data. Twenty seven 8.5 GB dual layer DVDs (assuming
conventional 12 cm disks) are needed to store it. To put a two-hour movie on a single
DVD, each frame must be compressed-on average by a factor of 26.3.The compression
must be even higher for High Definition(HD)television. where image resolutions reach
1929*1080*24 bit/image[1].
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 5
1.3 OVERVIEW
Figure 1.2 Block Diagram
1.3.1 Experimental Setup
Figure 1.3: Experimental Setup
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 6
1.3.2 RESOURCES USED:
Xilinx IST
Matlab
Virtex 2 pro FPGA Development Kit
Desktop PC
Interfacing Model
1.3.3 OBJECTIVE
1) To carry out literature survey on
a) Image and Image Compression
b) Need for Compression
c) JPEG Standard
d) DWT-IDWT
e) DA Arithmetic
f) Real Time Setup for Image Compression
2) To develop system level block diagram for Image Compression and DWT-IDWT
processor
3) To develop software reference level for Image Compression and analyse the results
for multiple test images
4) To design and implement DA DWT-IDWT processor and analyze its performance
w.r.t area, time and power on FPGA
5) To design Modified DA DWT-IDWT processor and analyses its performance
6) To implement the proposed architecture on FPGA and verify the results in real time
experimental setup
1.4 APPLICATIONS:
Although the Fourier transform has been the mainstay of transform-based digital
signal processing since time immemorial, a more recent transformation, called the
wavelet transform, is making strides in DSP applications following some of its unique
advantages.
Wavelets have their energy concentrated in time. Sinusoids (Fourier Transform)
are useful in analyzing periodic and time-invariant phenomena, while wavelets are well
suited for the analysis of transient, time-varying signals. Since most of the real-life
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 7
signals encountered are time varying in nature, the Wavelet Transform suits very well for
many applications[4].
1.4.1 Wavelets in Audio
DWT can be used to analyze temporal and spectral properties of non-stationary
signals such as audio. Unlike the Fourier transform, whose basic functions are sinusoids,
wavelet transforms are based on small waves, called wavelets, of varying frequency and
limited duration. That reveals not only what notes (or frequencies) to play but also when
to play them. Conventional Fourier transforms, on the other hand, provide only the notes
or frequency information; temporal information is lost in transformation process.
Some of audio applications where DWT could offer considerable improvement
are extraction of beat attributes from music signals and automatic classification of non-
speech audio signal using statistical pattern recognition. Shrinking of transform
coefficients towards zero in wavelet domain is one of the wavelet techniques, which
offers advantage of removal of noise in wide variety of signal types while preserving non-
smooth features.
1.4.2 Wavelets in Video
Wavelet basis functions are obtained from single wavelet by transformation and
scaling of mother wavelets. Also, multi-resolution concept, satisfied by almost all useful
wavelet functions, makes it very useful in analyzing “real world” signals.
Multi-resolution theory is concerned with the representation and analysis of
signals at more than one resolution. The multi-resolution of videos has an advantage of
scalability. i.e. possibility to transmit the same sequence at different resolution as high-
resolution television, videophone and videoconferencing. DWT offers better
approximation at half the width and half as wide translation steps. This is conceptually
similar to improving frequency resolution by doubling the number of harmonics in
Fourier series expansion.
While DCT-based image coders like JPEG perform very well at moderate bit
rates, at low bit rates the image quality degrades rapidly because of the blocking artifacts
introduced by the block based DCT transform. JPEG-2000 is an emerging standard in
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 8
image processing that uses DWT to achieve far superior image quality at very low bit
rates because of overlapping basis functions and better energy compaction property of
wavelet transformation.
1.4.3 Wavelets in Wireless applications
The analysis, design and measurement of antennas have been extremely important
in the development and success of wireless communication and applications.
Unfortunately mathematical simulations of antenna are extremely complex and require
extensive computation and large amount of memory. Use of wavelets in conjunction with
other techniques in the numerical methods involved in solving the current distribution on
the antenna offers many advantages. The use of wavelets in such simulations propose
reduction in computation, aids in reducing errors as well as enables us to get closer to the
true values of such computation.
With the recent developments in wireless communication technologies, video
streaming and the image compression techniques are very important for wireless
application to transmit multimedia content over wireless channels. As wireless channels
are very noisy and have narrow bandwidth, higher compression is required for both image
and video signals, use of wavelet transform as image compression technique in wireless
applications could be a good choice because of its advantage of providing better
compression at higher bit rates.
1.4.4 Wavelets in Neural Networks
Neural Networks (NN) have emerged as a powerful tool for data mining
applications due to their ability to learn patterns and relationships in complex, multi-
dimensional data sets. The effectiveness of any NN-based solution is largely dependent
on a range of factors such as scalability of the network, generalization capability,
dimensionality of the parameter space and host of other factors and often restrict the
effectiveness of the NN. As such, any methods, which are able to increase the quality or
accessibility of the input data, will be invaluable. It is here that wavelets are likely to be
extremely useful. NN‟s are useful in conjunction with wavelets, with the latter serving as
a preprocessing tool that transforms hidden patterns into a more recognizable form
suitable for use as a training set
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 9
CHAPTER 2
IMAGE COMPRESSION STANDARD
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 10
2.1 NEED FOR A COMPRESSION STANDARD
With the rapid developments of imaging technology, image compression and
coding tools and techniques, it is necessary to evolve coding standards so that there is
compatibility and interoperability between the image communication and storage
products manufactured by different vendors. Without the availability of standards,
encoders and decoders can not communicate with each other; Most commonly used
standards are JPEG and JPEG2000[3].
2.2 JPEG
The aim of JPEG compression is to take full-color (and gray-scale) "real-world"
scenes and reduce the file size of images for storage and transmission.
While capacity and bandwidth have improved dramatically over the last decade,
the increased size of images makes JPEG still relevant for digital cameras users and
websites.
This standard doesn't define exactly how to implement this process, but is
sufficiently wide that images from any program can be viewed. The most common
version in use is that produced by the Independent JPEG Group or IJG[3].
2.2.1 Need for JPEG
To make your image files smaller, and to store 24-bit-per-pixel color data instead
of 8-bit-per-pixel data. Advantage of JPEG is that it stores full color
information:24bits/pixel
2.2.2 JPEG STANDARD
In computing, JPEG (named after the Joint Photographic Experts Group who
created the standard) is a commonly used method of lossy compression for photographic
images.
The degree of compression can be adjusted, allowing a selectable trade off
between storage size and image quality.
JPEG (.jpeg, .jfif, .jpg and .jpe) is a standard image compression format developed
by and named after the Joint Photographic Experts Group.
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 11
It is one of the two most common formats for storing and sending images on the
Web. JPEG images are full-color images, meaning they are capable of storing 24 bits-per-
pixel and using 16 million colors[3].
JPEG is best for compressing full-color or gray-scale images, including
photographs and graphic images.
The JPEG format is unique in the aspect that images are compressed based on the
human eye. Because the human eye does not pick up subtle color distinctions and high
frequency brightness variations, data can be removed without completely changing the
image. However, as this data is removed the quality of the image decreases. This is the
reason JPEG compression is considered “lossy”.
Edges in a typical JPEG image - split by red, green and blue channels
Figure 2.1:Image describing JPEG standard.
As with all image compression formats, JPEG has both its advantages and disadvantages:
2.2.3 ADVANTAGES OF JPEG
Large compression ratios = shorter file transfer time
Full-color information
Great for photographs, graphic artwork, banner ads, etc
2.2.4 DISADVANTAGES OF JPEG
Loss of image quality
Sharp edges tend to come out blurry
Longer page load time than the GIF Format
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 12
JPEG uses a lossy compression algorithm so you will lose some detail when
converting other formats like BMP to a JPEG
If you have an illustrated image or a vector image, don't use JPEG because the
edges of lines may get blurred.
2.2.5 EMERGENCE OF A JPEG 2000
JPEG 2000 addresses most of the problems:
The biggest problem is that JPEGs are lossy - when an image is converted to
JPEG, some of the information in the image is lost.
Professional photographers tend to avoid working repeatedly with JPEG images as
continually loading and saving the image causes the image to lose quality.
JPEGs don't support layers - most photo manipulation software use layers; to save
images as JPEGs the image has to be "flattened".
JPEGs only support 8 bit images. Modern digital cameras can operate in 12, 14 or
16 bit mode but if the images are saved as JPEGs, the extra information is
discarded
2.3 JPEG 2000
The JPEG-2000 image compression system has a rate-distortion advantage over
the original JPEG.JPEG-2000 is an emerging standard for still image
compression.
As digital imagery becomes more common place and of higher quality, there is the
need to manipulate more and more data
Thus, image compression must not only reduce the necessary storage and
bandwidth requirements, but also allow extraction for editing, processing, and
targeting particular devices and applications.
More importantly, it also allows extraction of different resolutions, pixel
fidelities, and regions of interest, components, and more, all from a single
compressed bit stream.
This allows an application to manipulate or transmit only the essential information
for any target device from any JPEG 2000 compressed source image.
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 13
2.3.1 FEATURES OF JPEG-2000
State-of-the-art low bit-rate compression performance
Progressive transmission by quality, resolution, component, or spatial Locality.
Lossy and lossless compression (with lossless decompression available
Naturally through all types of progression)
Random (spatial) access to the bit stream
Pan and zoom (with decompression of only a subset of the compressed data)
Compressed domain processing (e.g., rotation and cropping)
Region of interest coding by progression
Limited memory implementations.
The aims of JPEG 2000 are not only improved compression performance over JPEG
but also adding (or improving) features such as scalability and edit ability.Very low and
very high compression rates are supported in JPEG 2000.
In fact, the graceful ability of the design to handle a very large range of effective bit
rates is one of the strengths of JPEG 2000. While there is a modest increase in
compression performance of JPEG 2000 compared to JPEG, the main advantage offered
by JPEG 2000 is the significant flexibility of the code stream[3].
Figure 2.2 : COMPRESSION (ENCODING AND DECODING)
Conventional methods of lossless compression such as Zip reversibly reduce file
sizes while preserving information by compacting regularities in the data. Jpeg
compression goes one step further, by organizing regularities in the visual perception of
an image and using lossy compression to reduce the file size of the image.
This process involves a small but irreversible loss of quality as discussed in the errors
below.
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 14
Figure 2.3: Edges in a typical image - zoomed in to see the pixels.
After compression most of the edges are still present, with some artifacts
The main steps are as follows (some require heavy math‟s)
Standard color space is 256 levels of Red, Green, Blue (16.7 million RGB
colors)
Color space separation (YCbCr) from RGB
e.g. Y (luminance) = 0.299 * R + 0.587 * G + 0.114 *B
Spatial separation into 8X8 pixels blocks
Sub-sampling (if required) of chroma and Cr (colors) in 16X16 pixel blocks
Discrete Cosine Function (DCF) of the spatial frequencies in each 8X8
block
Quantization of the spatial frequency matrix
Lossless compression of the resulting matrix
For illustrative purposes large images are not needed, since the entire JPEG
compression takes place inside 8X8 (or 16X16) pixel blocks. Note that a
JPEG cannot be compressed further using Zip or any other process of
lossless compression, since this is already done as the last step of the JPEG
encoding.
Note the predominance of green and blue pixels, with few red pixels
The green channel is closest to what the eye sees, with blue having next
most artifacts
Decoding an image from a JPEG is the reverse of this process, and does not
need elaboration here.
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 15
2.4 Implications
JPEG-2000 is unlikely to replace JPEG in low complexity applications at bit rates
in the range where JPEG performs well. However, for applications requiring
either higher quality or lower bitrates, or any of the features provided, JPEG-2000
should be a welcome standard.
JPEG-2000 provides better rate-distortion performance, for any given rate, than
The original JPEG standard. However, the largest improvements are observed at
very high and very low bitrates.
The improvements in the “near visually lossless” realm are more modest
(approximately 20%). Thus, widespread adoption of the new standard will likely
be based on the JPEG-2000 feature set.
While JPEG provided different methods of generating progressive bit streams,
with JPEG-2000 the progression is simply a matter of the order the compressed
bytes are stored in a file.
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 16
CHAPTER 3
DISCRETE WAVELET TRANSFORM
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 17
3.1 INTRODUCTION
The transform of a signal is just another form of representing the signal. It does
not change the information content present in the signal. The Wavelet Transform provides
a time-frequency representation of the signal. It was developed to overcome the short
coming of the Short Time Fourier Transform (STFT), which can also be used to analyze
non-stationary signals. While STFT gives a constant resolution at all frequencies, the
Wavelet Transform uses multi-resolution technique by which different frequencies are
analyzed with different resolutions. A wave is an oscillating function of time or space and
is periodic. In contrast, wavelets are localized waves. They have their energy
concentrated in time or space and are suited to analysis of transient signals. While Fourier
Transform and STFT use waves to analyze signals, the Wavelet Transform uses wavelets
of finite energy.
Figure3.1 Demonstration of (a) a Wave and (b) a Wavelet
The wavelet analysis is done similar to the STFT analysis. The signal to be analyzed
is multiplied with a wavelet function just as it is multiplied with a window function in
STFT, and then the transform is computed for each segment generated[4].
However, unlike STFT, in Wavelet Transform, the width of the wavelet function
changes with each spectral component. The Wavelet Transform, at high frequencies,
gives good time
resolution and poor frequency resolution, while at low frequencies, the Wavelet
Transform gives good frequency resolution and poor time resolution.
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 18
3.2 CONTINUOUS WAVELET TRANSFORM AND
WAVELET SERIES
The Continuous Wavelet Transform (CWT) is provided by equation 2.1, where
x(t) is the signal to be analyzed. ψ(t) is the mother wavelet or the basis function. All the
wavelet functions used in the transformation are derived from the mother wavelet through
translation (shifting) and scaling (dilation or compression).
............(3.1)
The mother wavelet used to generate all the basis functions is designed based on
some desired characteristics associated with that function. The translation parameter τ
relates to the location of the wavelet function as it is shifted through the signal. Thus, it
corresponds to the time information in the Wavelet Transform. The scale parameter s is
defined as |1/frequency| and corresponds to frequency information. Scaling either dilates
(expands) or compresses a signal. Large scales (low frequencies) dilate the signal and
provide detailed information hidden in the signal, while small scales (high frequencies)
compress the signal and provide global information about the signal. Notice that the
Wavelet Transform merely performs the convolution operation of the signal and the basis
function. The above analysis becomes very useful as in most practical applications, high
frequencies (low scales) do not last for a long duration, but instead, appear as short bursts,
while low frequencies (high scales) usually last for entire duration of the signal.
The Wavelet Series is obtained by discretizing CWT. This aids in computation of
CWT using computers and is obtained by sampling the time-scale plane. The sampling
rate can be changed accordingly with scale change without violating the Nyquist
criterion. Nyquist criterion states that, the minimum sampling rate that allows
reconstruction of the original signal is 2ω radians, where ω is the highest frequency in the
signal. Therefore, as the scale goes higher (lower frequencies), the sampling rate can be
decreased thus reducing the number of computations[4].
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 19
3.3 DWT
The Wavelet Series is just a sampled version of CWT and its computation may
consume significant amount of time and resources, depending on the resolution required.
The Discrete Wavelet Transform (DWT), which is based on sub-band coding is found to
yield a fast computation of Wavelet Transform. It is easy to implement and reduces the
computation time and resources required.
The foundations of DWT go back to 1976 when techniques to decompose discrete
time signals were devised . Similar work was done in speech signal coding which was
named as sub-band coding. In 1983, a technique similar to sub-band coding was
developed which was named pyramidal coding. Later many improvements were made to
these coding schemes which resulted in efficient multi-resolution analysis schemes.
In CWT, the signals are analyzed using a set of basis functions which relate to
each other by simple scaling and translation. In the case of DWT, a time-scale
representation of the digital signal is obtained using digital filtering techniques. The
signal to be analyzed is passed through filters with different cutoff frequencies at different
scales.
3.4 Filter Banks
3.4.1 Multi-Resolution Analysis using Filter Banks
Filters are one of the most widely used signal processing functions. Wavelets can
be realized by iteration of filters with rescaling. The resolution of the signal, which is a
measure of the amount of detail information in the signal, is determined by the filtering
operations, and the scale is determined by upsampling and downsampling (subsampling)
operations.
The DWT is computed by successive lowpass and highpass filtering of the
discrete time-domain signal as shown in figure 2.2. This is called the Mallat algorithm or
Mallat-tree decomposition. Its significance is in the manner it connects the continuous-
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 20
time mutiresolution to discrete-time filters. In the figure, the signal is denoted by the
sequence x[n], where n is an integer. The low pass filter is denoted by G0
while the high
pass filter is denoted by H0. At each level, the high pass filter produces detail information,
d[n], while the low pass filter associated with scaling function produces coarse
approximations, a[n].
Figure 3.2: Three level decomposition tree
At each decomposition level, the half band filters produce signals spanning only half the
frequency band. This doubles the frequency resolution as the uncertainity in frequency is
reduced by half. In accordance with Nyquist‟s rule if the original signal has a highest
frequency of ω, which requires a sampling frequency of 2ω radians, then it now has a
highest frequency of ω/2 radians. It can now be sampled at a frequency of ω radians thus
discarding half the samples with no loss of information. This decimation by 2 halves the
time resolution as the entire signal is now represented by only half the number of
samples. Thus, while the half band low pass filtering removes half of the frequencies and
thus halves the resolution, the decimation by 2 doubles the scale.
With this approach, the time resolution becomes arbitrarily good at high
frequencies, while the frequency resolution becomes arbitrarily good at low frequencies.
The time-frequency plane is thus resolved as shown in figure 1.1(d) of Chapter 1. The
filtering and decimation process is continued until the desired level is reached. The
maximum number of levels depends on the length of the signal. The DWT of the original
signal is then obtained by concatenating all the coefficients, a[n] and d[n], starting from
the last level of decomposition.
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 21
Figure 3.3: Three level reconstruction tree
Figure 3.3 shows the reconstruction of the original signal from the wavelet coefficients.
Basically, the reconstruction is the reverse process of decomposition. The approximation and
detail coefficients at every level are up-sampled by two, passed through the low pass and high
pass synthesis filters and then added. This process is continued through the same number of
levels as in the decomposition process to obtain the original signal. The Mallat algorithm
works equally well if the analysis filters, G0
and H0, are exchanged with the synthesis filters,
G11
.
3.4.2 Conditions for Perfect Reconstruction
In most Wavelet Transform applications, it is required that the original signal be
synthesized from the wavelet coefficients. To achieve perfect reconstruction the analysis
and synthesis filters have to satisfy certain conditions. Let G0(z) and G
1(z) be the low pass
analysis and synthesis filters, respectively and H0(z) and H
1(z) the high pass analysis and
synthesis filters respectively. Then the filters have to satisfy the following two conditions
as given in equation :
G0 (-z) =G
1 (z) + H
0 (-z). H
1 (z) = 0 (3.2)
G0 (z) =G
1 (z) + H
0 (z). H
1 (z) = 2z
-d
(3.3)
The first condition implies that the reconstruction is aliasing-free and the second
condition implies that the amplitude distortion has amplitude of one. It can be observed
that the perfect reconstruction condition does not change if we switch the analysis and
synthesis filters.
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 22
There are a number of filters which satisfy these conditions. But not all of them give
accurate Wavelet Transforms, especially when the filter coefficients are quantized. The
accuracy of the Wavelet Transform can be determined after reconstruction by calculating
the Signal to Noise Ratio (SNR) of the signal. Some applications like pattern recognition
do not need reconstruction, and in such applications, the above conditions need not apply.
3.4.3 Classification of Wavelets
We can classify wavelets into two classes: (a) orthogonal and (b) biorthogonal. Based on
the application, either of them can be used.
(a)Features of orthogonal wavelet filter banks
The coefficients of orthogonal filters are real numbers. The filters are of the same
length and are not symmetric. The low pass filter, G0
and the high pass filter, H0
are
related to each other by
H0 (z) = z
-N
G0 (-z
-1
) (3.3)
The two filters are alternated flip of each other. The alternating flip automatically
gives double-shift orthogonality between the lowpass and highpass filters, i.e., the scalar
product of the filters, for a shift by two is zero. i.e., ΣG[k] H[k-2l] = 0, where k,lЄZ .
Filters that satisfy equation are known as Conjugate Mirror Filters (CMF). Perfect
reconstruction is possible with alternating flip.
Also, for perfect reconstruction, the synthesis filters are identical to the analysis
filters except for a time reversal. Orthogonal filters offer a high number of vanishing
moments. This property is useful in many signal and image processing applications. They
have regular structure which leads to easy implementation and scalable architecture.
(b)Features of biorthogonal wavelet filter banks
In the case of the biorthogonal wavelet filters, the low pass and the high pass
filters do not have the same length. The low pass filter is always symmetric, while the
high pass filter could be either symmetric or anti-symmetric. The coefficients of the filters
are either real numbers or integers.
For perfect reconstruction, biorthogonal filter bank has all odd length or all even
length filters. The two analysis filters can be symmetric with odd length or one symmetric
and the other antisymmetric with even length. Also, the two sets of analysis and synthesis
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 23
filters must be dual. The linear phase biorthogonal filters are the most popular filters for
data compression applications.
3.5 Wavelet Families
There are a number of basis functions that can be used as the mother wavelet for
Wavelet Transformation. Since the mother wavelet produces all wavelet functions used in
the transformation through translation and scaling, it determines the characteristics of the
resulting Wavelet Transform. Therefore, the details of the particular application should be
taken into account and the appropriate mother wavelet should be chosen in order to use
the Wavelet Transform effectively[4].
[7] [7] [8]
[9] [e] [f]
[ g]
Figure 3.4 Wavelet families (a) Haar (b) Daubechies4 (c) Coiflet1 (d) Symlet2 (e) Meyer (f)
Morlet (g) Mexican Hat.
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 24
CHAPTER 4
Overview of DWT Algorithm and DA for DWT
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 25
4.1 DWT of an image
A low pass filter and a high pass filter are chosen, such that they exactly Halve the
frequency range between themselves. The filter pass is called the analysis filter pair. First
the low pass filter is applied for each row of data, thereby getting the low frequency
components of the row. But since the low pass filter is a half band filter, the output data
contains frequencies only in the first half of the original frequency range. So they can be
sub sampled by two, so that the output data now contains only half the original number of
samples. Now the high pass filter is applied for the same row of data, and similarly the
high pass components are separated and placed by the side of the low pass components.
This procedure is done for all rows. Next, the filtering is done for each column of
the intermediate data. The resulting two dimensional array of coefficients contains four
bands of data, each labeled as LL(low- Low), HL (high-low), LH (Low-High) and HH
(High-High). The LL band can be decomposed once again in the same manner, thereby
producing even more sub bands. This can be done up to any level, thereby resulting in a
pyramidal decomposition as shown.
The LL band at the highest level can be classified as most important and the other
detail bands can be classified as of lesser importance, with the degree of importance
decreasing from the top of the pyramid to the bands at the bottom[5].
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 26
Figure 4.1:Image encoding.
4.2 INVERSE DWT OF AN IMAGE.
Just as a forward transform is used to separate the image data into various classes of
importance a reverse transform is used to reassemble the various classes of data into a
reconstructed image. A pair of high pass and low pass filters is used here also. Then filter
pair is called the synthesis filter pair. The filtering procedure is just the opposite. We start
from the topmost level, apply the filters column wise first and then row wise and proceed
to the next level, till we reach the first level.
In this section the theoretical background and algorithm development is discussed.
The first recorded mention of what is now called a "wavelet" seems to be in 1909, in a
thesis by Alfred Haar. An image is represented as a two dimensional (2D) array of
coefficients, each coefficient representing the brightness level in that point[5].
When looking from a higher perspective, it is not possible to differentiate between
coefficients as more important ones, and lesser important ones. But thinking more
intuitively, it is possible. Most natural images have smooth color variations, with the fine
details being represented as sharp edges in between the smooth variations.
Technically, the smooth variations in color can be termed as low frequency
components and the sharp variations as high frequency components. The low frequency
components (smooth variations) constitute the base of an image, and the high frequency
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 27
components (the edges which give the detail) add upon them to refine the image, thereby
giving a detailed image. Hence the averages/smooth variations are demanding more
importance than the details.
In wavelet analysis, A signal can be separated into approximations or averages
and detail or coefficients. Averages are the high-scale, low frequency components of the
signal. The details are the low scale, high frequency components. If we perform forward
transform on a real digital signal, we wind up with twice as much data as we started with.
That‟s why after filtering down sampling has to be done.
The inverse process is how those components can be assembled back into the
original signal without loss of information. This process is called reconstruction or
synthesis. The mathematical manipulation that affects synthesis is called the inverse
discrete wavelet transform. The original signal, is reconstructed from the wavelet
coefficients. Where wavelet analysis involves filtering and down sampling, the wavelet
reconstruction process consists of up sampling and filtering. The DWT algorithm consists
of Forward DWT (FDWT) and Inverse DWT (IDWT) which are shown in fig.4.2
respectively.
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 28
Figure 4.2:Two dimensional decomposition.
Figure 4.3:Two Dimensional IDWT
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 29
The FDWT can be performed on a signal using different types of filters such as
db7, db4 or Haar. The Forward transform can be done in two ways, such as matrix
multiply method and linear equations. In the FDWT, each step calculates a set of wavelet
averages (approximation or smooth values) and a set of details. If a data set s0, s1, ... sN-
1 contains N elements, there will be N/2 averages and N/2 detail values. The averages are
stored in the upper half and the details are stored in the lower half of the N element array.
4.3. DISTRIBUTED ARITHMETIC FOR DWT
With the rapid progress of VLSI design technologies, many processors based on
audio and image signal processing have been developed recently. The two-dimensional
discrete wavelet transform (2D DWT) plays a major role in image/video compression
standard, such as JPEG2000 and MPEG4. Wavelets decompose the signal at one level of
approximation and detail signals at the next level. Thus subsequent levels can add more
details to the information content. Presently, research on the DWT is attracting a great
deal of attention. In addition to audio and image compression, the DWT has important
applications in many areas, such as computer graphics, numerical analysis, radar target
distinguishing and so forth. The architecture of the 2D DWT is mainly composed of the
multi-rate filters. Because extensive computation is involved in the practical applications,
e.g., digital cameras, high efficiency and low-cost hardware is indispensable. These
applications require real-time manipulation of digital images. Because this, fast
algorithms and specific circuits for DWT have been developed.
Among the methods for two-dimensional DWT, the indirect method based on
row-column decomposition is the best adapted to a hardware implementation. Distributed
arithmetic (DA) was proposed about two decades ago and has since used widely in VLSI
implementations of DSP architectures. Most of these applications are computation
intensive with multiplication and/or addition being the predominant operation. The main
advantage of distributed arithmetic approach is that it speeds up the multiply process by
pre-computing all the possible medium values and storing these values in a ROM. The
input data can then be used to directly address the memory and the result.
In this section, we only consider the separable 2-D DWT. We proposed an
efficient 2D DWT architecture based on distributed arithmetic. This architecture only
uses RAM in the
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 30
proposed architecture instead of ROM because the size of ROM grows exponentially
when the number of inputs and internal precision increase. Distributed arithmetic and
row-column
decomposition reduce the hardware amount and enhance the speed performance.
The basic architecture deals with the separable 2D DWT, whose mathematical
formulas are defined as follows. In the decomposition, the wavelet coefficients of any
stage can be calculated from DWT of the previous stage. The following expression shows
how the k-th scaling wavelet coefficients Xh(n,j+1) and Xg(n,j+1) are obtained at (j+1)
stage.
𝑋ℎ 𝑛, 𝑗 + 1 = 𝑋ℎ 𝑚, 𝑗 ℎ 𝑚 − 2𝑘 (4.1)
𝑋𝑔 𝑛, 𝑗 + 1 = 𝑋𝑔 𝑚, 𝑗 𝑔(𝑚 − 2𝑘) (4.2)
Figure 1, shows a classical one level implementation of analysis and synthesis of the
DWT system using filter bank structure. The input signal x(n) is filtered by the analysis
process using the low pass h and the high pass g filters. The symbols ↑2 and ↓2 are up
sampling and down sampling by a factor of two for decimating the filter results. The
synthesis process is dual of its analysis process[5].
Figure 4.4: One level implementation using filter bank
To derive Distributed Architecture for DWT, consider the following sum of products:
𝑦 = 𝐴𝑘𝑋𝑘𝐿𝑘=1 (4.3)
Where Ak is the fixed coefficient of the filter bank and Xk is the input samples. The
decomposed expression of (1) in form of DA can be written as equation 2:
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 31
Note that in equation (2), A is the distributed arithmetic matrix of fixed coefficients Aki,
where k = 1, 2, ...,L; i=1, 2, ...,N-1, with Ak N-1 is the MSB and Ak 0 is the LSB . It
should be noted again that, in Distributed Architecture for DWT, the bits of the
coefficients are distributed unlike conventional DA, where the bits of the input data words
are distributed. Furthermore, Distributed Architecture matrix contains only 0 and 1, which
means the computation of Y can be carried out just by shifting and adding of the input
vectors.
Matrix A is very important to DA architecture of DWT since its structure can lead
to savings in hardware to implement the computations. It only consists of 0's and 1's.
Therefore, we refer to matrix A as the Adder Butterflies. Overall, by using DA
architecture of DWT, inner product of vectors (1) can be implemented generally with
basic adder cells.
Consider the four high pass filter coefficients as
[2 3 4 2]
And,the image bits as
[X0,X1,X2…….X7]
The first image bit X0 enters the system filter and the sum of the product(sop) output is
given as Y0
Y0=2X0+3X-1+4X-2
Now X1 enters and Y1 is
Y1=3X0+2X1
Similarly Y2=4X0+3X1+2X2
Y3=2X0+3X1+4X2+2X3
Y4=2X1+4X2+3X3+2X4
Y5=2X0+4X1+3X4+2X5
Y6=……….
Y7=……….
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 32
Now let us take the input samples as [1 2 3 4 5 6 7 8] for easy computation and
configuring and realizing the distributive arithmetic architecture
H=[2 3 4 5],the filter coefficients
And the input samples as
X=[1 2 3 4 5 6 7 8]
And the computation is done as shown below
[2 3 4 5]
8 7 6 5 4 3 2 1
8 7 6 5 4 3 2 1
8 7 6 5 4 3 2 1
8 7 6 5 4 3 2 1
……………
…………….
Y0=2*1
Y1=3*1+2*2
Y2=4*1+3*2+2*3
Y3=5*1+4*2+3*3+2*4
Y4=5*2+4*3+3*4+2*5
Y5=……..
…..
…..
Now Y3 can be re-written as
Y3=5*[0 0 1]+4*[0 1 0]+3*[0 1 1]+2*[1 0 0]
=5 [0*22+0*2
1+1*2
0] +
4 [0*22+1*2
1+0*2
0] +
3 [0*22+1*2
1+1*2
0] +
2 [1*22+0*2
1+0*2
0]
Y3=
0 ∗ 5+
0 ∗ 4+
0 ∗ 3+
0 ∗ 2
* 22 +
0 ∗ 5+
1 ∗ 4+
1 ∗ 3+
0 ∗ 2
∗ 21 +
1 ∗ 5+
0 ∗ 4+
1 ∗ 3+
0 ∗ 2
*20
Similarly the input samples can be lasted till fourth bit in contrast with the earlier
example,where in we used 3-bits for each sample…in other words each input sample is
represented by the 4-bits
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 33
Lets consider another example to demonstrate the syntax of the above mentioned
equation for efficient realization.i.e,
H=[2 3 4 5]
X=[9 7 5 8]
The generalized or simple output representation is given as
Y=
1 ∗ 5+
0 ∗ 4+
0 ∗ 3+
1 ∗ 2
*23 +
0 ∗ 5+
1 ∗ 4+
1 ∗ 3+
0 ∗ 2
*22 +
0 ∗ 5+
0 ∗ 4+
1 ∗ 3+
0 ∗ 2
*21+
0 ∗ 5+
1 ∗ 4+
1 ∗ 3+
1 ∗ 2
*20
Now we can realize that, a total of 24 (or 16) coefficients can be stored in the rom.
On being developed the simplified representation of the sum of the product (sop) equation
Y,we move further to design the rough (prototype) architecture of the DA.
It consists of the SISO‟s and the ROM
Where the number of SISO registers depend upon the filters employed for
particular application.
1-bit of data is serially fed for each clock pulse into the SISO register and
shifting operation (i.e, either left or right shift) is performed.at the end of
the operation 1-bit output is serially fed out of the register.
ROM contains the mappable-coefficients.In other words the LSB‟s(least
significant bits) of all the input samples are mapped over to ROM for
corresponding coefficients.If LSB‟s match altogether with the ROM contents,then
the corresponding coefficient will be given as output
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 34
Figure 4.5: Showing the mapping the serial out on rom coefficients
The above prototype has the following reviews
It takes 3-clock cycles to load 1-single SISO
At the 4th clock 1-bit of SISO0 will be right-shifted into SISO1
Therefore,a total of 3*4=12 clock cycle is needed to load the shifters
The next 3-clocks are needed to map the LAB‟s of shifters on to ROM.and
generate 1-output.i,e, to compute the first output by parallel mapping of serial
outputs.
So a total of 21-cycles are required to generate first 3-outputs.
Another input sample enters at SISI0 for the next 3-clocks and SISO3 contents are
replaced by contents of SISO2
The distributed arithmetic architecture is incomplete without the section discussed below
The output of the ROM is given to the ADDER
ADDER contents are summed with the ACCUMULATOR contents.Accumulator
is initialized to zero at first.
The output of the Adder is right-shifted and stored in Accumulator.
The protype along with Adder,Accumulator and Shifter shows the perfect Distributed
Arithmetic Architecture.This is diagrammatically represented as shown below
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 35
Figure 4.6: General Distributive Arithmetic Architecture
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 36
CHAPTER 5
IMAGE COMPRESSION
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 37
5.1 PROBLEM STATEMENT
Distributed Arithmatic Architecture can be used for 9/7 tap filters in 2-
dimensional discrete wavelet transform. The 9-tap High-pass filter with the DA
Architecture has the following salient features
It has 9-SISO‟s,each of 8-bits
The First 8*9=72 cycles are for loading all SISO‟s
8-cycles for generating the first output
Next 8-cycles to load the first SISO
Next 8-cycles to compute
Total=8+8+8=24 cycles are required to compute the first 3-outputs
The first output is fed to Adder,which is summed with accumulator
contents.i.e,zero
The output is right shifted and fed to Accumulator.
And the cycle continues
The 7-tap low pass filter with the DA Architecture has the following salient features
It has 7-SISO‟s,each of 8-bits
The First 8*7=56 cycles are for loading all SISO‟s
5-cycles for generating the first output
Next 5-cycles to load the first SISO
Next 5-cycles to compute
Total=5+5+5=15 cycles are required to compute the first 3-outputs
The first output is fed to Adder,which is summed with accumulator
contents.i.e,zero
The output is right shifted and fed to Accumulator.
And the cycle continues
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 38
8-BIT SIS0 ROM-MEMORY MAP
DISTRIBUTED
ARITHMETIC
9-SISO
Figure 5.1: 9-tap high pass filter with DA-architecture
29
ADDER
ACCUM
ULATOR SHIFTER
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 39
8-BIT SIS0 ROM-MEMORY MAP
DISTRIBUTED
ARITHMETIC
7-SISO
Figure 5.2: 7-tap low- pass filter with DA-architecture
5.2 PROPOSED ARCHITECTURE
The architecture is based on popular Daubechies 9/7 filter bank (floating point)
used in JPEG2000 and MPEG4. The floating-point 9/7 forward transform uses two
analysis filter h
(high-pass) and g (low-pass). Without loss of generality we assume accuracy up to 5
decimal places, hence the coefficients are shown in equation 3. The finite precession of
the hardware
limits the accurate representation of the floating-point number; hence for the purpose of
implementation we will represent coefficients with accuracy of 13 bits. The assumption is
reasonable as 13 bits representation gives high enough accuracy for the fixed-point
implementation.
27
ADDER
ACCUM
ULATOR SHIFTER
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 40
The 9/7 tap high and low pass FIR filter are in the following:
Y(2i+1)=(-0.45656)*[X(2i-2)+X(2i+4)]
+(0.028772*[X(2i-1)+X(2i++3)]
+0.295636*[X92i)+X(2i+2)]
+(-0.55743)*X(2i+1);
Y(2i)=(0.0266749)*[X(2i-4)+X(2i+4)]
+(-0.016864)*[X(2i-3)+X(2i+3)]
+(-0.078223)*[X(2i-2)+X(2i+2)]
+(0.260864)*[X(2i-1)+X(2i+1)]
+(.002949)*[X(2i)];
So the coefficient matrixes are as the following:
h = [(-0.045636 )(0.028772)
(0.295636) (-0.557543 )];
g = [(0.026749) (-0.016864 )
(-0.078223 )(0.266864 )(0.602949 )];
Then the coefficient matrix (9/7 tap high and low pass FIR filter) can be distributed in to
13 bits (coefficient word length), so h and g can also be written as[5]:
h = [(2(2−12) 2−11 . . . (2−1) 2−0 ] Aℎ (5.1)
g=[(2(2−12) 2−11 . . . (2−1) 2−0 ] A𝑔 (5.2)
Aℎ and A𝑔are represented as following:
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 41
CHAPTER 6
SOFTWARE REFERENCE MODEL
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 42
6.1 MATLAB
6.1.1 OVERVIEW OF MATLAB
MATLAB is a high-performance language for technical computing. It integrates
computation, visualization, and programming in an easy-to-use environment where
problems and solutions are expressed in familiar mathematical notation. Typical uses
include:
Math and computation
Algorithm development
Data acquisition
Modeling, simulation, and prototyping
Data analysis, exploration, and visualization
Scientific and engineering graphics
Application development, including graphical user interface building
MATLAB is an interactive system whose basic data element is an array that does
not require dimensioning. This allows you to solve many technical computing problems,
especially those with matrix and vector formulations, in a fraction of the time it would
take to write a program in a scalar no interactive language such as C or FORTRAN.
The name MATLAB stands for matrix laboratory. MATLAB was originally
written to provide easy access to matrix software developed by the LINPACK and
EISPACK projects. Today, MATLAB engines incorporate the LAPACK and BLAS
libraries, embedding the state of the art in software for matrix computation.
MATLAB has evolved over a period of years with input from many users. In
university environments, it is the standard instructional tool for introductory and
advanced courses in mathematics, engineering, and science. In industry, MATLAB is the
tool of choice for high-productivity research, development, and analysis[6].
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 43
6.2 MATLAB SYSTEM
The MATLAB system consists of these main parts:
6.2.1 DESKTOP TOOLS AND DEVELOPMENT ENVIRONMENT
This is the set of tools and facilities that help you use MATLAB functions and
files. Many of these tools are graphical user interfaces. It includes the MATLAB desktop
and Command Window, a command history, an editor and debugger, a code analyzer and
other reports, and browsers for viewing help, the workspace, files, and the search path.
6.2.2 MATLAB MATHEMATICAL FUNCTION LIBRARY
This is a vast collection of computational algorithms ranging from elementary
functions, like sum, sine, cosine, and complex arithmetic, to more sophisticated functions
like matrix inverse, matrix Eigen values, Bessel functions, and fast Fourier transforms.
6.2.3 MATLAB LANGUAGE
This is a high-level matrix/array language with control flow statements, functions,
data structures, input/output, and object-oriented programming features. It allows both
„programming in the small‟ to rapidly create quick and dirty throw-away programs, and
„programming in the large‟ to create large and complex application programs.
6.2.4 GRAPHICS
MATLAB has extensive facilities for displaying vectors and matrices as graphs,
as well as annotating and printing these graphs. It includes high-level functions for two-
dimensional and three-dimensional data visualization, image processing, animation, and
presentation graphics. It also includes low-level functions that allow you to fully
customize the appearance of graphics as well as to build complete graphical user
interfaces on your MATLAB applications.
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 44
6.2.5 MATLAB EXTERNAL INTERFACES
This is a library that allows you to write C and FORTRAN programs that interact
with MATLAB. It includes facilities for calling routines from MATLAB (dynamic
linking), calling MATLAB as a computational engine, and for reading and writing MAT-
files.
6.3 IMAGE PROCESSING TOOLBOX
6.3.1 INTRODUCTION
Image Processing Toolbox is a collection of functions that extend the capability of
the MATLAB numeric computing environment. The toolbox supports a wide range of
image processing operations, including
Spatial image transformations
Morphological operations
Neighborhood and block operations
Linear filtering and filter design
Transforms
Image analysis and enhancement
Image registration
Region of interest operations
Many of the toolbox functions are MATLAB M-files, a series of MATLAB
statements that implement specialized image processing algorithms. We can view the
MATLAB code for these functions using the statement
„type function_name’
We can extend the capabilities of Image Processing Toolbox by writing your own
M-files, or by using the toolbox in combination with other toolboxes, such as Signal
Processing Toolbox and Wavelet Toolbox.
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 45
6.3.2 READ AND DISPLAY AN IMAGE
First, clear the MATLAB workspace of any variables and close open figure windows.
‘Close all’
To read an image, use the imread command. The example reads one of the sample
images included with Image Processing Toolbox, pout.tif, and stores it in an array named
I. I = imread ('pout.tif');
Now display the image. The toolbox includes two image display functions:
imshow and imtool. Imshow is the toolbox's fundamental image display function. Imtool
starts the Image Tool which presents an integrated environment for displaying images and
performing some common image processing tasks. The Image Tool provides all the
image display capabilities of imshow but also provides access to several other tools for
navigating and exploring images, such as scroll bars, the Pixel Region tool, Image
Information tool, and the Contrast Adjustment tool.
6.3.3 IMAGE APPEARANCE IN THE WORKSPACE
To see how the imread function stores the image data in the workspace, check the
Workspace browser in the MATLAB desktop. The Workspace browser displays
information about all the variables you create during a MATLAB session. The imread
function returned the image data in the variable I, which is a 291-by-240 element array of
uint8 data. MATLAB can store images as uint8, uint16, or double arrays.
6.3.4 IMPROVING IMAGE CONTRAST
pout.tif is a somewhat low contrast image. To see the distribution of intensities in
pout.tif, we can create a histogram by calling the imhist function.
figure, imhist(I)
The intensity range is rather narrow. It does not cover the potential range of
[0, 255], and is missing the high and low values that would result in good contrast. The
toolbox provides several ways to improve the contrast in an image.
One way is to call the histeq function to spread the intensity values over the full
range of the image, a process called histogram equalization.I2 = histeq(I);Display the new
equalized image, I2, in a new figure window.
figure, imshow(I2)
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 46
6.4 PSNR AND MSE FOR IMAGES
6.4.1 PSNR
Compute peak signal-to-noise ratio (PSNR) between images. The PSNR block
computes the peak signal-to-noise ratio, in decibels, between two images[6]. This ratio is
often used as a quality measurement between the original and a compressed image. The
higher the PSNR, the better the quality of the compressed image[1].
6.4.2 MSE
In statistics, the mean square error or MSE of an estimator is one of many ways to
quantify the difference between an estimator and the true value of the quantity being
estimated.
MSE is a risk function, corresponding to the expected value of the squared error
loss or quadratic loss. MSE measures the average of the square of the "error." The error is
the amount by which the estimator differs from the quantity to be estimated.
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 47
CHAPTER 7:
FPGA IMPLEMENTATION
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 48
7.1 FPGA basic design Flow Overview:
The ISE design flow comprises the following steps: design entry, design
synthesis, design implementation, and Xilinx device programming. Design verification,
which includes both functional verification and timing verification, takes places at
different points during the design flow. This section describes what to do during each
step. For additional details on each design step, click a box in the following figure.
Figure 7.1:FPGA Basic Design Flow
7.2 Design Summary:
Design entry is the first step in the ISE design flow. During design entry, you
create your source files based on your design objectives. You can create your top-level
design file using a Hardware Description Language (HDL), such as VHDL, Verilog, or
ABEL, or using a schematic. You specify your top-level module type when you create
your project as described in Creating a Project[9].
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 49
You can use multiple formats for the lower-level source files in your design. Different
source types are available, depending on your project properties (top-level module type,
device type, synthesis tool, and language). You can create these source files in Project
Navigator, as described in Creating a Source File. Some source types launch additional
tools to help you create the file, as described in Source File Types.
Table 7.1: Design Summary
image_inte Project Status
Project File: image_inte.ise Current State: Programming File
Generated
Module
Name: video
Errors: No Errors
Target
Device: xc2vp30-7ff896
Warnings: 703 Warnings (676 new,
0 filtered)
Product
Version:
ISE 10.1 -
WebPACK
Routing Results: All Signals Completely
Routed
Design Goal: Balanced
Timing
Constraints: All Constraints Met
Design
Strategy:
Xilinx Default
(unlocked)
Final Timing
Score: 0 (Timing Report)
image_inte Partition Summary [-]
No partition information was found.
Device Utilization Summary [-]
Logic Utilization Used Available Utilization Note(s)
Number of Slice Flip Flops 113 27,392 1%
Number of 4 input LUTs 333 27,392 1%
Logic Distribution
Number of occupied Slices 203 13,696 1%
Number of Slices containing only related logic 203 203 100%
Number of Slices containing unrelated logic 0 203 0%
Total Number of 4 input LUTs 378 27,392 1%
Number used as logic 333
Number used as a route-thru 45
Number of bonded IOBs 31 556 5%
Number of RAMB16s 15 136 11%
Number of BUFGMUXs 2 16 12%
Number of DCMs 1 8 12%
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 50
Performance Summary [-]
Final Timing Score: 0 Pinout Data: Pinout
Report
Routing Results: All Signals Completely Routed Clock Data: Clock
Report
Timing Constraints: All Constraints Met
Detailed Reports [-]
Report Name Status Generated Errors Warnings Infos
Synthesis Report Current Wed 9. Jun
00:02:30 2010 0
676 Warnings (676 new, 0
filtered)
25
Infos
(24
new, 0
filtered)
Translation
Report Current
Wed 9. Jun
00:03:10 2010 0
24 Warnings (0 new, 0
filtered) 0
Map Report Current Wed 9. Jun
00:03:52 2010 0
2 Warnings (0 new, 0
filtered)
3 Infos
(0 new,
0
filtered)
Place and Route
Report Current
Wed 9. Jun
00:05:20 2010 0
1 Warning (0 new, 0
filtered)
2 Infos
(0 new,
0
filtered)
Static Timing
Report Current
Wed 9. Jun
00:05:50 2010 0 0
3 Infos
(0 new,
0
filtered)
Bitgen Report Current Wed 9. Jun
00:06:38 2010 0 0
2 Infos
(0 new,
0
filtered)
Table 7.1(Contd): Design Summary
7.3 Timing Constraints:
The ISE software allows you to enter timing constraints that describe the timing
performance requirements of the design. Providing a concise set of constraints achieves
the following:
Allows the software to create a design that meets your requirements.
Allows you to compare the constraints to the performance of the resulting
design, using the timing reports output by the ISE software. By analyzing the
timing reports, you can identify the paths in the design that may require
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 51
coding modifications, placement directives, or additional constraints to
achieve timing closure. Increases the performance of the ISE software by
reducing the memory and runtime requirements[9].
Timing Constraints
Met Constraint Check
Worst
Case
Slack
Best Case
Achievable
Timing
Errors
Timing
Score
Yes Autotimespec constraint for clock
net dwt1/dw_2d/d1/s1
SETUP
HOLD
N/A
0.701ns 3.018ns N/A 0 0 0
Yes Autotimespec constraint for clock
net vga_out_pixel_clock_OBUF
SETUP
HOLD
N/A
0.562ns 23.268ns N/A 0 0 0
Yes Autotimespec constraint for clock
net dwt1/dw_2d/clkd3
SETUP
HOLD
N/A
0.635ns 1.863ns N/A 0 0 0
Yes Autotimespec constraint for clock
net dwt1/dw_2d/d2/s1
SETUP
HOLD
N/A
0.701ns 2.949ns N/A 0 0 0
Yes Autotimespec constraint for clock
net dwt1/dw_2d/d2/s
SETUP
HOLD
N/A
0.721ns 3.035ns N/A 0 0 0
Yes Autotimespec constraint for clock
net dwt1/dw_2d/d3/s1
SETUP
HOLD
N/A
0.712ns 3.138ns N/A 0 0 0
Yes Autotimespec constraint for clock
net dwt1/dw_2d/d3/s
SETUP
HOLD
N/A
0.713ns 3.297ns N/A 0 0 0
Yes Autotimespec constraint for clock
net dwt1/dw_2d/d1/s
SETUP
HOLD
N/A
0.855ns 3.445ns N/A 0 0 0
Table 7.2: Timing Constraints
7.4 Clock Report
This report contains information on the resource utilization of each clock region
and lists any clock conflicts between global clock buffers in a clock region.
Clock Report
Clock Net Resource Locked Fanout Net
Skew(ns)
Max
Delay(ns)
vga_out_pixel_clock_OBUF BUFGMUX0P No 443 0.233 1.212
dwt1/dw_2d/clkd3 BUFGMUX4P No 50 0.024 1.122
dwt1/dw_2d/d2/s BUFGMUX6P No 36 0.020 1.006
dwt1/dw_2d/d3/s BUFGMUX5P No 36 0.014 1.121
dwt1/dw_2d/d1/s BUFGMUX3P No 36 0.048 1.039
dwt1/dw_2d/d1/s1 Local
63 0.038 2.192
dwt1/dw_2d/d2/s1 Local
62 0.145 2.480
dwt1/dw_2d/d3/s1 Local
62 0.046 2.239
Table 7.3: Clock Report
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 52
7.5 Synthesis Report:
After design entry and optional simulation, you run synthesis. In the Sources tab, select
Synthesis/Implementation from the Design View drop-down list, and select the top
module. In the Processes tab, double-click Synthesize.
The ISE software includes Xilinx Synthesis Technology (XST), which synthesizes
VHDL, Verilog, or mixed language designs to create Xilinx-specific netlist files known
as NGC files. Unlike output from other vendors, which consists of an EDIF file with an
associated NCF file, NGC files contain both logical design data and constraints. XST
places the NGC file in your project directory and the file is accepted as input to the
Translate (NGDBuild) step of the Implement Design process. To specify XST as your
synthesis tool, you must set the Synthesis Tool Project Property to XST, as described in
Changing Project, Source, and Snapshot Properties[9].
Table 7.4: Synthesis Report
---- Source Parameters
Input File Name : "video.prj"
Input Format : mixed
Ignore Synthesis Constraint File : NO
---- Target Parameters
Output File Name : "video"
Output Format : NGC
Target Device : xc2vp30-7-ff896
---- Source Options
Top Module Name : video
Automatic FSM Extraction : YES
FSM Encoding Algorithm : Auto
Safe Implementation : No
FSM Style : lut
RAM Extraction : Yes
RAM Style : Auto
ROM Extraction : Yes
Mux Style : Auto
Decoder Extraction : YES
Priority Encoder Extraction : YES
Shift Register Extraction : YES
Logical Shifter Extraction : YES
XOR Collapsing : YES
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 53
ROM Style : Auto
Mux Extraction : YES
Resource Sharing : YES
Asynchronous To Synchronous : NO
Multiplier Style : auto
Automatic Register Balancing : No
---- Target Options
Add IO Buffers : YES
Global Maximum Fanout : 500
Add Generic Clock Buffer(BUFG) : 16 :16
Register Duplication : YES
Slice Packing : YES
Optimize Instantiated Primitives : NO
Convert Tristates To Logic : Yes
Use Clock Enable : Yes
Use Synchronous Set : Yes
Use Synchronous Reset : Yes
Pack IO Registers into IOBs : auto
Equivalent register Removal : YES
---- General Options
Optimization Goal : Speed
Optimization Effort : 1
Library Search Order : video.lso
Keep Hierarchy : NO
Netlist Hierarchy : as_optimized
RTL Output : Yes
Global Optimization : AllClockNets
Read Cores : YES
Write Timing Constraints : NO
Cross Clock Analysis : NO
Hierarchy Separator : /
Bus Delimiter : <>
Case Specifier : maintain
Slice Utilization Ratio : 100
BRAM Utilization Ratio : 100
Verilog 2001 : YES
Auto BRAM Packing : NO
Slice Utilization Ratio Delta : 5 Table 7.4(Contd): Synthesis Report
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 54
7.6 RTL Schematic:
The synthesized design can be viewed as a schematic in the register transfer level
(RTL) viewer. This view displays gates and elements independently of the targeted Xilinx
device.
Figure 7.2 : RTL Schematic
The schematic shows a representation of the pre-optimized design in terms of generic
symbols, such as adders, multipliers, counters, AND gates, and OR gates, which are
independent of the targeted Xilinx device. Viewing this schematic may help you discover
design issues early in the design process. [9]
Figure 7.3: Pictorial view of RTL schematic
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 55
Figure 7.4: Technology Schematic Overview
The synthesized design can be viewed as a schematic in a technology schematic viewer.
This view displays gates and elements as they will appear on the Xilinx device.
Figure 7.5: Technology Schematic
7.7 Implement Design:
Translate:
The Translate process merges all of the input net-lists and design constraints and outputs
a Xilinx native generic database (NGD) file, which describes the logical design reduced
to Xilinx primitives. See the following table for details. [9]
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 56
Table 7.5: Translate Process
NGDBUILD Design Results Summary:
Number of errors : 0
Number of warnings : 25
Total memory usage is 102260 kilobytes
7.7.1 Floor plan design after Translate
The general steps in the basic flow are as follows:
Design is created, synthesized, and transformed into an NGD file. The NGD file includes
location constraints that originated in your design source, a UCF, or an NCF. The file
may also include references or instances of IP macros. Floorplan Editor reads the NGD
file, reads the design hierarchy, pulls in data for any IP macros, and creates a
representation of your design. While reading the NGD file, Floorplan Editor interprets
any I/O standards applied to buffers connected to I/Os and displays them in the Design
Objects tab window. Floorplan Editor modifies one or more UCFs[9].
Note Floorplan Editor does not create the UCF. If you don‟t already have one, you must
first create at least one UCF using the Project Navigator New Source or Add Source
Translate Process
Command line tool NGDBuild
Tcl command process run "Translate"
Input files EDIF, SEDIF, EDN, EDF, NGC, UCF, NCF, URF, NMC,
BMM
Output files BLD (report), NGD
Process properties Translate Properties
Tools available after
running process Constraints Editor, Floorplan Editor, Floorplanner, PACE
Note Each of these tools modifies the UCF file. When you
rerun Translate with the updated UCF, the NGD file is
updated.
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 57
functions. The UCFs are then input to NGDBuild and the remainder of the Xilinx
implementation flow is completed.When the initial constraints are from your design
source or an NCF, these constraints cannot be removed when a UCF is used as Floorplan
Editor output. They can only be overridden by constraints applied in Floorplan Editor and
finally be saved in a UCF.
7.8 Map Report:
The Map process maps the logic defined by an NGD file into FPGA elements, such as
CLBs and IOBs. The output design is a native circuit description (NCD) file that
physically represents the design mapped to the components in the Xilinx FPGA. See the
following table for details. [9]
Map Process
Command line tools MAP
Tcl command process run "Map"
Input files NGD, NMC, NCD, NGM
Note The NCD and NGM files are for guiding.
Output files NCD, PCF, NGM, MRP (report), GRF, MAP, PSR
Process Properties Map Properties
Tools available after running process Floorplanner, FPGA Editor, Timing Analyzer
Table 7.6: Map Process
Table 7.7: Map Report(Below)
Target Device : xc2vp30
Target Package : ff896
Target Speed : -7
Design Summary
Number of errors : 0
Number of warnings : 2
Logic Utilization:
Number of Slice Flip Flops : 113 out of 27,392 1%
Number of 4 input LUTs :339 out of 27,392 1%
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 58
Logic Distribution:
Number of occupied Slices : 200 out of 13,696 1%
Number of Slices containing only related logic : 200 out of 200 100%
Number of Slices containing unrelated logic : 0 out of 200 0%
Total Number of 4 input LUTs : 371 out of 27,392 1%
Number used as logic : 339
Number used as a route-thru : 32
Number of bonded IOBs : 31 out of 556 5%
Number of RAMB16s : 15 out of 136 11%
Number of BUFGMUXs : 2 out of 16 12%
Number of DCMs : 1 out of 8 12%
Peak Memory Usage : 231 MB
Total REAL time to MAP completion : 11 secs
Total CPU time to MAP completion : 8 secs Table 7.7(Contd): Map Report
7.9 Place and Route:
The Place and Route process takes a mapped NCD file, places and routes the design,
and produces an NCD file that is used as input for bitstream generation.
Place and Route Process
Command line tools PAR
Tcl command process run "Place & Route"
Input files NCD, PCF
Note In addition to the NCD file from MAP, PAR also accepts an NCD file for guiding.
Output files NCD, PAR (report), PAD, CSV, TXT, GRF, DLY
Process Properties Place & Route Properties
Tools available after running
process
Floorplanner, FPGA Editor, Timing Analyzer, TRACE, XPower
Analyzer
Table 7.8: Place and Route Process
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 59
Device Utilization Summary:
Number of BUFGMUXs 2 out of 16 12%
Number of DCMs 1 out of 8 12%
Number of External IOBs 31 out of 556 5%
Number of LOCed IOBs 31 out of 31 100%
Number of RAMB16s 15 out of 136 11%
Number of SLICEs 200 out of 13696 1%
Overall effort level (-ol) Standard
Placer effort level (-pl) High
Placer cost table entry (-t) 1
Router effort level (-rl) Standard
REAL time consumed by placer 24 secs
CPU time consumed by placer 21 secs
Table 7.9: Place and Route
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 60
Figure 7.6: View of the design after routed in place and route[9]
Data in X-power analyser
Table 7.10: X-power analyzer[9]
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 61
7.10 Configure target device:
Target Device Properties
The following properties are available for the Configure Target Device process for a
CPLD or FPGA device.
iMPACT Project File
The iMPACT Project File (IPF) contains information from a previous session of
iMPACT. If you specify an IPF file in this property and run the Configure Target
Device process, the target device will be configured according to the settings in the
specified IPF file. If Default is specified here, the target device will be configured
according to the settings in the default IPF file, <ISE_image_inte>.ipf.
Port to be used (Advanced): Here we use USB, specifies the port you would like to
use for configuration. Auto-default causes the software to search every port for a
connection, automatically detect an available cable, and connect to it.Run Generate
Target PROM/ACE FileIf selected, the Configure Target Device process will
automatically run the Generate Target PROM/ACE File process to generate a PROM
or ACE file before configuring the target device.The file will be generated using the
information from the .ipf file specified in the iMPACT Project File property. When
Automatically Generate Target PROM/ACE File is set to True (checkbox is checked),
the PROM or ACE file is generated in the background before the target device is
configured. This is useful for quick PROM or System ACE file regeneration when a
bitstream has changed.[9]
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 62
Figure 7.7: Output Simulation Window
Figure 7.8: Snapshot1 of Image Compression Chip(internal view 1)
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 63
Figure 7.9: Image Compression Chip (internal view 2)
Figure 7.10: Image Compression Chip Internal View 3
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 64
RESULT:
[a] Original image [b] Reconstructed image
The original image and the reconstructed image are compared with respect to
PSNR(db) and MSE and the observation made is that, the original and the reconstructed
image are similar to each other. This validates our result.
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 65
CHAPTER 8
Conclusion and Scope for Future Work
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 66
8.1 Conclusion An image compression algorithm was simulated using Matlab to comprehend the
process of image compression. Modifications on the padding style showed reduction in
the error, because it offers a better reproduction of image at its edges. It also supports
faithful reproduction of the image, keeping the size of the transform coefficient matrix
equal to the image size. For the VLSI implementation of an image compression encoder,
Verilog HDL was chosen.
The proposed theoretical benefits of DA are realizing the full potential of FPGA
architecture for hardware implementation and achieving large parallelism. The relative
area and speed efficiencies of DA turns out to be good on hardware implementation on
FPGA. DA approach can achieve near to maximum clock rates possible with a given
FPGA technology using only basic 4-LUT based blocks and the fast ripple carry chains
while the multi stage modulo adders required in RNS implementation are slow, even for
small word lengths, and as such the accumulator stage becomes the performance
bottleneck.
It has also been observed that implementation of large adders in FPGAs with fast
carry chains is quite fast and the adder delay scales up less than linearly with increasing
word lengths. In light of the implementation results it is clear that DA based architectures
have an area, speed and simplicity advantage over any other method based on
implementations. It is in this context, we can say that DA implementations are superior
when targeting FPGAs.
8.2 Scope for Future work
The newly developed concept of „sparsity‟ in signal processing can be used in the
context of Image Compression. The first step of the scheme is to use a sparsifying
transform on the image. The sparse set of coefficients is encoded via Sparse PCA.
Wavelet Transform had been used profusely for image compression tasks. But the choice
is not the ideal one. The partial reconstruction error from wavelet coefficients is an order
of magnitude higher than the ideal error rate for many critical application. Image
compression can be carried in the curvelet domain—a better choice compared to
wavelets, atleast theoretically, since the reconstruction error rate with curvelet
coefficients is of the same asymptotic order as that of the ideal error rate.
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 67
APPENDIX-A
FPGA ARCHITECTURE
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 68
A Field Programmable Gate Array (FPGA) is a semiconductor device containing
programmable logic components and programmable interconnects. The programmable
logic components can be programmed to duplicate the functionality of basic logic gates
such as AND, OR, XOR, NOT or more complex combinational functions such as
decoders or imple mathematical functions. In most FPGAs these programmable logic
components also include memory elements, which may be simple flip-flops or more
programmable logic components also include memory elements, which may be simple
flip-flops or more complete blocks of memories. FPGAs are generally slower than their
Application Specific Integrated Circuits (ASIC) counterparts, as they can‟t handle as
complex a design and draw more power.[7]
The programmable logic devices are capable of implementing a sequential
network but not a complete digital system. Programmable gate arrays(PGAs) and
complex programmable logic devices(CPLDs) are more flexible and more versatile and
can be used and can be used to implement a complete digital system on a chip. Some of
the largest devices can implement a small microprocessor.
A typical PGA is an IC that contains an array of identical logic cells with
programmable interconnections. We can program the functions realized by each logic cell
and connections between the cells. Such PGAs are called FPGAs since they are field
programmable.[7]
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 69
A.1 APPLICATION OF FPGA
[7]
[7]
Figure A.1: Multiply accumulate operation
(a) Conventional implementation
(b) Distributed arithmetic implementation.
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 70
A.2 Virtex-II Pro
One of most advanced FPGA families in industry is the FPGA series produced by
Xilinx. The Virtex user programmable gate array comprises two major configurable
elements: configurable logic blocks (CLBs) and input/output blocks (IOBs). Each CLB is
composed of two slices as shown in Figure A.2 A slice contains 4- input, 1-output LUTs
and two registers. Interconnections between these elements are configured by
multiplexers controlled by SRAM cells programmed by a user‟s bit stream. The LUTs
allow any function of five inputs, and two functions of four inputs, or some functions of
up to nine inputs to be created within a CLB slice. This structure allows a very powerful
method of implementing
arbitrary, complex digital logic.
Figure A.2: Simplified Architecture of Virtex configurable logic block.
Virtex FPGAs are programmed using Verilog HDL; a popular hardware
description language . The language has capabilities to describe the behavioral nature of a
design, the data flow of a design, a design‟s structural composition, delays and a
waveform generation mechanism. Models written in this language can be verified using a
Verilog simulator. As a programming and development environment, Xilinx ISE
Foundation Series tools have been used to produce a physical implementation for the
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 71
Viretx FPGA. Field programmable gate arrays (FPGAs) provide a new implementation
platform for the discrete wavelet transform.
FPGAs maintain the advantages of the custom functionality of VLSI ASIC
devices, while avoiding the high development costs and the inability to make design
modifications after production. Furthermore, FPGAs inherit design flexibility and
adaptability of software implementations.
We make maximal utilization of the lookup table (LUT) architecture of Virtex
FPGAs by reformulating the wavelet transform computation in accordance with the
distributed arithmetic algorithm. Distributed arithmetic makes extensive use of look-up
tables, which makes it ideal for implementing the discrete wavelet transform functions
onto the LUT-based architecture of Virtex FPGAs. Moreover, distributed arithmetic is
suitable for low power portable applications because it allows replacement of costly
multipliers with shifts and look-up tables. Indeed, one of the unique features of our
discrete wavelet transform implementation is exploiting the natural match between the
Virtex architecture and distributed arithmetic.
Three more unique features are worth mentioning at this point.
The first is the flexibility of the implementation which is made possible by
virtue of the re-programmability of FPGAs which allows easy
modification of wavelet type.
The second is that, unlike most reported implementations which
concentrate on architecture development, this implementation goes down
to the actual implementation level.
Finally, describes implementations for both the forward and inverse
transforms.
A.3 INTERNAL CONFIGURATION
The basic Virtex logic element in a CLB is the slice . Two slices are present in
each CLB as shown in Figure 2.6. Each slice contains 4-input, 1-output LUTs and two
registers. Interconnections between these elements are configured by multiplexers
controlled by SRAM cells programmed by a user‟s bitstream. The LUTs allow any
function of five inputs, and two
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 72
functions of four inputs, or some functions of up to nine inputs to be created within a
CLB slice. The outputs of these functions may be registered, or the registers may be used
independently of the LUTs. This structure allows a very powerful method of
implementing arbitrary, complex digital logic.
Figure A.3. Simplified Virtex configurable slice
A.3.1 LOOK-UP TABLE IMPLEMENTATION
Virtex slices have the ability to implement distributed memory instead of logic.
Each 4- input LUT in a slice may be used to implement a 16x1 ROM or RAM, or the two
LUTs may be combined together to create a 32x1 ROM or RAM or a 16x1 dual-port
RAM. This allows each slice to trade logic resources for memory in order to maximize
the resources available for a particular application.
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 73
APPENDIX- B
VIRTEX-II PRO ARCHITECTURE
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 74
B.1 Introduction
The XUP Virtex-II Pro Development System provides an advanced hardware platform
that consists of a high performance Virtex-II Pro Platform FPGA surrounded by a
comprehensive collection of peripheral components that can be used to create a complex
system and to demonstrate the capability of the Virtex-II Pro Platform FPGA[8].
Features
Figure-I shows the Virtex-II Trainer, which includes the following components and
features:
Virtex-II Pro FPGA with PowerPC 405 cores
Up to 2 GB of Double Data Rate (DDR) SDRAM
System ACE controller and Type II Compact Flash connector for FPGA
configuration and data storage
Embedded Platform Cable USB configuration port
High-speed SelectMAP FPGA configuration from Platform Flash In-System
Programmable Configuration PROM
Support for “Golden” and “User” FPGA configuration bitstreams
On-board 10/100 Ethernet PHY device
Silicon Serial Number for unique board identification
RS-232 DB9 serial port
Two PS-2 serial ports
Four LEDs connected to Virtex-II Pro I/O pins
Four switches connected to Virtex-II Pro I/O pins
Five push buttons connected to Virtex-II Pro I/O pins
Six expansion connectors joined to 80 Virtex-II Pro I/O pins with over-voltage
protection
High-speed expansion connector joined to 40 Virtex-II Pro I/O pins that can be
used
differentially or single ended
AC-97 audio CODEC with audio amplifier and speaker/headphone output and line
level output
Microphone and line level audio input
On-board XSGA output, up to 1200 x 1600 at 70 Hz refresh
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 75
Three Serial ATA ports, two Host ports and one Target port
Off-board expansion MGT link, with user-supplied clock
100 MHz system clock, 75 MHz SATA clock
Provision for user-supplied clock
On-board power supplies
Power-on reset circuitry
PowerPC 405 reset circuitry
Block Diagram
Figure B.1: XUP Virtex-II Pro Development System Block Diagram[8]
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 76
Figure B.2: XUP Virtex-II Pro Development System Board Photo[8]
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 77
B.2 Virtex-II Pro FPGA:
U1 is a Virtex-II Pro FPGA device packaged in a flip-chip-fine-pitch FF896 BGA
package. Two different capacity FPGAs can be used on the XUP Virtex-II Pro
Development System with no change in functionality. Table B-1 lists the Virtex-II Pro
device features.
Features XC2VP20 XC2VP30
Slices 9280 13969
Array Size 56x46 80x46
Distributed RAM 290Kb 428Kb
Multiplier Blocks 88 136
Block RAMs 1584Kb 2448Kb
DCMs 8 8
PowerPC RISC Cores 2 2
Multi-Gigabit Transceivers 8 8
Table B-1: XC2VP20 and XC2VP30 Device Features
Power Supplies and FPGA Configuration
The XUP Virtex-II Pro Development System is powered from a 5V regulated
power supply. On-board switching power supplies generate 3.3V, 2.5V, and 1.5V for the
FPGA, and peripheral components and linear regulators power the MGTs.
The board has provisioning for current measurement for all of the FPGA digital power
supplies, as well as application of external power if the capacity of the on-board
switching power supplies is exceeded.
The XUP Virtex-II Pro Development System provides several methods for the
configuration of the Virtex-II Pro FPGA. The configuration data can originate from the
internal Platform Flash PROM (two potential configurations), the internal CompactFlash
storage media (eight potential configurations), and external configurations delivered from
the embedded Platform Cable USB or parallel port interface
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 78
Truth table of LUT3 Column1 Column2 Column3
I1 I2 IO O
0 0 0 0
0 0 1 0
0 1 0 1
0 1 1 0
1 0 0 0
1 0 1 1
1 1 0 1
1 1 1 1 Table B.2: Truth table of LUT3
Figure B.3: Internal structure of a basic LUT3[9]
Figure B.4: Karnaugh Map for LUT3[9]
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 79
Figure B.5: I/O Connections to Peripheral Devices[8]
Multi-Gigabit Transceivers
Four of the eight Multi-Gigabit Transceivers (MGTs) that are present in the
Virtex-II Pro FPGA are brought out to connectors and can be utilized by the user. Three
of the bidirectional MGT channels are terminated at Serial Advanced Technology
Attachment (SATA) connectors and the fourth channel terminates at user-supplied Sub-
Miniature A (SMA) connectors. The MGT transceivers are equipped with a 75 MHz
clock source that is independent for the system clock to support standard SATA
communication. An additional MGT clock source is available through a differential user-
supplied (SMA) connector pair. Two of the ports with SATA connectors are configured
as Host ports and the third SATA port is configured as a Target port to allow for simple
board-to-board networking. [8]
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 80
Figure B.6: SMA-based MGT Connections
Signal MGT Location PAD Name I/O Pin Notes
SATA_PORT0_TXN MGT_X0Y1 TXNPAD4 A27 HOST
SATA_PORT0_TXP MGT_X0Y1 TXPPAD4 A26 —
SATA_PORT0_RXN MGT_X0Y1 RXNPAD4 A24 —
SATA_PORT0_RXP MGT_X0Y1 RXPPAD4 A25 —
SATA_PORT0_IDLE — — B15 —
SATA_PORT1_TXN MGT_X1Y1 TXNPAD6 A20 TARGET
SATA_PORT1_TXP MGT_X1Y1 TXPPAD4 A19 —
SATA_PORT1_RXN MGT_X1Y1 RXNPAD6 A17 —
SATA_PORT1_RXP MGT_X1Y1 RXPPAD6 A18 —
SATA_PORT1_IDLE — — AK3 —
SATA_PORT2_TXN MGT_X2Y1 TXNPAD7 A14 HOST
SATA_PORT2_TXP MGT_X2Y1 TXPPAD7 A13 —
SATA_PORT2_RXN MGT_X2Y1 RXNPAD7 A11 —
SATA_PORT2_RXP MGT_X2Y1 RXPPAD7 A12 —
SATA_PORT2_IDLE — — C15 —
MGT_TXN MGT_X3Y1 TXNPAD9 A7 USER
MGT_TXP MGT_X3Y1 TXPPAD9 A6 —
MGT_RXN MGT_X3Y1 RXNPAD9 A4 —
MGT_RXP MGT_X3Y1 RXPPAD9 A5 —
MGT_CLK_N — — G16 BREFCLK
MGT_CLK_P — — F16 —
EXTERNAL_CLOCK_N — — F15 BREFCLK2
EXTERNAL_CLOCK_P — — G15 — Table B.3: SATA and MGT Signals
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 81
System RAM
The XUP Virtex-II Pro Development System has provision for the installation of
user supplied JEDEC-standard 184-pin dual in-line Double Data Rate Synchronous
Dynamic RAM memory module. The board supports buffered and unbuffered memory
modules with a capacity of 2 GB or less in either 64-bit or 72-bit organizations. The 72-
bit organization should be used if ECC error detection and correction is required.
System ACE Compact Flash Controller
The System Advanced Configuration Environment (System ACE) Controller
manages FPGA configuration data. The controller provides an intelligent interface
between an FPGA target chain and various supported configuration sources. The
controller has several ports: the Compact Flash port, the Configuration JTAG port, the
Microprocessor (MPU) port and the Test JTAG port. The XUP Virtex-II Pro
Development System supports a single System ACE Controller. The Configuration JTAG
ports connect to the FPGA and front expansion connectors. The Test JTAG port connects
to the JTAG port header and USB2 interface CPLD, and the MPU ports connect directly
to the FPGA. [8]
Serial Ports
The XUP Virtex-II Pro Development System provides three serial ports: a single
RS-232 port and two PS/2 ports. The RS-232 port is configured as a DCE with hardware
handshake using a standard DB-9 serial connector. This connector is typically used for
communications with a host computer using a standard 9-pin serial cable connected to a
COM port. The two PS/2 ports could be used to attach a keyboard and mouse to the XUP
Virtex-II Pro Development System. All of the serial ports are equipped with level-shifting
circuits, because the Virtex-II Pro FPGAs cannot interface directly to the voltage levels
required by RS-232 or PS/2.
User LEDs, Switches, and Push Buttons
A total of four LEDs are provided for user-defined purposes. When the FPGA
drives a logic 0, the corresponding LED turns on. A single four-position DIP switch and
five push buttons are provided for user input. If the DIP switch is up, closed, or on, or the
push button is pressed, a logic 0 is seen by the FPGA, otherwise a logic 1 is indicated. [8]
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 82
Table B.4: System Configuration Status LEDs
Expansion Connectors
A total of 80 Virtex-II Pro I/O pins are brought out to four user-supplied 60-pin
headers and two 40-pin right angle connectors for user-defined use. The 60-pin headers
are designed to accept ribbon-cable connectors, with every second signal a ground for
signal integrity. Some of these signals are shared with the front-mounted right-angle
connectors. The front-mounted connectors support Digilent expansion modules. In
addition, a highspeed connector is provided to support Digilent high-speed expansion
modules. This connector provides 40 single-ended or differential I/O signals in addition
to three clocks. [8]
XSGA Output
The XUP Virtex-II Pro Development System includes a video DAC and 15-pin
highdensity D-sub connector to support XSGA output. The video DAC can operate with a
pixel clock of up to 180 MHz. This allows for a VESA-compatible output of 1280 x 1024
at 75 Hz refresh and a maximum resolution of 1600 x 1200 at 70 Hz refresh[8].
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 83
DCM and XSGA Controller Settings for Various XSGA Formats
Table B.5: DCM and XSGA Controller settings for various XSGA Formats
USB 2 Programming Interface
The XUP Virtex-II Pro Development System includes an embedded USB 2.0
microcontroller capable of communications with either high-speed (480 Mb/s) or
fullspeed (12 Mb/s) USB hosts. This interface is used for programming or configuring the
Virtex-II Pro FPGA in Boundary-Scan (IEEE 1149.1/IEEE 1532) mode. Target clock
speeds are selectable from 750 kHz to 24 MHz. The USB 2.0 microcontroller attaches to
a desktop or laptop PC with an off-the-shelf high-speed A-B USB cable[8].
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 84
Table B.6: XSGA Output Connections
Using the CPU Debug Port and CPU Reset
The CPU Debug port (J36) is a right angle header that provides connections to the
debugging resources of the PowerPC 405 CPU core[8].
The PowerPC 405 CPU cores include dedicated debug resources that support a variety of
debug modes for debugging during hardware and software development. These debug
resources include:
Internal debug mode for use by ROM monitors and software debuggers
External debug mode for use by JTAG debuggers
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 85
Debug wait mode, which allows the servicing of interrupts while the processor
appears to be stopped
Real-time trace mode, which supports event triggering for real time tracing Debug
modes and events are controlled using debug registers in the processor. The debug
registers are accessed either through software running on the processor or through the
JTAG port. The debug modes, events, controls, and interfaces provide a powerful
combination of debug resources for hardware and software development tools.
The JTAG port interface supports the attachment of external debug tools, such as
the powerful ChipScope Integrated Logic Analyzer, a powerful tool providing logic
analyzer capabilities for signals inside an FPGA, without the need for expensive external
instrumentation. Using the JTAG test access port, a debug tool can single-step the
processor and examine the internal processor state to facilitate software debugging. This
capability complies with standard JTAG hardware for boundary scan system testing.
External debug mode can be used to alter normal program execution. It provides the
ability to debug system hardware as well as software. The mode supports multiple
functions: starting and stopping the processor, single-stepping instruction execution,
setting breakpoints, as well as monitoring processor status. Access to processor resources
is provided through the CPU Debug Port.
The PPC405 JTAG Debug Port supports the four required JTAG signals:
CPU_TCK, CPU_TMS, CPU_TDO, and CPU_TDI. It also implements the optional
CPU_TRST signal. The frequency of the JTAG clock signal, CPU_TCK, can range from
0 MHz up to one-half of the processor clock frequency. The JTAG debug port logic is
reset at the same time the system is reset, using the CPU_TRST signal. When
CPU_TRST is asserted, the JTAG TAP controller returns to the test-logic reset state.
Figure B.7: CPU Debug Connector Pinouts
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 86
Figure B.7 shows the pinout of the header used to debug the operation of software in the
CPU. This is accomplished using debug tools, such as the Xilinx Parallel Cable IV or
third party tools. The JTAG debug resources are not hardwired to specific pins and are
available for attachment in the FPGA fabric, making it possible to route these signals to
whichever FPGA pins the user prefers to use. The signal-pin connections used on the
XUP Virtex- II Pro Development System are identified in Table B.7 along with the
recommended I/O characteristics. Level shifting circuitry is provided for all signals to
convert from the 3.3V levels at the connector to the 2.5V levels at the FPGA.[8]
Table B.7: CPU Debug Port Connections and CPU Reset
The RESET_RELOAD pushbutton (SW1) provides two different functions
depending on how long the switch is depressed. If the switch is activated for more than 2
seconds, the XUP Virtex-II Pro Development System undergoes a complete reset and
reloads the selected configuration. If, however, the switch is activated for less than 2
seconds, aprocessor reset pulse of 100 microseconds is applied to the
PROCESSOR_RESET_Z signal.[8]
Configuring the FPGA:
At power up, or when the RESET_RELOAD push button (SW1) is pressed for
longer than 2 seconds, the FPGA begins to configure. The two configuration
methods supported, JTAG and master SelectMAP, are determined by the
CONFIG SOURCE switch, the most significant switch (left side) of SW9.
If the CONFIG SOURCE switch is closed, on, or up, a high-speed SelectMap
byte-wide configuration from the on-board Platform Flash configuration PROM
(U3) is selected as the configuration source. This is identified to the user through
the illumination of the PROM CONFIG LED (D19).
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 87
The Platform Flash configuration PROM supports two different FPGA
configurations (versions) selected by the position of the PROM VERSION switch,
the least significant switch (right side) of SW9.
If the PROM VERSION switch is closed, on, or up, the GOLDEN configuration
from the onboard Platform Flash configuration PROM is selected as the
configuration data. This is identified to the user through the illumination of the
GOLDEN CONFIG LED (D14). This configuration can be a board test utility
provided by Xilinx, or another safe default configuration. It is important to note
that the PROM VERSION switch is only sampled on board powerup and after a
complete system reset. This means that if this switch is changedafter board
powerup, the RESET_RELOAD pushbutton (SW1) must be pressed for more than
2 seconds for the new state of the switch to be recognized.
If the PROM VERSION switch is open, off, or down, a User configuration from
the on-board Platform Flash configuration PROM is selected as the configuration
data. This configuration must be programmed into the Platform Flash PROM from
the JTAG
The Platform Cable USB interface or the USB interface.
The Platform Flash is normally disabled after the FPGA is finished configuring
and has asserted the DONE signal. If additional data is made available to the
FPGA after the completion of configuration, jumper JP9 must be moved from the
NORMAL to the EXTENDED position to permanently enable the PROM and
allow the FPGA to clock out the additional data using the FPGA_PROM_CLOCK
signal.
If the CONFIG SOURCE switch is open, off, or down, a lower speed JTAG-based
configuration from Compact Flash or external JTAG source is selected as the
configuration source. This is identified to the user through the illumination of the
JTAG CONFIG LED (D20).
The JTAG-based configuration can originate from several sources: the Compact
Flash card, a PC4 cable connection through J27, and a USB to PC connection
through J8 the embedded Platform Cable USB interface.
If a JTAG-based configuration is selected, the default source is from the Compact
Flash port (J7). The System ACE controller checks the associated Compact Flash
socket and storage device for the existence of configuration data. If configuration
data exists on the storage device, the storage device becomes the source for the
configuration data. The file structure on the Compact Flash storage device
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 88
supports up to eight different configuration data files, selected by the triple CF
CONFIG SELECT DIP switch (SW8).
During JTAG configuration, the SYSTEMACE STATUS LED (D12) flashes until
the configuration process is completed, and the FPGA asserts the FPGA_DONE
signal and illuminates the DONE LED (D4). At any time, the RESET_RELOAD
pushbutton (SW1) can be used to load any of the eight different configuration data
files by pressing the switch for more than 2 seconds.
If a JTAG-based configuration is selected and a valid configuration file is not
found on the Compact Flash card by the System ACE controller (U2), the
SYSTEMACE ERROR LED (D11) flashes, and the System ACE controller
connects to an external JTAG port for FPGA configuration.
The default external source for FPGA configuration is the high-speed embedded
Platform Cable USB configuration port (J8) and is enabled when the System ACE
controller does not find configuration data on the storage device.
If a USB-equipped host PC is not available as a configuration source, then a
Parallel Cable 4 (PC4) interface can be used instead by connecting a PC4 cable to
J27.
Flash configuration PROM is enabled, the FPGA Start-Up Clock should be set to
CCLK in the Startup Options section of the Process Options for the generation of
the programming file, otherwise JTAG Clock should be selected.[8]
Figure B.8: Configuration data path
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 89
Table B.8: System Configuration Status LEDs
Four status LEDs show the configuration state of the XUP Virtex-II Pro Development
System at all times. The user can see the configuration source, configuration version, and
tell when the configuration has completed from the status LEDs shown in Table B-8.
Modified DA based DWT-IDWT on FPGA for Image Compression
Dept of E&C, Sir MVIT, Bengaluru Page 90
References
[1] Rafael C. Gonzalez, University of Tennessee and Richard E. Woods, MedData
Interactive, Digital Image Processing, Pearson Prentice Hall, 3 edition, 2009.
[2] Performance Analysis of Image Compression Using Wavelets by Sonja Grgic,
Mislav Grgic, and Branka Zovko-Cihlar IEEE TRANSACTIONS ON
INDUSTRIAL ELECTRONICS, VOL. 48, NO. 3, JUNE 2001
[3] JPEG official website,-www.jpeg.org/jpeg2000.html
[4] Performance Analysis of Image Compression Using Wavelets by Sonja Grgic,
Mislav Grgic, and Branka Zovko-Cihlar IEEE TRANSACTIONS ON
INDUSTRIAL ELECTRONICS, VOL. 48, NO. 3, JUNE 2001
[5] An Efficient VLSI Implementation of Distributed Architecture for DWT by Xixin
Cao, Qingqing Xie from School of Software and Microelectronics, Peking
University,Beijing, China
[6] Matlab support for Image Compression from
http://www.mathworks.nl/matlabcentral/fileexchange/4772
[7] http://www.support.xilinx.com/support/techsup/tutorials
[8] Virtex-II Pro Datasheet http://www.xilinx.com/support/documentation/virtex-
ii_pro_data_sheets.htm
[9] Xilinx-XST software toolbar help