Main Prjt Doc

CHAPTER 1

1.1 Introduction

Noise is a random fluctuation in an electrical signal, a characteristic of all electronic

circuits. Noise generated by electronic devices varies greatly, as it can be produced by

several different effects. In communication systems, the noise is an error or undesired

random disturbance of a useful information signal. Denoising is the extraction of a signal

from a mixture of signal and noise . This is the first step in many applications.

In this project, DWT is used for De-noising a one dimensional signal. The linear methods

of de-noising (like Filtering) have the drawback of either removing sharp features

(sudden changes) or not completely removing noise. The DWT is a non-linear method

that separates the signal from noise by comparing their amplitude rather than their

spectra.

1.2 Aim of the project

The aim of the project is to de-noise a real time signal and to design a suitable

architecture for high speed implementation.

1.3 Methodology

The test signal is initially analyzed in MATLAB using a suitable mother wavelet. Then it

is decomposed into required number of levels and denoised using a suitable threshold

rule.

De-noising using DWT is realized using the concept of Parallel Distributed Arithmetic in

VHDL for high computational speed.

The results of MATLAB and VHDL are compared.

1

http://en.wiktionary.org/wiki/signal

http://en.wiktionary.org/wiki/extraction

http://en.wikipedia.org/wiki/Communication_system

http://en.wikipedia.org/wiki/Electrical_circuit

http://en.wikipedia.org/wiki/Electronics

1.4 Significance of the Work

The significance of this work is that Wavelet Transforms are used to denoise the signal

instead of general techniques (like filtering). Recently the Wavelet Transform has gained

a lot of popularity in the field of signal processing. This is due to its capability of

providing both time and frequency information simultaneously, hence giving a time-

frequency representation of the signal. The traditional Fourier Transform can only

provide spectral information about a signal. Moreover, the Fourier method only works for

stationary signals. In many real world applications, the signals are non-stationary. One

solution for processing non-stationary signals is the Wavelet Transform. Currently there

is tremendous focus on the application of Wavelet Transform for real time signal

processing like De-noising and Compression.

1.5 Organization of the Report

Chapter 2 consists of Literature review related to the topic of work i.e. Wavelet

transforms and implementation of Distributed Arithmetic.

Chapter 3 consists of detailed procedure we adopted to denoise the signal

Chapter 4 consists of the, results and waveforms.

2

CHAPTER 2

2.1 WAVELET TRANSFORMS

2.1.1 Introduction

Mathematical transformations are applied to signals to obtain further information

From that signal that is not readily available in the raw signal. Usually a time-domain

signal is assumed as a raw signal and a signal that has been “transformed by any of the

available mathematical transformations as a processed signal.

There are number of transformations that can be applied, among which the Fourier

transforms are probably by far the most popular. In this section it will be analyzed how

wavelet transform overcomes some of the drawbacks of Fourier transforms.

2.1.2 Time Domain Analysis

Most of the signals in practice are time domain signals in their raw format. That is,

whatever that signal is measuring, is a function of time. In other words, when the signal is

plotted one of the axes is time (independent variable), and the other (dependent variable)

is usually the amplitude. When time-domain signals are plotted, time-amplitude

representation of the signal will be obtained. This representation is not always the best

representation of the signal for most signal processing related applications. In many

cases, the most distinguished information is hidden in the frequency content of the signal.

The frequency spectrum of a signal is basically the frequency components (spectral

components) of that signal. The frequency spectrum of a signal shows what frequencies

exist in the signal.

For example the electric power used in our daily life in the US is 60 Hz. This

means that if the electric current is plotted, it will be a sine wave passing through the

same point 60 times in 1 second. In the below figures, the first one is a sine wave at 3

Hz ,the second is the same at 10 Hz and the third one at 50 Hz.

3

Fig 2.1 sine waves with different frequencies

To find the frequency content of a signal, we use the FOURIER TRANSFORM (FT).

2.1.3 Frequency Domain Analysis

2.1.3.1 The Fourier Transform

If the FT of a signal in time domain is taken, the frequency-amplitude

representation of that signal is obtained. In other words, the signal is plotted with one axis

4

being the frequency and the other being the amplitude. This plot tells how much of each

frequency exists in a signal.

. The Fig 2.2 shows the FT of the 50 Hz signal

Fig 2.2

Often times, the information that cannot be readily seen in the time-domain can be

seen in the frequency domain.

The best example is that of an ECG signal (Electro Cardio Graphy, graphical

recording of heart's electrical activity). The typical shape of a healthy ECG signal is well

known to cardiologists. Any significant deviation from that shape is usually considered to

be a symptom of a pathological condition.

This pathological condition, however, may not always be quite obvious in the

original time-domain signal. Cardiologists usually use the time-domain ECG signals

which are recorded on strip-charts to analyze ECG signals. Recently, the new

computerized ECG recorders/analyzers also utilize the frequency information to decide

whether a pathological condition exists. A pathological condition can sometimes be

diagnosed more easily when the frequency content of the signal is analyzed.

This, of course, is only one simple example why frequency content might be

useful. Today Fourier transforms are used in many different areas including all branches

of engineering.

5

2.1.4. Drawbacks of Fourier Transform

Although FT is probably the most popular transform being used (especially in

Electrical engineering), it is not the only one. There are many other transforms that are

used quite often by engineers and mathematicians. Hilbert transform, short-time Fourier

transform , Wigner distributions, the Radon Transform, and the wavelet transform,

constitute only a small portion of a huge list of transforms that are available at engineer's

and mathematician's disposal. Every transformation technique has its own area of

application, with advantages and disadvantages, and the wavelet transform (WT) is no

exception.

FT is a reversible transform, that is, it allows to go back and forward between the

raw and processed (transformed) signals. However, only either of them is available at any

given time. That is, no frequency information is available in the time-domain signal, and

no time information is available in the Fourier transformed signal. But sometimes it is

necessary to have both the time and the frequency information at the same time

depending on the particular application, and the nature of the signal in hand. To better

understand the drawbacks of Fourier transforms it is required to understand the concept

of stationary and non-stationary signals.

2.1.5 Stationary and Non-Stationary Signals

FT gives the frequency information of the signal, which means that it tells us how

much of each frequency exists in the signal, but it does not tell us when in time these

frequency components exist. This information is not required when the signal is

stationary. Signals whose frequency content do not change in time are called stationary

signals. In other words, the frequency content of stationary signals do not change in time.

In this case, one does not need to know at what times frequency components exist , since

all frequency components exist at all times.

For example the following signal

x(t)=cos(2*pi*10*t)+cos(2*pi*25*t)+cos(2*pi*50*t)+cos(2*pi*100*t)

6

is a stationary signal, because it has frequencies of 10, 25, 50, and 100 Hz at any given

time instant. This signal is plotted below:

Figure 2.3 Stationary Signal

And the following is its FT:

Figure 2.4 FT of Stationary signal

It contains the four spectral components corresponding to the frequencies 10, 25, 50 and

100 Hz.

Contrary to the signal in Figure 2.3, the following signal is not stationary. Figure

2.5 plots a signal whose frequency constantly changes in time. This is a non-stationary

signal.

7

Figure 2.5 Non Stationary signal

And the following is its FT:

Figure 2.6 FT of Non Stationary signal

Little ripples at this time are due to sudden changes from one frequency

component to another, which have no significance.

8

Now, comparing the Figures 2.4 and 2.6,the similarity between these two spectrum is

apparent. Both of them show four spectral components at exactly the same frequencies,

i.e., at 10, 25, 50, and 100 Hz. Other than the ripples, and the difference in amplitude

(which can always be normalized), the two spectrums are almost identical, although the

corresponding time-domain signals are not even close to each other. Both of the signals

involve the same frequency components, but the first one has these frequencies at all

times, the second one has these frequencies at different intervals. So, FT gives the

spectral content of the signal, but it gives no information regarding where in time those

spectral components appear. Therefore, FT is not a suitable technique for non-stationary

signal.

2.1.6. Need for Time Frequency Representation

FT can be used for non-stationary signals, if we are only interested in what spectral

components exist in the signal, but not interested where these occur. However, if this

information is needed, i.e., if we want to know, what spectral component occur at what

time (interval) , then Fourier transform is not the right transform to use.

For practical purposes it is difficult to make the separation, since there are a lot of

practical stationary signals, as well as non-stationary ones. Almost all biological signals,

for example, are non-stationary. Some of the most famous ones are ECG (electrical

activity of the heart, electrocardiograph), EEG (electrical activity of the brain,

electroencephalograph), and EMG (electrical activity of the muscles, electromyogram).

When the time localization of the spectral components is needed, a transform

giving the TIME-FREQUENCY REPRESENTATION of the signal is needed. The next

transform developed to serve this purpose is the Short Term Fourier Transform.

9

2.1.7. Short Term Fourier Transform

There is only a minor difference between STFT and FT. In STFT, the signal is

divided into small enough segments, where these segments (portions) of the signal can be

assumed to be stationary. For this purpose, a window function "w" is chosen. The width

of this window must be equal to the segment of the signal where its stationary is valid.

This window function is first located to the very beginning of the signal. That is, the

window function is located at t=0. Suppose that the width of the window is "T" s. At this

time instant (t=0), the window function will overlap with the first T/2 seconds. The

window function and the signal are then multiplied. By doing this, only the first T/2

seconds of the signal is being chosen, with the appropriate weighting of the window (if

the window is a rectangle, with amplitude "1", then the product will be equal to the

signal). Then this product is assumed to be just another signal, whose FT is to be taken.

In other words, FT of this product is taken, just as taking the FT of any signal.

The result of this transformation is the FT of the first T/2 seconds of the signal. If

this portion of the signal is stationary, as it is assumed, then there will be no problem and

the obtained result will be a true frequency representation of the first T/2 seconds of the

signal.

The next step, would be shifting this window (for some t1 seconds) to a new location,

multiplying with the signal, and taking the FT of the product. This procedure is followed,

until the end of the signal is reached by shifting the window with "t1" seconds intervals.

The following definition of the STFT summarizes all the above explanations in one

line

Equ .2.1

10

In the above equation x(t) is the signal itself, w(t) is the window function, and * is

the complex conjugate. From the equation, we can observe that the STFT of the signal is

nothing but the FT of the signal multiplied by a window function.

For every t' and f a new STFT coefficient is computed.

Fig 2.7 STFT coefficients computed for every t’ and f

The Gaussian-like functions in color are the windowing functions. The red one

shows the window located at t=t1', the blue shows t=t2', and the green one shows the

window located at t=t3'. These will correspond to three different FTs at three different

times. Therefore, we will obtain a true time-frequency representation (TFR) of the signal.

STFT is a function of both time and frequency (unlike FT, which is a function of

frequency only), the transform would be two dimensional (three, if you count the

amplitude too).

Consider a non-stationary signal, such as the following one:

11

Figure 2.8 Non stationary signal

In this signal, there are four frequency components at different times. The interval 0

to 250 ms is a simple sinusoid of 300 Hz, and the other 250 ms intervals are sinusoids of

200 Hz, 100 Hz, and 50 Hz, respectively. Apparently, this is a non-stationary signal.

Below is its STFT:

12

Fig 2.9 STFT of Non stationary signal

This is two dimensional plot (3 dimensional, if the amplitude is also). The "x" and

"y" axes are time and frequency, respectively. The graph is symmetric with respect to

midline of the frequency axis, FT of a real signal is always symmetric, since STFT is

nothing but a windowed version of the FT, STFT is also symmetric in frequency. The

symmetric part is said to be associated with negative frequencies.

There are four peaks corresponding to four different frequency components. Also

note that, unlike FT, these four peaks are located at different time intervals along the time

axis.

From the above fig. it is not only possible to know what frequency components are

present in the signal, but we also know where they are located in time.

The implicit problem of the STFT is not obvious in the above example. The

problem with STFT is the Heisenberg Uncertainty Principle . This principle originally

applied to the momentum and location of moving particles, can be applied to time-

13

frequency information of a signal. Simply, this principle states that one cannot know the

exact time-frequency representation of a signal, i.e., one cannot know what spectral

components exist at what instances of times. What one can know are the time intervals in

which certain band of frequencies exist, which is a resolution problem.

2.1.8. The Resolution Problem

The problem with the STFT has something to do with the width of the window

function that is used. To be technically correct, this width of the window function is

known as the support of the window. If the window function is narrow, then it is known

as compactly supported.

In the FT there is no resolution problem in the frequency domain, i.e., it is known

exactly what frequencies exist; similarly there is no time resolution problem in the time

domain, since we know the value of the signal at every instant of time. Conversely, the

time resolution in the FT, and the frequency resolution in the time domain are zero, since

we have no information about them. What gives the perfect frequency resolution in the

FT is the fact that the window used in the FT is its kernel, the exp{jwt} function, which

lasts at all times from minus infinity to plus infinity. Now, in STFT, our window is of

finite length, thus it covers only a portion of the signal, which causes the frequency

resolution to get poorer i.e. we no longer know the exact frequency components that exist

in the signal, but we only know a band of frequencies that exist:

In FT, the kernel function, allows us to obtain perfect frequency resolution,

because the kernel itself is a window of infinite length. In STFT window is of finite

length, and we no longer have perfect frequency resolution.

If a window of infinite length is used, we get the FT, which gives perfect

frequency resolution, but no time information. Furthermore, in order to obtain the

stationarity, we have to have a short enough window, in which the signal is stationary.

The narrower we make the window, the better the time resolution, and better the

assumption of stationarity, but poorer the frequency resolution:

14

Narrow window ===>good time resolution, poor frequency resolution.

Wide window ===>good frequency resolution, poor time resolution.

In order to see these effects, consider four windows of different length to compute

the STFT. The window function we use is simply a Gaussian function in the form:

w (t)=exp(-a*(t^2)/2);

where a determines the length of the window, and t is the time. The following figure

shows four window functions of varying regions of support, determined by the value of a.

Just note the length of each window. The above example given was computed with the

second value, a=0.001. The STFT of the same signal given above is computed with the

other windows.

Figure 2.10 Different windows

:

15

Figure 2.11 STFT found using narrow window

The fig. shows the STFT found using most narrow window. The four peaks are well

separated from each other in time. In frequency domain, every peak covers a range of

frequencies, instead of a single frequency value. Now by making the window wider, (i.e.

the 3rd window, the second one was already shown in the first example), the STFT is as

shown

16

Figure 2.12 STFT found using wider window

The peaks are not well separated from each other in time, unlike the previous case,

however, in frequency domain the resolution is much better. By further increasing the

width of the window, the STFT is plotted as shown.

17

Figure 2.13 Showing Resolution problem

These examples illustrate the implicit problem of resolution of the STFT. Anyone

who would like to use STFT is faced with this problem of resolution. Narrow windows

give good time resolution, but poor frequency resolution. Wide windows give good

frequency resolution, but poor time resolution; furthermore, wide windows may violate

the condition of stationarity. The problem is a result of choosing a window function, once

and for all, and uses that window in the entire analysis. If the frequency components are

well separated from each other in the original signal, than we may sacrifice some

frequency resolution and go for good time resolution, since the spectral components are

already well separated from each other. However, if this is not the case, then a good

window function could be difficult to find.

Although the time and frequency resolution problems are results of a physical

phenomenon (the Heisenberg uncertainty principle) and exist regardless of the transform

used, it is possible to analyze any signal by using an alternative approach called the

multiresolution analysis (MRA). MRA, as implied by its name, analyzes the signal at

18

different frequencies with different resolutions. Every spectral component is not resolved

equally as was the case in the STFT.

MRA is designed to give good time resolution and poor frequency resolution at

high frequencies and good frequency resolution and poor time resolution at low

frequencies. This approach makes sense especially when the signal at hand has high

frequency components for short durations and low frequency components for long

durations. Fortunately, the signals that are encountered in practical applications are often

of this type. For example, the following shows a signal of this type. It has a relatively low

frequency component throughout the entire signal and relatively high frequency

components for a short duration somewhere around the middle.

2.1.9. THE CONTINUOUS WAVELET TRANSFORM

The continuous wavelet transform was developed as an alternative approach to

the short time Fourier transforms to overcome the resolution problem. The wavelet

analysis is done in a similar way to the STFT analysis, in the sense that the signal is

multiplied with a function, similar to the window function in the STFT, and the transform

is computed separately for different segments of the time-domain signal. However, there

are two main differences between the STFT and the CWT:

1. The Fourier transforms of the windowed signals are not taken, and therefore single

peak will be seen corresponding to a sinusoid, i.e., negative frequencies are not

computed.

2. The width of the window is changed as the transform is computed for every single

spectral component, which is probably the most significant characteristic of the wavelet

transform.

The continuous wavelet transform is defined as follows

19

Equ.2.2

As seen in the above equation, the transformed signal is a function of two variables, tau

and s, the translation and scale parameters, respectively. psi(t) is the transforming

function, and it is called the mother wavelet . The term mother wavelet gets its name due

to two important properties of the wavelet analysis as explained below:

The term wavelet means a small wave. The smallness refers to the condition that this

(window) function is of finite length (compactly supported). The wave refers to the

condition that this function is oscillatory. The term mother implies that the functions with

different region of support that are used in the transformation process are derived from

one main function, or the mother wavelet. In other words, the mother wavelet is a

prototype for generating the other window functions.

The term translation is used in the same sense as it was used in the STFT; it is related to

the location of the window, as the window is shifted through the signal. This term,

obviously, corresponds to time information in the transform domain. However, there is

no frequency parameter, as we had before for the STFT. Instead, we have scale parameter

which is defined as 1/frequency. The term frequency is reserved for the STFT.

2.1.9.1The Scale

The parameter scale in the wavelet analysis is similar to the scale used in maps. As in the

case of maps, high scales correspond to a non-detailed global view (of the signal), and

low scales correspond to a detailed view. Similarly, in terms of frequency, low

frequencies (high scales) correspond to a global information of a signal (that usually

spans the entire signal), whereas high frequencies (low scales) correspond to a detailed

information of a hidden pattern in the signal (that usually lasts a relatively short time).

20

Cosine signals corresponding to various scales are given as examples in the following

figure.

Figure 2.14 Signal corresponding to various scales

Fortunately in practical applications, low scales (high frequencies) do not last for the

entire duration of the signal, unlike those shown in the figure, but they usually appear

from time to time as short bursts, or spikes. High scales (low frequencies) usually last for

the entire duration of the signal.

Scaling, as a mathematical operation, either dilates or compresses a signal. Larger scales

correspond to dilated (or stretched out) signals and small scales correspond to

21

compressed signals. All of the signals given in the figure are derived from the same

cosine signal, i.e., they are dilated or compressed versions of the same function. In the

above figure, s=0.05 is the smallest scale, and s=1 is the largest scale.

In terms of mathematical functions, if f(t) is a given function f(st) corresponds to a

contracted (compressed) version of f(t) if s > 1 and to an expanded (dilated) version of

f(t) if s < 1 .

However, in the definition of the wavelet transform, the scaling term is used in the

denominator, and therefore, the opposite of the above statements holds, i.e., scales s > 1

dilates the signals whereas scales s < 1 , compresses the signal. This interpretation of

scale will be used throughout this text.

2.1.9.2. COMPUTATION OF THE CWT

Continuous Wavelet Transform can be computed in five steps. The continuous wavelet

transform is the sum over all time of the signal multiplied by scaled, shifted versions of

the wavelet. This process produces wavelet coefficients that are a function of scale and

position.

1. Take a wavelet and compare it to a section at the start of the original signal.

2. Calculate a number, C, that represents how closely correlated the wavelet is with this

section of the signal. The higher C is, the more the similarity. More precisely, if the

signal energy and the wavelet energy are equal to one, C may be interpreted as a

22

correlation coefficient. The results will depend on the shape of the wavelet chosen.

3. Shift the wavelet to the right and repeat steps 1 and 2 until you've the whole signal is

covered.

4. Scale (stretch) the wavelet and repeat steps 1 through 3.

5. Repeat steps 1 through 4 for all scales.

c=0.2247

When the process is done, the coefficients produced at different scales by different

sections of the signal are obtained. The coefficients constitute the results of a regression

of the original signal performed on the wavelets. We have to make a plot on which the x-

axis represents position along the signal (time), the y-axis represents scale, and the color

at each x-y point represents the magnitude of the wavelet coefficient C.

23

Figure 2.15 Represents wavelet coefficients at each x-y point

2.1.10. Need of Discrete Wavelet Transform

The discretized continuous wavelet transform enables the computation of the continuous

wavelet transform by computers; it is not a true discrete transform. As a matter of fact,

the wavelet series is simply a sampled version of the CWT, and the information it

provides is highly redundant as far as the reconstruction of the signal is concerned. This

redundancy, on the other hand, requires a significant amount of computation time and

resources. The discrete wavelet transform (DWT), on the other hand, provides sufficient

information both for analysis and synthesis of the original signal, with a significant

reduction in the computation time.

The DWT is considerably easier to implement when compared to the CWT. The DWT

analyzes the signal at different frequency bands with different resolutions by

decomposing the signal into a coarse approximation and detail information. DWT

employs two sets of functions, called scaling functions and wavelet functions, which are

associated with low pass and high pass filters, respectively. The decomposition of the

signal into different frequency bands is simply obtained by successive high pass and low

pass filtering of the time domain signal.

24

2.1.11. One-Stage Filtering: Approximations and Details

For many signals, the low-frequency content is the most important part. It is what gives

the signal its identity. The high-frequency content, on the other hand, imparts flavor or

nuance. Consider the human voice. By removing the high-frequency components, the

voice sounds different, but it can still be told what's being said. However, if the low-

frequency components are removed, one hears gibberish.

In wavelet analysis, it is often spoken of approximations and details. The approximations

are the high-scale, low-frequency components of the signal. The details are the low-scale,

high-frequency components.

The filtering process, at its most basic level, looks like this.

Figure 2.16 Filtering process at its basic level

The original signal, S, passes through two complementary filters and emerges as two

signals. If this operation is actually performed on a real digital signal, we wind up with

twice as much data as we started with. Suppose, for instance, that the original signal S

consists of 1000 samples of data. Then the resulting signals will each have 1000 samples,

for a total of 2000.. There exists a more subtle way to perform the decomposition using

wavelets. By looking carefully at the computation, only one point out of two in each of

the two 2000-length samples may be kept to get the complete information. This is the

notion of down sampling. We produce two sequences called cA and cD.

25

Figure 2.17 Shows the process of obtaining DWT coefficients

The process on the right, which includes down sampling, produces DWT coefficients.

Below it is shown how to perform a one-stage discrete wavelet transform of a sinusoid

signal with high-frequency noise added to it.

The schematic figure is as shown

.Figure 2.18 schematic representation for one stage decomposition

2.1.12. Multiple-Level Decomposition

The decomposition process can be iterated, with successive approximations being

decomposed in turn, so that one signal is broken down into many lower resolution

components. This is called the wavelet decomposition tree.

26

Figure 2.19 Wavelet decomposition tree

Looking at a signal's wavelet decomposition tree can yield valuable information.

Figure 2.20 Signal’s Wavelet decomposition tree.

Since the analysis process is iterative, in theory it can be continued indefinitely. In

reality, the decomposition can proceed only until the individual details consist of a single

sample or pixel. In practice, we will select a suitable number of levels based on the nature

of the signal, or on a suitable criterion such as entropy.

2.1.13. Wavelet Reconstruction

27

The mathematical manipulation that effects synthesis is called the inverse discrete

wavelet transform (IDWT).To synthesize a signal using Wavelet Toolbox software,it is

reconstructed from the wavelet coefficients.

Fig 2.21.Reconstruction from the wavelet coefficients.

Where wavelet analysis involves filtering and down sampling, the wavelet reconstruction

process consists of up sampling and filtering. Up sampling is the process of lengthening a

signal component by inserting zeros between samples.

Fig 2.22 Single component and Upsampled signal component

The toolbox includes commands, like idwt and waverec, that perform single-level or

multilevel reconstruction, respectively, on the components of one-dimensional signals.

These commands have their two-dimensional analogs, idwt2 and waverec2.

2.1.14. Reconstruction Filters

28

http://www.mathworks.com/access/helpdesk/help/toolbox/wavelet/waverec2.html

http://www.mathworks.com/access/helpdesk/help/toolbox/wavelet/idwt2.html

http://www.mathworks.com/access/helpdesk/help/toolbox/wavelet/waverec.html

http://www.mathworks.com/access/helpdesk/help/toolbox/wavelet/idwt.html

The filtering part of the reconstruction process is important, because it is the choice of

filters that is crucial in achieving perfect reconstruction of the original signal.

The down sampling of the signal components performed during the decomposition phase

introduces a distortion called aliasing. It turns out that by carefully choosing filters for the

decomposition and reconstruction phases that are closely related (but not identical), we

the effects of aliasing can be cancelled out.

The low- and high-pass decomposition filters (L and H), together with their associated

reconstruction filters (L' and H'), form a system of what is called quadrature mirror

filters:

Fig 2.23 (a) Decomposition and (b) Reconstruction

2.1.14.1Reconstructing Approximations and Details

It is possible to reconstruct our original signal from the coefficients of the approximations

and details.

Fig 2.24 Reconstructing approximations and details

29

It is also possible to reconstruct the approximations and details themselves from their

coefficient vectors. As an example, consider how the first-level approximation A1 can be

reconstructed from the coefficient vector cA1.

Coefficient vector cA1 is passed through the same process used to reconstruct the

original signal. However, instead of combining it with the level-one detail cD1, we feed

in a vector of zeros in place of the detail coefficients vector:

Fig 2.25 Reconstructing the signal from approximations

The process yields a reconstructed approximation A1, which has the same length as the

original signal S and which is a real approximation of it.

Similarly, the first-level detail D1 can be reconstructed, using the analogous process:

Fig 2.25 Reconstructing the signal from details

The reconstructed details and approximations are true constituents of the original signal.

In fact, when we combine them it can be found that

30

The coefficient vectors cA1 and cD1 -- because they were produced by down

sampling and are only half the length of the original signal -- cannot directly be combined

to reproduce the signal. It is necessary to reconstruct the approximations and details

before combining them.

Extending this technique to the components of a multilevel analysis, we find that similar

relationships hold for all the reconstructed signal constituents. That is, there are several

ways to reassemble the original signal:

Fig 2.26 Reconstructed signal components

2.2 Distributed Arithmetic (DA)

2.2.1 Distributed Arithmetic at a Glance

The arithmetic sum of products that defines the response of linear, time-

invariant networks can be expressed as:

Equ 2.3

Where

31

is response of network at time n.

is kth input at time n.

is weighing factor of kth input variable that is constant for all n,

and so it remains time-invariant.

In filtering applications the constants, Ak , are the filter coefficients and the variables,

xk , are the prior samples of a single data source (for example, an analog to digital

converter). In frequency transforming - whether the discrete Fourier or the fast Fourier

transform - the constants are the sine/cosine basis functions and the variables are a block

of samples from a single data source. Examples of multiple data sources may be found in

image processing.

The multiply-intensive nature of equ2.3 can be appreciated by observing that a

single output response requires the accumulation of K product terms. In DA the task of

summing product terms is replaced by table look-up procedures that are easily

implemented in the Xilinx configurable logic block (CLB) look-up table architecture.

We start by defining the number format of the variable to be 2’s complement, fractional -

a standard practice for fixed-point microprocessors in order to bound number growth

under multiplication. The constant factors, Ak, need not be so restricted, nor are they

required to match the data word length, as is the case for the microprocessor. The

constants may have a mixed integer and fractional format; they need not be defined at

this time. The variable, xk, may be written in the fractional format as shown in equ. 2.4

Equ 2.4

where xkb is a binary variable and can assume only values of 0 and 1. A sign bit of value

-1 is indicated by xk0. The time index, n, has been dropped since it is not needed to

continue the derivation. The final result is obtained by first substituting equ.2.4 into

equ.2.3.

32

Equ 2.5

and then explicitly expressing all the product terms under the summation symbols:

Equ 2.6

Each term within the brackets denotes a binary AND operation involving a bit of the

input variable and all the bits of the constant. The plus signs denote arithmetic sum

operations. The exponential factors denote the scaled contributions of the bracketed pairs

to the total sum. Construct a look-up table that can be addressed by the same scaled bit

of all the input variables and can access the sum of the terms within each pair of brackets.

Such a table is shown in fig.2.26 and will henceforth be referred to as a Distributed

Arithmetic look-up table or DALUT. The same DALUT can be time-shared in a serially

organized computation or can be replicated B times for a parallel computation scheme.

33

Fig2.27 The Distributed Arithmetic Look-up Table (DALUT)

The arithmetic operations have now been reduced to addition, subtraction, and binary

scaling. With scaling by negative powers of 2, the actual implementation entails the

shifting of binary coded data words toward the least significant bit and the use of sign

extension bits to maintain the sign at its normal bit position. The hardware

implementation of a binary full adder (as is done in the CLBs) entails two operands, the

addend and the augends to produce sum and carry output bits. The multiple bit-parallel

34

additions of the DALUT outputs expressed in equ.2.6 can only be performed with a

single parallel adder if this adder is time-shared. Alternatively, if simultaneous addition

of all DALUT outputs is required, an array of parallel adders is required. These opposite

goals represent the classic speed-cost tradeoff.

2.2.2 The Speed Tradeoff

Any new device that can be software configured to perform DSP functions must contend

with the well entrenched standard DSP chips, i.e. the programmable fixed point

microprocessors that feature concurrently operating hardware multipliers and address

generators, and on-chip memories. The first challenge is speed. If the FPGA doesn’t offer

higher speed why bother. For a single filter channel the bother is worth it - particularly as

the filter order increases. And the FPGA advantage grows for multiple filter channels.

Alas, a simple advantage may not be persuasive in all cases - an overwhelming speed

advantage may be needed for FPGA acceptance. To reach 50 megasamples/sec data

sample rates we require high cost in gate resources. The first two examples will show the

end points of the serial/parallel tradeoff continuum.

2.2.3 The Ultimate in Speed

Conceivably, with a fully parallel design the sample speed could match the system clock

rate. This is the case where all the add operations of the bracketed values (the DALUT

outputs) of equ.2.6 are performed in parallel. Gain implementation guidance can be done

by rephrasing equ.2.6, and to facilitate this process, abbreviate the contents within each

bracket pair by the data bit position. Thus

For B=16, equ 2.6 becomes:

35

Equ 2.7

The decomposition of Equ 2.7 into an array of two input adders is given below:

Equ 2.8

Equations 2.7 and 2.8 are computationally equivalent, but equ.2.8 can be mapped in a

straight forward way into a binary tree-like array of summing nodes with scaling affected

by signal routing as shown in fig. 2.27. Each of the 15 nodes represents a parallel adder,

and while the computation may can yield responses that include both the double precision

(B+C bits) of the implicit multiplication and the attendant processing gain, these adders

36

can be truncated to produce single precision (B bits) responses.

Fig. 2.28 Example of Fully Parallel DA Model (K=16, B=16)

All B bits of all K data sources must be present to address the B DALUTS. A BxK array

of flip-flops is required. Each of the B identical DALUTS contains 2K words with C bits

per word where C is the “cumulative” coefficient accuracy. The data flow from the flip-

flop array can be all combinatorial; the critical delay path for B=16 is not inordinately

long - signal routing through 5 CLB stages and a carry chain embracing 2C adder stages.

A system clock in the 10 MHz range may work. Certainly with internodes pipelining a

system clock of 50 MHz appears feasible. The latency would, in many cases be

acceptable; however, it would be problematic in feedback networks (e.g., IIR filters).

37

2.2.4 The Ultimate in Gate Efficiency

The ultimate in gate efficiency would be a single DALUT, a single parallel adder, and, of

course, fewer flip-flops for the input data source. Again with our B=16 examples, a

rephrasing of equ.2.6 yields the desired result:

Equ 2.9

Starting from the least significant end, i.e. addressing the DALUT with the least

significant bit of all K input variables the DALUT contents, [sum15], are stored, scaled

by and then added to the DALUT contents, [sum14] when the address changes to the

next-to-the-least-significant bits. The process repeats until the most significant bit

addresses the DALUT, [sum0]. If this is a sign bit a subtraction occurs. Now a vision of

the hardware emerges. A serial shift register, B bits long, for each of the K variables

addresses the DALUT least significant bit first. At each shift the output is applied to a

parallel adder whose output is stored in an accumulator register. The accumulator output -

scaled by 2-1 is the second input to the adder. Henceforth, the adder, register and scalar

shall be referred to as a scaling accumulator. The functional blocks are shown in fig.

2.29. All can be readily mapped into the Xilinx 4000 CLBs. There is a performance price

to be paid for this gate efficiency - the computation takes at least B clocks.

38

Fig2.29 Serially Organized DA processor

2.2.5 Between the Extremes

While there are a number of speed-gate count tradeoffs that range from one bit per clock

(the ultimate in gate efficiency) to B bits per clock (the ultimate in speed) the question of

their effectiveness under architectural constraints remains. We can start this study with

the case of 2 bit-at-a-time processing; the computation lasts B/2 clocks and the DALUT

now grows to include two contiguous bits, i.e. [sum b + {sum(b+1)}2-1]. Again consider

the case of B = 16 and rephrasing equ. 2.7:

39

Equ 2.10

The terms within the rectangular brackets are stored in a common DALUT which can

also serve [sum0] and [sum15]. Note that the computation takes B/2 +1 or 9 clock

periods. The functional blocks of the data path are shown in fig. 2.30(a). The odd valued

scale factors outside the rectangular brackets do introduce some complexity to the circuit,

but it can be managed;

40

Fig. 2.30(a) Two-bit-at-a-time Distributed Arithmetic Data Path (B=16, K=16)

The scaling simplifies with magnitude-only input data. Furthermore, the two bit

processing would last for 8 clock periods. Thus:

Equ 2.11

There is another way of rephrasing or partitioning equ. 2.7 that maintains the B clock

computation time:

41

Equ 2.12

Here two identical DALUTs, two scaling accumulators, and a post-accumulator adder

(fig.2.30(b)) are required. While the adder in the scaling accumulator may be single

precision, the second adder stage may be double precision to meet performance

requirements.

Fig 2.30(b) Two-bit-at-a-time Distributed Arithmetic Data Path (B=16, K=16)

There are other two-bit-at-a-time possibilities. Each possibility implies a different circuit

arrangement. Consider a third rephrasing of equ. 2.7.

42

Equ 2.13

Here the inner brackets denote a DALUT output while the larger, outer brackets denote

the scaling of the scaling accumulator.. Two parallel odd-even bit data paths are indicated

(fig.2.30c) with two identical DALUTs. The DALUT addressed by the even bits has its

output scaled by 2-1 and then is applied to the parallel adder. The adder sum is then

applied to the scaling accumulator which yields the desired response, y(n). Here a single

precision pre-accumulator adder replaces the double precision post accumulator double

precision adder.

43

Fig.2.30(c) Two-bit-at-a-time Distributed Arithmetic Data Path (B=16, K=16)

Each of these approaches implies a different gate implementation. Certainly one of the

most important concerns is DALUT size which is constrained by the look-up table

capacity of the CLB. The first approach, defined by equ.5b, describes a DALUT of 22K

words that feeds a single scaling accumulator, while the second, defined by equ.5c,

describes 2 DALUTs -each 2K words - that feed separate scaling accumulators. An

additional parallel adder is required to sum (with the 2-B/2 scaling indicated) the two

output halves. The difference in memory sizes between 22K and 2x2K is very significant

particularly when we confront reality, namely the CLB memory sizes of 32x1 or

2x(16x1) bits.

2.2.6 Parallel Realization

In its most obvious and direct form, distributed arithmetic computations are bit-serial in

ature, i.e., each bit of the input samples must be indexed in turn before a new output

sample becomes available. When the input samples are represented with B bits of

44

precision, B clock cycles are required to complete an inner-product calculation. A parallel

realization of distributed arithmetic corresponds to allowing multiple bits to be processed

in one clock cycle by duplicating the LUT and adder tree. In a 2-bit at a time parallel

implementation, the odd bits are fed to one LUT and adder tree, while the even bits are

simultaneously fed to an identical tree. The bits partials are left shifted to properly weight

the result and added to the even partials before accumulating the aggregate. In the

extreme case, all input bits can be computed in parallel and then combined in a shifting

adder tree.

Fig 2.31 Mallat’s quadratic mirror filter tree used to compute the coefficients of the (a)

forward and (b) inverse wavelet transforms.

45

CHAPTER 3

3.1 Introduction

This chapter describes the detailed procedure adopted to denoise the signal. It also

explains in details the MATLAB functions involved in the process.

3.2 Testing in MATLAB

Steps involved in denoising the signal using MATLAB are Load a signal Perform a single-level wavelet decomposition of a signal Construct approximations and details from the coefficients Display the approximation and detail Perform a multilevel wavelet decomposition of a signal Extract approximation and detail coefficients Apply thresholding to detail coefficients Reconstruct the level 3 approximation Display the results of a multilevel decomposition Reconstruct the original signal from the level 3 decomposition

3.3 Functions involved in denoising a signal

3.3.1 Analysis-Decomposition Functions

3.3.1.1 dwt

Purpose

Single-level discrete 1-D wavelet transform

Syntax

[cA,cD] = dwt(X,'wname')

[cA,cD] = dwt(X,'wname','mode',MODE)

[cA,cD] = dwt(X,Lo_D,Hi_D)

[cA,cD] = dwt(X,Lo_D,Hi_D,'mode',MODE)

Description

46

The dwt command performs a single-level one-dimensional wavelet decomposition with

respect to either a particular wavelet or particular wavelet decomposition filters (Lo_D

and Hi_D) that we specify.

[cA,cD] = dwt(X,'wname') computes the approximation coefficients vector cA and detail

coefficients vector cD, obtained by a wavelet decomposition of the vector X. The string

'wname' contains the wavelet name.

[cA,cD] = dwt(X,Lo_D,Hi_D) computes the wavelet decomposition as above, given

these filters as input:

Lo_D is the decomposition low-pass filter.

Hi_D is the decomposition high-pass filter.

Lo_D and Hi_D must be the same length.

Let lx = the length of X and lf = the length of the filters Lo_D and Hi_D; then

length(cA) = length(cD) = la where la = ceil(lx/2), if the DWT extension mode is set to

periodization. For the other extension modes, la = floor(lx+lf-1)/2).

[cA,cD] = dwt(...,'mode',MODE) computes the wavelet decomposition with the

extension mode MODE that you specify. MODE is a string containing the desired

extension mode.

Example:

[cA,cD] = dwt(x,'db1','mode','sym');

47

3.3.1.2 wavedec

Purpose

Multilevel 1-D wavelet decomposition

Syntax

[C,L] = wavedec(X,N,'wname')

[C,L] = wavedec(X,N,Lo_D,Hi_D)

Description

wavedec performs a multilevel one-dimensional wavelet analysis using either a specific

wavelet ('wname') or a specific wavelet decomposition filters. [C,L] =

wavedec(X,N,'wname') returns the wavelet decomposition of the signal X at level N,

using 'wname'. N must be a strictly positive integer . The output decomposition structure

contains the wavelet decomposition vector C and the bookkeeping vector L. The structure

is organized as in this level-3 decomposition example.

48

Fig 3.1 Decomposition structure

[C,L] = wavedec(X,N,Lo_D,Hi_D) returns the decomposition structure as above, given

the low- and high-pass decomposition filters.

3.3.2 Synthesis-Reconstruction Functions

3.3.2.1 idwt

Purpose

Single-level inverse discrete 1-D wavelet transform

Syntax

X = idwt(cA,cD,'wname')

X = idwt(cA,cD,Lo_R,Hi_R)

X = idwt(cA,cD,'wname',L)

49

X = idwt(cA,cD,Lo_R,Hi_R,L)

X = idwt(...,'mode',MODE)

Description

The idwt command performs a single-level one-dimensional wavelet reconstruction with

respect to either a particular wavelet or particular wavelet reconstruction filters (Lo_R

and Hi_R) that we specify.

X = idwt(cA,cD,'wname') returns the single-level reconstructed approximation

coefficients vector X based on approximation and detail coefficients vectors cA and cD,

and using the wavelet 'wname'.

X = idwt(cA,cD,Lo_R,Hi_R) reconstructs as above using filters that you specify.

Lo_R is the reconstruction low-pass filter.

Hi_R is the reconstruction high-pass filter.

Lo_R and Hi_R must be the same length.

la be the length of cA (which also equals the length of cD) and lf the length of the filters

Lo_R and Hi_R; then length(X) = LX where LX = 2*la if the DWT extension mode is set

to periodization. For the other extension modes LX = 2*la-lf+2.

X = idwt(cA,cD,'wname',L) or X = idwt(cA,cD,Lo_R,Hi_R,L) returns the length-L

central portion of the result obtained using idwt(cA,cD,'wname'). L must be less than LX.

X = idwt(...,'mode',MODE) computes the wavelet reconstruction using the specified

extension mode MODE.

X = idwt(cA,[],...) returns the single-level reconstructed approximation coefficients

vector X based on approximation coefficients vector cA.

X = idwt([],cD,...) returns the single-level reconstructed detail coefficients vector X based

on detail coefficients vector cD.

50

idwt is the inverse function of dwt in the sense that the abstract statement

idwt(dwt(X,'wname'),'wname') would give back X

3.3.2.2 waverec

Purpose

Multilevel 1-D wavelet reconstruction

Syntax

X = waverec(C,L,'wname')

X = waverec(C,L,Lo_R,Hi_R)

Description

waverec performs a multilevel one-dimensional wavelet reconstruction using either a

specific wavelet or specific reconstruction filters (Lo_R and Hi_R). waverec is the

inverse function of wavedec in the sense that the abstract statement

waverec(wavedec(X,N,'wname'),'wname') returns X.

X = waverec(C,L,'wname') reconstructs the signal X based on the multilevel wavelet

decomposition structure [C,L] and wavelet 'wname'. X = waverec(C,L,Lo_R,Hi_R)

reconstructs the signal X as above, using the reconstruction filters you specify. Lo_R is

the reconstruction low-pass filter and Hi_R is the reconstruction high-pass filter.

X = waverec(C,L,'wname') is equivalent to X = appcoef(C,L,'wname',0).

51

3.3.2.3 wrcoef

Purpose

Reconstruct single branch from 1-D wavelet coefficients

Syntax

X = wrcoef('type',C,L,'wname',N)

X = wrcoef('type',C,L,Lo_R,Hi_R,N)

X = wrcoef('type',C,L,'wname')

X = wrcoef('type',C,L,Lo_R,Hi_R)

Description

wrcoef reconstructs the coefficients of a one-dimensional signal, given a wavelet

decomposition structure (C and L) and either a specified wavelet or specified

reconstruction filters (Lo_R and Hi_R).

X = wrcoef('type',C,L,'wname',N) computes the vector of reconstructed coefficients,

based on the wavelet decomposition structure [C,L] ,at level N. 'wname' is a string

containing the wavelet name.

Argument 'type' determines whether approximation ('type' = 'a') or detail ('type' = 'd')

coefficients are reconstructed. When 'type' = 'a', N is allowed to be 0; otherwise, a strictly

positive number N is required. Level N must be an integer such that N length(L)-2.

X = wrcoef('type',C,L,Lo_R,Hi_R,N) computes coefficients as above, given the

reconstruction filters you specify.

X = wrcoef('type',C,L,'wname') and X = wrcoef('type',C,L,Lo_R,Hi_R) reconstruct

coefficients of maximum level N = length(L)-2.

52

3.3.2.4 appcoef

Purpose

1-D approximation coefficients

Syntax

A = appcoef(C,L,'wname',N)

A = appcoef(C,L,'wname')

A = appcoef(C,L,Lo_R,Hi_R)

A = appcoef(C,L,Lo_R,Hi_R,N)

Description

appcoef is a one-dimensional wavelet analysis function.

appcoef computes the approximation coefficients of a one-dimensional signal.

A = appcoef(C,L,'wname',N) computes the approximation coefficients at level N using

the wavelet decomposition structure [C,L] .'wname' is a string containing the wavelet

name. Level N must be an integer such that 0 N length(L)-2.

A = appcoef(C,L,'wname') extracts the approximation coefficients at the last level:

length(L) - 2.

For A = appcoef(C,L,Lo_R,Hi_R) or A = appcoef(C,L,Lo_R,Hi_R,N), Lo_R is the

reconstruction low-pass filter and Hi_R is the reconstruction high-pass filter .

53

3.3.3 De-noising and Compression3.3.3.1.ddencmp

Purpose

Default values for de-noising or compression

Syntax

[THR,SORH,KEEPAPP,CRIT] = ddencmp(IN1,IN2,X)

[THR,SORH,KEEPAPP] = ddencmp(IN1,'wv',X)

[THR,SORH,KEEPAPP,CRIT] = ddencmp(IN1,'wp',X)

Description

ddencmp is a de-noising and compression-oriented function.

ddencmp gives default values for all the general procedures related to de-noising and

compression of one- or two-dimensional signals, using wavelets or wavelet packets.

[THR,SORH,KEEPAPP,CRIT] = ddencmp(IN1,IN2,X) returns default values for de-

noising or compression, using wavelets or wavelet packets, of an input vector or matrix

X, which can be a one- or two-dimensional signal. THR is the threshold, SORH is for

soft or hard thresholding, KEEPAPP allows you to keep approximation coefficients, and

CRIT is the entropy name

IN1 is 'den' for de-noising or 'cmp' for compression.

IN2 is 'wv' for wavelet or 'wp' for wavelet packet.

For wavelets (three output arguments):

54

[THR,SORH,KEEPAPP] = ddencmp(IN1,'wv',X) returns default values for de-noising (if

IN1 = 'den') or compression (if IN1 = 'cmp') of X. These values can be used for

wdencmp.

For wavelet packets (four output arguments):

[THR,SORH,KEEPAPP,CRIT] = ddencmp(IN1,'wp',X) returns default values for de-

noising (if IN1 = 'den') or compression (if IN1 = 'cmp') of X. These values can be used

for wpdencmp.

3.3.3.2 wdencmp

Purpose

De-noising or compression

Syntax

[XC,CXC,LXC,PERF0,PERFL2]=wdencmp('gbl',X,'wname',N,THR,SORH,KEEPAPP)

[XC,CXC,LXC,PERF0,PERFL2] = wdencmp('lvd',X,'wname',N,THR,SORH)

[XC,CXC,LXC,PERF0,PERFL2] = wdencmp('lvd',C,L,'wname',N,THR,SORH)

Description

wdencmp is a one- or two-dimensional de-noising and compression-oriented function.

wdencmp performs a de-noising or compression process of a signal or an image, using

wavelets.

[XC,CXC,LXC,PERF0,PERFL2] = wdencmp('gbl',X,'wname',N,THR,SORH,

KEEPAPP) returns a de-noised or compressed version XC of input signal X (one- or two-

dimensional) obtained by wavelet coefficients thresholding using global positive

threshold THR.

55

Additional output arguments [CXC,LXC] are the wavelet decomposition structure of XC.

PERF0 and PERFL2 are L2-norm recovery and compression score in percentage.

PERFL2 = 100 * (vector-norm of CXC / vector-norm of C)2 if [C,L] denotes the wavelet

decomposition structure of X.

If X is a one-dimensional signal and 'wname' an orthogonal wavelet, PERFL2 is reduced

to

Wavelet decomposition is performed at level N and 'wname' is a string containing

wavelet name .SORH ('s' or 'h') is for soft or hard thresholding .If KEEPAPP = 1,

approximation coefficients cannot be thresholded, otherwise it is possible.

wdencmp('gbl',C,L,'wname',N,THR,SORH,KEEPAPP) has the same output arguments,

using the same options as above, but obtained directly from the input wavelet

decomposition structure [C,L] of the signal to be de-noised or compressed, at level N and

using 'wname' wavelet.

For the one-dimensional case and 'lvd' option,

[XC,CXC,LXC,PERF0,PERFL2] = wdencmp('lvd',X,'wname',N,THR,SORH)or

[XC,CXC,LXC,PERF0,PERFL2] = wdencmp('lvd',C,L,'wname',N,THR,SORH) have the

same output arguments, using the same options as above, but allowing level-dependent

thresholds contained in vector THR (THR must be of length N). In addition, the

approximation is kept. Note that, with respect to wden (automatic de-noising), wdencmp

allows more flexibility and you can implement your own de-noising strategy.

For the two-dimensional case and 'lvd' option,

[XC,CXC,LXC,PERF0,PERFL2] = wdencmp('lvd',X,'wname',N,THR,SORH) or

[XC,CXC,LXC,PERF0,PERFL2] = wdencmp('lvd',C,L,'wname',N,THR,SORH).

56

THR must be a matrix 3 by N containing the level-dependent thresholds in the three

orientations, horizontal, diagonal, and vertical.

The compression features of a given wavelet basis are primarily linked to the relative

scarceness of the wavelet domain representation for the signal. The notion behind

compression is based on the concept that the regular signal component can be accurately

approximated using a small number of approximation coefficients (at a suitably selected

level) and some of the detail coefficients.

Like de-noising, the compression procedure contains three steps:

1. Decomposition.

2. Detail coefficient thresholding. For each level from 1 to N, a threshold is selected

and hard thresholding is applied to the detail coefficients.

3. Reconstruction.

3.4 Parallel DA Implementation

The discrete wavelet transform equations can be efficiently computed using the pyramid

filter bank tree shown in Figure 3.2. This section describes a parallel distributed

arithmetic implementation of the filter banks by deriving a parallel distributed arithmetic

structure of a single FIR filter. Next step is to describe the implementation of the

decimator and interpolator; the basic building blocks of the forward and discrete wavelet

transforms, respectively.

57

Fig. 3.2 Mallat's quadratic mirror filter tree used to compute the coefficients of the (a).

forward and (b). inverse wavelet transforms.

3.5 Parallel DA FIR Filter Structure

All filters in the pyramid tree structure shown in Figure 3.2 are constructed using FIR

filters because of their inherent stability. Most discrete wavelet transform

implementations reported in literature employ the direct FIR structure, in which each

filter tap consists of a delay element, an adder, and a multiplier . However, a major

drawback of this implementation is that filter throughput is inversely proportional to the

number of filter taps. That is, as filter length is increased, the filter throughput is

proportionately decreased. In contrast, throughput of an FIR filter constructed using

distributed arithmetic is maintained regardless of the length of the filter. This feature is

particularly attractive for flexible implementations of different wavelet types since each

type has a different set of filer coefficients. Distributed arithmetic implementation of the

Daubechies 8-tap wavelet FIR filter consists of an LUT, a cascade of shift registers and a

scaling accumulator, as shown in Figure 3.3. The LUT stores all possible sums of the

Daubechies 8-tap wavelet coefficients. As the input sample is serialized, the bit-wide

58

output is presented to the bit serial shift register cascade, 1-bit at a time. The cascade

stores the input sample history in a bit-serial format and is used in forming the required

inner-product computation. The bit outputs of the shift register cascade are used as

address inputs to the LUT. Partial results from the LUT are summed by the scaling

accumulator to form a final result at the filter output port

Fig. 3.3 A DA implementation of the Daubechies FIR filter

Since the LUT size in a distributed arithmetic implementation increases exponentially

with the number of coefficients, the LUT access time can be a bottleneck for the speed of

the whole system when the LUT size becomes large. Hence the 8-bit LUT decomposed

is shown in Figure 3.3 into two 4-bit LUTs, and added their outputs using a two-input

accumulator. The 4-bit LUT partitioning is optimum in terms of logic resources

utilization, which uses 4-input LUTs. The modified partitioned-LUT architecture is

shown in Figure 3.4. The total size of storage is now reduced since the accumulator

occupies less logic resources than the larger 8-bit LUT. Furthermore, partitioning the

larger LUT into two smaller LUTs accessed in parallel reduces access time.

59

Fig 3.4 A partitioned-LUT DA implementation of the Daubechies FIR filter

A parallel implementation of the inherently serial distributed arithmetic (SDA) FIR filter,

shown in Figure 3.4, corresponds to partitioning the input sample into M sub-samples

and processing these sub-samples in parallel. Such a parallel implementation requires M

times as many memory look-up tables and so comes at a cost of increased logic

requirements. Below describes the implementation of PDA FIR filter at two different

degrees of parallelism; a 2-bit PDA FIR filter and a fully parallel 8-bit PDA FIR filter. A

2-bit parallel distributed arithmetic (PDA) FIR filter implementation is shown in Figure

3.5. It corresponds to feeding the odd bits of the input sample to an SDA LUT adder tree,

while feeding the even bits, simultaneously, to an identical tree. Compared to the serial

DA filter, shown is Figure 3.4, the shift registers are each replaced with two similar shift

registers at half the bit size. The odd bit partials are left shifted to properly weight the

result and added to the even partials before accumulating the aggregate by a 1-bit scaling

adder. Finally, since two bits are taken at a time, the scaling accumulator is changed from

1-to-2-bit shift (1/4) for scaling.

60

Fig 3.5 A 2-bit PDA Daubechies FIR filter

As for the fully parallel 8-bit PDA FIR filter implementation, the 8-bit input sample is

partitioned into eight 1-bit sub-samples so as to achieve maximum speed. Figure 3.6

shows the ultimate fully parallel PDA FIR filter, where all 8 input bits are computed in

parallel and then summed by a binary-tree like adder network. The lower input to each

adder is scaled down by a factor of 2. No scaling accumulator is needed in this case, since

the output from the adder tree is the entire sum of products.

61

Fig 3.6 (a). A single-bit and (b). an 8 -bit PDA Daubechies FIR filter

3.6 Decimator Implementation

Wavelets are the basic building block of the parallel DA forward discrete wavelet

transform filter bank is the decimator, which consists of a parallel DA, anti-aliasing FIR

filter, followed by a down-sampling operator .

Down sampling an input sequence x[n] by 2 generates an output sequence y[n] according

to the relation y[n] = x[2n]. All input samples with indices equal to an integer multiple of

2 are retained at the output, and all other samples are discarded. Therefore, the sequence

y[n] has a sampling rate equal to half of the sampling rate of x[n]. Implementation of the

decimator is shown in Figure 3.7. The input data port of the PDA FIR filter is connected

to the external input samples source, and its clock input is tied with the clock input of a 1-

bit counter. Furthermore, the output data port of the PDA FIR filter is connected to the

input port of a parallel-load register. The register receives or blocks data appearing on its

input port depending on the status of the 1-bit counter. Assuming an unsigned 8-bit input

sample is used, the decimator operates in such a way that when the counter is in the 1

state, the PDA FIR data is stored in the parallel load register, and when the counter turns

62

to the 0 state, the PDA FIR data is discarded. A random input sample X enters the

decimator at a rate of 1sample/1 clocks , and an output filtered sample Y leaves the

decimator at a rate of 1sample/ 2clocks. The input frequency is clearly halved by the

decimator.

Fig 3.7 Decimator Implementation

3.7 Interpolator implementation

Wavelets are the basic building block of the inverse discrete wavelet transform filter bank

is the interpolator which consists of a parallel DA, anti-imaging FIR filter, proceeded by

an up-sampling operator . In upsampling by a factor of 2, an equidistant zero-valued

sample is inserted between every two consecutive samples on the input sequence x[n] to

develop an output sequence y[n], such that y[n] = x[n/2] for even indices of n, and 0

otherwise. The sampling rate of the output sequence y[n] is twice as large as the sampling

rate of the original sequence x[n]. Implementation of the interpolator is shown in Figure

3.8. The input data port of the PDA FIR filter is connected to the output port of a parallel-

load register. Furthermore, the input port of the register is connected to the external input

sample source, and its CLK input is tied with the CLK input of a 1-bit counter. The

63

operation of the register depends on the signal received on its active-high CLR (clear)

input from the 1-bit counter. Assuming the input signal source sends out successive

samples separated by 2 clock periods, the interpolating filter operates in such a way that

when the counter is in the 0 state, the register passes the input sample X to the PDA FIR

filter, and when the counter turns to the 1 state, the register is cleared, thus transferring a

zero to the PDA FIR filter. That is, a zero is inserted between every two successive input

samples. The filter receives an input sample X at the rate of 1 sample/2 clocks , and sends

out its filtered sample Y at the rate of 1 sample/1 clock. The input frequency is clearly

doubled by the interpolator.

Fig3.8InterpolatorImplementation

64

CHAPTER 4

This chapter consists of the results and waveforms.

4.1 Waveforms obtained in MATLAB.

Fig 4.1 Approximation and Details

65

Fig 4.2 Approximation (A3) and Details(D1,D2,D3)

66

Fig 4.3 Original and level 3 approximation

67

Fig 4.4 Detail level 1, 2, 3

Fig 4.5 Original and De-noised signals

68

4.2 VHDL Coding

4.2.1. Code for Single level DWT

library IEEE;use IEEE.std_logic_1164.all; use IEEE.std_logic_arith.all; use IEEE.std_logic_signed.all; entity dwt_single_level is

port (-- din: in std_logic_vector(15 downto 0);

clockin:in std_logic; samplecount:in integer;dout: out std_logic_vector(15 downto 0)

);end dwt_single_level;

architecture dwt_single_level of dwt_single_level is

component fileread isport(clockin:in std_logic;dout:out std_logic_vector(15 downto 0));

end component;

component decimator_hpf_dec isport (

clock: in STD_LOGIC;datain: in std_logic_vector(15 downto 0); samplecnt:in integer;levelno:in integer;

-- dataoutcnt:out integer;-- clockout:out std_logic;

dataout: out std_logic_vector(15 downto 0));

end component;

component decimator_lpf_dec isport (

clock: in STD_LOGIC;datain: in std_logic_vector(15 downto 0); samplecnt:in integer;levelno:in integer;

69

dataoutcnt:out integer; clockout:out std_logic;firout: out std_logic_vector(15 downto 0);dataout: out std_logic_vector(15 downto 0)

);end component;

component interpolator_lpf_rec isport (clock: in STD_LOGIC; datain:in std_logic_vector(15 downto 0); samplecnt,levelno:in integer;

-- countout:out integer;

doutput: out std_logic_vector(15 downto 0));

end component;

component interpolator_hpf_rec isport (clock: in STD_LOGIC; datain:in std_logic_vector(15 downto 0); samplecnt,levelno:in integer;countout:out integer;doutput: out std_logic_vector(15 downto 0));

end component;

signal din,fir1: std_logic_vector(15 downto 0);signal ca_cnt,cd_cnt: integer; signal ca_clock: std_logic;signal ca_dec,cd_dec,ca_rec,cd_rec,cd1_dec:std_logic_vector(15 downto 0);begin

c0:fileread port map(clockin,din);c1:decimator_lpf_dec port

map(clockin,din,samplecount,1,ca_cnt,ca_clock,fir1,ca_dec); c2:decimator_hpf_dec port map(clockin,din,samplecount,1,cd_dec);

c3:interpolator_lpf_rec port map(clockin,ca_dec,ca_cnt,2,ca_rec);c4:interpolator_hpf_rec port map(clockin,cd_dec,ca_cnt,2,cd_cnt,cd_rec); dout<=ca_rec + cd_rec;

end dwt_single_level;

70

4.3.Results

Fig 4.6 Samples taken from MATLAB

4.4. Comparision of the results

71

72

73

Fig 4.9 MATLAB values for denoised signal

74

4.5 Conclusion

Real world signals are often corrupted by noise which may severely limit their

usefulness. For this reason, signal denoising is a topic that continually draws great

interest. Wavelets are an alternative tool for signal decomposition using orthogonal

functions. Unlike basic Fourier analysis, wavelets do not lose completely time

information, a feature that makes the technique suitable for applications where the

temporal location of the signal’s frequency content is important. One of the fields where

wavelets have been successfully applied is data analysis. In particular, it has been

demonstrated that wavelets produce excellent results in signal denoising. This work

presents a procedure to denoise a signal using discrete wavelet transform. A real-time

electrical signal contaminated with noise is used as test bed for the method. The

simulation result of the suggested design is presented. The future work includes using

multiwavelets to denoise a signal.

75

References

[1] Texas Corporation, www.ti.com

[2] M. Smith, Application-specific integrated circuits.USA: Addison Wesley Longman,

1997.

[3] R. Seals and G. Whapshott, Programmable Logic: PLDs and FPGAs. UK:

Macmillan, 1997.

[4] P. Kollig, B. Al-Hashimi and K. Abbot, “ FPGA implementation of high performance

FIR filters,” In Proc. International Symposium on Circuits and Systems, 1997.

[5] M. Shand, “ Flexible image acquisition using reconfigurable hardware,” In Proc. of

the IEEE Workshop on Filed Programmable Custom Computing Machines, Napa,

Ca, Apr. 1995.

[6] J. Villasenor, B. Schoner, and C. Jones, “Video communication using rapidly

reconfigurable hardware,” IEEE Transactions on Circuits and Systems for Video

Technology, vol. 5, no. 12, pp. 565-567, Dec. 1995.

[7] L. Mintzer, “The role of distributed arithmetic in FPGAs,” Xilinx Corporation.

[8] K. Parhi, VLSI digital signal processing systems. US: John Wiley & Sons, 1999

[9] G. Strang and T. Nguyen, Wavelets and filter banks. MA: Wellesley-Cambridge

Press, 1996.

[10] M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, “Image coding using

wavelet transform,” IEEE Trans. Image Processing, vol. 1, no.2, pp. 205-220, April

1992.

[11] T. Ebrahimi and F. Pereira, The MPEG-4 Book. Prentice Hall, July 2002

[12] D. Taubman and M. Marcellin. JPEG2000: Image compression fundamentals,

standards, and practice. Kluwer Academic Publishers, November, 2001,

[13] Xilinx Corporation. “Xilinx breaks one million-gate barrier with delivery of new

Virtex series,” October 1998

[14] G. Knowles, “VLSI architecture for the discrete wavelet transform,” Electron

Letters, vol. 26, no.15, pp. 1184-1185, July 1990.

76

[15] A. Grzeszczak, M. Kandal, S. Panchanathan, and T. Yeap, “ VLSI implementation

of discretewavelet transform,” IEEE Trans. VLSI Systems, vol. 4, no. 4, pp. 421-433,

Dec. 1996

[16] K. Parhi and T. Nishitani, VLSI architectures for discrete wavelet transforms, IEEE

Trans. VLSI Systems, pp. 191-202, June 1993.

[17] C.Chakabarti, M. Vishwanath, and R. Owens, "Architectures for wavelet transforms:

a survey," Journal of VLSI Signal Processing, vol. 14, no. 2, pp.171-192, Nov. 1996.

[18] S. Mallat, “ A theory for multresolution signal decomposition: The wavelet

representation, IEEE Trans. Pattern Anal. And Machine Intell., vol. 11, no. 7, pp. 674-

693, July 1989.

[19] I. Daubechies, “Orthonomal bases of compactly supported wavelets,” Comm.

Pure Appl. Math, vol. 41, pp. 906-966, 1988.

[20] S. White, “Applications of distributed arithmetic to digital signal processing: a

tutorial”, In IEEE ASSP Magazine, pp. 4-19, July 1989.

[21] A. Oppenheim and R. Schafer, Discrete signal processing. New Jersy: Prentice

Hall, 1999.

[22] Xess Corporation. www.xess.com.

[23] WaveLib www-sim.int-evry.fr/~bourges/WaveLib.html

[24] EPIC [http://www.cis.upenn.edu/~eero/epic.html]

[25] Imager Wavelet Library

[http://www.cs.ubc.ca/nest/imager/contributions/bobl/wvlt/download/]

[26] Mathematica wavelet programs [http://timna.Mines.EDU/wavelets/]

[27] p-wavelets [ftp://pandemonium.physics.missouri.edu/pub/wavelets/]

[28] WaveLab [http://playfair.Stanford.EDU/~wavelab/]

[29] Uvi_Wave Software [http://www.tsc.uvigo.es/~wavelets/uvi_wave.html]

[30] WAVBOX [ftp://simplicity.stanford.edu/pub/taswell/]

[31]WaveThresh[http://www.stats.bris.ac.uk/pub/software/wavethresh/

WaveThresh.html]

[32]WPLIB [ftp://pascal.math.yale.edu/pub/wavelets/software/wplib/]

[33]W-Transform Matlab Toolbox [ftp://info.mcs.anl.gov/pub/W-transform/]

[34] XWPL [ftp://pascal.math.yale.edu/pub/wavelets/software/xwp]

77

78

Main Prjt Doc

Documents

signal chapter

raw signal

electrical signal

processed signal

mixture of signal

dimensional signal

hz signal fig

useful information signal