i PROJECT REPORT ON SPEECH COMPRESSION USING WAVELETS DEVELOPED BY GIRI SHIVRAMAN S. GH99316 NARKHEDE NILESH P. GH99340 PALTERU SWARNALATHA. GH99344 RAMALAKSHMI SUNDARESAN. GH99347 RAJADHYAX AMOL A. GH99364 UNDER THE GUIDANCE OF Dr. S. C. GADRE DEPARTMENT OF ELECTRICAL ENGINEERING VEERMATA JIJABAI TECHNOLOGICAL INSTITUTE UNIVERSITY OF MUMBAI
99
Embed
SPEECH COMPRESSION USING WAVELETS - VJTI STUDENT'S PAGE
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
i
PROJECT REPORT
ON
SPEECH COMPRESSION USING WAVELETS
DEVELOPED BY
GIRI SHIVRAMAN S. GH99316
NARKHEDE NILESH P. GH99340
PALTERU SWARNALATHA. GH99344
RAMALAKSHMI SUNDARESAN. GH99347
RAJADHYAX AMOL A. GH99364
UNDER THE GUIDANCE OF
Dr. S. C. GADRE
DEPARTMENT OF ELECTRICAL ENGINEERING
VEERMATA JIJABAI TECHNOLOGICAL INSTITUTE
UNIVERSITY OF MUMBAI
ii
2002-03
Certificate of Approval
This is to certify that the project titled “Speech Compression using Wavelets” is a bona fide record of the project work done by
Name University No.
Palteru Swarnalatha 10178
Ramalakshmi Sundaresan 10180
Giri Shivraman S. 10188
Narkhede Nilesh P. 10203
Rajadhyax Amol A. 10205
during the year 2002-03 under the guidance of Dr. S. C. Gadre, towards the
partial fulfillment of Bachelor of Engineering Degree Course in Electrical
Engineering of the University of Mumbai.
_______________________
_______________________
Dr. B. K. Lande
Head of Department
Electrical & Electronics
Engg. Dept.
VJTI
Dr. S. C. Gadre
Project Guide
Electrical & Electronics
Engg. Dept.
VJTI
iii
Certificate of Approval by Examiners
This is to certify that the project titled “Speech Compression
using Wavelets” is a bona fide record of the project work done by
Name University No.
Palteru Swarnalatha 10178
Ramalakshmi Sundaresan 10180
Giri Shivraman S. 10188
Narkhede Nilesh P. 10203
Rajadhyax Amol A. 10205
during the year 2002-03 and is approved for the degree of Bachelor of
Engineering Degree Course in Electrical Engineering of the University of
Mumbai.
Examiners
_______________________
_______________________
(External)
Date:
(Internal)
Date:
iv
PREFACEPREFACE
Multimedia files in general need plenty of disk space for storage apart from being
unwieldy for communication purposes and sound files are no exception. Hence
compression of these files has become a necessity and is a ripe subject for research.
The field of signal processing has enjoyed several analysis tools in the past. One of the
more recent (and, needless to say, more exciting) developments in this field has been the
emergence of a new transform, THE WAVELET TRANSFORM. In fact, its use is not
restricted to signal processing alone, but ranges over such diverse fields as image
processing, communications, mathematics, computer science to name a few. Wavelet
transforms, in their different guises, have come to be accepted as a set of tools useful for
various applications. Wavelet transforms are good to have at one's fingertips, along with
many other, mostly more traditional, tools.
This thesis considers the application of Wavelet transforms for the compression of human
speech signals. It is part of the project on the said topic undertaken by a group of
undergrad students. The study was by no means exhaustive, which is beyond the purview
of undergrad study.
It can also serve as a guide to understand the working of the software implementation of
the project, which is provided in the accompanying CD. Note that you need MATLAB
Version 6 or higher to run the code. Other than this, the software is self-contained in
terms of help files and source code, which facilitates alteration of the program to suit
your needs.
Chapter 1 is intended to serve as an introduction. Its main purpose is to define the scope
of this project.
v
Chapter 2 presents the drawbacks inherent in the Fourier Methods. It is assumed that the
reader is conversant with the Fourier methods. A specific example is taken wherein the
shortcomings of Fourier methods become readily evident.
Chapter 3 presents a detailed introduction to the wavelet analysis. It tries to compare
Wavelet transform as a signal processing tool in view of its similarities and differences
with the Fourier methods. The CONTINUOUS WAVELET TRANSORM (CWT) is
discussed next. The chapter concludes by demonstrating how the wavelet transform
overcomes the drawbacks of Fourier methods.
Chapter 4 discusses the ‘DISCRETE WAVELET TRANSFORM’ (DWT), which is a
more practical approach than CWT. It also explains an implementation of DWT using
filtering schemes.
In chapter 5, the use of Wavelet transform in speech compression is presented. The
motivation for using wavelets for speech compression is developed, so is the algorithm
used for the same.
The pertinent commands of MATLAB WAVELET TOOLBOX are explained in brief
in chapter 6. Specifically, the commands used for achieving compression are discussed
along with their syntax. The software implementation is based on these commands.
However, a detailed discussion of software is relegated to the Appendix lest you get
inundated with extraneous details of programming.
The conclusion is the subject of chapter 7. Statistical analysis of signals is performed and
the results recorded. Based on these, inferences are drawn.
As has been pointed out, this discussion is not comprehensive, to compensate for which
plenty of references have been provided at the end. The accompanying CD contains
myriad documents that we found on the internet. These should serve as useful guide for
anyone interested in pursuing the subject beyond the point where we stopped.
The Appendices are included in the CD
vi
.ACKNOWLEDGEMENTS
A project of this magnitude is unlikely to be completed without the active help of a
hidden army of others who make it possible. At the risk of forgetting to mention some
names, we would like to thank the following people.
The first and the foremost person who deserves our boundless gratitude is our project
guide, Professor Dr. (Ms.) S. C. Gadre. The sheer size of this project was daunting in
the beginning and would have deterred us from proceeding had it not been for her
support and guidance. We take this opportunity to tender our sincerest thanks to her.
We are also indebted to Mr. Ajay Shah and Mr. Laxman Udgiri who provided us with
valuable leads into the subject of wavelet transforms.
Throughout the progress of this project we have enjoyed a close collaboration with a
number of people who we feel fortunate to count as seniors and friends, whose views
have greatly influenced us. Our deep gratitude to all of them.
Last but not the least, we wish to thank the various internet-communities/e-groups on
signal processing/wavelets/MATLAB/Compression. It was a rich learning experience
to have been a part of this wonderful community. Our innumerable thanks to every
member of this community.
vii
Table of Contents.
1 INTRODUCTION TO AND SCOPE OF THE PROJECT ............................................. 1 1.1 Speech Signals ....................................................................................................... 1
1.2 Compression – An Overview................................................................................. 2
1.4 Aim, Scope And Limitations of This Thesis ......................................................... 4
2 WEAKNESSES OF FOURIER ANALYSIS .................................................................. 7 2.1 Review of Fourier Methods ................................................................................... 7
2.2 Shortcomings of FT ............................................................................................... 9
3 INTRODUCTION TO WAVELETS AND THE CONTINUOUS WAVELET TRANSFORM (CWT)...................................................................................................... 12
7 RESULTS ...................................................................................................................... 64 8 FURTHER STUDY ...................................................................................................... 87
1
1 INTRODUCTION TO AND SCOPE OF THE PROJECT
1.1 Speech Signals
The human speech in its pristine form is an acoustic signal. For the purpose of
communication and storage, it is necessary to convert it into an electrical signal. This is
accomplished with the help of certain instruments called ‘transducers’.
This electrical representation of speech has certain properties.
1. It is a one-dimensional signal, with time as its independent variable.
2. It is random in nature.
3. It is non-stationary, i.e. the frequency spectrum is not constant in time.
4. Although human beings have an audible frequency range of 20Hz –20kHz, the
human speech has significant frequency components only upto 4kHz,a
property that is exploited in the compression of speech.
Digital representation of speech
With the advent of digital computing machines, it was propounded to exploit the powers
of the same for processing of speech signals. This required a digital representation of
speech. To achieve this, the analog signal is sampled at some frequency and then
quantized at discrete levels. Thus, parameters of digital speech are
1. Sampling rate
2. Bits per second
3. Number of channels.
The sound files can be stored and played in digital computers. Various formats have been
proposed by different manufacturers for example ‘.wav’ ‘.au’ to name a few.
In this thesis, the ‘.wav’ format is used extensively due to the convenience in recording it
with ‘Sound recorder’ software, shipped with WINDOWS OS.
2
1.2 Compression – An Overview
In the recent years, large scale information transfer by remote computing and the
development of massive storage and retrieval systems have witnessed a tremendous
growth. To cope up with the growth in the size of databases, additional storage devices
need to be installed and the modems and multiplexers have to be continuously upgraded
in order to permit large amounts of data transfer between computers and remote
terminals. This leads to an increase in the cost as well as equipment. One solution to these
problems is-“COMPRESSION” where the database and the transmission sequence can be
encoded efficiently.
WHY COMPRESSION?
Compression is a process of converting an input data stream into another data stream that
has a smaller size. Compression is possible only because data is normally represented in
the computer in a format that is longer than necessary i.e. the input data has some amount
of redundancy associated with it. The main objective of compression systems is to
eliminate this redundancy.
When compression is used to reduce storage requirements, overall program execution
time may be reduced. This is because reduction in storage will result in the reduction of
disc access attempts.
With respect to transmission of data, the data rate is reduced at the source by the
compressor (coder) ,it is then passed through the communication channel and returned to
the original rate by the expander(decoder) at the receiving end. The compression
algorithms help to reduce the bandwidth requirements and also provide a level of security
for the data being transmitted. A tandem pair of coder and decoder is usually referred to
as codec.
3
APPLICATIONS OF COMPRESSION
1. The use of compression in recording applications is extremely powerful. The playing
time of the medium is extended in proportion to the compression factor.
2. In the case of tapes, the access time is improved because the length of the tape needed
for a given recording is reduced and so it can be rewound more quickly.
3. In digital audio broadcasting and in digital television transmission, compression is
used to reduced the bandwidth needed.
4. The time required for a web page to be displayed and the downloading time in case of
files is greatly reduced due to compression.
COMPRESSION TERMINOLOGY
q Compression ratio:- The compression ratio is defined as:-
Compression ratio = size of the output stream/size of the input stream
A value of 0.6 means that the data occupies 60% of its original size after compression.
Values greater than 1 mean an output stream bigger than the input stream. The
compression ratio can also be called bpb(bit per bit),since it equals the no. of bits in the
compressed stream needed, on an average, to compress one bit in the input stream.
q Compression factor:- It is the inverse of compression ratio. Values greater than 1
indicate compression and less than 1 indicates expansion.
4
1.3 Coding Techniques
There are various methods of coding the speech signal
PCMDPCMADPCM
Time domain coding
Sub-band codingTransform Coding
Frequency domain coding
WAVEFORM CODING
LPC VocodersMBE coder
MPE codecsRPE codecsCELP codecs
Hybrid coding
SOURCE CODING
CODING TECHNIQUES
1.4 Aim, Scope And Limitations of This Thesis
The primary objective of this thesis is to present the wavelet based method for the
compression of speech. The algorithm presented here was implemented in MATLAB.
The said software is provided in the accompanying CD. Readers may find it useful to
verify the result by running the program.
Since this thesis is an application of wavelets, it was natural to study the basics of
wavelets in detail. The same procedure was adopted in writing this thesis, as it was felt
5
that without minimal background in wavelets, it would be fruitless, and also inconvenient
to explain the algorithm.
However, the wavelet itself is an engrossing field, and a comprehensive study was
beyond the scope of our undergraduate level. Hence, attempt is made only to explain the
very basics which are indispensable from the compression point of view. This approach
led to the elimination of many of the mammoth sized equations and vector analysis
inherent in the study of wavelets.
At this stage, it is worthwhile mentioning two quotes by famous scientists
‘So far as the laws of mathematics refer to reality, they are not certain. And so far as
they are certain, they do not refer to reality.’ --Albert Einstein
‘As complexity rises, precise statements lose meaning and meaningful statements
lose precision.’ --Lotfi Zadeh 1
The inclusion of the above quotes is to highlight the fact that simplicity and clarity are
often the casualties of precision and accuracy, and vice-versa.
In this thesis, we have compromised on the mathematical precision and accuracy to make
matters simple and clear. An amateur in the field of wavelets might find this work useful
as it is relieved of most of the intimidating vector analysis and equations, which have
been supplanted by simple diagrams. However, for our own understanding, we did found
it necessary, interesting and exciting to go through some literature which deal with the
intricate details of wavelet analysis, and sufficient references have been provided
wherever necessary, for the sake of a fairly advanced reader. Some of the literature that
we perused has been included in the CD.
1 Lotfi Zadeh is considered to be the father of Fuzzy Logic
6
The analysis that we undertook for wavelets includes only the orthogonal wavelets. This
decision was based on the extensive literature we read on the topic, wherein the
suitability of these wavelets for speech signals was stated.
Another topic that has been deliberately excluded in this work is the concept of MRA,
which bridges the gap between the wavelets and the filter banks and is indispensable for a
good understanding of Mallat’s Fast Wavelet Transform Algorithm. Instead, we have
assumed certain results and provided references for further reading.
Secondly, the sound files that we tested were of limited duration, around 5 seconds.
Albeit the programs will run for larger files (of course, the computation time will be
longer in this case), a better approach towards such large files is to use frames of finite
length. This procedure is more used in real-time compression of sound files, and is not
presented here.
Encoding is performed using only the Run Length Encoding. The effect of other
encoding schemes on the compression factor have not been studied.
This thesis considers only wavelets analysis, wherein only approximation coefficients are
split. There exists another analysis, called wavelet packet analysis, which splits detail
coefficients. This is not explored in this thesis.
7
2 WEAKNESSES OF FOURIER ANALYSIS
Introduction
This chapter develops the need and motivation for studying the wavelet transform.
Historically, Fourier Transform has been the most widely used tool for signal processing.
As signal processing began spreading its tentacles and encompassing newer signals,
Fourier Transform was found to be unable to satisfy the growing need for processing a
bulk of signals. Hence, this chapter begins with a review of Fourier Methods. Detailed
explanation is avoided to rid the discussion of insignificant details. A simple case is
presented, where the shortcomings of Fourier methods is expounded. The next chapter
concerns wavelet transforms, and shows how the drawback of FT are eliminated.
2.1 Review of Fourier Methods
For a continuous –time signal x(t) , the Fourier Transform (FT) equations are
Equation (2.1) is the analysis equation and equation (2.2) is the synthesis equation.
……… … 2.1
………….2.2
8
The synthesis equation suggests that the FT expresses the signal in terms of linear
combination of complex exponential signal. For a real signal, it can be shown that the FT
synthesis equation expresses the signal in terms of linear combination of sine and cosine
terms. A diagrammatic representation of this is as follows:
fig 2.1: A signal shown as a linear combination of sinusoids (FT method)
The analysis equation represents the given signal in a different form; as a function of
frequency. The original signal is a function of time, whereas the after the transformation,
the same signal is represented as a function of frequency. It gives the frequency
components in the signal.
fig 2.2:Transforming a signal from time-domain to frequency-domain, the
FOURIER METHOD
Thus the FT is a very useful tool as it gives the frequency content of the input signal. It
however suffers from a serious drawback. It is explained through an example in the
sequel.
9
2.2 Shortcomings of FT
EXAMPLE 2.1: Consider the following 2 signals
x1(t) = sin(2*π*100*t) 0 <= t < 0.1 sec
= sin(2*π*500*t) 0.1 <= t < 0.2 sec
x2(t) = sin(2*π*500*t) 0 <= t < 0.1 sec
= sin(2*π*100*t) 0.1 <= t < 0.2 sec
A plot of these signals is shown below.
(Note: A time interval of 0 to 0.2 seconds was divided into 10,000 points. The sine of
each point was computed and plotted. Since the signal is of 10,000 points, 16,384 point
FFT was computed which represents the frequency domain of the signal. This was done
in MATLAB)
fig 2.3: signal x1(t) and its FFT
10
fig 2.4 : Signal x2(t) and its FFT
The above example demonstrates the drawback inherent in the Fourier analysis of
signals. It shows that the FT is unable to distinguish between two different signals. The
two signals have same frequency components, but at different times.
Thus, the FT is incapable of giving time information of signals.
In general, FT is not suitable for the analysis of a class of signals called NON-
STATIONARY SIGNALS.
11
This led to the search of new tools for analysis of signals. One such tool that was
proposed was the SHORT TIME FOURIER TRANSFORM (STFT). This STFT too
suffered from a drawback1 and was supplanted by WAVELET TRANSFORM.
In the sequel, CONTINUOUS WAVELET TRANSFORM is introduced, and the same
problem is solved with the help of this transform.
1 see the tutorials on ‘WAVELET TRANSFORMS’ by ROBI POLIKAR for a detailed discussion on this.
12
3 INTRODUCTION TO WAVELETS AND THE
CONTINUOUS WAVELET TRANSFORM (CWT)
INTRODUCTION:
This chapter provides a motivation towards the study of wavelets as a tool for signal
processing. The drawbacks inherent in the Fourier methods are overcome with wavelets.
This fact is demonstrated here.
It must be reiterated that the discussion in this chapter is by no means comprehensive and
exhaustive. The concepts of time-frequency resolution have been avoided for the sake of
simplicity. Instead, the development endeavors to compare the Wavelet methods with the
Fourier methods as the reader is expected to be well conversant with the latter.
3.1 Continuous-time Wavelets
Consider a real or complex-valued continuous-time function ψ(t) with the following
properties 1
1. The function integrates to zero
i. 0)().( =∫∞
∞−tdtψ ………………………… (3.1)
2. It is square integrable or, equivalently, has finite energy:
1A third condition, called admissibility condition also exists. For a detailed study of this topic, the reader is referred to the book by Rao (see references, section I, # 10)
13
∞<∫∞
∞−)(.|)(| 2 tdtψ ………………..……..(3.2)
A function is called mother wavelet if it satisfies these two properties. There is an
infinity of functions that satisfy these properties and thus qualify to be mother wavelet.
The simplest of them is the ‘Haar wavelet’. Some other wavelets are Mexican hat,
Morlet. Apart from this, there are various families of wavelets. Some of the families are
daubechies family, symlet family, coiflet family etc. In this thesis, the main stress is
given on the Daubechies family, which has db1 to db10 wavelets. They are shown in the
following figure1 .
1 db1 is same as haar wavelet
Haar wavelet
14
fig 3.1 : Some wavelet functions.
15
3.2 The Continuous Wavelet Transform (CWT)
Consider the following figure which juxtaposes a sinusoid and a wavelet
fig 3.2 : comparing sine wave and a wavelet
As has already been pointed out, wavelet is a waveform of effectively limited duration
that has an average value of zero.
Compare wavelets with sine waves, which are the basis of Fourier analysis.
Sinusoids do not have limited duration -- they extend from minus to plus infinity. And
where sinusoids are smooth and predictable, wavelets tend to be irregular and
asymmetric.
16
Fourier analysis consists of breaking up a signal into sine waves of various Frequencies
(fig 2.1). Similarly, wavelet analysis is the breaking up of a signal into shifted and scaled
versions of the original (or mother) wavelet. Compare the following figure with fig :2.1 .
fig 3.3 :figure demonstrating the decomposition of a signal into wavelets
The above diagram suggests the existence of a synthesis equation to represent the original
signal as a linear combination of wavelets which are the basis function for wavelet
analysis (recollect that in Fourier analysis, the basis functions are sines and cosines). This
is indeed the case. The wavelets in the synthesis equation are multiplied by scalars. To
obtain these scalars, we need an analysis equation, just as in the Fourier case.
We thus have two equations, the analysis and the synthesis equation. They are stated as
follows:
1. Analysis equation or CWT equation:1
)(.)(*||
1.)(),( td
abt
atfbaC
−= ∫
∞
∞−
ψ ………………..… (3.3)
2. Synthesis equation or ICWT:
1 The ‘*’ indicates complex conjugate.
17
)().(.)(||
1),(
||
11)(
2bdad
abt
abaC
aKtf
ba
−= ∫∫
∞
−∞=
∞
−∞=
ψ………….(3.4)
The basis functions in both Fourier and wavelet analysis are localized in frequency
making mathematical tools such as power spectra (power in a frequency interval) useful
at picking out frequencies and calculating power distributions.
The most important difference between these two kinds of transforms is that individual
wavelet functions are localized in space. In contrast Fourier sine and cosine functions
are non-local and are active for all time t.
This localization feature, along with wavelets localization of frequency, makes many
functions and operators using wavelets “sparse”, when transformed into the wavelet
domain. This sparseness, in turn results in a number of useful applications such as data
compression, detecting features in images and de-noising signals.
Returning to the equations
The quantities ‘a’ and ‘b’ appearing in the above equations represent respectively the
scale and shift of mother wavelet.
The wavelet transform of a signal f(t) is the family C(a,b), given by the analysis equation.
It depends on two indices a and b. From an intuitive point of view, the wavelet
decomposition consists of calculating a "resemblance index" between the signal and the
wavelet located at position b and of scale a. If the index is large, the resemblance is
strong, otherwise it is slight. The indexes C(a,b) are called coefficients. The dependence
of these coefficients on both ‘a’ and ‘b’ is responsible for the wavelet transform
K is a constant; it depends on the wavelet
18
preserving time and frequency information. These quantities are explained in the
following sections.
3.3 The Scale ‘a’
Simply put ‘Scaling a wavelet means stretching (or compressing) it ‘ .To go beyond
colloquial descriptions such as "stretching," we introduce the scale factor, often denoted
by the letter ‘a’. If we're talking about sinusoids, for example, the effect of the scale
factor is very easy to see:
fig 3.4 : Effect of scaling on sine waves
The scale factor works exactly the same with wavelets. The smaller the scale factor, the
more "compressed" the wavelet and vice versa.
(see fig 3.5)
19
It is clear from the diagrams that, for a sinusoid sin(ωt), the scale factor is related
(inversely) to the radian frequency ω . Similarly, with wavelet analysis, the scale is
related to the frequency of the signal.
fig 3.5: Effect of scaling on wavelets
Thus the higher scales correspond to the most "stretched" wavelets. The more stretched
the wavelet, the longer the portion of the signal with which it is being compared, and thus
the coarser the signal features being measured by the wavelet coefficients.
fig 3.6 : Figure demonstrating the effect of stretching the wavelet on the length of
the signal being compared
20
Thus, there is a correspondence between wavelet scales and frequency as revealed