Adaptive compressed sensing of speech signal CHAPTER-1 INTRODUCTION Compressive sensing is an emerging and revolutionary technology that strongly relies on the sparsity of the signal. In compressive sensing the signal is compressively sampled by taking a small number of random projections of the signal, which contain most of the salient information. Compressive sensing has been previously applied in areas like image processing, radar systems and sonar systems. And now it is being used in speech processing as advanced technique to acquiring the data. The key objective in compressed sensing (also referred to as sparse signal recovery or compressive sampling) is to reconstruct a signal accurately and e ciently from a set of few non-adaptive linear ffi measurements. Of course, linear algebra easily shows that in general it is not possible to reconstruct an arbitrary signal from an incomplete set of linear measurements. Thus one must restrict the domain in which the signals belong. To this end, we consider sparse signals, those with few non-zero coordinates. It is now known that many signals such as real-world images or audio signals are sparse . MRITS, DEPARTMENT OF ECE 1
97
Embed
frame based adaptive compressed sensing of speech signal
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Adaptive compressed sensing of speech signal
CHAPTER-1
INTRODUCTION
Compressive sensing is an emerging and revolutionary technology that
strongly relies on the sparsity of the signal. In compressive sensing the signal is
compressively sampled by taking a small number of random projections of the signal,
which contain most of the salient information. Compressive sensing has been
previously applied in areas like image processing, radar systems and sonar systems.
And now it is being used in speech processing as advanced technique to acquiring the
data.
The key objective in compressed sensing (also referred to as sparse signal
recovery or compressive sampling) is to reconstruct a signal accurately and e cientlyffi
from a set of few non-adaptive linear measurements. Of course, linear algebra easily
shows that in general it is not possible to reconstruct an arbitrary signal from an
incomplete set of linear measurements. Thus one must restrict the domain in which
the signals belong. To this end, we consider sparse signals, those with few non-zero
coordinates. It is now known that many signals such as real-world images or audio
signals are sparse .
Since sparse signals lie in a lower dimensional space, one would think that
they may be represented by few linear measurements. This is indeed correct, but the
di culty is determining in which lower dimensional subspace such a signal lies. Thatffi
is, we may know that the signal has few non-zero coordinates, but we do not know
which coordinates those are. It is thus clear that we may not reconstruct such signals
using a simple linear operator, and that the recovery requires more sophisticated
techniques.
The compressed sensing field has provided many recovery algorithms, most
with provable as well as empirical results. There are several important traits that an
optimal recovery algorithm must possess. The algorithm needs to be fast, so that it can
e ciently recover signals in practice. Of course, minimal storage requirements asffi
well would be ideal.
MRITS, DEPARTMENT OF ECE 1
Adaptive compressed sensing of speech signal
The algorithm should provide uniform guarantees, meaning that given a
specific method of acquiring linear measurements, the algorithm recovers all sparse
signals (possibly with high probability).Ideally, the algorithm would require as few
linear measurements as possible. However, recovery using only this property would
require searching through the exponentially large set of all possible lower dimensional
subspaces, and so in practice is not numerically feasible. Thus in the more realistic
setting, we may need slightly more measurements. Finally, we wish our ideal
recovery algorithm to be stable.
This means that if the signal or its measurements are perturbed slightly, then
the recovery should still be approximately accurate. This is essential, since in practice
we often encounter not only noisy signals or measurements, but also signals that are
not exactly sparse, but close to being sparse.The conventional scheme in signal
processing, acquiring the entire signal and then compressing it, was questioned by
Donoho. Indeed, this technique uses tremendous resources to acquire often very large
signals, just to throw away information during compression.
The natural question then is whether we can combine these two processes, and
directly sense the signal or its essential parts using few linear measurements. Recent
work in compressed sensing has answered this question in positive, and the field
continues to rapidly produce encouraging results.
1.1 Objective
Compressed sensing (CS) is an emerging signal acquisition theory that directly
collects signals in a compressed form if they are sparse on some certain basis. It
originates from the idea that it is not necessary to invest a lot of power into observing
the entries of a sparse signal in all coordinates when most of them are zero anyway.
Rather it should be possible to collect only a small number of measurements that still
allow for reconstruction. This is potentially useful in applications where one cannot
afford to collect or transmit a lot of measurements but has rich resources at the
decoder.
MRITS, DEPARTMENT OF ECE 2
Adaptive compressed sensing of speech signal
Observing that different kind speech frames have different intra-frame
correlations, a frame-based adaptive compressed sensing framework for speech
signals has been proposed. The objective of this project is to further improve the
performance of the existing compressed sensing process that uses non adaptive
projection matrix, by using the adaptive projection matrix based on frame analysis.
Average-frame signal-to-noise ratio (AFSNR) is calculated to evaluate the
performance of the frame-based adaptive CS with the non-adaptive CS.
1.2 Existing System
Compressed sensing is the technique used to overcome the constraints of
conventional sampling theorem. The compressed sensing allows us to go beyond the
nyquist rate and sample the signal below nyquist frequency. In a typical
communication system, the signal is sampled at least at twice the highest frequency
contained in the signal. However, this limits efficient ways to compress the signal, as
it places a huge burden on sampling the entire signal while only a small number of the
transform coefficients are needed to represent the signal.
On the other hand, compressive sampling provides a new way to reconstruct
the original signal from a minimal number of observations. CS is a sampling
paradigm that allows us to go beyond the Shannon limit by exploiting the sparsity
structure of the signal. It allows us to capture and represent the compressible signals
at a rate significantly below the Nyquist rate. The signal is then reconstructed from
these projections by using different optimization techniques.
During compressive sampling only the important information about a signal is
acquired, rather than acquiring the important information plus the information of a
signal which will be eventually discarded at the receiver.But the existing compressed
sensing uses non adaptive projection matrix and takes the same number of projections
for all the frames. This leads to degradation in system’s efficiency.
1.3 Proposed System
The efficiency of the conventional non adaptive compressed sensing can be
increased by using the Adaptive projection matrix. The adaptive projection matrix
MRITS, DEPARTMENT OF ECE 3
Adaptive compressed sensing of speech signal
uses different projections for different frames based on their inter frame correlations
thus improving the efficiency of the system.
Most work in CS research focus on random projection matrix which is
constructed by considering only the signals’ sparsity rather than other properties. In
other word, the construction of projection matrix is non-adaptive.
Observing that different kind speech frames have different intra-frame
correlations, a frame-based adaptive compressed sensing framework, which applies
adaptive projection matrix for speech signals, has been proposed.To do so,
neighbouring frames are compared to estimate their intra-frame correlation, every
frame is classified into different categories, and the number of projections for each
frame is adjusted accordingly.The experimental results show that the adaptive
projection matrix can significantly improve the speech reconstruction quality.
MRITS, DEPARTMENT OF ECE 4
Adaptive compressed sensing of speech signal
CHAPTER-2
LITERATURE SURVEY
According to information theory, the bit rate at which the condition of
distortion less transmission of any source signal is possible is determined by the
entropy of the speech source message. However, that in practical terms the source rate
corresponding to the entropy is only asymptotically achievable, for the encoding
memory length or delay tends to infinity. Any further compression is associated with
information loss or coding distortion. Many practical source compression techniques
employ lossy coding, which typically guarantees further bit rate economy at the cost
of nearly imperceptible speech, audio, video, and other source representation
degradation.
Note that the optimum Shannonian source encoder generates a perfectly
uncorrelated source-coded stream, in which all the source redundancy has been
removed. Therefore, the encoded source symbols which in most practical cases are
constituted by binary bits are independent, and each one has the same significance.
Having the same significance implies that the corruption of any of the source-encoded
symbols results in identical source signal distortion over imperfect channels. Under
these conditions, according to Shannon's fundamental work the best protection against
transmission errors is achieved if source and channel coding are treated as separate
entities. When using a block code of length N channel-coded symbols in order to
encode K source symbols with a coding rate of R = K/N, the symbol error rate can be
rendered arbitrarily low if N tends to infinity .
2.1 Speech Production
Speech is a natural form of communication for human beings, we use speech
every day almost unconsciously, but an understanding of the mechanisms on which it
is based will help to clarify how the brain processes information. Figure 2.1 shows the
process of human speech production.
MRITS, DEPARTMENT OF ECE 5
Adaptive compressed sensing of speech signal
Figure 2.1: Human Speech Production System
Human speech production comprises lungs, vocal cords, and the vocal tract.
The vocal cords are expressed as a simple vibration model, and the pitch of the speech
changes according to adjustments in the tension of the vocal cords. Speech is
generated by emitting sound pressure waves, radiated primarily from the lips,
although significant energy emanates through sounds from the nostrils, throat, and the
like.
The air compressed by the lungs excites the vocal cords in two typical modes.
When generating voiced sounds, the vocal cords vibrate and generate a high-energy
quasi-periodic speech waveform, while in the case of lower energy unvoiced sounds,
the vocal cords do not participate in the voice production and the source behaves
similarly to a noise generator.
In a somewhat simplistic approach, the excitation signal denoted by E (z) is
then filtered through the vocal apparatus, which behaves like a spectral shaping filter
with a transfer function of H (z) that is constituted by the spectral shaping action of
the glottis, which is defined as the opening between the vocal folds. Further spectral
shaping is carried out by the vocal tract, lip radiation characteristics, and so on.
MRITS, DEPARTMENT OF ECE 6
Adaptive compressed sensing of speech signal
The human speech in its pristine form is an acoustic signal. For the purpose of
communication and storage, it is necessary to convert it into an electrical signal. This
is accomplished with the help of certain instruments called ‘transducers’.
This electrical representation of speech has certain properties.
1. It is a one-dimensional signal, with time as its independent variable.
2. It is random in nature.
3. It is non-stationary, i.e. the frequency spectrum is not constant in time.
Microphone is a transeducer that converts the acoustic speech signal into
elictrical signal .The micro phone receives the acoustic voice signal and produces the
electrical signal as output whose amplitude is proportional to the input acoustic voice
signal intensity.
Figure 2.2: block diagram of micro phone
The electrical signal which is produced by the micro phone is a analog signal
whose amplitude various continuously with time and it is continuous both in time and
amplitude as shown in figure below,
Figure 2.3: Analog speech signal
MRITS, DEPARTMENT OF ECE 7
Adaptive compressed sensing of speech signal
2.2 Digitization of Speech
Speech is a very basic way for humans to convey information to one another.
With a bandwidth of only 4 kHz, speech can convey information with the emotion of
a human voice. People want to be able to hear someone’s voice from anywhere in the
world as if the person was in the same room .As a result a greater emphasis is being
placed on the design of new and efficient speech coders for voice communication and
transmission. Today applications of speech coding and compression have become
very numerous.
Though the electrical signal which is coming from microphone can be
processed in its original analog form it is not as much efficient as it is processed in
digital form. With the advent of digital computing machines, it was propounded to
exploit the powers of the same for processing of speech signals. This required a
digital representation of speech. To achieve this, the analog signal is sampled at some
frequency and then quantized at discrete levels. There are many advantages in
processing the signal in digital than in analog form. The analog speech signal is
converted into digital form by using Analog to Digital Converter (ADC).
Figure 2.4: Block diagram of Analog to digital converter
As shown in the figure above the analog to digital converter consists of three
basic functions.
1. Sampling
2. Quantization
3. Encoding
MRITS, DEPARTMENT OF ECE 8
Adaptive compressed sensing of speech signal
2.2.1 Sampling
Sampling process is used to convert the continuous time signal to discrete time
signal. This is achieved by multiplying the input continuous signal with an impulse
train of unity magnitude. Here the impulse train frequency which is also known as
sampling frequency should be greater than twice the highest frequency component
present in the input analog signal (Fs > = 2Fm), and this rate is called as nyquist
rate( nyquist condition for sampling),
Figure 2.5: Block diagram of Sampler
The output of sampler is a discrete time signal in which the samples present at
only discrete intervals of time.
2.2.2 Quantization
The output obtained from the sampling process is discrete in only time but its
amplitude values are still assume continuous values. In order to make it discrete ,
quantization is used.
Figure 2.6: Block diagram of Quantizer
The quantizer assigns discrete values to the input samples by either Rounding
off process or Truncation process. In rounding off process the values are assign to the
nearest integer multiple of step size. Where as in truncation process the values are
assigned by truncating the values above the integral multiple of step size. When
MRITS, DEPARTMENT OF ECE 9
Adaptive compressed sensing of speech signal
compared Rounding off process the truncation process produces more error.Thus the
output of quantizer is a signal which is discrete both in time and amplitude.
2.2.3 Encoding
The encoder is used to assign binary values to the sample values coming from
the quantizer.
Figure 2.7: Block diagram of Encoder
Thus the out from the encoder is pure digital signal represented by zeros and
ones.
2.2.4 Anti-Aliasing Filter
Apart from the three building blocks sampler, quantizer and encoder discussed
above the Analog to Digital Converter also contains a low pass filter called Anti-
Aliasing filter. Anti-aliasing low-pass filtering (LPF) is necessary in order to band
limit the signal to a bandwidth of B before sampling. In case of speech signals, about
1% of the energy resides above 4 kHz and only a negligible proportion above 7 kHz.
Hence, commendatory quality speech links, which are also often referred to as
wideband speech systems, typically band limit the speech signal to 7-8 kHz.
Conventional telephone systems usually employ a bandwidth limitation of 0.3-3.4
kHz, which results only in a minor speech degradation, hardly perceptible by the
listener.
2.3 Speech Signal Analysis
In contrast to deterministic signals, random signals, such as speech, music,
video, and other information signals, cannot be described by the help of analytical
formulas. They are typically characterized by the help of statistical functions. The
power spectral density (PSD), autocorrelation function (ACF), cumulative distribution
MRITS, DEPARTMENT OF ECE 10
Adaptive compressed sensing of speech signal
function (CDF), and probability density function (PDF) are some of the most frequent
ones invoked.A typical speech sentence signal consists of two main parts.
one carries the speech information, and the other includes silent or noise
sections that are between the utterances, without any verbal information. The verbal
(informative) part of speech can be further divided into two categories namely Speech
and Unvoiced Speech.
2.3.1 Voiced and Non Voiced Speech
In speech there are two major types of excitation, voiced and unvoiced.
Voiced speech consists mainly of vowel sounds. It is produced by forcing air through
the glottis, proper adjustment of the tension of the vocal cords results in opening and
closing of the cords, and a production of almost periodic pulses of air. These pulses
excite the vocal tract. Psychoacoustics experiments show that this part holds most of
the information of the speech and thus holds the keys for characterizing a speaker.
Unvoiced speech sections are generated by forcing air through a constriction formed
at a point in the vocal tract (usually toward the mouth end), thus producing
turbulence. Being able to distinguish between the three is very important for speech
signal analysis.
Voiced speech tends to be periodic in nature. Examples of voiced sounds are
English vowels, such as the /a/ in .bay and the /e/ in .see.Since unvoiced speech is due
to turbulence, the speech is aperiodic and has a noise-like structure. Some examples
of unvoiced English sounds are the /s/ in .so and the /h/ in .he.. In general at least 90%
of the speech energy is always retained in the first N/2 transform coefficients, if the
speech is a voiced frame.
However, for an unvoiced frame the energy is spread across several frequency
bands and typically the first N/2 coefficients holds less than 40% of the total energy.
Due to this wavelets are inefficient at coding unvoiced speech. Unvoiced speech
frames are infrequent. By detecting unvoiced speech frames and directly encoding
them (perhaps using entropy coding), no unvoiced data is lost and the quality of the
compressed speech will remain transparent.
Typical voiced and unvoiced speech waveform segments are shown in Figures
2.8 and 2.9 respectively, along with their corresponding power densities. Clearly, the
MRITS, DEPARTMENT OF ECE 11
Adaptive compressed sensing of speech signal
unvoiced segment appears to have a significantly lower magnitude, which is also
reflected by its PSD.
Figure 2.8: Voice speech segment and its PSD
Figure 2.9: Unvoiced speech segment and its PSD
The voiced segment shown in Figure 2.10 is quasi-periodic in the time
domain, and it has an approximately 80-sample periodicity, identified by the positions
of the largest time-domain signal peaks, which corresponds to10 ms. This interval is
referred to as the pitch period and it is also often expressed in terms of the pitch
frequency p, which in this example 1/ (10 ms) = 100 Hz. In the case of male speakers,
MRITS, DEPARTMENT OF ECE 12
Adaptive compressed sensing of speech signal
the typical pitch frequency range is between 40 and 120 Hz , whereas for females it
can be as high as 300-400 Hz.
Observe furthermore that within each pitch period there is a gradually
decaying oscillation, which is associated with the excitation and gradually decaying
vibration of the vocal cords.
A perfectly periodic time-domain signal would have a line
spectrum, but since the voiced speech signal is quasi-periodic with a frequency of p
(rather than being perfectly periodic) its spectrum exhibits somewhat widened but
distinctive spectral needles at frequencies of np, rather than being perfectly periodic.
As a second phenomenon, it can also be observed that three, sometimes four, spectral
envelope peaks. In voiced spectrum of Figure 2.8 these formant frequencies are
observable around 500 Hz, 1500 Hz, and 2700 Hz, and they are the manifestation of
the resonances of the vocal tract at these frequencies. In contrast, the unvoiced
segment of Figure 2.9 does not have a formant structure, rather it has a more
dominant high-pass nature, exhibiting a peak around 2500 Hz. Observe, furthermore,
that its energy is much lower than that of the voiced segment .
It is equally instructive to study the ACF of voiced and unvoiced
segments, which are portrayed on an expanded scale in Figure 2.10 and Figure 2.11
respectively.
Figure 2.10: voice speech segment and its ACF
MRITS, DEPARTMENT OF ECE 13
Adaptive compressed sensing of speech signal
The voiced ACF shows a set of periodic peaks at displacements of about 20
samples, corresponding to 2.5 ms, which coincides with the positive quasi-periodic
time-domain segments. Following four monotonously decaying peaks, there is a more
dominant one around a displacement of 80 samples, which indicates the pitch
periodicity.
Figure 2.11: unvoiced speech segment and its ACF
The periodic nature of the ACF can therefore, for example, be exploited to
detect and measure the pitch periodicity in a range of applications, such as speech
codec’s and voice activity detectors. Observe, however, that the first peak at a
displacement of 20 samples is about as high as the one near 80.Hence, a reliable pitch
detector has to attempt to identify and rank all these peaks in order of prominence,
exploiting also the a priori knowledge as to the expected range of pitch frequencies.
By contrast, the unvoiced segment has a much more rapidly decaying ACF, indicating
no inherent correlation between adjacent samples and no long-term periodicity. The
voiced speech signal and unvoiced speech signal have some distinct characteristic
features that enable us to distinguish between voiced speech and unvoiced speech.
2.3.2 Zero Crossing Rate
The rate at which the speech signal crosses zero can provide information about
the source of its creation. It is well known that unvoiced speech has a much higher
ZCR than voiced speech this is because most of the energy in unvoiced speech is
MRITS, DEPARTMENT OF ECE 14
Adaptive compressed sensing of speech signal
found in higher frequencies than in voiced speech, implying a higher ZCR for the
former.
2.3.3 Cross-Correlation
Cross-correlation is calculated between two consecutive pitch cycles. The
cross-correlation values between pitch cycles are higher (close to 1) in voiced speech
than in unvoiced speech.
MRITS, DEPARTMENT OF ECE 15
Adaptive compressed sensing of speech signal
CHAPTER-3
SPEECH COMPRESSION
In the recent years, large scale information transfer by remote computing and
the development of massive storage and retrieval systems have witnessed a
tremendous growth. To cope up with the growth in the size of databases, additional
storage devices need to be installed and the modems and multiplexers have to be
continuously upgraded in order to permit large amounts of data transfer between
computers and remote terminals. This leads to an increase in the cost as well as
equipment. One solution to these problems is “COMPRESSION” where the database
and the transmission sequence can be encoded efficiently.
3.1 Fourier Analysis
Historically, Fourier Transform has been the most widely used tool for signal
processing. As signal processing began spreading its tentacles and encompassing
newer signals, Fourier Transform was found to be unable to satisfy the growing need
for processing a bulk of signals.And therefore is had been replaced by wavelet
transform.
A major drawback of Fourier analysis is that in transforming to the frequency
domain, the time domain information is lost. The most important difference between
Fourier Transform and wavelet transform is that individual wavelet functions are
localized in space. In contrast Fourier sine and cosine functions are non-local and are
active for all time t.
3.2 Continuous Wavelet Transform (CWT)
The drawbacks inherent in the Fourier methods are overcome with wavelets.
Consider a real or complex-valued continuous-time function ψ(t) with the following
properties
1. The function integrates to zero
MRITS, DEPARTMENT OF ECE 16
Adaptive compressed sensing of speech signal
∫−∞
∞
ψ ( t ) . d ( t )=0
2. It is square integrable or, equivalently has finite energy
∫−∞
∞
∣ψ ( t )∣2 .d (t )<∞
A function is called mother wavelet if it satisfies these two properties. There is
infinity of functions that satisfy these properties and thus qualify to be mother
wavelet. The simplest of them is the ‘Haar wavelet’. Some other wavelets are
Mexican hat, Morlet. Apart from this, there are various families of wavelets. Some of
the families are daubechies family, symlet family, coiflet family etc.
Consider the following figure which juxtaposes a sinusoid and a wavelet
Figure 3.1: comparing sine wave and a wavelet
wavelet is a waveform of effectively limited duration that has an average value
of zero. Compare wavelets with sine waves, which are the basis of Fourier analysis,
Sinusoids do not have limited duration,they extend from minus to plus infinity. And
while sinusoids are smooth and predictable, wavelets tend to be irregular and
asymmetric.Fourier analysis consists of breaking up a signal into sine waves of
various Frequencies .Similarly, wavelet analysis is the breaking up of a signal into
shifted and scaled versions of the original (or mother) wavelet.
MRITS, DEPARTMENT OF ECE 17
Adaptive compressed sensing of speech signal
Figure 3.2: demonstrating the decomposition of a signal into
wavelets
The above diagram suggests the existence of a synthesis equation to represent
the original signal as a linear combination of wavelets which are the basis function for
wavelet analysis (recollect that in Fourier analysis, the basis functions are sines and
cosines). This is indeed the case. The wavelets in the synthesis equation are multiplied
by scalars
3.3 Discrete Wavelet Transform (DWT)
Calculating wavelet coefficients at every possible scale (for continuous WT) is
a fair amount of work, and it generates lot of data. If scales and positions are choosed
based on powers of two then analysis will be much more efficient and just as accurate.
Such analysis can be obtained from the discrete wavelet transform (DWT).
3.3.1 Vanishing Moments
The number of vanishing moments of a wavelet indicates the smoothness of
the wavelet function as well as the flatness of the frequency response of the wavelet
filters (filters used to compute the DWT).Typically a wavelet with p vanishing
moments satisfies the following equation,
∫−∞
∞
tmψ (t )dt=0 For m=0....p-1
MRITS, DEPARTMENT OF ECE 18
Adaptive compressed sensing of speech signal
Wavelets with a high number of vanishing moments lead to a more
compact signal representation and are hence useful in coding applications. However,
in general, the length of the filters increases with the number of vanishing moments
and the complexity of computing the DWT coefficients increases with the size of the
wavelet filters.
3.3.2 Fast Wavelet Transform
The Discrete Wavelet Transform (DWT) coefficients can be computed by
using Mallet’s Fast Wavelet Transform algorithm. This algorithm is sometimes
referred to as the two-channel sub band coder and involves filtering the input signal
based on the wavelet function used. To explain the implementation of the Fast
Wavelet Transform algorithm consider the following equations:
1. Φ (t) =∑k
c (k ) Φ (2t-k)
2. Ψ (t) =∑k
(−1 )kc (1−k )Φ (2 t−k )
The first equation is known as the twin-scale relation (or the dilation equation)
and defines the scaling function. The next equation expresses the wavelet in terms of
the scaling function. These equations represent the impulse response coefficients for a
low pass filter of length 2N, with a sum of 1 and a norm of 1/squrerootof(2). The high
pass filter is obtained from the low pass filter using the relationship
gk=(−1 )k c (1−k )
Where k varies over the range (1-(2N-1)) to 1.
3.4 Wavelets and Speech Compression
The idea behind signal compression using wavelets is primarily linked
to the relative scarceness of the wavelet domain representation for the signal.
MRITS, DEPARTMENT OF ECE 19
Adaptive compressed sensing of speech signal
Wavelets concentrate speech information (energy and perception) into a few
neighbouring coefficients. Therefore as a result of taking the wavelet transform of a
signal, many coefficients will either be zero or have negligible magnitudes.
Another factor that comes into picture is taken from psycho acoustic studies.
Since our ears are more sensitive to low frequencies than high frequencies and our
hearing threshold is very high in the high frequency regions, a method for
compression is used by means of which the detail coefficients (corresponding to high
frequency components) of wavelet transforms are thresholded such that the error due
to thresholding is inaudible to our ears. Since some of the high frequency components
are discarded, a smoothened output signal is expected , as is shown in the following
figure
Figure 3.3 original signal and compressed signal using DWT
In summary, the notion behind compression is based on the concept that the
regular Signal component can be accurately approximated using a small Number of
approximation coefficients (at a suitably chosen level) and some of the detail
Coefficients.Data compression is then achieved by treating small valued coefficients
as insignificant data and thus discarding them. The process of compressing a speech
signal using Wavelets involves a number of different stages, each of which is
discussed below.
3.4.1 Choice of Wavelet
MRITS, DEPARTMENT OF ECE 20
Adaptive compressed sensing of speech signal
The choice of the mother-wavelet function used in designing high quality
speech coders is of prime importance. Choosing a wavelet that has compact support in
both time and frequency in addition to a significant number of vanishing moments is
essential for an optimum wavelet speech compressor.
This is followed very closely by the Daubechies D20, D12, D10 or D8
wavelets, all concentrating more than 96% of the signal energy in the Level 1
approximation coefficients. Wavelets with more vanishing moments provide better
reconstruction quality, as they introduce less distortion into the processed speech and
concentrate more signal energy in a few neighbouring coefficients.
3.4.2 Wavelet Decomposition
Wavelets work by decomposing a signal into different resolutions or
frequency bands, and this task is carried out by choosing the wavelet function and
computing the Discrete Wavelet Transform (DWT). Signal compression is based on
the concept that selecting a small number of approximation coefficients (at a suitably
chosen level) and some of the detail coefficients can accurately represent regular
signal components. Choosing a decomposition level for the DWT usually depends on
the type of signal being analyzed or some other suitable criterion such as entropy. For
the processing of speech signals decomposition up to scale 5 is adequate, with no
further advantage gained in processing beyond scale 5.
3.4.3 Truncation of Coefficients
After calculating the wavelet transform of the speech signal, compression
involves truncating wavelet coefficients below a threshold this means that most of the
speech energy is in the high-valued coefficients, which are few. Thus the small valued
coefficients can be truncated or zeroed and then be used to reconstruct the signal. This
compression scheme provided a segmental signal-to-noise ratio (SEGSNR) of 20 dB,
with only 10% of the coefficients.Two different approaches are available for
calculating thresholds. The first, known as Global Thresholding involves taking the
wavelet expansion of the signal and keeping the largest absolute value coefficients.
In this case you can manually set a global threshold, a compression
performance or a relative square norm recovery performance. Thus only a single
parameter needs to be selected. The second approach known as By Level
MRITS, DEPARTMENT OF ECE 21
Adaptive compressed sensing of speech signal
Thresholding consists of applying visually determined level dependent thresholds to
each decomposition level in the wavelet transform.
MRITS, DEPARTMENT OF ECE 22
Adaptive compressed sensing of speech signal
3.4.4 Encoding Coefficients
Signal compression is achieved by first truncating small-valued coefficients
and then efficiently encoding them. One way of approach to compression is to encode
consecutive zero valued coefficients, with two bytes. One byte to indicate a sequence
of zeros in the wavelet transforms vector and the second byte representing the number
of consecutive zeros.
For further data compaction a suitable bit encoding format, can be used to
quantize and transmit the data at low bit rates. A low bit rate representation can be
achieved by using an entropy coder like Huffman coding or arithmetic coding.
MRITS, DEPARTMENT OF ECE 23
Adaptive compressed sensing of speech signal
CHAPTER-4
COMPRESSIVE SENSING
The theory of compressive sensing was developed by Candes et al and
Donoho. Compressed sensing (CS) also named compressive sampling .In a typical
communication system, the signal is sampled at least at twice the highest frequency
contained in the signal. However, this limits efficient ways to compress the signal, as
it places a huge burden on sampling the entire signal while only a small number of the
transform coefficients are needed to represent the signal. On the other hand,
compressive sampling provides a new way to reconstruct the original signal from a
minimal number of observations. CS is a sampling paradigm that allows to go beyond
the Shannon limit by exploiting the sparsity structure of the signal. It allows to
capture and represent the compressible signals at a rate significantly below the
Nyquist rate.
The signal is then reconstructed from these projections by using
different optimization techniques.During compressive sampling only the important
information about a signal is acquired, rather than acquiring the important information
plus the information of a signal which will be eventually discarded at the receiver.
The key elements that need to be addressed before using compressive sensing are the
following
how to find the transform domain in which the signal has a sparse
representation,
how to effectively sample the sparsely signal in the time domain,
how to recover the original signal from the samples by using
optimization techniques.
In summary, the large amount of data needed to sample at the Nyquist rate,
especially for speech, image and video signals, motivates the study of compressive
sensing as a feasible solution for future mobile communication systems. Sparse
signals are defined as signals that can be represented by a limited number of data
MRITS, DEPARTMENT OF ECE 24
Adaptive compressed sensing of speech signal
points in the transform domain. Many real-world signals can be classified into this
category using an appropriate transform domain. For instance, if signal is a sine, it is
clearly not sparse, but its Fourier transform is extremely sparse. The consequences of
acquiring large amounts of data, added to the overhead of compression can be
improved by using compressive sensing. As a result, there are potential savings in
terms of energy, memory and processing.
4.1 Signal Sparsity
Sparsity of the signal represents the presence of signal transform coefficients
less densely.Sparsity allows to reconstruct the signal with less number of projections
(samples).The procedure used to ensure the sparsity of the signal is called transform
coding, which is performed by the following four steps
1. Full N-points of a signal x is obtained using the Nyquist rate ,
2. Complete set of transform coefficients (DFT) is obtained,
3. Located the K largest coefficients and threw away the smallest coefficients
4. Multipled the signal by the measurement matrix to obtain the observation
vector of length M.
Figure 4.1 shows an example of how compressive sensing can be used to
compress a signal below Nyquist rate. In this example, the original sampled signal is
composed of 300 samples. The intent is to reconstruct the signal using only 30
samples. Figure 4.1 (a) shows the time domain representation of the sampled signal.
From this figure, it is evident that by selecting only 30 samples (red dots) from the
300 samples it will be impossible to reconstruct the original signal perfectly. On the
other hand, by applying compressive sensing to the frequency representation of the
signal it is possible to perfectly reconstruct it from a significant small number of
samples.
In order to achieve this goal it is necessary to implement optimization
techniques. However, not every optimization technique can be used for this purpose.
For example, Figure 4.1(c) represents the reconstructed spectrum using the l2
minimization.Clearly; there are significant differences between the signal in Figure
4.1(b) and the signal from Figure 4.1(c).
MRITS, DEPARTMENT OF ECE 25
Adaptive compressed sensing of speech signal
Figure 4.1: (a) Time domain representation of the signal composed of 300
samples,
(b) Fourier spectrum of the signal to be encoded
(c) Reconstruction of the Fourier spectrum via l2 minimization
(d) Reconstruction of the Fourier spectrum via l1 minimization
In contrast, the reconstruction using the l1 minimization gives as a
result a perfect reconstruction. This can be clearly seen by comparing Figure 4.1(b)
and Figure 4.1(d). In summary, optimization techniques based on l1 minimization are
desired when compressive sensing is used.
MRITS, DEPARTMENT OF ECE 26
Adaptive compressed sensing of speech signal
4.2 Measurement Matrix
In compresed sensing special emphasis is given to represent the signal with an
incoherent basis.The linear measurement process that computes M < N inner products
between x and the collection of vectors
{∅ j}j=1M via y j =< x, ∅ j>
Where ∅ is an M×N measurement matrix with each row been a measurement
vector. It has been seen that some of the measurement matrices can be used in any
scenario, in the sense that they are incoherent with any fixed basis Ψ such as Gabor,
spikes, sinusoidal and wavelets. The compressive sensing measurement process with
K-sparse coefficient vector x is depicted in Figure 4.2
Figure 4.2: Compressive sensing measurement process
The measurement matrix plays a vital role in the process of recovering the
original signal.There are two types of measurements matrices that can be used in
compressive sensing.The Random measurement matrix and the predefined
measurement matrix. The fundamental revelation is that, if a signal x composed of N
samples is sparse then the actual signal can be reconstructed using below formula
M ≥ O (Log (N/K))
MRITS, DEPARTMENT OF ECE 27
Adaptive compressed sensing of speech signal
Furthermore, x can be perfectly reconstructed using different optimization
techniques. If Φ is a structurally random matrix, its rows are not stochastically
independent because they are randomized from the same random seed vector. The
random matrix is transposed and then orthogonalized. This will have the effect of
creating a matrix that represents an orthonormal basis. In a predefined measurement
matrix, the matrix is created by using function like the Dirac functions and Sine
functions.
In this case, the signal is multiplied by several Dirac functions cantered at
different locations to obtain the observation vector. Then the speech signal can be
reconstructed using the l1 normalization method by using the observation vector and
the predefined measurement matrix. Linear programming is another procedure that
plays a vital role in reconstructing the original signal. It is a mathematical approach
designed to get the best outcome in a given mathematical model, which is a special
case of mathematical programming. Linear programming can be expressed in the
following canonical form
Maximize eT x
Subject to Ax ≤ b
Where x represents the variable that is to be determined, e and b are vectors of
coefficients and A is a matrix of coefficients. The above expression which has to be
maximized or minimized is called the objective function and the equation defines the
constraints over which the objective function has to be optimized. At the end the
reconstruction of the speech signal depends upon the observation vector and the
measurement matrix.
4.3 Signal Reconstruction in Compressive Sensing
Recent developments in signal theory have shown that a sparse signal is a
useful model in areas such as communications, radar and image processing. Therefore
the assumption that every signal can be represented in a sparse form has helped in the
compression of the signal of interest. The perfect reconstruction of a signal x depends
on the measurement matrix Φ and the measurement vector y.
MRITS, DEPARTMENT OF ECE 28
Adaptive compressed sensing of speech signal
The compressive sensing theory tells that when the matrix ΦΨ has the
Restricted Isometric Property (RIP) which are nearly orthonormal then it is possible to
recover the K largest significant coefficients from a similar size set of M=O(K
log(N/K)) measurements of y. As a result, the sparse signal can be reconstructed by
different optimization techniques such as l1 norm and convex optimization. The first
minimization technique which has been used to reconstruct the signal is the l1
minimization
(P1) min || x ||l1 Subject to Φx = y
This is also known as basis pursuit (P1). The goal of this technique is to find
the vectors with the smallest l1-norm
║X║l=∑i=1
n
∣ X i ∣
It is also known as Taxicab norm Manhattan norm. The name relates to the
distance a taxi has to drive in rectangular street grid to get from the origin to the point
x. The distance obtained from this norm is called the Manhattan distance or l1
distance. The other optimization technique known as convex optimization (cvx) will
solve many medium and small scale problems. By using cvx the signal is minimized
in order to reconstruct the original signal.
4.4 Optimization Techniques
Signal reconstruction plays an important role in compressive sensing theory
where the signal is reconstructed or recovered from a minimum number of
measurements. By using optimization techniques it is possible to recover the signal
without losing the information at the receiver.
4.4.1 l1 Minimization
A recent series of papers have developed a theory of signal recovery from
highly incomplete information. The results states that a sparse vector x ∈RN can be
recovered from a small number of linear measurement b=Ax∈RK, K<< N by solving a
MRITS, DEPARTMENT OF ECE 29
Adaptive compressed sensing of speech signal
convex program. l1 minimization is used to solve the under determined linear
equations or sparsely corrupted solution to an over determined equations.
Recently, l1 minimization has been proposed as a convex alternative to the
combinatorial norm l1, which simply counts the number of nonzero entries in a
vector, for synthesizing the signal as a sparse superposition of waveforms.The
program P1min || x ||1 subject to Ax=b, is also known as basis pursuit. The goal of this
program is to find the smallest l1 norm. This algorithm search is for a vector x, that
will explain the observation b. If the signal x is sufficiently sparse then (P1) will find
the norm of x by using the values of A and b. When x, A, and b have real valued
entries, (P1) can be recast as a linear program.
4.4.2 Matching Pursuit
Orthogonal matching pursuit (OMP) is a canonical greedy algorithm for a
sparse approximation. Let Φ represent a matrix of size M×N (where typically M < N)
and y denotes a vector in RM , the goal of OMP is to recover a coefficient vector x ∈
RN with roughly K < M non-zero terms so that Φx equals y exactly or approximately.
OMP is frequently used to find a sparse representations of the signal y ∈ RM in
settings where represents a dictionary for the signal space. It is also commonly used in
compressive sensing, where y=Φx represents compressive measurements of a sparse
signal x ∈RN to be recovered. One of the attractive features of OMP is its simplicity.
OMP is empirically competitive in terms of approximation performance.
4.4.3 Orthogonal Matching Pursuit (OMP)
In this project, signal is reconstructed frame by frame using OMP method.
OMP uses sub Gaussian measurement matrices to reconstruct sparse signals. If Φ is
such a measurement matrix, then Φ Φ is in a loose sense close to the identity.∗
Therefore the largest coordinate of the observation vector y = Φ Φx is expected to∗
correspond to a non-zero entry of x. Thus one coordinate for the support of the signal
x is estimated. Subtracting of that contribution from the observation vector y and
repeating eventually yields the entire support of the signal x. OMP is quite fast, both
in theory and in practice, but its guarantees are not as strong as those of Basis Pursuit.
The algorithm’s simplicity enables a fast runtime. The algorithm iterates s times, and
MRITS, DEPARTMENT OF ECE 30
Adaptive compressed sensing of speech signal
each iteration does a selection through d elements, multiplies by Φ Φx and solves a∗
least squares problem.
MRITS, DEPARTMENT OF ECE 31
Adaptive compressed sensing of speech signal
CHAPTER-5
ADAPTIVE COMPRESSIVE SENSING
In conventional compressed sensing process, the projection matrix which is
used to generate the required compressed signal is generated randomly and considered
to be fixed during the entire conversion process .That means the projection matrix is
non-adaptive. Though this process results in better performance when compared to
conventional sampling process, even better results can be obtained by using adaptive
projection matrix.
5.1 Adaptive Projection Matrix
Most work in CS research focus on random projection matrix which is
constructed by considering only the signals’ sparsity rather than other properties. In
other word, the construction of projection matrix is non-adaptive. Observing that
different kind speech frames have different intra-frame correlations, a frame-based
adaptive compressed sensing framework for speech signals has been proposed , which
applies adaptive projection matrix. To do so, the neighbouring frames are compared to
estimate their intra-frame correlation, every frame is classified into different
categories, and the number of projections are adjusted accordingly.
The experimental results show that the adaptive projection matrix
can significantly improve the speech reconstruction quality. Intra-frame correlation of
speech signals is explored to achieve efficient sampling. Because different kind
speech signals may have different intra-frame correlations a frame-based adaptive CS
framework that uses different sampling strategies in different kind speech frames, has
been proposed.
5.2 Frame Analysis
Each speech sequence is divided into non-overlapping frames of size 1x n and
all frames in a speech sequence are processed independently. The projection matrix is
initialised by Gaussian random matrix Φ which has been proven to be incoherent with
most sparse basises at high probability.
MRITS, DEPARTMENT OF ECE 32
Adaptive compressed sensing of speech signal
Figure 5.1: The frame-based adaptive CS framework for speech
As shown in Figure 5.1, for each frame in a speech sequence, a small number
of projections is collected, and compared these projections with the projections
collected for the previous frame. Based on the comparison results, the correlation
between these two frames is estimated , and the correlation is classified into different
categories. Then the sampling strategy is adjusted according to the correlation type
and different number of samples for the current frame are collected.
For the current t th frame original speech signal, we represent it as X(t)
its previous frame t-1 is represented using x(t-1) The difference between X(t) and x(t-
1) reflects the correlation between the two neighbouring frames, and can be used to
classify the correlation. We use the collected measurements to estimate the correlation
instead. The same projection matrix Φ is applied to all frames in the partial sampling
stage, and we have y(t)-y(t-1)= Φx(t)- Φx(t-1), where y(t) and y(t-1) are the the
projection vectors of x(t) and x(t-1) respectively. As each sample in y(t)-y(t-1) is a
linear combination of x9t)-x(t-1) the difference between the two projection vectors
also reflects the intensity changes in the two frame.
Therefore, we can estimate the amount of intensity changes in the two frames
using only a small number of projections. Let ΦM0 be a matrix containing the first
M0 rows of the Gaussian random matrix Φ. For the current frame t, we first use ΦM0
to collect M0 measurements y (t) ^ M0= ΦM0 .x(t) in the partial sampling stage.
Then, we compare it with the first M0 measurements in y (t-1) and calculate the
difference y (t) ^d=y (t) ^M0-y (t-1) ^M0. In the frame analysis module, given y (t)
^d, we calculate its l ^2 norm normalized by M0 and compare with two thresholds T1
and T2 (T1<T2).
MRITS, DEPARTMENT OF ECE 33
Adaptive compressed sensing of speech signal
If y (t) ^d/M0<=T1, the current frame is almost the same as its previous
frame.We consider the two neighbouring frames may be both surd and label the intra-
frame correlation as surd vs. surd. If T1< y (t) ^d/M0<=T2, it indicates that these two
neighbouring frames undergo small changes. In this situation, the two neighbouring
frames may be both sonant at high probability and the intra-correlation is labelled as
sonant vs. sonant. If y(t)^d/M0>T0, the two frames are significantly different from
each other, which is most likely due to the change of the frame type, and we label the
correlation as surd vs. sonant.
5.3 Partial Sampling
For each frame in a speech sequence, we first collect a small number of
projections, and compare it with the projections collected for the previous frame.
Based on the comparison results, we estimate the correlation between these two
frames, and classify the correlation into different categories. We then adjust the
sampling strategy according to the correlation type and collect different number of
samples for the current frame. The next sections discuss details of each step in the
above framework.
For the current t th frame original speech signal, we represent it as x t . Its
previous frame t-1 is represented using xt-1. The difference between xt and xt -1
reflects the correlation between the two neighboring frames, and can be used to
classify the correlation. Since xt- xt-1 is not available at the sampling stage. We use the
collected measurements to estimate the correlation instead. The same projection
matrix Φ is applied to all frames in the partial sampling stage, and we have y t-yt-1=φxt-
φxt-1 , where yt and yt-1 are the projection vectors of xt and xt-1 respectively. As each
sample in is a linear combination of 1 tt the difference between the two projection
vectors also reflects the intensity changes in the two frame. Therefore, we can
estimate the amount of intensity changes in the two frames using only a small number
of projections.
Let ΦMo be a matrix containing the first Mo rows of the Gaussian random
matrix Φ . For the current frame t, we first use ΦMo to collect Mo measurements ytMo
=ΦMoxt in the partial sampling stage. Then, we compare it with the first Mo
measurements in yt-1 and calculate the difference ytd=yt
Mo-yt-1Mo. In the frame analysis
MRITS, DEPARTMENT OF ECE 34
Adaptive compressed sensing of speech signal
module, given ydt, we calculate its l2 norm normalized by Mo and compare with two
thresholds T1 and T2 .
If ||yd||2/Mo<= T1 the current frame is almost the same as its previous frame.
We consider the two neighboring frames may be both surd and label the intra-frame
correlation as surd vs. surd. If T1 < ||yd||2/Mo<= T2 it indicates that these two
neighboring frames undergo small changes. In this situation, the two neighboring
frames may be both sonant at high probability and the intra-correlation is labeled as
sonant vs. sonant. If , ||yd||2/Mo >T1 the two frames are significantly different from
each other, which is most likely due to the change of the frame type, and we label the
correlation as surd vs. sonant. The correlation between the two neighbor frames, and
can be used classify the correlation.
5.4 Adaptive Sampling
Depending on their classified intra-frame correlation types, different number
of projections is used for the speech frames. We consider the frame as surd frame if
its intra-frame correlation type is surd vs. surd. A surd frame contains the least new
information in the speech. Thus, the M0 measurements collected in the partial
sampling stage are sufficient and we do not need additional sampling. When its intra-
frame correlation is sonant vs. sonant, the frame is considered as sonant and contains
some new information, which requires more measurements to be collected.
For such frames, we collect M1 (M1 >M0) measurements. We use the
(M0+1)th to the M1th rows of the Gaussian random matrix Φ and combine with M0
to generate the final projection vector yt . The frames that experience large changes
must contain the most new information. Therefore, we collect a total of M2
(M2>M1>M0) measurements during the sampling process. The total projection
matrix is the first M2 rows of the Gaussian random matrix Φ.
MRITS, DEPARTMENT OF ECE 35
Adaptive compressed sensing of speech signal
5.5 Reconstruction
The original signal is reconstructed from a significant small numbers of
samples by using different optimization techniques such as l1-norm or convex
optimization. In this project we have used orthogonal matching pursuit (OMP) which
gives better results when compared to other optimization techniques. Reconstructing
frame by frame of signal using OMP method. OMP uses sub Gaussian measurement
matrices to reconstruct sparse signals. If Φ is such a measurement matrix, then Φ Φx∗
is in a loose sense close to the identity.
Therefore one would expect the largest coordinate of the observation vector y
= Φ Φx to correspond to a non-zero entry of x. Thus one coordinate for the support∗
of the signal x is estimated. Subtracting of that contribution from the observation
vector y and repeating eventually yields the entire support of the signal x. OMP is
quite fast, both in theory and in practice, but its guarantees are not as strong as those
of Basis Pursuit. The algorithm’s simplicity enables a fast runtime.
The algorithm iterates s times, and each iteration does a selection through d
elements, multiplies by Φ Φx and solves a least squares problem. Reconstruction∗
algorithms compute the support of the sparse signal x iteratively. Once the support of
the signal is compute correctly, the pseudo-inverse of the measurement matrix
restricted to the corresponding columns can be used to reconstruct the actual signal x.
The clear advantage to this approach is speed, but the approach also presents new
challenges.
5.5.1 Orthogonal Matching Pursuit
Orthogonal Matching Pursuit (OMP), put forth by Mallrat and his
collaborators and analyzed by Gilbert and Troop .OMP uses sub Gaussian
measurement matrices to reconstruct sparse signals. If Φ is such a measurement
matrix, then Φ Φ is in a loose sense close to the identity. Therefore one would expect∗
the largest coordinate of the observation vector y = Φ Φx to correspond to a non-zero∗
entry of x. Thus one coordinate for the support of the signal x is estimated.
Subtracting o that contribution from the observation vector y and repeatingff
eventually yields the entire support of the signal x.
MRITS, DEPARTMENT OF ECE 36
Adaptive compressed sensing of speech signal
OMP is quite fast, both in theory and in practice, but its guarantees are not as
strong as those of Basis Pursuit.The algorithm’s simplicity enables a fast runtime. The
algorithm iterates s times,and each iteration does a selection through d elements,
multiplies by Φ Φ, and solves a least squares problem. The selection can easily be∗
done in O(d) time, and the multiplication of Φ Φ in the general case takes O(md).∗
When Φ is an unstructured matrix, the cost of solving the least squares problem is
O(s2d). However, maintaining a QR-Factorization of Φ|I and using the modified
Gram-Schmidt algorithm reduces this time to O(|I|d) at each iteration.
Using this method, the overall cost of OMP becomes O(smd). In the case
where the measurement matrix Φ is structured with a ast-multiply, this can clearly be
improved.
5.5.2. Stagewise Orthogonal Matching Pursuit
An alternative greedy approach, Stagewise Orthogonal Matching Pursuit
(StOMP) developed and analyzed by Donoho and his collaborators, uses ideas
inspired by wireless communications. As in OMP, StOMP utilizes the observation
vector y = Φ u.Where u = Φx is the measurement vector. However, instead of simply∗
selecting the largest component of the vector y, it selects all of the coordinates whose
values are above a specified threshold. It then solves a least-squares problem to update
the residual. The algorithm iterates through only a fixed number of stages and then
terminates, whereas OMP requires s iterations where s is the sparsity level.The
thresholding strategy is designed so that many terms enter at each stage, and so that
algorithm halts after a fixed number of iterations.
The formal noise level σk is proportional the Euclidean norm of the
residual at that iteration. This method appears to provide slightly weaker results. It
appears however,that StOMP outperforms OMP and Basis Pursuit in some cases.
Although the structure of StOMP is similar to that of OMP, because StOMP selects
many coordinates at each state, the runtime is quite improved. Indeed, using iterative
methods to solve the least-squares problem yields a runtime bound of CNsd + O(d),
where N is the fixed number of iterations run by StOMP, and C is a constant that
depends only on the accuracy level of the least-squares problem.
MRITS, DEPARTMENT OF ECE 37
Adaptive compressed sensing of speech signal
5.5.3 Regularized Orthogonal Matching Pursuit
As is now evident, the two approaches to compressed sensing each presented
disjoint advantages and challenges. While the optimization method provides
robustness and uniform guarantees, it lacks the speed of the greedy approach. The
greedy methods on the other hand had not been able to provide the strong guarantees
of Basis Pursuit. This changed when we developed a new greedy algorithm,
Regularized Orthogonal Matching Pursuit , that provided the strong guarantees of the
optimization method.
This work bridged the gap between the two approaches, and provided the first
algorithm possessing the advantages of both approaches. Regularized Orthogonal
Matching Pursuit (ROMP) is a greedy algorithm, but will correctly recover any sparse
signal using any measurement matrix that satisfies the Restricted Isometry Condition.
Again as in the case of OMP, we will use the observation vector Φ Φx as a good∗
local approximation to the s-sparse signal x. Since the Restricted Isometry Condition
guarantees that every s columns of Φ are close to an orthonormal system, we will
choose at each iteration not just one coordinate as in OMP, but up to s coordinates
using the observation vector. It will then be okay to choose some incorrect
coordinates, so long as the number of those is limited.
To ensure that we do not select too many incorrect coordinates at
each iteration, we include a regularization step which will guarantee that each
coordinate selected contains an even share of the information about the signal. We
remark here that knowledge about the sparsity level s is required in ROMP, as in
OMP. Clearly in the case where the signal is not exactly sparse and the signal and
measurements are corrupted with noise, the algorithm as described above will never
halt. Thus in the noisy case, we simply change the halting criteria by allowing the
algorithm iterate at most s times, or until |I| ≥ s. We show below that with this
modification ROMP approximately reconstructs arbitrary signals.
MRITS, DEPARTMENT OF ECE 38
Adaptive compressed sensing of speech signal
5.6 Compressed Sensing Using DCT
The Algorithm for the compressed sensing by applying discrete cosine
transform to make the signal sparse is as given below