frame based adaptive compressed sensing of speech signal

Adaptive compressed sensing of speech signal

CHAPTER-1

INTRODUCTION

Compressive sensing is an emerging and revolutionary technology that

strongly relies on the sparsity of the signal. In compressive sensing the signal is

compressively sampled by taking a small number of random projections of the signal,

which contain most of the salient information. Compressive sensing has been

previously applied in areas like image processing, radar systems and sonar systems.

And now it is being used in speech processing as advanced technique to acquiring the

data.

The key objective in compressed sensing (also referred to as sparse signal

recovery or compressive sampling) is to reconstruct a signal accurately and e cientlyffi

from a set of few non-adaptive linear measurements. Of course, linear algebra easily

shows that in general it is not possible to reconstruct an arbitrary signal from an

incomplete set of linear measurements. Thus one must restrict the domain in which

the signals belong. To this end, we consider sparse signals, those with few non-zero

coordinates. It is now known that many signals such as real-world images or audio

signals are sparse .

Since sparse signals lie in a lower dimensional space, one would think that

they may be represented by few linear measurements. This is indeed correct, but the

di culty is determining in which lower dimensional subspace such a signal lies. Thatffi

is, we may know that the signal has few non-zero coordinates, but we do not know

which coordinates those are. It is thus clear that we may not reconstruct such signals

using a simple linear operator, and that the recovery requires more sophisticated

techniques.

The compressed sensing field has provided many recovery algorithms, most

with provable as well as empirical results. There are several important traits that an

optimal recovery algorithm must possess. The algorithm needs to be fast, so that it can

e ciently recover signals in practice. Of course, minimal storage requirements asffi

well would be ideal.

MRITS, DEPARTMENT OF ECE 1


The algorithm should provide uniform guarantees, meaning that given a

specific method of acquiring linear measurements, the algorithm recovers all sparse

signals (possibly with high probability).Ideally, the algorithm would require as few

linear measurements as possible. However, recovery using only this property would

require searching through the exponentially large set of all possible lower dimensional

subspaces, and so in practice is not numerically feasible. Thus in the more realistic

setting, we may need slightly more measurements. Finally, we wish our ideal

recovery algorithm to be stable.

This means that if the signal or its measurements are perturbed slightly, then

the recovery should still be approximately accurate. This is essential, since in practice

we often encounter not only noisy signals or measurements, but also signals that are

not exactly sparse, but close to being sparse.The conventional scheme in signal

processing, acquiring the entire signal and then compressing it, was questioned by

Donoho. Indeed, this technique uses tremendous resources to acquire often very large

signals, just to throw away information during compression.

The natural question then is whether we can combine these two processes, and

directly sense the signal or its essential parts using few linear measurements. Recent

work in compressed sensing has answered this question in positive, and the field

continues to rapidly produce encouraging results.

1.1 Objective

Compressed sensing (CS) is an emerging signal acquisition theory that directly

collects signals in a compressed form if they are sparse on some certain basis. It

originates from the idea that it is not necessary to invest a lot of power into observing

the entries of a sparse signal in all coordinates when most of them are zero anyway.

Rather it should be possible to collect only a small number of measurements that still

allow for reconstruction. This is potentially useful in applications where one cannot

afford to collect or transmit a lot of measurements but has rich resources at the

decoder.



Observing that different kind speech frames have different intra-frame

correlations, a frame-based adaptive compressed sensing framework for speech

signals has been proposed. The objective of this project is to further improve the

performance of the existing compressed sensing process that uses non adaptive

projection matrix, by using the adaptive projection matrix based on frame analysis.

Average-frame signal-to-noise ratio (AFSNR) is calculated to evaluate the

performance of the frame-based adaptive CS with the non-adaptive CS.

1.2 Existing System

Compressed sensing is the technique used to overcome the constraints of

conventional sampling theorem. The compressed sensing allows us to go beyond the

nyquist rate and sample the signal below nyquist frequency. In a typical

communication system, the signal is sampled at least at twice the highest frequency

contained in the signal. However, this limits efficient ways to compress the signal, as

it places a huge burden on sampling the entire signal while only a small number of the

transform coefficients are needed to represent the signal.

On the other hand, compressive sampling provides a new way to reconstruct

the original signal from a minimal number of observations. CS is a sampling

paradigm that allows us to go beyond the Shannon limit by exploiting the sparsity

structure of the signal. It allows us to capture and represent the compressible signals

at a rate significantly below the Nyquist rate. The signal is then reconstructed from

these projections by using different optimization techniques.

During compressive sampling only the important information about a signal is

acquired, rather than acquiring the important information plus the information of a

signal which will be eventually discarded at the receiver.But the existing compressed

sensing uses non adaptive projection matrix and takes the same number of projections

for all the frames. This leads to degradation in system’s efficiency.

1.3 Proposed System

The efficiency of the conventional non adaptive compressed sensing can be

increased by using the Adaptive projection matrix. The adaptive projection matrix



uses different projections for different frames based on their inter frame correlations

thus improving the efficiency of the system.

Most work in CS research focus on random projection matrix which is

constructed by considering only the signals’ sparsity rather than other properties. In

other word, the construction of projection matrix is non-adaptive.

Observing that different kind speech frames have different intra-frame

correlations, a frame-based adaptive compressed sensing framework, which applies

adaptive projection matrix for speech signals, has been proposed.To do so,

neighbouring frames are compared to estimate their intra-frame correlation, every

frame is classified into different categories, and the number of projections for each

frame is adjusted accordingly.The experimental results show that the adaptive

projection matrix can significantly improve the speech reconstruction quality.



CHAPTER-2

LITERATURE SURVEY

According to information theory, the bit rate at which the condition of

distortion less transmission of any source signal is possible is determined by the

entropy of the speech source message. However, that in practical terms the source rate

corresponding to the entropy is only asymptotically achievable, for the encoding

memory length or delay tends to infinity. Any further compression is associated with

information loss or coding distortion. Many practical source compression techniques

employ lossy coding, which typically guarantees further bit rate economy at the cost

of nearly imperceptible speech, audio, video, and other source representation

degradation.

Note that the optimum Shannonian source encoder generates a perfectly

uncorrelated source-coded stream, in which all the source redundancy has been

removed. Therefore, the encoded source symbols which in most practical cases are

constituted by binary bits are independent, and each one has the same significance.

Having the same significance implies that the corruption of any of the source-encoded

symbols results in identical source signal distortion over imperfect channels. Under

these conditions, according to Shannon's fundamental work the best protection against

transmission errors is achieved if source and channel coding are treated as separate

entities. When using a block code of length N channel-coded symbols in order to

encode K source symbols with a coding rate of R = K/N, the symbol error rate can be

rendered arbitrarily low if N tends to infinity .

2.1 Speech Production

Speech is a natural form of communication for human beings, we use speech

every day almost unconsciously, but an understanding of the mechanisms on which it

is based will help to clarify how the brain processes information. Figure 2.1 shows the

process of human speech production.



Figure 2.1: Human Speech Production System

Human speech production comprises lungs, vocal cords, and the vocal tract.

The vocal cords are expressed as a simple vibration model, and the pitch of the speech

changes according to adjustments in the tension of the vocal cords. Speech is

generated by emitting sound pressure waves, radiated primarily from the lips,

although significant energy emanates through sounds from the nostrils, throat, and the

like.

The air compressed by the lungs excites the vocal cords in two typical modes.

When generating voiced sounds, the vocal cords vibrate and generate a high-energy

quasi-periodic speech waveform, while in the case of lower energy unvoiced sounds,

the vocal cords do not participate in the voice production and the source behaves

similarly to a noise generator.

In a somewhat simplistic approach, the excitation signal denoted by E (z) is

then filtered through the vocal apparatus, which behaves like a spectral shaping filter

with a transfer function of H (z) that is constituted by the spectral shaping action of

the glottis, which is defined as the opening between the vocal folds. Further spectral

shaping is carried out by the vocal tract, lip radiation characteristics, and so on.



The human speech in its pristine form is an acoustic signal. For the purpose of

communication and storage, it is necessary to convert it into an electrical signal. This

is accomplished with the help of certain instruments called ‘transducers’.

This electrical representation of speech has certain properties.

1. It is a one-dimensional signal, with time as its independent variable.

2. It is random in nature.

3. It is non-stationary, i.e. the frequency spectrum is not constant in time.

Microphone is a transeducer that converts the acoustic speech signal into

elictrical signal .The micro phone receives the acoustic voice signal and produces the

electrical signal as output whose amplitude is proportional to the input acoustic voice

signal intensity.

Figure 2.2: block diagram of micro phone

The electrical signal which is produced by the micro phone is a analog signal

whose amplitude various continuously with time and it is continuous both in time and

amplitude as shown in figure below,

Figure 2.3: Analog speech signal



2.2 Digitization of Speech

Speech is a very basic way for humans to convey information to one another.

With a bandwidth of only 4 kHz, speech can convey information with the emotion of

a human voice. People want to be able to hear someone’s voice from anywhere in the

world as if the person was in the same room .As a result a greater emphasis is being

placed on the design of new and efficient speech coders for voice communication and

transmission. Today applications of speech coding and compression have become

very numerous.

Though the electrical signal which is coming from microphone can be

processed in its original analog form it is not as much efficient as it is processed in

digital form. With the advent of digital computing machines, it was propounded to

exploit the powers of the same for processing of speech signals. This required a

digital representation of speech. To achieve this, the analog signal is sampled at some

frequency and then quantized at discrete levels. There are many advantages in

processing the signal in digital than in analog form. The analog speech signal is

converted into digital form by using Analog to Digital Converter (ADC).

Figure 2.4: Block diagram of Analog to digital converter

As shown in the figure above the analog to digital converter consists of three

basic functions.

1. Sampling

2. Quantization

3. Encoding



2.2.1 Sampling

Sampling process is used to convert the continuous time signal to discrete time

signal. This is achieved by multiplying the input continuous signal with an impulse

train of unity magnitude. Here the impulse train frequency which is also known as

sampling frequency should be greater than twice the highest frequency component

present in the input analog signal (Fs > = 2Fm), and this rate is called as nyquist

rate( nyquist condition for sampling),

Figure 2.5: Block diagram of Sampler

The output of sampler is a discrete time signal in which the samples present at

only discrete intervals of time.

2.2.2 Quantization

The output obtained from the sampling process is discrete in only time but its

amplitude values are still assume continuous values. In order to make it discrete ,

quantization is used.

Figure 2.6: Block diagram of Quantizer

The quantizer assigns discrete values to the input samples by either Rounding

off process or Truncation process. In rounding off process the values are assign to the

nearest integer multiple of step size. Where as in truncation process the values are

assigned by truncating the values above the integral multiple of step size. When



compared Rounding off process the truncation process produces more error.Thus the

output of quantizer is a signal which is discrete both in time and amplitude.

2.2.3 Encoding

The encoder is used to assign binary values to the sample values coming from

the quantizer.

Figure 2.7: Block diagram of Encoder

Thus the out from the encoder is pure digital signal represented by zeros and

ones.

2.2.4 Anti-Aliasing Filter

Apart from the three building blocks sampler, quantizer and encoder discussed

above the Analog to Digital Converter also contains a low pass filter called Anti-

Aliasing filter. Anti-aliasing low-pass filtering (LPF) is necessary in order to band

limit the signal to a bandwidth of B before sampling. In case of speech signals, about

1% of the energy resides above 4 kHz and only a negligible proportion above 7 kHz.

Hence, commendatory quality speech links, which are also often referred to as

wideband speech systems, typically band limit the speech signal to 7-8 kHz.

Conventional telephone systems usually employ a bandwidth limitation of 0.3-3.4

kHz, which results only in a minor speech degradation, hardly perceptible by the

listener.

2.3 Speech Signal Analysis

In contrast to deterministic signals, random signals, such as speech, music,

video, and other information signals, cannot be described by the help of analytical

formulas. They are typically characterized by the help of statistical functions. The

power spectral density (PSD), autocorrelation function (ACF), cumulative distribution



function (CDF), and probability density function (PDF) are some of the most frequent

ones invoked.A typical speech sentence signal consists of two main parts.

one carries the speech information, and the other includes silent or noise

sections that are between the utterances, without any verbal information. The verbal

(informative) part of speech can be further divided into two categories namely Speech

and Unvoiced Speech.

2.3.1 Voiced and Non Voiced Speech

In speech there are two major types of excitation, voiced and unvoiced.

Voiced speech consists mainly of vowel sounds. It is produced by forcing air through

the glottis, proper adjustment of the tension of the vocal cords results in opening and

closing of the cords, and a production of almost periodic pulses of air. These pulses

excite the vocal tract. Psychoacoustics experiments show that this part holds most of

the information of the speech and thus holds the keys for characterizing a speaker.

Unvoiced speech sections are generated by forcing air through a constriction formed

at a point in the vocal tract (usually toward the mouth end), thus producing

turbulence. Being able to distinguish between the three is very important for speech

signal analysis.

Voiced speech tends to be periodic in nature. Examples of voiced sounds are

English vowels, such as the /a/ in .bay and the /e/ in .see.Since unvoiced speech is due

to turbulence, the speech is aperiodic and has a noise-like structure. Some examples

of unvoiced English sounds are the /s/ in .so and the /h/ in .he.. In general at least 90%

of the speech energy is always retained in the first N/2 transform coefficients, if the

speech is a voiced frame.

However, for an unvoiced frame the energy is spread across several frequency

bands and typically the first N/2 coefficients holds less than 40% of the total energy.

Due to this wavelets are inefficient at coding unvoiced speech. Unvoiced speech

frames are infrequent. By detecting unvoiced speech frames and directly encoding

them (perhaps using entropy coding), no unvoiced data is lost and the quality of the

compressed speech will remain transparent.

Typical voiced and unvoiced speech waveform segments are shown in Figures

2.8 and 2.9 respectively, along with their corresponding power densities. Clearly, the



unvoiced segment appears to have a significantly lower magnitude, which is also

reflected by its PSD.

Figure 2.8: Voice speech segment and its PSD

Figure 2.9: Unvoiced speech segment and its PSD

The voiced segment shown in Figure 2.10 is quasi-periodic in the time

domain, and it has an approximately 80-sample periodicity, identified by the positions

of the largest time-domain signal peaks, which corresponds to10 ms. This interval is

referred to as the pitch period and it is also often expressed in terms of the pitch

frequency p, which in this example 1/ (10 ms) = 100 Hz. In the case of male speakers,



the typical pitch frequency range is between 40 and 120 Hz , whereas for females it

can be as high as 300-400 Hz.

Observe furthermore that within each pitch period there is a gradually

decaying oscillation, which is associated with the excitation and gradually decaying

vibration of the vocal cords.

A perfectly periodic time-domain signal would have a line

spectrum, but since the voiced speech signal is quasi-periodic with a frequency of p

(rather than being perfectly periodic) its spectrum exhibits somewhat widened but

distinctive spectral needles at frequencies of np, rather than being perfectly periodic.

As a second phenomenon, it can also be observed that three, sometimes four, spectral

envelope peaks. In voiced spectrum of Figure 2.8 these formant frequencies are

observable around 500 Hz, 1500 Hz, and 2700 Hz, and they are the manifestation of

the resonances of the vocal tract at these frequencies. In contrast, the unvoiced

segment of Figure 2.9 does not have a formant structure, rather it has a more

dominant high-pass nature, exhibiting a peak around 2500 Hz. Observe, furthermore,

that its energy is much lower than that of the voiced segment .

It is equally instructive to study the ACF of voiced and unvoiced

segments, which are portrayed on an expanded scale in Figure 2.10 and Figure 2.11

respectively.

Figure 2.10: voice speech segment and its ACF



The voiced ACF shows a set of periodic peaks at displacements of about 20

samples, corresponding to 2.5 ms, which coincides with the positive quasi-periodic

time-domain segments. Following four monotonously decaying peaks, there is a more

dominant one around a displacement of 80 samples, which indicates the pitch

periodicity.

Figure 2.11: unvoiced speech segment and its ACF

The periodic nature of the ACF can therefore, for example, be exploited to

detect and measure the pitch periodicity in a range of applications, such as speech

codec’s and voice activity detectors. Observe, however, that the first peak at a

displacement of 20 samples is about as high as the one near 80.Hence, a reliable pitch

detector has to attempt to identify and rank all these peaks in order of prominence,

exploiting also the a priori knowledge as to the expected range of pitch frequencies.

By contrast, the unvoiced segment has a much more rapidly decaying ACF, indicating

no inherent correlation between adjacent samples and no long-term periodicity. The

voiced speech signal and unvoiced speech signal have some distinct characteristic

features that enable us to distinguish between voiced speech and unvoiced speech.

2.3.2 Zero Crossing Rate

The rate at which the speech signal crosses zero can provide information about

the source of its creation. It is well known that unvoiced speech has a much higher

ZCR than voiced speech this is because most of the energy in unvoiced speech is



found in higher frequencies than in voiced speech, implying a higher ZCR for the

former.

2.3.3 Cross-Correlation

Cross-correlation is calculated between two consecutive pitch cycles. The

cross-correlation values between pitch cycles are higher (close to 1) in voiced speech

than in unvoiced speech.



CHAPTER-3

SPEECH COMPRESSION

In the recent years, large scale information transfer by remote computing and

the development of massive storage and retrieval systems have witnessed a

tremendous growth. To cope up with the growth in the size of databases, additional

storage devices need to be installed and the modems and multiplexers have to be

continuously upgraded in order to permit large amounts of data transfer between

computers and remote terminals. This leads to an increase in the cost as well as

equipment. One solution to these problems is “COMPRESSION” where the database

and the transmission sequence can be encoded efficiently.

3.1 Fourier Analysis

Historically, Fourier Transform has been the most widely used tool for signal

processing. As signal processing began spreading its tentacles and encompassing

newer signals, Fourier Transform was found to be unable to satisfy the growing need

for processing a bulk of signals.And therefore is had been replaced by wavelet

transform.

A major drawback of Fourier analysis is that in transforming to the frequency

domain, the time domain information is lost. The most important difference between

Fourier Transform and wavelet transform is that individual wavelet functions are

localized in space. In contrast Fourier sine and cosine functions are non-local and are

active for all time t.

3.2 Continuous Wavelet Transform (CWT)

The drawbacks inherent in the Fourier methods are overcome with wavelets.

Consider a real or complex-valued continuous-time function ψ(t) with the following

properties

1. The function integrates to zero



∫−∞

∞

ψ ( t ) . d ( t )=0

2. It is square integrable or, equivalently has finite energy

∫−∞

∞

∣ψ ( t )∣2 .d (t )<∞

A function is called mother wavelet if it satisfies these two properties. There is

infinity of functions that satisfy these properties and thus qualify to be mother

wavelet. The simplest of them is the ‘Haar wavelet’. Some other wavelets are

Mexican hat, Morlet. Apart from this, there are various families of wavelets. Some of

the families are daubechies family, symlet family, coiflet family etc.

Consider the following figure which juxtaposes a sinusoid and a wavelet

Figure 3.1: comparing sine wave and a wavelet

wavelet is a waveform of effectively limited duration that has an average value

of zero. Compare wavelets with sine waves, which are the basis of Fourier analysis,

Sinusoids do not have limited duration,they extend from minus to plus infinity. And

while sinusoids are smooth and predictable, wavelets tend to be irregular and

asymmetric.Fourier analysis consists of breaking up a signal into sine waves of

various Frequencies .Similarly, wavelet analysis is the breaking up of a signal into

shifted and scaled versions of the original (or mother) wavelet.



Figure 3.2: demonstrating the decomposition of a signal into

wavelets

The above diagram suggests the existence of a synthesis equation to represent

the original signal as a linear combination of wavelets which are the basis function for

wavelet analysis (recollect that in Fourier analysis, the basis functions are sines and

cosines). This is indeed the case. The wavelets in the synthesis equation are multiplied

by scalars

3.3 Discrete Wavelet Transform (DWT)

Calculating wavelet coefficients at every possible scale (for continuous WT) is

a fair amount of work, and it generates lot of data. If scales and positions are choosed

based on powers of two then analysis will be much more efficient and just as accurate.

Such analysis can be obtained from the discrete wavelet transform (DWT).

3.3.1 Vanishing Moments

The number of vanishing moments of a wavelet indicates the smoothness of

the wavelet function as well as the flatness of the frequency response of the wavelet

filters (filters used to compute the DWT).Typically a wavelet with p vanishing

moments satisfies the following equation,

∫−∞

∞

tmψ (t )dt=0 For m=0....p-1



Wavelets with a high number of vanishing moments lead to a more

compact signal representation and are hence useful in coding applications. However,

in general, the length of the filters increases with the number of vanishing moments

and the complexity of computing the DWT coefficients increases with the size of the

wavelet filters.

3.3.2 Fast Wavelet Transform

The Discrete Wavelet Transform (DWT) coefficients can be computed by

using Mallet’s Fast Wavelet Transform algorithm. This algorithm is sometimes

referred to as the two-channel sub band coder and involves filtering the input signal

based on the wavelet function used. To explain the implementation of the Fast

Wavelet Transform algorithm consider the following equations:

1. Φ (t) =∑k

c (k ) Φ (2t-k)

2. Ψ (t) =∑k

(−1 )kc (1−k )Φ (2 t−k )

The first equation is known as the twin-scale relation (or the dilation equation)

and defines the scaling function. The next equation expresses the wavelet in terms of

the scaling function. These equations represent the impulse response coefficients for a

low pass filter of length 2N, with a sum of 1 and a norm of 1/squrerootof(2). The high

pass filter is obtained from the low pass filter using the relationship

gk=(−1 )k c (1−k )

Where k varies over the range (1-(2N-1)) to 1.

3.4 Wavelets and Speech Compression

The idea behind signal compression using wavelets is primarily linked

to the relative scarceness of the wavelet domain representation for the signal.



Wavelets concentrate speech information (energy and perception) into a few

neighbouring coefficients. Therefore as a result of taking the wavelet transform of a

signal, many coefficients will either be zero or have negligible magnitudes.

Another factor that comes into picture is taken from psycho acoustic studies.

Since our ears are more sensitive to low frequencies than high frequencies and our

hearing threshold is very high in the high frequency regions, a method for

compression is used by means of which the detail coefficients (corresponding to high

frequency components) of wavelet transforms are thresholded such that the error due

to thresholding is inaudible to our ears. Since some of the high frequency components

are discarded, a smoothened output signal is expected , as is shown in the following

figure

Figure 3.3 original signal and compressed signal using DWT

In summary, the notion behind compression is based on the concept that the

regular Signal component can be accurately approximated using a small Number of

approximation coefficients (at a suitably chosen level) and some of the detail

Coefficients.Data compression is then achieved by treating small valued coefficients

as insignificant data and thus discarding them. The process of compressing a speech

signal using Wavelets involves a number of different stages, each of which is

discussed below.

3.4.1 Choice of Wavelet



The choice of the mother-wavelet function used in designing high quality

speech coders is of prime importance. Choosing a wavelet that has compact support in

both time and frequency in addition to a significant number of vanishing moments is

essential for an optimum wavelet speech compressor.

This is followed very closely by the Daubechies D20, D12, D10 or D8

wavelets, all concentrating more than 96% of the signal energy in the Level 1

approximation coefficients. Wavelets with more vanishing moments provide better

reconstruction quality, as they introduce less distortion into the processed speech and

concentrate more signal energy in a few neighbouring coefficients.

3.4.2 Wavelet Decomposition

Wavelets work by decomposing a signal into different resolutions or

frequency bands, and this task is carried out by choosing the wavelet function and

computing the Discrete Wavelet Transform (DWT). Signal compression is based on

the concept that selecting a small number of approximation coefficients (at a suitably

chosen level) and some of the detail coefficients can accurately represent regular

signal components. Choosing a decomposition level for the DWT usually depends on

the type of signal being analyzed or some other suitable criterion such as entropy. For

the processing of speech signals decomposition up to scale 5 is adequate, with no

further advantage gained in processing beyond scale 5.

3.4.3 Truncation of Coefficients

After calculating the wavelet transform of the speech signal, compression

involves truncating wavelet coefficients below a threshold this means that most of the

speech energy is in the high-valued coefficients, which are few. Thus the small valued

coefficients can be truncated or zeroed and then be used to reconstruct the signal. This

compression scheme provided a segmental signal-to-noise ratio (SEGSNR) of 20 dB,

with only 10% of the coefficients.Two different approaches are available for

calculating thresholds. The first, known as Global Thresholding involves taking the

wavelet expansion of the signal and keeping the largest absolute value coefficients.

In this case you can manually set a global threshold, a compression

performance or a relative square norm recovery performance. Thus only a single

parameter needs to be selected. The second approach known as By Level



Thresholding consists of applying visually determined level dependent thresholds to

each decomposition level in the wavelet transform.



3.4.4 Encoding Coefficients

Signal compression is achieved by first truncating small-valued coefficients

and then efficiently encoding them. One way of approach to compression is to encode

consecutive zero valued coefficients, with two bytes. One byte to indicate a sequence

of zeros in the wavelet transforms vector and the second byte representing the number

of consecutive zeros.

For further data compaction a suitable bit encoding format, can be used to

quantize and transmit the data at low bit rates. A low bit rate representation can be

achieved by using an entropy coder like Huffman coding or arithmetic coding.



CHAPTER-4

COMPRESSIVE SENSING

The theory of compressive sensing was developed by Candes et al and

Donoho. Compressed sensing (CS) also named compressive sampling .In a typical

communication system, the signal is sampled at least at twice the highest frequency

contained in the signal. However, this limits efficient ways to compress the signal, as

it places a huge burden on sampling the entire signal while only a small number of the

transform coefficients are needed to represent the signal. On the other hand,

compressive sampling provides a new way to reconstruct the original signal from a

minimal number of observations. CS is a sampling paradigm that allows to go beyond

the Shannon limit by exploiting the sparsity structure of the signal. It allows to

capture and represent the compressible signals at a rate significantly below the

Nyquist rate.

The signal is then reconstructed from these projections by using

different optimization techniques.During compressive sampling only the important

information about a signal is acquired, rather than acquiring the important information

plus the information of a signal which will be eventually discarded at the receiver.

The key elements that need to be addressed before using compressive sensing are the

following

how to find the transform domain in which the signal has a sparse

representation,

how to effectively sample the sparsely signal in the time domain,

how to recover the original signal from the samples by using

optimization techniques.

In summary, the large amount of data needed to sample at the Nyquist rate,

especially for speech, image and video signals, motivates the study of compressive

sensing as a feasible solution for future mobile communication systems. Sparse

signals are defined as signals that can be represented by a limited number of data



points in the transform domain. Many real-world signals can be classified into this

category using an appropriate transform domain. For instance, if signal is a sine, it is

clearly not sparse, but its Fourier transform is extremely sparse. The consequences of

acquiring large amounts of data, added to the overhead of compression can be

improved by using compressive sensing. As a result, there are potential savings in

terms of energy, memory and processing.

4.1 Signal Sparsity

Sparsity of the signal represents the presence of signal transform coefficients

less densely.Sparsity allows to reconstruct the signal with less number of projections

(samples).The procedure used to ensure the sparsity of the signal is called transform

coding, which is performed by the following four steps

1. Full N-points of a signal x is obtained using the Nyquist rate ,

2. Complete set of transform coefficients (DFT) is obtained,

3. Located the K largest coefficients and threw away the smallest coefficients

4. Multipled the signal by the measurement matrix to obtain the observation

vector of length M.

Figure 4.1 shows an example of how compressive sensing can be used to

compress a signal below Nyquist rate. In this example, the original sampled signal is

composed of 300 samples. The intent is to reconstruct the signal using only 30

samples. Figure 4.1 (a) shows the time domain representation of the sampled signal.

From this figure, it is evident that by selecting only 30 samples (red dots) from the

300 samples it will be impossible to reconstruct the original signal perfectly. On the

other hand, by applying compressive sensing to the frequency representation of the

signal it is possible to perfectly reconstruct it from a significant small number of

samples.

In order to achieve this goal it is necessary to implement optimization

techniques. However, not every optimization technique can be used for this purpose.

For example, Figure 4.1(c) represents the reconstructed spectrum using the l2

minimization.Clearly; there are significant differences between the signal in Figure

4.1(b) and the signal from Figure 4.1(c).



Figure 4.1: (a) Time domain representation of the signal composed of 300

samples,

(b) Fourier spectrum of the signal to be encoded

(c) Reconstruction of the Fourier spectrum via l2 minimization

(d) Reconstruction of the Fourier spectrum via l1 minimization

In contrast, the reconstruction using the l1 minimization gives as a

result a perfect reconstruction. This can be clearly seen by comparing Figure 4.1(b)

and Figure 4.1(d). In summary, optimization techniques based on l1 minimization are

desired when compressive sensing is used.



4.2 Measurement Matrix

In compresed sensing special emphasis is given to represent the signal with an

incoherent basis.The linear measurement process that computes M < N inner products

between x and the collection of vectors

{∅ j}j=1M via y j =< x, ∅ j>

Where ∅ is an M×N measurement matrix with each row been a measurement

vector. It has been seen that some of the measurement matrices can be used in any

scenario, in the sense that they are incoherent with any fixed basis Ψ such as Gabor,

spikes, sinusoidal and wavelets. The compressive sensing measurement process with

K-sparse coefficient vector x is depicted in Figure 4.2

Figure 4.2: Compressive sensing measurement process

The measurement matrix plays a vital role in the process of recovering the

original signal.There are two types of measurements matrices that can be used in

compressive sensing.The Random measurement matrix and the predefined

measurement matrix. The fundamental revelation is that, if a signal x composed of N

samples is sparse then the actual signal can be reconstructed using below formula

M ≥ O (Log (N/K))



Furthermore, x can be perfectly reconstructed using different optimization

techniques. If Φ is a structurally random matrix, its rows are not stochastically

independent because they are randomized from the same random seed vector. The

random matrix is transposed and then orthogonalized. This will have the effect of

creating a matrix that represents an orthonormal basis. In a predefined measurement

matrix, the matrix is created by using function like the Dirac functions and Sine

functions.

In this case, the signal is multiplied by several Dirac functions cantered at

different locations to obtain the observation vector. Then the speech signal can be

reconstructed using the l1 normalization method by using the observation vector and

the predefined measurement matrix. Linear programming is another procedure that

plays a vital role in reconstructing the original signal. It is a mathematical approach

designed to get the best outcome in a given mathematical model, which is a special

case of mathematical programming. Linear programming can be expressed in the

following canonical form

Maximize eT x

Subject to Ax ≤ b

Where x represents the variable that is to be determined, e and b are vectors of

coefficients and A is a matrix of coefficients. The above expression which has to be

maximized or minimized is called the objective function and the equation defines the

constraints over which the objective function has to be optimized. At the end the

reconstruction of the speech signal depends upon the observation vector and the

measurement matrix.

4.3 Signal Reconstruction in Compressive Sensing

Recent developments in signal theory have shown that a sparse signal is a

useful model in areas such as communications, radar and image processing. Therefore

the assumption that every signal can be represented in a sparse form has helped in the

compression of the signal of interest. The perfect reconstruction of a signal x depends

on the measurement matrix Φ and the measurement vector y.



The compressive sensing theory tells that when the matrix ΦΨ has the

Restricted Isometric Property (RIP) which are nearly orthonormal then it is possible to

recover the K largest significant coefficients from a similar size set of M=O(K

log(N/K)) measurements of y. As a result, the sparse signal can be reconstructed by

different optimization techniques such as l1 norm and convex optimization. The first

minimization technique which has been used to reconstruct the signal is the l1

minimization

(P1) min || x ||l1 Subject to Φx = y

This is also known as basis pursuit (P1). The goal of this technique is to find

the vectors with the smallest l1-norm

║X║l=∑i=1

n

∣ X i ∣

It is also known as Taxicab norm Manhattan norm. The name relates to the

distance a taxi has to drive in rectangular street grid to get from the origin to the point

x. The distance obtained from this norm is called the Manhattan distance or l1

distance. The other optimization technique known as convex optimization (cvx) will

solve many medium and small scale problems. By using cvx the signal is minimized

in order to reconstruct the original signal.

4.4 Optimization Techniques

Signal reconstruction plays an important role in compressive sensing theory

where the signal is reconstructed or recovered from a minimum number of

measurements. By using optimization techniques it is possible to recover the signal

without losing the information at the receiver.

4.4.1 l1 Minimization

A recent series of papers have developed a theory of signal recovery from

highly incomplete information. The results states that a sparse vector x ∈RN can be

recovered from a small number of linear measurement b=Ax∈RK, K<< N by solving a



convex program. l1 minimization is used to solve the under determined linear

equations or sparsely corrupted solution to an over determined equations.

Recently, l1 minimization has been proposed as a convex alternative to the

combinatorial norm l1, which simply counts the number of nonzero entries in a

vector, for synthesizing the signal as a sparse superposition of waveforms.The

program P1min || x ||1 subject to Ax=b, is also known as basis pursuit. The goal of this

program is to find the smallest l1 norm. This algorithm search is for a vector x, that

will explain the observation b. If the signal x is sufficiently sparse then (P1) will find

the norm of x by using the values of A and b. When x, A, and b have real valued

entries, (P1) can be recast as a linear program.

4.4.2 Matching Pursuit

Orthogonal matching pursuit (OMP) is a canonical greedy algorithm for a

sparse approximation. Let Φ represent a matrix of size M×N (where typically M < N)

and y denotes a vector in RM , the goal of OMP is to recover a coefficient vector x ∈

RN with roughly K < M non-zero terms so that Φx equals y exactly or approximately.

OMP is frequently used to find a sparse representations of the signal y ∈ RM in

settings where represents a dictionary for the signal space. It is also commonly used in

compressive sensing, where y=Φx represents compressive measurements of a sparse

signal x ∈RN to be recovered. One of the attractive features of OMP is its simplicity.

OMP is empirically competitive in terms of approximation performance.

4.4.3 Orthogonal Matching Pursuit (OMP)

In this project, signal is reconstructed frame by frame using OMP method.

OMP uses sub Gaussian measurement matrices to reconstruct sparse signals. If Φ is

such a measurement matrix, then Φ Φ is in a loose sense close to the identity.∗

Therefore the largest coordinate of the observation vector y = Φ Φx is expected to∗

correspond to a non-zero entry of x. Thus one coordinate for the support of the signal

x is estimated. Subtracting of that contribution from the observation vector y and

repeating eventually yields the entire support of the signal x. OMP is quite fast, both

in theory and in practice, but its guarantees are not as strong as those of Basis Pursuit.

The algorithm’s simplicity enables a fast runtime. The algorithm iterates s times, and



each iteration does a selection through d elements, multiplies by Φ Φx and solves a∗

least squares problem.



CHAPTER-5

ADAPTIVE COMPRESSIVE SENSING

In conventional compressed sensing process, the projection matrix which is

used to generate the required compressed signal is generated randomly and considered

to be fixed during the entire conversion process .That means the projection matrix is

non-adaptive. Though this process results in better performance when compared to

conventional sampling process, even better results can be obtained by using adaptive

projection matrix.

5.1 Adaptive Projection Matrix

Most work in CS research focus on random projection matrix which is

constructed by considering only the signals’ sparsity rather than other properties. In

other word, the construction of projection matrix is non-adaptive. Observing that

different kind speech frames have different intra-frame correlations, a frame-based

adaptive compressed sensing framework for speech signals has been proposed , which

applies adaptive projection matrix. To do so, the neighbouring frames are compared to

estimate their intra-frame correlation, every frame is classified into different

categories, and the number of projections are adjusted accordingly.

The experimental results show that the adaptive projection matrix

can significantly improve the speech reconstruction quality. Intra-frame correlation of

speech signals is explored to achieve efficient sampling. Because different kind

speech signals may have different intra-frame correlations a frame-based adaptive CS

framework that uses different sampling strategies in different kind speech frames, has

been proposed.

5.2 Frame Analysis

Each speech sequence is divided into non-overlapping frames of size 1x n and

all frames in a speech sequence are processed independently. The projection matrix is

initialised by Gaussian random matrix Φ which has been proven to be incoherent with

most sparse basises at high probability.



Figure 5.1: The frame-based adaptive CS framework for speech

As shown in Figure 5.1, for each frame in a speech sequence, a small number

of projections is collected, and compared these projections with the projections

collected for the previous frame. Based on the comparison results, the correlation

between these two frames is estimated , and the correlation is classified into different

categories. Then the sampling strategy is adjusted according to the correlation type

and different number of samples for the current frame are collected.

For the current t th frame original speech signal, we represent it as X(t)

its previous frame t-1 is represented using x(t-1) The difference between X(t) and x(t-

1) reflects the correlation between the two neighbouring frames, and can be used to

classify the correlation. We use the collected measurements to estimate the correlation

instead. The same projection matrix Φ is applied to all frames in the partial sampling

stage, and we have y(t)-y(t-1)= Φx(t)- Φx(t-1), where y(t) and y(t-1) are the the

projection vectors of x(t) and x(t-1) respectively. As each sample in y(t)-y(t-1) is a

linear combination of x9t)-x(t-1) the difference between the two projection vectors

also reflects the intensity changes in the two frame.

Therefore, we can estimate the amount of intensity changes in the two frames

using only a small number of projections. Let ΦM0 be a matrix containing the first

M0 rows of the Gaussian random matrix Φ. For the current frame t, we first use ΦM0

to collect M0 measurements y (t) ^ M0= ΦM0 .x(t) in the partial sampling stage.

Then, we compare it with the first M0 measurements in y (t-1) and calculate the

difference y (t) ^d=y (t) ^M0-y (t-1) ^M0. In the frame analysis module, given y (t)

^d, we calculate its l ^2 norm normalized by M0 and compare with two thresholds T1

and T2 (T1<T2).



If y (t) ^d/M0<=T1, the current frame is almost the same as its previous

frame.We consider the two neighbouring frames may be both surd and label the intra-

frame correlation as surd vs. surd. If T1< y (t) ^d/M0<=T2, it indicates that these two

neighbouring frames undergo small changes. In this situation, the two neighbouring

frames may be both sonant at high probability and the intra-correlation is labelled as

sonant vs. sonant. If y(t)^d/M0>T0, the two frames are significantly different from

each other, which is most likely due to the change of the frame type, and we label the

correlation as surd vs. sonant.

5.3 Partial Sampling

For each frame in a speech sequence, we first collect a small number of

projections, and compare it with the projections collected for the previous frame.

Based on the comparison results, we estimate the correlation between these two

frames, and classify the correlation into different categories. We then adjust the

sampling strategy according to the correlation type and collect different number of

samples for the current frame. The next sections discuss details of each step in the

above framework.

For the current t th frame original speech signal, we represent it as x t . Its

previous frame t-1 is represented using xt-1. The difference between xt and xt -1

reflects the correlation between the two neighboring frames, and can be used to

classify the correlation. Since xt- xt-1 is not available at the sampling stage. We use the

collected measurements to estimate the correlation instead. The same projection

matrix Φ is applied to all frames in the partial sampling stage, and we have y t-yt-1=φxt-

φxt-1 , where yt and yt-1 are the projection vectors of xt and xt-1 respectively. As each

sample in is a linear combination of 1 tt the difference between the two projection

vectors also reflects the intensity changes in the two frame. Therefore, we can

estimate the amount of intensity changes in the two frames using only a small number

of projections.

Let ΦMo be a matrix containing the first Mo rows of the Gaussian random

matrix Φ . For the current frame t, we first use ΦMo to collect Mo measurements ytMo

=ΦMoxt in the partial sampling stage. Then, we compare it with the first Mo

measurements in yt-1 and calculate the difference ytd=yt

Mo-yt-1Mo. In the frame analysis



module, given ydt, we calculate its l2 norm normalized by Mo and compare with two

thresholds T1 and T2 .

If ||yd||2/Mo<= T1 the current frame is almost the same as its previous frame.

We consider the two neighboring frames may be both surd and label the intra-frame

correlation as surd vs. surd. If T1 < ||yd||2/Mo<= T2 it indicates that these two

neighboring frames undergo small changes. In this situation, the two neighboring

frames may be both sonant at high probability and the intra-correlation is labeled as

sonant vs. sonant. If , ||yd||2/Mo >T1 the two frames are significantly different from

each other, which is most likely due to the change of the frame type, and we label the

correlation as surd vs. sonant. The correlation between the two neighbor frames, and

can be used classify the correlation.

5.4 Adaptive Sampling

Depending on their classified intra-frame correlation types, different number

of projections is used for the speech frames. We consider the frame as surd frame if

its intra-frame correlation type is surd vs. surd. A surd frame contains the least new

information in the speech. Thus, the M0 measurements collected in the partial

sampling stage are sufficient and we do not need additional sampling. When its intra-

frame correlation is sonant vs. sonant, the frame is considered as sonant and contains

some new information, which requires more measurements to be collected.

For such frames, we collect M1 (M1 >M0) measurements. We use the

(M0+1)th to the M1th rows of the Gaussian random matrix Φ and combine with M0

to generate the final projection vector yt . The frames that experience large changes

must contain the most new information. Therefore, we collect a total of M2

(M2>M1>M0) measurements during the sampling process. The total projection

matrix is the first M2 rows of the Gaussian random matrix Φ.



5.5 Reconstruction

The original signal is reconstructed from a significant small numbers of

samples by using different optimization techniques such as l1-norm or convex

optimization. In this project we have used orthogonal matching pursuit (OMP) which

gives better results when compared to other optimization techniques. Reconstructing

frame by frame of signal using OMP method. OMP uses sub Gaussian measurement

matrices to reconstruct sparse signals. If Φ is such a measurement matrix, then Φ Φx∗

is in a loose sense close to the identity.

Therefore one would expect the largest coordinate of the observation vector y

= Φ Φx to correspond to a non-zero entry of x. Thus one coordinate for the support∗

of the signal x is estimated. Subtracting of that contribution from the observation

vector y and repeating eventually yields the entire support of the signal x. OMP is

quite fast, both in theory and in practice, but its guarantees are not as strong as those

of Basis Pursuit. The algorithm’s simplicity enables a fast runtime.

The algorithm iterates s times, and each iteration does a selection through d

elements, multiplies by Φ Φx and solves a least squares problem. Reconstruction∗

algorithms compute the support of the sparse signal x iteratively. Once the support of

the signal is compute correctly, the pseudo-inverse of the measurement matrix

restricted to the corresponding columns can be used to reconstruct the actual signal x.

The clear advantage to this approach is speed, but the approach also presents new

challenges.

5.5.1 Orthogonal Matching Pursuit

Orthogonal Matching Pursuit (OMP), put forth by Mallrat and his

collaborators and analyzed by Gilbert and Troop .OMP uses sub Gaussian

measurement matrices to reconstruct sparse signals. If Φ is such a measurement

matrix, then Φ Φ is in a loose sense close to the identity. Therefore one would expect∗

the largest coordinate of the observation vector y = Φ Φx to correspond to a non-zero∗

entry of x. Thus one coordinate for the support of the signal x is estimated.

Subtracting o that contribution from the observation vector y and repeatingff

eventually yields the entire support of the signal x.



OMP is quite fast, both in theory and in practice, but its guarantees are not as

strong as those of Basis Pursuit.The algorithm’s simplicity enables a fast runtime. The

algorithm iterates s times,and each iteration does a selection through d elements,

multiplies by Φ Φ, and solves a least squares problem. The selection can easily be∗

done in O(d) time, and the multiplication of Φ Φ in the general case takes O(md).∗

When Φ is an unstructured matrix, the cost of solving the least squares problem is

O(s2d). However, maintaining a QR-Factorization of Φ|I and using the modified

Gram-Schmidt algorithm reduces this time to O(|I|d) at each iteration.

Using this method, the overall cost of OMP becomes O(smd). In the case

where the measurement matrix Φ is structured with a ast-multiply, this can clearly be

improved.

5.5.2. Stagewise Orthogonal Matching Pursuit

An alternative greedy approach, Stagewise Orthogonal Matching Pursuit

(StOMP) developed and analyzed by Donoho and his collaborators, uses ideas

inspired by wireless communications. As in OMP, StOMP utilizes the observation

vector y = Φ u.Where u = Φx is the measurement vector. However, instead of simply∗

selecting the largest component of the vector y, it selects all of the coordinates whose

values are above a specified threshold. It then solves a least-squares problem to update

the residual. The algorithm iterates through only a fixed number of stages and then

terminates, whereas OMP requires s iterations where s is the sparsity level.The

thresholding strategy is designed so that many terms enter at each stage, and so that

algorithm halts after a fixed number of iterations.

The formal noise level σk is proportional the Euclidean norm of the

residual at that iteration. This method appears to provide slightly weaker results. It

appears however,that StOMP outperforms OMP and Basis Pursuit in some cases.

Although the structure of StOMP is similar to that of OMP, because StOMP selects

many coordinates at each state, the runtime is quite improved. Indeed, using iterative

methods to solve the least-squares problem yields a runtime bound of CNsd + O(d),

where N is the fixed number of iterations run by StOMP, and C is a constant that

depends only on the accuracy level of the least-squares problem.



5.5.3 Regularized Orthogonal Matching Pursuit

As is now evident, the two approaches to compressed sensing each presented

disjoint advantages and challenges. While the optimization method provides

robustness and uniform guarantees, it lacks the speed of the greedy approach. The

greedy methods on the other hand had not been able to provide the strong guarantees

of Basis Pursuit. This changed when we developed a new greedy algorithm,

Regularized Orthogonal Matching Pursuit , that provided the strong guarantees of the

optimization method.

This work bridged the gap between the two approaches, and provided the first

algorithm possessing the advantages of both approaches. Regularized Orthogonal

Matching Pursuit (ROMP) is a greedy algorithm, but will correctly recover any sparse

signal using any measurement matrix that satisfies the Restricted Isometry Condition.

Again as in the case of OMP, we will use the observation vector Φ Φx as a good∗

local approximation to the s-sparse signal x. Since the Restricted Isometry Condition

guarantees that every s columns of Φ are close to an orthonormal system, we will

choose at each iteration not just one coordinate as in OMP, but up to s coordinates

using the observation vector. It will then be okay to choose some incorrect

coordinates, so long as the number of those is limited.

To ensure that we do not select too many incorrect coordinates at

each iteration, we include a regularization step which will guarantee that each

coordinate selected contains an even share of the information about the signal. We

remark here that knowledge about the sparsity level s is required in ROMP, as in

OMP. Clearly in the case where the signal is not exactly sparse and the signal and

measurements are corrupted with noise, the algorithm as described above will never

halt. Thus in the noisy case, we simply change the halting criteria by allowing the

algorithm iterate at most s times, or until |I| ≥ s. We show below that with this

modification ROMP approximately reconstructs arbitrary signals.



5.6 Compressed Sensing Using DCT

The Algorithm for the compressed sensing by applying discrete cosine

transform to make the signal sparse is as given below

1. Expand the signal in cosine basis

ψ j(t)=n1 /2cos (2Πjt/n), t=0,1,...............,n-1

and obtain coefficient vector x.

2. Choose the parameter M to obtain a new observation vector y by

Correlating the signal

3. Create a new sensing matrix x and solve the L1 norm equation for the

same.

4. Get the estimated coefficient vector s* and the recovered signal y.

5. Compare y with x to check whether the reconstruction is exact.

6. if not, go back to step 2 and increase the number of M.

Conventionally, after acquisition of a scene, Discrete Cosine Transform

(DCT) is performed on the image using values assigned to each pixel. After DCT,

many coefficients will be zero or will carry negligible energy; these coefficients

are discarded before quantization or/and entropy coding. Hence, though each

frame of the image is acquired fully, much of the acquired information is

discarded after DCT causing unnecessary burden on the acquisition process. This

makes compressive sampling a good candidate for digital image and video

applications, where the Nyquist rate is so high that compressing the sheer

volume of samples is a problem for transmission or storage.



Applying compressive sampling to the whole image is ineffective. In order

to exploit any sparsity within an image, we split the image into small non-

overlapping blocks of equal size. Compressive sampling is then performed only on

blocks determined to be sparse, i.e., we exploit local sparsity within an image. Note

that, in real-time acquisition, it is not possible to test for sparsity of a block

before sampling.

Reference frames are sampled fully, and DCT is applied on each of the B

blocks. We select Bs sparse blocks in the following manner. Let C be a small

positive constant, and T an integer threshold that is representative of the

average number of non-significant DCT coefficients over all blocks. If the

number of DCT coefficients in the block whose absolute value is less than C is

larger than T, the block is selected as a reference for compressive sampling.

Consider a real-valued, finite-length, one-dimensional, discrete-time signal x,

which we view as an N × 1 column vector in with elements x[n], n = 1, 2, . . . , N. We

treat an image or higher dimensional data by vectorizing it into a long one-

dimensional vector. Any signal in can be represented in terms of a basis of N

×1 vectors { Φi} =1. For simplicity, assume that the basis is orthonormal. Forming

the N × N basis matrix Ψ = [ Φ1 | Φ2 | . . . | Φn ] by stacking the vectors { Φi } as

columns, we can express any signal x as

x=∑i=1

N

siψ i

where s is the N × 1 column vector of weighting coefficients s1= <x, Φi

> ΦiX= and where denotes the (Hermitian) transpose operation. Clearly, x and s

are equivalent representations of the same signal, with x in the time domain and s in

the Ψ domain. We will focus on signals that have a sparse representation, where x is a

linear combination of just K basis vectors, with K << N. That is, only K of the in

above equation are nonzero and (N − K) are zero.

Sparsity is motivated by the fact that many natural and manmade signals

are compressible in the sense that there exists a basis Ψ where the representation in

above equation has just a few large coefficients and many small coefficients.



Compressible signals are well approximated by K-sparse representations; this is

the basis of transform coding.

For example, natural images tend to be compressible in the discrete cosine

transform (DCT) and wavelet bases on which the JPEG and JPEG-2000 compression

standards are based. Audio signals and many communication signals are compressible

in a localized Fourier basis.

The main idea of compressive sampling is to remove this “sampling

redundancy” by needing only M samples of the signal, where K < M ≪N. Let y

be an M-length measurement vector given by y= Φs , where Φ is an M×N

measurement matrix. The above expression can be written in terms of s as

Y=Φψs

The signal x can be recovered lossless from M ≈ K or slightly more

measurements if the measurement matrix Φ is properly designed, so that ΦΨ

satisfies the so-called restricted isometric property . This will always be true if Φ and

Ψ are incoherent, that is, the vectors of Φ cannot sparsely represent basic vectors and

vice versa.



CHAPTER-6

APPLICATIONS OF COMPRESSED SENSING

Compressed sensing finds many applications almost in every signal processing

system and communication system due to its efficient signal recovery capability.

Sparse signals arise in practice in very natural ways, so compressed sensing lends it

well to many settings. Some of the applications are given below

6.1 Imaging

Many images are sparse with respect to some basis. Because of this, many

applications in imaging are able to take advantage of the tools provided by

Compressed Sensing. The typical digital camera today records every pixel in an

image before compressing that data and storing the compressed image. Due to the use

of silicon, everyday digital cameras today can operate in the megapixel range. A

natural question asks why we need to acquire this abundance of data, just to throw

most of it away immediately. This notion sparked the emerging theory of

Compressive Imaging. In this new framework, the idea is to directly acquire random

linear measurements of an image without the burdensome step of capturing every

pixel initially. Several issues from this of course arise. The first problem is how to

reconstruct the image from its random linear measurements.

The solution to this problem is provided by Compressed Sensing. The next

issue lies in actually sampling the random linear measurements without first acquiring

the entire image. Compressive sampling camera consists of a digital micro mirror

device (DMD), two lenses, a single photon detector and an analog-to-digital (A/D)

converter. The first lens focuses the light onto the DMD. Each mirror on the DMD

corresponds to a pixel in the image, and can be tilted toward or away from the second

lens.

Since this camera utilizes only one photon detector, its design is a stark

contrast to the usual large photon detector array in most cameras. The single-pixel

compressive sampling camera also operates at a much broader range of the light

spectrum than traditional cameras that use silicon.



For example, because silicon cannot capture a wide range of the spectrum, a

digital camera to capture infrared images is much more complex and costly.

Compressed Sensing is also used in medical imaging, in particular with magnetic

resonance (MR) images which sample Fourier coe cients of an image. MR imagesffi

are implicitly sparse and can thus capitalize on the theories of Compressed Sensing.

Some MR images such as angiograms are sparse in their actual pixel representation,

whereas more complicated MR images are sparse with respect to some other basis,

such as the wavelet Fourier basis.

MR imaging in general is very time costly, as the speed of data collection is

limited by physical and physiological constraints. Thus it is extremely beneficial to

reduce the number of measurements collected without sacrificing quality of the MR

image. Compressed Sensing again provides exactly this, and many Compressed

Sensing algorithms have been specifically designed with MR images in mind.

The common approach in digital image system is to capture as many pixels as

possible and later compress the captured image by digital means. Compression is

desired to increase the storage capacity and enhance the communication process.

Compression techniques exploit the visual redundancy typical to human intelligible

images. After capturing the optical image and applying data compression, the image is

represented by a smaller number of pixels than the original image. The decompressed

image should satisfy some desired visual quality.

6.2 Analog to Information Conversion

Analog to-digital converters (ADC) have been used in sensing and

communications due to the advancement in digital signal processing. The process of

ADC is based on the Nyquist sampling theorem which uniformly samples the signal

with a rate of at least twice its bandwidth in order to reconstruct the signal perfectly.

Emerging applications like radar detection and ultra-wideband communication are

pushing the limit of ADC. The recent developments in the field of compressive

sensing (CS), has helped in the design of Analog to-Information converters (AIC) that

are able to acquire samples at a lower sampling rate.



6.3 Compressive Radar

One additional application is Compressive Radar Imaging. A standard radar

system transmits some sort of pulse (for example a linear chirp), and then uses a

matched filter to correlate the signal received with that pulse. The receiver uses a

pulse compression system along with a high-rate analog to digital (A/D) converter.

This conventional approach is not only complicated and expensive, but the resolution

of targets in this classical framework is limited by the radar uncertainty principle.

Compressive Radar Imaging tackles these problems by discretizing the time-

frequency plane into a grid and considering each possible target scene as a matrix. If

the number of targets is small enough, then the grid will be sparsely populated, and

we can employ Compressed Sensing techniques to recover the target scene.

The new theory of compressive sensing can be used in radar imaging systems

which are designed to determine the range, altitude, direction and speed of moving

and fixed objects. The received radar signal can be reconstructed with fewer

measurements by solving an inverse problem through a linear program or a Greedy

Algorithm. With the implementation of compressive sensing in radar systems, the

need for a pulse compression matched filter at the receiver side and the analog to-

digital conversion operating at high Nyquist rates can be eliminated. As a result, the

complexity and the cost of the receiver hardware is going to be greatly reduced.



6.4 Compressive Sensing in Mobile Communication System

Wireless Sensor Networks (WSN) is a very special topic in wireless

communications, because in WSNs the main constraint is power, unlike almost all

other wireless networks. The power constraint poses a serious restriction on the

design of the sensor nodes and the network layers (especially the physical and MAC

layers). A typical sensor node consists of the wireless transceiver, processor, sensor

and the power. The sensor node has a number of modes of operation associated with

the activity of its components which may be broadly divided as sleep and active

modes. In sleep mode the sensor typically consumes much less power than in active

mode.

In order to minimize the power consumption, the ratio of the active vs. sleep

time must be as minimal as possible. The lifetime of a node in a sensor network is of

utmost importance. Because, once the sensors are deployed it will be very difficult to

identify the ones whose power supply is dead and to maintain them (consider a WSN

for monitoring the traffic on a freeway), the sensor nodes cannot be recharged with

the same ease that a cell-phone (or other handheld devices) offers. There are also the

additional problems associated with ad-hoc networks, such as synchronization, hidden

nodes, exposed nodes, etc. that need to be solved in order to maximize the sleep time.

Also while in active mode, the node consumes considerable energy to transmit bits.

This is where CS comes in, it helps to reduce the amount of data that must

circulate through the network. An important point here is that, reducing the amount of

data will significantly reduce transmit power; because, number of collisions, bit errors

and retransmissions will all decrease as well. Compressed sensing (CS) theory

specifies a new signal acquisition approach, potentially allowing the acquisition of

signals at a much lower data rate than the Nyquist sampling rate. In CS, the signal is

not directly acquired but reconstructed from a few measurements. One of the key

problems in CS is how to recover the original signal from measurements in the

presence of noise. This dissertation addresses signal reconstruction problems in CS.



First, a feedback structure and signal recovery algorithm, orthogonal pruning

pursuit (OPP), is proposed to exploit the prior knowledge to reconstruct the signal in

the noise-free situation. To handle the noise, a noise-aware signal reconstruction

algorithm based on Bayesian Compressed Sensing (BCS) is developed. Moreover, a

novel Turbo Bayesian Compressed Sensing (TBCS) algorithm is developed for joint

signal reconstruction by exploiting both spatial and temporal redundancy. Then, the

TBCS algorithm is applied to a UWB positioning system for achieving mm-accuracy

with low sampling rate ADCs. Finally, hardware implementation of BCS signal

reconstruction on FPGAs and GPUs is investigated. Implementation on GPUs and

FPGAs of parallel Cholesky decomposition, which is a key component of BCS, is

explored. Simulation results on software and hardware have demonstrated that OPP

and TBCS outperform previous approaches, with UWB positioning accuracy is

improved. The accelerated computation helps enable real-time application of this

work.

By using compressive sensing techniques, the speech signal is pre coded at the

transmitter side which is being sent to the receiver through a wireless channel. As a

result, a small number of samples are being transmitted, and this will increase the

transmission data rates when compared to the current communication systems. In the

proposed communication system, first the speech signal is modelled in such a way

that the input signal is sparse enough before applying compressive sensing. The

sparse signal is multiplied by the predefined measurement matrix.

Figure 6.1: Compressive sensing in a mobile communication

system



The output of the compressive sensing module is then transmitted to the

receiver. At the receiver, the signal is perfectly reconstructed from a significant small

number of measurements by using different optimization techniques such as l1-norm

or convex optimization.



CHAPTER -7

SIMULATION RESULTS

To compare the performance of this proposed adaptive compressed sensing

and the conventional non-adaptive CS, some experiments are conducted. As a part of

that, an arbitrary speech signal has been chosen which is 16 kHz sampled and 16 bits

quantized for each sample. Adaptive CS and CS sampling and reconstruction are

performed frame by frame, with a frame length of N=320 samples. Threshold values

T1 and T2 are chosen as 0.08 and 0.4 respectively which is tested through a great

number of experiments. Average Frame Signal to Noise Ratio (AFSNR) is calculated

and used to evaluate the reconstruction quality of speech signal. Average Frame

Signal to Noise Ratio (AFSNR) is calculated using the formula shown below

AFSNR=1/K∑k=1

K

10 log10¿¿

Where K is the total frame number of a speech sequence X k and x̂krepresent

the k thframe speech and the k th frame reconstructed speech. Under different

compressed ratio(r=0.2, r=0.4 and r=0.6), which is defined as r=M/N, the different

test results are obtained based on the proposed frame-based adaptive CS using OMP

reconstruction algorithm.



7.1 Compression Ratio r=0.2

Here the compression ratio r is equal to 0.2 which indicates that the number of

projections M=64 for the frame of samples N = 320.The below figures 7.1 shows the

time domain waveform of the original speech signal and adaptive CS reconstructed

speech with compressed ratio of 0.2.

0 1 2 3 4 5 6 7

x 104

-1

-0.5

0

0.5

1reconstruction signal

r=0.2

0 1 2 3 4 5 6 7

x 104

-1

-0.5

0

0.5

1original signal

Figure 7.1: Original and reconstructed signals for N=320, M=64, r=0.2

The average frame signal to noise ratio for compression ratio r=0.2 with

adaptive projection matrix is 12.030 which is 4.970 with non adaptive projection

matrix. Thus the AFSNR can be increased by adaptive projection matrix.

.



7.2 Compression Ratio r=0.4




speech with compressed ratio of 0.4

0 1 2 3 4 5 6 7

x 104

-1

-0.5

0

0.5


r=0.4

0 1 2 3 4 5 6 7

x 104

-1

-0.5

0

0.5

1original signal

Figure 7.2: Original and reconstructed signals for N=320, M=128, r=0.4



matrix. Thus the AFSNR can be increased by adaptive projection matrix. From the

above figure, it is observed that, when the compressed ratio r=0.4, the quality of the

reconstructed signal has been increased than the reconstructed signal obtained from

compressed ratio r=0.2.



7.3 Compression ratio r=0.6




speech with compressed ratio of 0.6

0 1 2 3 4 5 6 7

x 104

-1

-0.5

0

0.5


r=0.6

0 1 2 3 4 5 6 7

x 104

-1

-0.5

0

0.5

1original signal

Figure7.3: Original and reconstructed signals for N=320, M=192, r=0.6



matrix. Thus the AFSNR can be increased by adaptive projection matrix. From the

above figure, it is observed that, when the compressed ratio r=0.6, the quality of the

reconstructed signal has been increased than the reconstructed signal obtained from

compressed ratio r=0.2 and r=0.4.



7.4 Comparison with the Non-Adaptive Process

The results obtained by using adaptive projection matrix to implement

the compressed sensing are compared with that of conventional non adaptive

compressed sensing, for the three compressed ratios r=0.2, r=0.4, r=0.6.

Below figure shows the reconstruction quality

of speech signal based on our proposed frame-based adaptive CS and the conventional

non-adaptive CS. From the fig 7.4 it can be observed that reconstruction quality is

improved when adaptive projection matrix is used. For example, when the

compressed ratio is 0.2, the AFSNR increases more than 7dB.

Figure 7.4: Reconstructed speech quality using the proposed adaptive CS and non-

Adaptive Compressed sensing.



Table 7.1 Gives the Average Frame Signal to Noise Ratio (AFSNR) of the

reconstructed speech based on our proposed frame-based adaptive CS and the

conventional non-adaptive CS. Under each different compressed ratio, we can see that

reconstructed speeches using the proposed adaptive frame have higher AFSNR than

that using non-adaptive frame.

Table 7.1: MOS of the reconstructed speech using non-adaptive CS and the adaptive

CS

Thus it is clearly observed that the proposed adaptive compressed sensing is

more efficient than the conventional non adaptive compressed sensing.



CHAPTER-8

CONCLUSION AND FUTURE SCOPE

The adaptive projection matrix has been applied to the conventional

compressed sensing and improved the average frame signal to noise ratio. It is also

proved that the quality of the reconstructed signal increases as the compressed ratio

increases. Thus the conventional non adaptive compressed sensing can be replaced by

the adaptive compressed sensing to improve the efficiency of the system. During the

design process, this module went through different tests and analysis in order to find

the most adequate optimization technique to reconstruct the speech signal with few

random measurements without losing the information. For simulation purposes, code

was created in order to compress the speech signal below the Nyquist rate by taking

only a few measurements of the signal.

The result shows that by keeping the length of the signal (L) and threshold

window (Th) constant the desired compression of the signal can be achieved by

making the signal sparse (K) to a certain amount which in turn increases the data

rates. The speech signal was reconstructed without losing important information in

order to achieve an increase in the data rates. After multiple simulations, it was found

that the system worked as expected and the speech signal was reconstructed

efficiently with a minimum error. Performance of compressive sensing is better when

compared to wavelet compression as there is a minimum error with same compression

rate using different parameters.

In this research the design of a new signal acquisition system using adaptive

compressive sensing has been implemented. The proposed system should fulfil the

accurate reconstruction of the speech signal. Different transformations need to be

tested in order to find the most efficient one for this application Design and

measurement matrix that will be optimum for speech signals.





REFERENCES

[1] Donoho D L. Compressed sensing [J]. IEEE Transaction on information

theory,

2006,52(4):1289-1306

[2] Donoho D,Tsaig Y. Extensions of compressed sensing [J]. Signal

Processing,

2006,86(3):533-548.

[3] Scott S. chen,David L. Donoho,Michael A.Saunders. Atomic

decomposition by

Basis pursuit [J]. 2001 Society for Industrial and Applied Mathematics,

SIAM

Review,43(1):129-159.

[4] J. Tropp, A. Gilbert. Signal recovery from random measurements via

orthogonal

Matching pursuit. IEEE Transaction on Information Theory [J], 2007,

53(12),4655- 4666.

[5] Zh. M Wang, G. R. Arcet, J. L. Paredest. Colored random projections for

compressed sensing. ICASSP,2007: 873-876

[6] Zhaorui Liu, Vicky Zhao,A. Y. Elezzabi. Block-based adaptive

compressed

sensing for video. Proceedings of 2010 IEEE 17 th international

conference on

image

proceeding,2010,1649-1652.

[7] Jarvis Haupt,Robert Nowak,Rui Castro.Adaptive sensing for sparse

signalrecovery.ICASSP,2009,702-707.



[8] R.Gribonval, M. Nielsen. Sparse representations in unions of bases. IEEE

Trascation on Information Theory, 2004,49(12),3320-3325.

[9] J. A. Tropp. Greed id good: Algorithmic results for sparse approximation.

IEEE

Transaction on Information Theory, 2004, 50(10), 2231-2242.



Appendix A: MATLAB code for sampling the speech signal in mobile system

using

Compressed sensing

clc;

clear all;

%Fs Hz (samples per second) is the rate at the speech signal is sampled

Fs=2000;

x=wavread('test.wav');

figure(1)

stem(x)

title('Recorded input speech signal');

xlabel('Length of the input speech signal');

ylabel('Amplitude of the input speech signal');

%Discrete cosine transform of the recorded signal

a0=dct(x)

figure(2)

stem(a0)

axis([0 2000 -1 1]);

title('Discrete cosine transform of the recorded signal');

xlabel('Length of the DCT spectrum');

ylabel('Amplitude of the DCT spectrum');

% Thresholding the spectrum to make it sparse

for i=1:1:2000;

if a0(i,1)<=0.04 && a0(i,1)>=-0.06



a0(i,1)=0;

else

a0(i,1)=a0(i,1);

end

end

a0;

figure(3)

stem(a0)

axis([0 2000 -1 1]);

title('The Threshold spectrum');

xlabel('The length of the threshold spectrum');

ylabel('Amplitude of the threshold spectrum');

% Sparsity of the spectrum(K)and Length of the signal (N)

K=800

N=2000

% Random measurement matrix

disp('Creating measurment matrix...');

A = randn(K,N);

A = orth(A')';

figure(4)

imagesc(A)

colorbar;

colormap('lines');

title('Random Measurement matrix');

disp('Done.');



% observations vector

y = A*a0;

figure(5)

plot(y)

title('Observation Vector');

%initial guess = min energy

x0 = A'*y;

%solve the LP

tic

xp = l1eq_pd(x0, A, [], y, 1e-2);

toc

figure(6)

plot(xp)

axis([0 2000 -0.6 0.6]);

title(' Reconstructed Spectrum using l1-minimization');

% Inverse dicrete cosine transform of reconstructed signal (IDCT)

Xrec=idct(xp)

wavplay(Xrec,Fs)

figure(7)

stem(Xrec)

title('Reconstructed signal at the receiver');

xlabel('Length of the reconstructed signal using IDCT');

ylabel('Amplitude of the reconstructed signal using IDCT');

% Calculating Absolute error between the reconstructed and actual signal

err=(max(abs(Xrec-x))) stem(err);



title(' Absolute Error of Reconstructed spectrum and Threshold spectrum ');

xlabel('Length of the Maximum Absolute Error');

ylabel('Maximum Absolute error')



Appendix B:MATLAB code for compressing the test signal using wavelet

compression

% Load original one-dimensional signal.

Fs=2000

s=wavread('test.wav')';

figure(1)

stem(s)

title('Input speech signal');

xlabel('Length of the input speech signal');

ylabel('Amplitude of the input speech signal');

l_s = length(s);

% Wavelet transform of input signal

[cA1,cD1] = dwt(s,'db1');

%To extract the Level-1 Approximation and Detail coefficient

A1 = idwt(cA1,[],'db1',l_s);

D1 = idwt([],cD1,'db1',l_s);

figure(2)

subplot(1,2,1); plot(A1); title('Approximation A1')

subplot(1,2,2); plot(D1); title('Detail D1')

%Inverse Wavelet transform of Approximation and detail coefficient

A0 = idwt(A1,D1,'db1',l_s);

wavplay(A0,Fs)

figure(3)

stem(A0)



title('Recontructed speech signal');

xlabel('Length of the reconstructed speech signal');

ylabel('Amplitude of the reconstructed speech signal');

err = max(abs(s-A0))