Sound Source ion Using LabVIEW

R.V. COLLEGE OF ENGINEERING, Bangalore-560059

(Autonomous Institution Affiliated to VTU, Belgaum)

“SOUND SOURCE LOCALIZATION USING

LabVIEW”

PROJECT REPORT2011-12

Submitted by

1. JAGRITI R 1RV08IT058

2. SHREE VARDHAN SARAF 1RV08IT061

Under the Guidance of

Mr. HARSHA HERLE Assistant professor

Department of Instrumentation Technology, RVCE

In partial fulfillment for the award of degree

of

Bachelor of Engineering

in

INSTRUMENTATION TECHNOLOGY

R.V. COLLEGE OF ENGINEERING, BANGALORE – 560059(Autonomous Institution Affiliated to VTU, Belgaum)

DEPARTMENT OF INSTRUMENTATION TECHNOLOGY

CERTIFICATE

Certified that the project work titled ‘Sound source localization using LabVIEW’ is

carried out by Jagriti R (1RV08IT058) and Shree Vardhan (1RV08IT061) who are

bonafide students of R.V College of Engineering, Bangalore, in partial fulfillment for

the award of degree of Bachelor of Engineering in Instrumentation Technology of

the Visvesvaraya Technological University, Belgaum during the year 2011-2012. It is

certified that all corrections/suggestions indicated for the internal Assessment have

been incorporated in the report deposited in the departmental library. The project

report has been approved as it satisfies the academic requirements in respect of

project work prescribed by the institution for the said degree.

Signature of Guide: Signature of Head of Department: Signature of Principal

External Viva

Name of Examiners Signature with date

1

2

R.VCOLLEGE OF ENGINEERING, BANGALORE - 560059(Autonomous Institution Affiliated to VTU, Belgaum)

DEPARTMENT OF INSTRUMENTATION TECHNOLOGY

DECLARATION

We, Jagriti R (1RV08IT058) and Shree Vardhan Saraf (1RV08IT061) the

students of eighth semester B.E., Instrumentation Technology, hereby declare that

the project titled “Sound source localization using LabVIEW” has been carried out

by us and submitted in partial fulfillment for the award of degree of Bachelor of

Engineering in Instrumentation Technology. We do declare that this work is not

carried out by any other students for the award of degree in any other branch.

Place: Bangalore Names Signature

Date: 1. Jagriti R

2. Shree Vardhan

ACKNOWLEDGEMENT

The satisfaction that accompanies the successful completion of any endeavour would

be incomplete without the mention of people who made it possible and whose

constant support, encouragement and guidance has been a source of inspiration

throughout the course of this project.

We thank our internal guide Mr. Harsha Herle, Assistant Professor, Instrumentation

Technology, R.V College of Engineering for his endearing help and guidance.

We express our heart-felt gratitude to Mr. Rohit Pannikar, Manager, Applications

engineering division and Mr. Rajshekhar, Staff Applications engineer, National

Instruments India for providing a very congenial work environment and for their

expert supervision that enabled us to complete this project successfully in the given

duration.

We would like to thank Dr. Prasanna Kumar S.C., Professor and Head of

Department of Instrumentation Technology, R.V College of Engineering, Bangalore

for his encouragement and support.

We would like to thank Prof. B.S. Satyanarayana, Principal, R.V College of

Engineering for his constant support.

Finally, we thank one and all involved directly or indirectly in successful completion

of the project.

ABSTRACT

Problem of locating a sound source in space has received a growing interest. The

human auditory system uses several cues for sound source localization, including

time- and level-differences between ears, spectral information, timing analysis,

correlation analysis, and pattern matching. Similarly, a biologically inspired sound

localization system can be built by making use of an array of microphones, which are

hooked up to a computer.

Methods for determining the direction of incidence based on sound intensity, the

phase of cross-spectral functions, cross-correlation functions, and Frequency

Selection algorithm are available. Sound source localizations finds applications in

military, camera pointing in video-conferencing environments, beam former steering

for robust speech recognition systems etc.

There is no universal solution for accurate sound source localization. Depending on

the object under study and the noise problem, the most appropriate technique has to

be selected. In this project we attempt to localize a single sound source by using four

microphones. A most practical acoustic source localization scheme is based on time

delay of arrival estimation (TDOA). We implement generalized cross correlation to

find time delay of arrival between microphone pairs. TDOA estimation using

microphone arrays is to use the phase information present in signals from

microphones that are spatially separated. Phase difference between the Fourier

transformed signals to estimate the TDOA and is implemented using a 4 element

tetrahedron shaped microphone array.

Once TDOA estimation is performed, it is possible to find the position of the source

through geometrical calculations therefore deriving the source location by solving the

set of non-linear least squares equations. The experimental results showed that the

direction of the two sources was estimated with high accuracy while the range of the

sources was estimated with moderate accuracy.

\i

CONTENTS

Abstract iList of Figures iiList of Tables iii List of symbols, Acronyms and Nomenclature i

1. Chapter 1: Introduction 1 1.1 Sound localization in Biology 2 1.2 Sound localization: a signal processing view 31.3 Problem statement 41.4 Objective 41.5 Overview of the Project 5 1.6 Organization of Report 51.7 Block Diagram and Description 6

2. Chapter 2: Theoretical Background 72.1 Nature of Sound 82.2 Microphone 10

2.2.1 Types of microphone 112.3 Microphone array 132.4 Various Coherence Measures 14

3. Chapter 3: Design and methodology 153.1 Scenario 163.2 Direction of Arrival Estimation 16

3.2.1 The Geometry of the Problem 163.2.2 Microphone array structure 173.2.3 Time Delay of Arrival (TDOA) 183.2.4 Algorithm to find Time delay of Arrival 19

3.3 Distance Estimation 213.3.1 Source Localization in 2-Dimensional Space 213.3.2 Hyperbolic position location 21

3.3.2.1 General Model 223.3.2.2 Position Estimation 23

3.4 Hardware Design 243.4.1 cDAQ-9172 243.4.2 Analog input module - NI 9234 243.4.3 Digital output module – NI 9472 24

3.5 Assumptions and Limitations 25

4. Chapter 4: Implementation Overview 26 4.1 Hardware and interfacing 274.2 Overview of LabVIEW 27

4.2.1 Front panel 284.2.2 Block diagram 28

4.3 Programming using LabVIEW 11 294.3.1 Microphone signal interface using NI DAQ Assistant. 294.3.2 Threshold detection of each signal 324.3.3 Finding time delay of arrival 344.3.4 Direction and distance estimation 364.3.5 Servo control 37

4.4 System Hardware 374.4.1 Microphone 384.4.2 The microphone array 384.4.3 Data Acquisition 39

4.4.3.1 Modules 404.5 Flow Chart 42

5. Chapter 5: Results and Discussion 435.1 Experimental Setup 445.2 Experiment 1: Time delay of arrival 455.3 Experiment 2: Direction of arrival 465.4 Experiment 3: Distance estimation 47

6. Chapter 6: Conclusion and future work 496.1 Conclusion 506.2 Future work 50

7. Chapter 6: Appendix 527.1 Bibliography 537.2 Snapshots of working model 557.3 Datasheets 57

LIST OF FIGURES

Fig 1.1 Block diagram of the System.

Fig 2.1 Diagram of sound wave vibrations

Fig 2.2 Types of Microphones (a) Free-field microphone, (b) pressure-field microphone, (c) random-incident microphone

Fig 2.3 Condenser Microphone

Fig 2.4 Electret Microphone

Fig 2.5 Dynamic Microphone

Fig 2.6 Piezo Microphone

Fig 2.7 Generalized Microphone Array

Fig 3.1 Microphone Array

Fig 3.2 Conceptual Diagram for TDOA

Fig 3.3 2 Microphone array with a source in the far field.

Fig 3.4 Position Estimation of the Sound Source

Fig 4.1 Data Flow between major hardware components

Fig 4.2 Opening the DAQ Assistant Panel

Fig 4.3 NI DAQ Assistant settings

Fig 4.4 NI DAQ Assistant Configuration

Fig 4.5 DAQ Assistant on Block Diagram

Fig 4.6 Threshold Detection

Fig 4.7 Waveforms generated on the front panel

Fig 4.8 Observed time delay of arrival between two microphones

Fig 4.9 Generalized Cross Correlation

Fig 4.10 Direction Estimation

Fig 4.11 Distance Estimation

Fig 4.12 Servo Control

Fig 4.13 Setup

Fig 4.14 Panasonic Dynamic Microphone

Fig 4.15 Picture of the Microphone Array

Fig 4.16 Microphone Coordinates

Fig 4.17 Front Panel

Fig 4.18 Hardware Setup

Fig 4.19 Direction Indicator

Fig 5.1 Array Structure

Fig 5.2 Time Delay of Arrival between mics 1,2,3.

Fig 5.3 Direction Error Graph

LIST OF TABLES

Table 5.1 Direction error estimation

Table 5.2 Distance Error Estimation

ii

iii

List of symbols, Abbreviations and Nomenclature

DOA – Direction of Arrival

GCC - Generalized Cross Correlation

GCCPHAT - GCC Phase Transform

SSL – Sound source localization

TDE - Time Delay Estimation

TDOA - Time Difference Of Arrival

PL - Position Location

2-D - Two dimensional

3-D - Three dimensional

DFT- Discrete Fourier Transform

FFT - Fast Fourier Transform

iv

R V College of Engineering

CHAPTER – 1

INTRODUCTION

Department of Instrumentation Technology Page 1


Chapter 1

INTRODUCTION

1.1 Sound localization in Biology

For many species such as barn owl, sound localization is a matter of survival. The

natural capabilities of human and animals to localize sound has intrigued researchers

for many years. Numerous studies have attempted to determine the processes and

mechanisms used by humans or animals to achieve spatial hearing.

One of the first steps in understanding nature's way of solving this problem is to

understand how information is processed in the ear. A number of models for the ear

have been suggested by the researchers [11]. These studies suggest that the cochlea

effectively extracts the spectral information from the sound wave impinging on the

ear drums and converts it into the electrical signals. The cochlear output is in the form

of electrical signals at different neuron points along the basilar memebrane of cochlea.

The electrical signals then travel up to the brain for further processing.

Many researchers have come up with different models of processing of electrical

signals in the brain for sound localization to support the experimental data from

various neurophysiological studies. All these different models agree on the

fundamental view that the direction of the sound is determined by two important

binaural cues - the interaural time difference and the interaural level difference. These

binaural cues arise from the differences in the sound waveforms entering the two ears.

The interaural time difference is the temporal difference in the waveforms due to the

delay in reaching the ear farther away from the sound source. The interaural level

difference is the difference in the intensity of the sound reaching the two ears. In

general, the ear which is farther away from the source will receive a fainter sound

than the ear which is relatively closer to the source due to the attenuation effect of the

head and surroundings. The phenomena of time delay and the intensity difference can

be integrated into the notion of interaural transfer function which represents the

transfer function between the two ears.



In general, the task that the human auditory system performs in order to detect,

localize, recognize and emphasize different sound sources is referred to as auditory

scene analysis (ASA). An auditory scene denotes the listener and his/her physical

surroundings, including sound sources.

It is generally accepted that cross-correlation based computational models for binaural

processing provide excellent qualitative and quantitative accounts of experimental

studies. The output patterns obtained from the cross-correlation operations reflect the

binaural information which can be refined further and interpreted to determine the

direction of the source.

1.2 Sound localization: a signal processing view

In the signal processing community, the more commonly used term for this problem is

direction-of-arrival (DOA) estimation

Time Delay Estimation between replicas of signals is intrinsic to many signal

processing applications. Depending on the type of signals acquired ranging from

human hearing to radars, various Time Delay Estimation methods have been

described in literature [9]. Sound source localization (SSL) systems estimate the

location of audio sources based on signals received by an array of microphones. With

proper microphone geometry, SSL systems can also provide 3D location information.

The other example of SSL is to locate sound sources so that a robot can interact with

detected objects. Rotating a microphone in a conference room to isolate and process a

particular speaker is another example where SSL systems can be implemented.

In general, there are three categories of techniques for sound source localization, i.e.

steered-beamformer based, high resolution spectral estimation based, and time delay

of arrival (TDOA) based [10].

The direction of the sound source can be obtained by estimating the relative Time

Delay of Arrival (TDOA) between two microphones. Peak levels for each microphone

signal are analyzed, from which a time delay between signals can be found. The

location of the source relative to the microphone array is calculated using this delay,

and this location is displayed on the computer screen.



1.3 Problem statement

Sound Source Localization system is to determine the location of audio sources based

on the audio signals received by an array of microphones at different positions in the

environment.

Sound source localization is a complex and cumbersome task. The toughest challenge

facing any acoustics engineer is to figure out where the sound originates – especially

when there is considerable interference and reverberation flying around.

Even though a number of basic techniques exist and have undergone constant

improvement, the problem remains that there is no “magical” sound source

localization technique that prevails over the others. Depending on the test object, the

nature of the sound and the actual environment, engineers have to select one method

or the other [8].

1.4 Objective

In this Project we have studied the available techniques and eventually developed an

algorithm as we attempt to localize a sound source. Using an array of 5 microphones,

the direction of the sound source as well as well as the distance to the sound source is

estimated.

The first step computes TDOA for each microphone pair, and the second step

combines these estimates using a set of equations to obtain the direction vector and

distance coordinates.



1.5Overview of the Project

The procedure for localization of multiple sound sources by the TDOA method is:

1. Estimation of delays of arrival[2]

2. Localization by clustering delay of arrivals[3]

3. Display the sound location.

A simple method of localization is to estimate the time delay of arrival (TDOA) of a

sound signal between the two microphones. This TDOA estimate is then used to

calculate the Angle of Arrival (AoA). The most commonly used TDOA estimation

method is generalized cross correlation (GCC). The TDOA estimate τ can be

calculated by applying the cross correlation equation. The sample corresponding to

the maximum coefficient denotes the time delay in number of samples.

Combining the data from two microphone pairs and by using the process of

hyperbolic position estimation, we compute the distance of sound source from the

microphone pairs

A microphone cluster was set up in a lab with coordinates, where

the microphones were placed at the corners of a known array. A

sound localization experiment was done whose results proved the

measured success of the sound localization routine implemented in

LabVIEW.

1.6 Organization of Report

Chapter 1 introduces a brief literature along with the latest

application of the project. The block diagram explains the working.

Chapter 2 deals with the theory behind the nature of sound and the

acquisition of the same using different types of microphones.

Chapter 3 describes the methodology along with the algorithms

used. Chapter 4 explains the system software and hardware

implementation. In Chapter 5, the readings have been tabulated



and the discrepancies are discussed. Chapter 6 concludes the work

and the future scope of the project is discussed. Finally the

snapshots of the working model along with the datasheets are

presented in chapter 7.

1.7 Block Diagram and Description

A simplified block diagram showing the process involved using three microphones is

shown below in Fig 1.1.

Fig. 1.1 Block diagram

The microphones are used to capture the sound signal. Each microphone does so at

different time intervals. These signals are interfaced to the PC using an analog input

module. The signals are further processed on LabVIEW. Cross correlation is done to

estimate time delay of arrival. The correlation is always done with respect to a

reference signal. In this case, the signal from microphone 1 is taken as reference. So

the signal received at microphone 2 and 3 are cross correlated with respect to signal at

microphone 1. By doing this, the extra time taken by the sound signal to reach

microphone 2 and 3 is computed. Using the time delay estimates, suitable direction

and distance algorithms are implemented. Once the direction is found, it is given to

the digital output module in terms of duty cycle. This is used to drive a servo motor



which houses a pointer. So by doing this, the direction and distance are indicated on

the front panel as well as a visual indication of the direction of sound source. In the

following chapter each of the blocks are explained in detail.

Chapter – 2

THEORETICAL BACKGROUND



Chapter 2

THEORETICAL BACKGROUND

Many audio processing applications can obtain substantial benefits from the

knowledge of the spatial position of the source which is emitting the signal under

process. For this reason many efforts have been devoted to investigating this research

area and several alternative approaches have been proposed over the years. [1]

Microphone arrays have been implemented in many applications, including

teleconferencing, speech recognition, and position location of dominant speaker in an

auditorium. Direction of arrival estimation of acoustic signals using a set of spatially

separated microphones has many practical applications in everyday life. DOA

estimates from the set of microphones can be used to automatically steer cameras to

the speaker in a conference room.

Techniques such as the generalized cross correlation (GCC) method, phase transform

(GCC-PHAT) are widely used for DOA estimation [9].

Accuracy of the system depends on various factors. The hardware used for data

acquisition, sampling frequency, number of microphones used for data acquisition,

and noise present in the signals captured, determine the accuracy of the estimates.

Increase in the number of microphones increases the performance of source location

estimation.

2.1 Nature of Sound

Sound is a variation in the pressure of the air of a type which has an effect on our ears

and brain. These pressure variations transfer energy from a source of vibration that

can be naturally-occurring, such as by the wind or produced by humans such as by



speech. Sound in the air can be caused by a variety of vibrations, such as the

following.

Moving objects: examples include loudspeakers, guitar strings, vibrating walls

and human vocal chords.

Moving air: examples include horns, organ pipes, mechanical fans and jet

engines.

A vibrating object compresses adjacent particles of air as it moves in one direction

and leaves the particles of air ‘spread out’ as it moves in the other direction. The

displaced particles pass on their extra energy and a pattern of compressions and

rarefactions travels out from the source, while the individual particles return to their

original positions.

Fig 2.1 shows how the amplitude changes with the loudness of the signal

Fig 2.1 – Diagram of sound wave vibrations

Wavelength (l) is the distance between any two repeating points on a wave. The

unit is the metre (m)

Frequency (f) is the number of cycles of vibration per second. The unit is the

hertz (Hz)

Velocity (v) is the distance moved per second in a fixed direction. The unit is

metres per second (m/s)

For every vibration of the sound source the wave moves forward by one wavelength.

The length of one wavelength multiplied by the number of vibrations per second

therefore gives the total length the wave motion moves in 1 second. This total length



per second is also the velocity. This relationship between velocity, frequency and

wavelength is true for all wave motions. A sound wave travels away from its source

with a speed of 344 m/s (770 miles per hour) when measured in dry air at 20 °C (68

°F) . If an object that produces sound waves vibrates 100 times a second, for example,

then the frequency of that sound wave will be 100 Hz.

2.2 Microphone

A microphone is an acoustic-to-electric transducer or sensor that converts sound into

an electrical signal. Most microphones today use electromagnetic induction (dynamic

microphone), capacitance change (condenser microphone), piezoelectric generation,

or light modulation to produce an electrical voltage signal from mechanical vibration.

They can be classified depending on the type of field: free-field, pressure-field, and

random-incident (diffuse) field. As shown in Fig 2.1(a), Free-field microphones are

intended for measuring sound pressure variations that radiate freely through a

continuous medium, such as air, from a single source without any interference. The

microphone is typically pointed directly at the sound source (0º incidence angle).

Free-field microphones measure the sound pressure at the diaphragm; however, the

sound pressure may be altered from the true value when the wavelength of a particular

frequency approaches the dimensions of the microphone. Consequently, correction

factors are usually added to the microphone’s calibration curves to compensate for

any changes in pressure at its diaphragm due to its own presence in the pressure field.

These microphones work best in anechoic chambers or large open areas where hard or

reflective surfaces are absent.

(a) (b) (c)

Fig 2.2 Types of Microphones (a) Free-field microphone, (b) pressure-field

microphone, (c) random-incident microphone


http://en.wikipedia.org/wiki/Piezoelectricity

http://en.wikipedia.org/wiki/Electromagnetic_induction

http://en.wikipedia.org/wiki/Electrical_signal

http://en.wikipedia.org/wiki/Sound

http://en.wikipedia.org/wiki/Sensor

http://en.wikipedia.org/wiki/Transducer


The second type is called a pressure-field microphone (Fig 2.1(b)). They measure

sounds from a single source within a pressure field that has the same magnitude and

phase at any location. In order to simulate a uniform pressure field, they are usually

calibrated in enclosures or cavities, which are small compared to their wavelength.

This minimizes any alterations in measurements due to the presence of the

microphone in the sound field. They are also supplied with a pressure versus

frequency-response curve. Such microphones measure the pressure exerted on walls,

airplane wings, or inside structures such as tubes, housings, and cavities.

The third type is called a random-incident or a diffuse-field microphone. Shown in

Fig 2.11(c) they are omni-directional and measure sound pressure from multiple

directions and sources, including reflections. They come with typical frequency

response curves for different angles of incidence and compensate for the effect of

their own presence in the field. An appropriate application for this type of microphone

is measuring sound in a building with hard, reflective walls, such as a church.

2.2.1 Types of microphone

The condenser microphone (Fig. 2.3) is also called a capacitor microphone or

electrostatic microphone. Here, the diaphragm acts as one plate of a capacitor, and the

vibrations produce changes in the distance between the plates. Condenser

microphones span the range from telephone transmitters through inexpensive karaoke

microphones to high-fidelity recording microphones.

Fig. 2.3 Condenser microphone


http://en.wikipedia.org/wiki/Capacitor

http://en.wikipedia.org/wiki/Diaphragm_(acoustics)


An electret microphone is a type of capacitor microphone. The externally applied

charge described above under condenser microphones is replaced by a permanent

charge in an electret material. An electret is a ferroelectric material that has been

permanently electrically charged or polarized.

Fig 2.4 Electret microphone

Dynamic microphones work via electromagnetic induction. They are robust,

relatively inexpensive and resistant to moisture. Moving-coil microphones use the

same dynamic principle as in a loudspeaker, only reversed. A small movable

induction coil, positioned in the magnetic field of a permanent magnet, is attached to

the diaphragm. When sound enters through the windscreen of the microphone, the

sound wave moves the diaphragm. When the diaphragm vibrates, the coil moves in

the magnetic field, producing a varying current in the coil through electromagnetic

induction

Fig. 2.5 Dynamic microphone

A crystal microphone or piezo microphone uses the phenomenon of

piezoelectricity — the ability of some materials to produce a voltage when subjected

to pressure — to convert vibrations into an electrical signal.


http://en.wikipedia.org/wiki/Piezoelectricity



http://en.wikipedia.org/wiki/Current_(electricity)

http://en.wikipedia.org/wiki/Diaphragm_(acoustics)

http://en.wikipedia.org/wiki/Permanent_magnet

http://en.wikipedia.org/wiki/Magnetic_field

http://en.wikipedia.org/wiki/Induction_coil

http://en.wikipedia.org/wiki/Loudspeaker

http://en.wikipedia.org/w/index.php?title=Moving-coil_microphone&action=edit&redlink=1


http://en.wikipedia.org/wiki/Electric_charge

http://en.wikipedia.org/wiki/Ferroelectric

http://en.wikipedia.org/wiki/Electret


Fig 2.6 Piezo microphone

2.3 Microphone array

A microphone array is any number of microphones operating in tandem.

Microphone arrays consist of multiple microphones functioning as a single directional

input device: essentially, an acoustic antenna. Using sound propagation principles, the

principal sound sources in an environment can be spatially located and distinguished

from each other. Distinguishing sounds based on the spatial location of their source is

achieved by filtering and combining the individual microphone signals. The location

of the principal sounds sources may be determined dynamically by analyzing peaks in

the correlation function between different microphone channels.

Fig 2.7 Microphone array

There are many applications:

Systems for extracting voice input from ambient noise (notably telephones,

speech recognition systems, hearing aids)

Surround sound and related technologies


http://en.wikipedia.org/wiki/Surround_sound

http://en.wikipedia.org/wiki/Hearing_aid

http://en.wikipedia.org/wiki/Speech_recognition

http://en.wikipedia.org/wiki/Telephone

http://en.wikipedia.org/wiki/Ambient_noise_level

http://en.wikipedia.org/wiki/Tandem

http://en.wikipedia.org/wiki/Microphone


Locating objects by sound: acoustic source localization, e.g., military use to

locate the source(s) of artillery fire. Aircraft location and tracking.

High fidelity original recordings

2.4 Various Coherence Measures

Various Coherence Measures are required to find the time delay. Given the signals

acquired by a couple of microphones, a coherence measure can be defined as a

function that indicates the similarity degree between the two signals realigned

according to a given time lag. Coherence measures can hence be used to estimate the

time delay between two signals. For example, Cross-Correlation is the most

straightforward coherence measure[9].

Another approach adopted in the sound source localization community to compute a

coherence measure is the use of GCC-PHAT. Let us consider two digital signals x1(n)

and x2(n) acquired by a couple of microphones, GCC-PHAT is defined as follows:

GCC-PHAT (d) = IFFT X1 · X2*

|X1||X2|

where d is a time lag, subject to |d| < _max, while X1 and X2 are the DFT transforms

of x1 and x2 respectively. The inter-microphone distance determines the maximum

valid time delay _max. It has been shown that, in ideal conditions, GCC-PHAT

presents a prominent peak in correspondence of the actual TDOA. On the other hand,

reverberation introduces spurious peaks which may lead to wrong TDOA estimates

An alternative way to obtain a coherence measure is offered by AED that is able to

provide a rough estimation of the impulse responses that describe the wave

propagation from one acoustic source to two microphones. Under the assumption that

the main peak of each impulse response identifies the direct path between the source

and the microphone, the TDOA can be estimated as the time difference between the

two main peaks. Let us denote with h1 and h2 the two impulse responses, in ideal

conditions, i.e. without noise, the following equation holds:

h2 *x1(n) = h2 *h1 *s(n) = h1 *x2(n)


http://en.wikipedia.org/wiki/High_fidelity

http://en.wikipedia.org/wiki/Acoustic_source_localization


Chapter – 3

DESIGN AND METHODOLOGY



Chapter - 3

DESIGN AND METHODOLOGY

1.8 Scenario

Given a set of M acoustic sensors (microphones) in known locations, our goal is to

estimate two or three dimensional coordinates of the acoustic sound source. We

assume that source is present in a defined coordinate system. We know the number of

sensors present and that single sound source is present in the system. The sound

source is excited using a broad band signal with defined bandwidth and the signal is

captured by each of the acoustic sensors. The TDOA is estimated from the captured

audio signals. The TDOA for a given pair of microphones and speaker is defined as

the difference in the time taken by the acoustic signal to travel from the speaker to the

microphones. We assume that the signal emitted from the speaker does not interfere

with the noise sources. Computation of the time delay between signals from any pair

of microphones can be performed by first computing the cross- correlation function of

the two signals. The lag at which the cross-correlation function has its maximum is

taken as the time delay between the two signals.

3.2 Direction of Arrival Estimation

3.2.1 The Geometry of the Problem

Analyzing the geometry of the problem is important; because it

allows us to address the following issues. Any source localization

system can be prone to bewilderment regarding the location of the

source due to aliases. Aliases arise when we do not have enough

sensors or when the geometric placement of the sensors makes

some of them redundant. The problem of aliases can be solved by

adding more sensors to our localization system. However,

consideration of the geometry of the problem allows us to add

microphones economically. That is, we can determine the minimum

number of microphones needed for any given situation. In some



cases, we may need to constrain the degree of freedom of the

source of sound. This allows us to do simple experiments by using

even smaller number of microphones than the number of

microphones that we would need to localize a source in 3D. It is

evident that no source localization can be achieved using one

microphone. So, we start by looking at how we need to constrain our

source of sound, assuming that we only have two microphones. The

only way that a source can be localized to a point using two

microphones is if the source is constrained to a line that passes

through the two microphones. If this constraint is lifted, the

precision of the system degrades. We consider adding a third

microphone to our system because we want to ameliorate the

constraints placed on the source. Such constraints were imposed to

make the system precise to a point. Adding a third microphone on a

line that passes through the previous two microphones results in

redundancy. Thus, the three microphones should not all be on a

single line. Three microphones placed at the corners of a triangle

may seem to be adequate to localize a source to a point in a plane.

3.2.2 Microphone array structure

Preliminary experiments were done using a three-element-two-dimensional

microphone array for Direction of Arrival (DOA) estimation. The array consists of

three microphones arranged in an ’Δ’ fashion in a 2-dimensional plane. As shown in

the Fig. 3.1 the microphones M3-M1-M2 form the array with M1 being the center

microphone. M1 is at the origin of the coordinate axis. The x axis and y axis is as

shown. The angle of arrival θ1 is measured in clockwise direction w.r.t to the x axis.

This convention is chosen for experimental convenience.


M2M3

Y Axis


Fig 3.1 Microphone Array

In order to implement the same in 3D, another microphone is added on top to form a

tetrahedron thereby adding dimension

3.2.3 Time Delay of Arrival (TDOA)

Let mi for i belonging to [1,M] be the three dimensional vectors representing the

spatial coordinates of the ith microphone and ’s’ as the spatial coordinates of sound

source. We excite the source ’s’ and measure the time difference of arrivals. Letting

’c’ as the speed of sound in the acoustical medium (air) and || is the Euclidean norm.

The TDOA for a given pair of microphones and the source is defined as the time

difference between the signals received by the two microphones. Let TDOAij be the

TDOA between the ith and jth microphone when the source’s’ is excited. It is given by

equation 3.1

TDOAij = (||mi – s|| - ||mj – s||) (3.1) c

TDOA’s are then converted to time delay estimations (TDE’s) and path differences.

This is depicted in Fig. 3.2


M1M1X Axis


Fig 3.2 Conceptual diagram for TDOA

In order to compute the TDOA between the reference channel and any other channel

for any given segment it is usual to estimate it as the delay that causes the cross-

correlation between the two signals segments to be maximum. In order to improve

robustness against reverberation Generalized Cross Correlation (GCC) is used.

Given two signals xi(n) and xj(n) the GCC is defined as

G(f) = Xi(f)[Xj(f)]* (3.2)

Where and are the Fourier transforms of the two signals and denotes the

complex conjugate. The TDOA for these two microphones is estimated as:

D(i,j) = argmax(R(d)) (3.3)

In equation 3.3, R(d) is the inverse Fourier transform of Eq. (3.2). Maximum value of

R(d) corresponds to the estimated TDOA for that particular segment.

3.2.4 Algorithm to find Direction of Arrival

The lag at which the cross-correlation function has its maximum is taken as the time

delay between the two signals. Once TDOA estimation is performed, it is possible to

compute the position of the source through geometrical calculations. One technique

based on a linear equation system but sometimes, depending on the signals, the



system is ill-conditioned and unstable. For that reason, a simpler model based on far

field assumption is used [1]. Fig. 3.3 illustrates the case of a 2 microphone array with

a source in the far field.

Consider two microphones i and j placed at a distance Xij from each other. Tij is the

time delay of arrival found using the above method. On multiplying with the speed of

sound ‘c’, it gives the extra distance the sound source has to travel to reach

microphone i. On dropping a perpendicular, two angles of phi and theta are

subtended. u is a unit vector in the direction of the sound source.

Fig. 3.3 Two Microphone array with a source in the far field.

(3.4)

where Xi j is the vector that goes from microphone i to microphone j and u is a unit

vector pointing in the direction of the source. From the same figure, it can be stated

that:

(3.3)

where c is the speed of sound. When combining equations (3.4) and (3.5), we obtain:

which can be re-written as:



The position of microphone i being (xi, yi, zi). Considering N microphones, we obtain

a system of N -1 equations:

Therefore, using 3 microphones, two time delays are estimated and direction is

indicated in 2D. Similarly using the equations above, if 4 microphones are used (With

one microphone located in another plane), 3 time delays can be estimated and the

direction vector in 3D is located.

3.3 Distance Estimation

3.3.1 Source Localization in 2-Dimensional Space

Sound source localization is a two step problem.

First the signal received by several microphones is processed to obtain

information about the time-delay between pairs of microphones. We use the

GCC PHAT method for estimating the time-delay.

The estimated time-delays for pairs of microphones can be used for getting the

location of the sound source.

3.3.2 Hyperbolic position location

By definition, a hyperbola is the set of all points in the plane whose location is

characterized by the fact that the difference of their distance to two fixed points is a

constant. The two fixed points are called the foci. In our case the foci are the

microphones. Each hyperbola consists of two branches. The emitter is located on one



of the branches. The line segment which connects the two foci intersects the

hyperbola in two points, called the vertices. The line segment which ends at these

vertices is called the transverse axis and the midpoint of this line is called the center

of the hyperbola [4].

The time-delay of the sound arrival gives us the path difference that defines a

hyperbola on one branch of which the emitter must be located. At this point, we have

infinity of solutions since we have single information for a problem that has two

degrees of freedom.

We need to have a third microphone, when coupled with one of the previously

installed microphones, it gives a second hyperbola. The intersection of one branch of

each hyperbola gives one or two solutions with at most of four solutions being

possible. Since we know the sign of the angle of arrivals, we can remove the

ambiguity.

Hyperbolic position location (PL) estimation is accomplished in two stages. The first

stage involves estimation of the time difference of arrival (TDOA) between the

sensors (microphones) through the use of time-delay estimation techniques. The

estimated TDOAs are then utilized to make range difference measurements. This

would result in a set of nonlinear hyperbolic range difference equations.

When the microphones are arranged in non-collinear fashion, the position location of

a sound source is determined from the intersection of hyperbolic curves produced

from the TDOA estimates. The set of equations that describe these hyperbolic curves

are nonlinear and are not easily solvable. If the number of nonlinear hyperbolic

equations equals the number of unknown coordinates of the source, then the system is

consistent and a unique solution can be determined from iterative techniques. For an

inconsistent system, the problem of solving for the position location of the sound

source becomes more difficult due to non-existence of a unique solution



Fig 3.4 – Position Estimation of the Sound Source

3.3.2.1 General Model

A general model for the two-dimensional (2-D) position location estimation of a

source using three microphones is developed. All TDOAs are measured with respect

to the center microphone M1 and let index i = 2, 3. i = 1 represents the microphone

M1. Let (x, y) be the source location and (Xi, Yi) be the known location of the i th

microphone. The squared range difference between the source ’S’ and the ith

microphone is given as

Ri = Sqrt ( (Xi-x)2 + (Yi-y)2 ) (3.4)

Ri = Sqrt ( Xi2 + Yi

2 – 2Xix – 2Yiy + x2 +y2)

Using equation (3.4), the range difference between center microphone M1 and ith

microphone is

Ri,1 = cτi,1 = Ri – R1 (3.5)

Ri,1 = Sqrt( ( Xi – x)2 + (Yi – y)2) – Sqrt( (X1 – x)2 + (Y1 – y)2)

where c is velocity of sound, Ri,1 is the range difference distance between M1 and i th

microphone, R1 is the distance between M1 and sound source and τi,1 is the

estimated TDOA between M1 and i th microphone. This defines the set of nonlinear

hyperbolic equations whose solution gives the 2-D coordinates of the source.



3.3.2.2Position Estimation

To localize the source, we first estimate TDOA of the signal received by sensors i and

j using DFSE technique proposed in chapter 2. The technique measures TDOA’s

w.r.t. the first receiver, di, 1 = di – d1 for i = 2, 3... M. TDOA between receivers i and j

are computed from

di,j = di,1 − dj,1 where, i, j = 2, 3, ...,M

Let i = 2... M and source be at unknown position (x, y) and sensors at known locations

(xi, yi). The squared distance between the source and sensor i is

ri2= (xi − x)2 + (yi – y)2

= Ki − 2xix − 2yiy + x2 + y2 where i = 1, 2... M.

where Ki = xi2 + yi

2

If c is the speed of sound propagation, then

ri,1 = cdi,1 = ri − r1

define a set of nonlinear equations whose solution gives (x, y)

3.4 Hardware Design

3.4.1 cDAQ-9172

The cDAQ-9172 is an eight-slot NI CompactDAQ chassis that can hold up to eight C

Series I/O modules. This USB 2.0-compliant chassis operates on 11 to 30 VDC and

includes an AC/DC power converter and a 1.8 m USB cable.

The cDAQ-9172 has two 32-bit counter/timer chips built into the chassis. With a

correlated digital I/O module installed in slot 5 or 6 of the chassis, you can access all

the functionality of the counter/timer chip including event counting, pulse-wave

generation or measurement, and quadrature encoders.



3.4.2 Analog input module - NI 9234

The NI 9234 is a four-channel C Series dynamic signal acquisition module for making

high-accuracy audio frequency measurements from integrated electronic piezoelectric

(IEPE) and non-IEPE sensors with NI CompactDAQ system. The NI 9234 delivers

102 dB of dynamic range and incorporates software-selectable AC/DC coupling and

IEPE signal conditioning for accelerometers and microphones. The four input

channels simultaneously digitize signals at rates up to 51.2 kHz per channel with

built-in antialiasing filters that automatically adjust to your sampling rate.

3.4.3 Digital output module – NI 9472

The National Instruments NI 9472 is an 8-channel, 100 µs sourcing digital output

module for any NI CompactDAQ or CompactRIO chassis. Each channel is

compatible with 6 to 30 V signals and features 2,300 Vrms of transient overvoltage

protection between the output channels and the backplane. Each channel also has an

LED that indicates the state of that channel. With the NI 9472, you can connect

directly to a variety of industrial devices such as motors, actuators, and relays.

3.5 Assumptions and Limitations

We assume the following conditions under which location of sound source is

estimated:

Single sound source, infinitesimally small, omni directional source.

Reflections from the bottom of the plane and from the surrounding objects are

negligible.

No disturbing noise sources contributing to the sound field.

The noise source to be located is assumed to be stationary during the data

acquisition period.

Microphones are assumed to be both phase and amplitude matched and without

self-noise.



The change in sound velocity due to change in pressure and temperature are

neglected. The velocity of sound in air is taken as 330 m/sec.

Knowledge of positions of acoustic receivers and perfect alignment of the

receivers as prescribed by processing techniques.

Perfect solutions are not possible, since the accuracy depends on the following

factors:

Geometry of microphone and source.

Accuracy of the microphone setup.

Uncertainties in the location of the microphones.

Lack of synchronization of the microphones.

Inexact propagation delays.

Bandwidth of the emitted pulses.

Presence of noise sources.

Numerical round off errors.



Chapter - 4

IMPLEMENTATION OVERVIEW

Chapter - 4

IMPLEMENTATION OVERVIEW

4.1 Hardware and interfacing

In order to develop and evaluate the localization algorithms, it was necessary to first

test the hardware and write the required software interfaces. The hardware used were

five unidirectional microphones mounted on an array, NI compact DAQ along with

three modules, a servo motor which houses an indicator.

LabVIEW was used for interfacing with the hardware, as it provided a rich data

access toolbox. This also meant that no data conversion was required, as the

localization algorithms were implemented in LabVIEW.


NI cDAQ 9172


Fig. 4.1 Dataflow between major hardware components.

4.2. Overview of LabVIEW

The programming language used is LabVIEW – A data flow programming language.

Execution is determined by the structure of a graphical block diagram on which the

programmer connects different function nodes by drawing wires. These wires

propagate variables and any node can execute as soon as all its input data become

available. Multi-processing and multi threading hardware is automatically exploited

by the built in scheduler, which multiplexes multiple OS threads over the nodes ready

for execution.

NI LabVIEW is a graphical programming environment used on campuses all over the

world to deliver project-based learning to the classroom, enhance research

applications, and foster the next generation of innovators. With the intuitive nature of

graphical system design, educators and researchers can design, prototype and deploy

their applications.

LabVIEW programs/subroutines are called virtual instruments (VIs). Each VI has

three components: Block diagram, a front panel, and a connector pane. The last is

used to represent the VI in the block diagram of other, calling VIs. Controls and

indicators on the front panel allow an operator to input data into or extract data from a


NI 9234Analog I/P

module

NI 9472Digital O/P

module

Pointer mounted on a servomotor

Sound source

Indicating direction of sound

Sensing SoundSignal

Array of microphones


running virtual instrument. However, the front panel can also serve as a programming

interface. Thus a virtual instrument can either run as a program, with the front panel

serving a s a user interface, or, when dropped as a node onto the block diagram, the

front panel defined the inputs and outputs for the given node through the connector

pane, this implies that each VI can be easily tested before being embedded as a

subroutine into a larger program.

4.2.1 Front panel

Every user created VI has a front panel that contains the graphical interface with

which a user interacts. The front panel can house various graphical objects ranging

from simple buttons to complex graphs. Various options are available for changing

the look and feel of the objects on the front panel to match the application needs.

4.2.2 Block diagram

Nearly every VI has a block diagram containing some kind of program logic that

serves to modify data as it from sources to sinks. The block diagram houses a pipeline

structure of sources, sinks, VI’s, and structures wired together in order to define this

program logic.

Most importantly, every data source and sink from the front panel has its analog

source and sink on the block diagram. This representation allows the input values

from the user to be accessed from the block diagram. Likewise, new output values can

be shown on the front panel by code executed in the block diagram.

4.2 Programming using LabVIEW 11

There are 5 parts of graphical code in the program to make the VI and enable it to

localize the sound source in the most efficient manner. They are:

4.3.1 Microphone signal interface using NI DAQ Assistant.

4.3.2 Threshold detection of each signal

4.3.3 Finding time delay of arrival



4.3.4 Direction and distance estimation

4.3.5 Servo control

By joining and grouping these blocks appropriately, and running them continuously

will result in required software for the process.

4.3.1 Microphone signal interface using NI DAQ Assistant.

Microphone’s output signal are connected to NI 9234 modules channels 1- 4. These

must be configured. Configuring of channels can be done by the following steps:

Blank VI > Block diagram > Input > DAQ Assistant

DAQ Assistant is a graphical interface for building and configuring measurement

channels and tasks. The signal properties such as INPUT, ANALOG, VOLTAGE,

CHANNELS, SIGNAL RANGES, ACQUISITION MODE and SAMPLING RATE

are to be properly selected as shown in the figures below.

Figure 4.2 shows the DAQ Assistant VI on the functions Palette.



Fig 4.2 Opening the DAQ assistant Palette

Fig 4.3 NI DAQ Assistant settings

Figure 4.3 indicates process for selection of each channel. This is the Create New

Express Task dialog box. Here you can choose which data acquisition device to use as

well as specify what type of data you want to acquire or generate. Select Generate

Signals»Analog Output»Voltage to specify that you want to generate the voltage of a

signal

The DAQ Assistant dialog box allows you to edit the configurations you want to use

to measure and read the voltage. In figure 4.4, by clicking on the Add channel option,

the five channels connected to microphones can be added. The sampling rate can be

set as per requirement.



Fig 4.4 NI DAQ assistant configuration

Given below (Fig 4.10) is a snap shot of the block diagram containing the DAQ

Assistant.



Fig. 4.5 DAQ Assistant on the block diagram

The sampling rate and the acqusition speed can be specified as per requirement. All

the five signals are received at one from the data output of the DAQ Assistant.

However, each signal needs to be processed seperately so a split signal function is

used to do the same. The five signals are further computed in the remaining portion of

the code.

4.3.2 Threshold detection of each signal

To avoid unwanted results, a threshold must be set. i.e we want the correlation to start

only when the desired sound is made and not the moment the program starts to run.

To ensure this, a threshold of 0.8 mV is set. So if any of the five microphones pick up

signals greater than 0.8 mV, the generalized cross correlation begins with respect to

the reference microphone. The set threshold can be seen in fig 4.12. When the signal

crosses the threshold as seen, the correlation starts.

Give below (Fig 4.11) is a snap shot of the thresohold detection portion of the code.



Fig. 4.6 Threshold detection

Fig 4.7 Waveforms generated on front panel

4.3.3 Finding time delay of arrival



Given in fig below is the sound signal received at two microphones simultaneously.

On close observation it can be seen that the signal is received at slightly different time

intervals. This is the time delay of arrival and has to be calculated. As the time

difference is very small, time stamping the reception of the signals at the microphones

will not give accurate results. In order to do this, generalized cross correlation is done

as explained in design. By cross correlating the signal, level of similarity of two

waveforms as a function of a time-lag applied to one of them is found. As an example,

consider two real valued functions f and g differing only by an unknown shift along

the x-axis. One can use the cross-correlation to find how much g must be shifted

along the x-axis to make it identical to f. The formula essentially slides the g function

along the x-axis, calculating the integral of their product at each position. When the

functions match, the value of (f*g) is maximized.

Fig. 4.8 Observed time delay of arrival between two microphones


http://en.wikipedia.org/wiki/Waveforms


Given below (Fig. 4.9) is a snap shot of the cross correlation done. The signal

received at each microphone is cross correlated with the reference microphone

(Microphone 1 at coordinate (0,0,0))

Fig. 4.9 Generalized cross correlation



4.3.4 Direction and distance estimation

Using the estimated time delay of arrival, specific algorithms are implemented to

estimate the position of sound source. Fig shows the direction estimation in 2D and

3D.

Fig. 4.10 Direction estimation

In figure 4.10, 1 indicates the coordinates of the microphones 2, 3 and 4. The

coordinates along with the difference in distance in solved as a linear equation for the

unknown matrix which is the direction vector indicating the direction of the sound

source. In 3, all the three components of the direction vector are extracted to indicate

direction in 3-D. The same is plotted on a 3-D graph. In 4, only two components of

direction vector are extracted to indicate direction in 2-D. The value obtained in

radians is converted to degrees.


1. Coordinates of all five the microphones

4. Extracting first two elements of the direction vector and finding direction in 2D

3. Extracting all the three elements of direction vector and indicating the same on 3D graph

2. Solving linear equation


In figure Fig 4.11, using the algorithm mentioned in the design, the distance to the

sound source can also be estimated using hyperbolic position estimation. It again

employs the time delay of estimation to formulate the equations.

Fig. 4.11 Distance estimation

4.3.5 Servo control

Once the direction is found, a servo motor is used to indicate the same. The duty cycle

of the servo motor and the direction values are interpolated to specify the rotation of

the servo motor for every degree. (Fig 4.12)

Fig. 4.12 Servo control

4.4 System Hardware

Figure 4.13 shows the major components in the physical set up of our system. The

microphones are mounted on the array structure to collect the sound signals. These

signals are sent to the PC via the CompactDAQ. In the PC, the program is run on

LabVIEW which does the processing and computation to obtain of the direction and



distance to the sound source. We will now describe each of the components in greater

detail.

Fig. 4.13 Set up

4.4.1 Microphone

The Panasonic RPVK21 Microphone (Fig 4.3) is a dynamic type, uni-directional

microphone. The microphone features an 80 Hz- 12 kHz frequency response and 55

dB/mW sensitivity which ensures that the sound is clear. It comes with a built-in

on/off switch that is easy to operate and an O.F.C output cable that measures 3 meters

in length.

Fig 4.14 Dynamic microphone

4.4.2 The microphone array

A stand for the microphones (figure 4.4), was constructed as per specifications, and

enabled the height of the entire array be adjusted from 1.5 -2 meter. For the purposes

of this project, a baseline of 1.5 meter was used. The servo was mounted below the

microphones on the central axis.



The purpose of the servo motor was to indicate the direction of the sound source on

one half of the 2-D plane. i.e 0-180 degrees.

Fig. 4.15 Microphone array

The coordinates of each microphone were fixed as shown in the figure below( Fig

4.5)

Fig. 4.16 Microphone coordinates

4.4.3 Data Acquisition

The cDAQ-9172 is an eight-slot NI CompactDAQ chassis that can hold up to eight C

Series I/O modules. It is connected to the Windows host computer connected over

USB. NI CompactDAQ serves as a flexible, expandable platform to meet the needs of

any electrical or sensor measurement system.


(50,-35,0) (100,-35,0)

(-50,-35,0)

(0,-30,35)

(0,0,0)


By placing instrumentation close to the test subject, electrical noise can be minimized

from the surroundings. This is because digital signals, used by USB are significantly

less susceptible to electromagnetic interference. Since the NI CompactDAQ is a small

rugged package, it can be easily placed close to the unit under test.

4.4.3.1 Modules

Analog input module - In our project, of the 8 slots, we have utilized 3 slots (slot1,

slot2, slot5) as shown in fig. Slot 1 and slot 2 are occupied by two NI 9234 modules.

NI 9234 are analog input modules capable of simultaneous acquisition. The five

microphones were connected to five channels respectively use BNC connections. The

required signal conditioning is done within the modules itself. Maximum allowable

sampling rate is 51.2kHz per channel. We have set our sampling rate as half of that i.e

25.6kHz per channel as during testing, our maximum frequency component does not

exceed 1000 hz. So 25.6kHz was found to be more than sufficient as over sampling

was resulting is excess data and hence slower processing. After the signals are

received by the analog input modules, they are sent for further processing to

LabVIEW. Here the algorithms which have already been discussed are implemented.

And once the direction and distance have been found, it is displayed on the front panel

as shown in fig 4.7.

Fig. 4.17 Front panel



Fig. 4.18 Hardware set up

Digital output module - Once the direction of sound source is found, the same is

indicated visually using a pointer which is mounted on a servo motor (fig. 4.8)

As shown in fig. the digital module NI 9274 is placed in slot 5 of the cDAQ chassis

(as slot 5 and 6 are the counter slots). The direction of sound source is given to the

digital module which in turn sends it to a servo motor in the form of a duty cycle

input.

Fig 4.19 Indicator


Microphones 1,2,3,4 and 5 connected to channels 0 to 3 in the first module and channel 0 in the second using standard BNC connectors.

NI 9234 is connected to the counter 0 of the cDAQ in slot 5. PWM output is given from channel 3 of the DO module to the servo motor. Channels 8 and 9 are used for giving a Vsup of +5v.

NI cDAQ 9172 NI 9234 NI 9274

Servo motor


4.5 Flow Chart


Signals from five microphone array

Generalized cross correlation

Pick a peak

Estimate TDOA

Calculate path differences

Estimated path differences

Position estimation

Microphone locations

Estimated source location

Dimensions of coordinate system


Chapter - 5

RESULTS AND DISCUSSION

Chapter - 5



RESULTS AND DISCUSSION

Experiments were done, using the algorithms described in the previous chapter, in

order to be able to gain insight into the operation of the system. A localization error

for each scenario was measured as the difference between the true angle, calculated

from the center of the array to the primary source, and the estimated angle as

predicted by the time delays. For this, it was assumed that the source was far away,

compared to the size of the array, and that the source could therefore fall on a straight

line from the array. This assumption was made and the errors calculated for both the

azimuthal and altitudinal angles of incidence and for each time-delay estimation

routine implemented. By its definition, the altitudinal angle may vary from +90 to -

90. The azimuthal angle may vary from 0 to 180.

5.1 Experimental set up

The source localization routine was tested by sound recording experiments done in a

laboratory. We setup a fixed coordinate system in the laboratory. Four microphones

were placed at the tips of an imaginary tetrahedral, whose sides are about 40 cm long.

A fifth microphone was placed as an extended arm of one of the microphones

(Fig.5.1). The microphones were hooked up to a computer, which ran a LabVIEW

program. The program saved five of signals from the microphones. Several sound

recording experiments were done by placing a source of sound at various locations in

the laboratory.

Fig. 5.1 Array structure


Mic 4

Mic 1

Mic 2 Mic 5Mic 3


We take into account both correlated noise and reverberation into account when

generating our test data. By setting a threshold, we eliminate the inherent noise and

pick up the most dominant sound in the room. The setup corresponds to a 6m-7m-

2.5m room, with five microphones placed at a distance from each other, 1m from the

floor and 1m from the 6m wall (in relation to which they are centered). The sound

source is generated from different positions.

The sampling frequency is 25.6kHz, and acquisition rate is 10kHz samples i.e. every

0.4 seconds. The sound source is generated using an air gun whose frequency range

lies within 500 – 1000 Hz. Thus a 25.6kHz sampling rate was sufficient.

A number of complications limit the potential accuracy of the system. Some of these

are due to physical phenomena that can never be corrected, and others are due to

inherent errors built into the processing, due to the design of the system. As

mentioned in the introduction, complications in locating the sound source that exist

outside of perfect conditions.

5.2 Experiment 3: Time delay of arrival

By estimating the measured sharp peak created by cross correlation of microphone

pairs, the time delay of arrival can be found. Given below is a figure showing the time

delay of arrival between microphones 1, 2 and 3. It can be seen in Fig 5.2 that ‘t1’ is

the extra time taken by the sound signal to reach microphone 2 and similarly ‘t3’ is

the extra time taken to reach microphone 3. Since the microphones are placed in a co-

linear fashion, on multiplying this time delay by the speed of sound, the distance

between the microphones is obtained. We have obtained the same using generalized

cross correlation. It was found to be highly accurate with +3 cm accuracy.

Fig. 5.2 Time delay of Arrival between microphone 1, 2, 3

5.3 Experiment 2: Direction of arrival


Sound

Mic 3 Mic 2 Mic 1

t

t+Δt1

t+Δt2


Once the time delay is estimated, it is used in a suitable algorithm as explained in

previous chapters to find the direction of sound source. For direction in 2D, consider

the plane formed by microphones 1, 2, 3 in figure 5.1. The table (5.1) shows the

estimated source location and the direction of the source. It gives the error when

measured in 2D. To normalize the error on both sides, instead of considering the

direction of the sound source from 0-180 degrees, 0 to (+90) and 0 to (-90) is

considered on both sides. The same is plotted as a graph in Figure 5.3.

Actual direction (Deg) Estimated direction (deg) %Error

10 12 20

20 23 15

30 35 10

50 46 6

80 79 3

90 90 0

-80 -75 4

-50 -45 8

-30 -37 11

-20 -22 14

-10 -16 19

Table 5.1

0 10 30 50 80 90 -80 -50 -30 -20 -100

5

10

15

20

25

Error

Figure 5.3


ERROR

PERCENTAGE

ACTUAL TIME DELAY


On observation it can be seen that the direction finding is most accurate in the range

of 80 – 100 degrees. As we go towards the extremes, the accuracy falls as the

microphones are unidirectional in nature. Hence, the signals are not picked up at its

best when it comes from the side. For best results, the sound source should be located

right in front of the microphone array. With omnidirectional microphones, this

constraint could be removed. But keeping the cost and availability in consideration,

we decided on the unidirectional microphones.

5.4. Experiment 3: Distance estimation

For distance finding in 2-D, the microphone array consists of 3 microphones. We

have conducted preliminary experiments with the 3 element microphone array. The

experiments involved acquiring signals from a sound source which is triggered by a

suitable mechanism. The source is located in a plane and its location is estimated

using the planar 3 microphone array.

The source is positioned at various places in 2-D space. The table 5.2 produces the

true, estimated source locations of the sound source. As mentioned in the chapter 3

Chan Ho’s linear array optimization method is utilized for solving the nonlinear

equations.

True distance (cm) Indicated distance (cm) %Error

10 13 30

50 55 10

75 79 5.3

100 104 4

120 125 4.1

150 157 4.6

180 189 5

210 221 5.23

250 275 10

Table 5.2



Distance to the sound source was found in 2-D. Table 5.2 tabulates the readings

obtained. On studying the same, varying levels of accuracy can be found. Larger

percentage of error is found when the sound source is placed too close to the

microphone or when the sound source is placed beyond 2 meters. So a safe range of

0.25 to 2 meters can be set.

The reason for this discrepancy is that if the sound source is placed too close to the

microphone array, it assumes a spherical approach. And our project works on the

assumption that sound signal travels in a straight manner i.e. the spherical nature of

the sound signal is not taken into account. Secondly, if the sound source is placed far

away, the sound signal reaches the microphones in an almost parallel manner so the

small time delay of arrival is not accounted for.



Chapter 6

CONCLUSIONS AND FUTURE WORK



Chapter 6

CONCLUSIONS AND FUTURE WORK

6.1 Conclusion

In this report we present an implementation of a sound-based localization technique

and introduce the platform we used in our lab. The report summarizes the basics of

sound-based localization as discussed in the literature. The process of time delay of

arrival estimation is explained. Then, the report explains the design which includes all

the algorithms, hardware, the assumptions and limitations. The implementation of the

concept is explained in detail. Finally, a comprehensive set of experimental results are

offered.

We find that in our current hardware deployment there are still many inevitable errors

in time of delay calculation. We proposed our algorithm which uses peak-weighted

goal function that detects sound source location in real time.

6.2 Future work

There are multiple factors which contribute towards errors in the sound-based

localization implementation. Future work will address reducing the impact of these

factors. These can be identified as follows:

(ii) Different materials exhibit different reflection and absorption coefficients. It has

been observed that the material of the floor between the microphone pair and the

sound source, affects the phase as well as amplitude of the signal received.

(iii) As the distance between the microphone pair and the sound source decreases, the

DOA estimates become coarser.

(iv)The position of sources of ambient noise in the room is important. This will affect

the nature of the percentage abnormality plot causing it to become non-symmetric.

(v) Position of reflective surfaces around the experimental setup contributes towards

the fluctuations.



(vi) Physical parameters such as speaker width and sensitivity of the microphone

contribute towards measurement errors.

(vii) The frequency response of the microphone elements also affects the fidelity of

the captured signal.

(viii) Accuracy of experimental setup and error due to elevation of microphone and

sound source are also factors which may cause errors.

The hyperbolic position location techniques presented in this thesis provides a general

overview of the capabilities of the system. Further research is needed to evaluate the

Dominant linear array algorithm for hyperbolic position location system. If improved

TDOA’s could be measured, the source position can be estimated very accurately.

Improving the performance of the algorithm for TDOA measurements reduces the

TDOA errors. The algorithm discussed for TDOA measurements is in its simplest

form.

Experiments were performed assuming that the source is stationary until all the

microphones have finished sampling the signals. Sophisticated multi-channel

sampling devices could be used to get rid of this stationary condition. While the

accuracy of the TDOA estimate appears to be a major limiting factor in the

performance of the hyperbolic position location system, the performance of the

hyperbolic position location algorithms is equally important. Position location

algorithms which are robust against TDOA noise and are able to provide

unambiguous solution to the set of nonlinear range difference equations are desirable.

For real-time implementations of source localization, closed-form solutions or

iterative techniques with fast convergence to the solution could be used. The trade-off

between computational complexity and accuracy exist for all position location

algorithms. The trade-off analysis through performance comparison of the closed-

form and iterative algorithms can be performed.

To find time delay only the most dominant peak is considered after performing

correlation. Exploring the possibility of taking advantage of second peak with particle

filtering must be done in order to get more reported sound source location data.



Chapter – 7

APPENDIX



BIBLIOGRAPHY

[1]Byoungho Kwon KAIST, Daejeon Gyeongho Kim ; Youngjin Park “Sound Source Localization Methods with Considering of Microphone Placement in Robot Platform”Robot and Human interactive Communication, 2007. RO-MAN 2007. The 16th IEEE International Symposium on 26-29 Aug. 2007

[2]Jean-Marc Valin, Franc¸ois Michaud, Jean Rouat, Dominic L´etourneau, “Robust Sound Source Localization Using a Microphone Array on a Mobile Robot”

[3]Y.T. Chan, senior member, IEEE, and K.C Ho, Member IEEE, “A simple and Efficient Estimator for Hyperbolic location”, IEEE transaction on signal processing, Vol 2, No. 8, Aug 1994.

[4] Ralph Bucher and D. Misra “A Synthesizable VHDL Model of the Exact Solution for Three-dimensional Hyperbolic Positioning System”, Volume 15 (2002), Issue 2, Pages 507-520.

[5] Johnson, Don H, Array Signal Processing: concepts and techniques.

[6] Lorraine Green Mazerolle, Ph.D, James Frank, Ph.D. “A Field Evaluation of the Shot spotter Gunshot Location System”

[7] Wang, H., Chu, P.: “Voice Source Localization for Automatic Camera Pointing System in Videoconferencing”. Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Mohonk, New Paltz, NY, USA (1997)

[8] Biniyam Tesfaye Taddese, “ Sound Source Separation and Localization” Honors Thesis in Computer Science Macalester College, May 1st 2006.

[9] Alessio Brutti, Maurizio Omologo, Piergiorgio Svaizer, “Comparison Between Different Sound Source Localization Techniques Based On A Real Data Collection”, IEEE Conf. On HSCMA 2008.

[10] M. Brandstein and H. Silverman, “A practical methodology for speech localization with microphone arrays”, Technical Report, Brown University, November 13, 1996

[11] J. O. Pickels, “An Introduction to the Physiology of Hearing”, Academic Press, London, 1982.


http://www.hindawi.com/journals/vlsi/2002/935925.abs.html

http://www.hindawi.com/journals/vlsi/2002/935925.abs.html

http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=4415041

http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=4415041

http://ieeexplore.ieee.org/search/searchresult.jsp?searchWithin=p_Authors:.QT.Youngjin%20Park.QT.&newsearch=partialPref

http://ieeexplore.ieee.org/search/searchresult.jsp?searchWithin=p_Authors:.QT.Gyeongho%20Kim.QT.&newsearch=partialPref

http://ieeexplore.ieee.org/search/searchresult.jsp?searchWithin=p_Authors:.QT.Byoungho%20Kwon.QT.&newsearch=partialPref

Sound Source ion Using LabVIEW

Documents

autonomous

ni daq assistant

sound recording

interaural

interaural

digital output

audio sources

analog input