BLIND SOURCE SEPARATION OF SPEECH SIGNALS USING FILTER BANKS by ANDREW J. PATTERSON, B.S.E.E., B,S, A THESIS IN ELECTRICAL ENGINEERING Submitted to the Graduate Faculty of Texas Tech University in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE IN ELECTRICAL ENGINEERING Approved August, 2002
47
Embed
BLIND SOURCE SEPARATION OF SPEECH SIGNALS A THESIS IN
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
BLIND SOURCE SEPARATION OF SPEECH SIGNALS
USING FILTER BANKS
by
ANDREW J. PATTERSON, B.S.E.E., B,S,
A THESIS
IN
ELECTRICAL ENGINEERING
Submitted to the Graduate Faculty
of Texas Tech University in Partial Fulfillment of the Requirements for
the Degree of
MASTER OF SCIENCE
IN
ELECTRICAL ENGINEERING
Approved
August, 2002
ACKNOWLEDGMENTS
I would like to thank my advisor, Dr. Tanja Karp, for her patience and guidance
throughout this research. Her willingness to answer my "five minute questions" saved
me many hours of tedious labor. In addition, I would like to thank Dr. Sunandra
Mitra for providing the financial support that allowed me to concentrate on my thesis
and research and not where my next meal would be coming from. I am also thankful
for the guys, Cary, Cody, Jono, Mike, and Ryan for helping me remember that there is
more to life than work and school, most notably, cards, softball, and Josie's burritos!
I am very thankful for my wonderful parents whose unwavering love and support
is appreciated more than they could ever know. Thank you for teaching me diligence
and the value of a job well done.
Most of all I thank my loving wife for her support and understanding during
my late hours completing this thesis. Her love and encouragement kept me sane
throughout this paper.
11
CONTENTS
ACKNOWLEDGMENTS ii
ABSTRACT v
LIST OF TABLES vi
LIST OF FIGURES vii
I. GENERAL INTRODUCTION 1
1.1 Psychoacoustics 3
II. FILTER BANKS 6
2.1 Introduction 6
2.2 Delay 8
2.3 Examples 10
2.4 Gain Effects on Signal Reconstruction 12
III. BLIND SOURCE SEPARATION 14
3.1 Introduction 14
3.2 Literature Review 15
3.3 Higher-Order Statistics Overview 15
3.4 Higher-Order Statistics Approaches 16
3.4.1 Second-Order Information 16
3.4.2 Higher-Order Approximations 17
3.5 Whitening 18
3.6 JADE 21
3.7 Example 21
IV. IMPLEMENTATION 23
4.1 Introduction and Results 23
4.2 Testing Setup 24
4.3 Results 24
4.3.1 Results With No Psychoacoustic Model 24
4.3.2 Subband Power Correction 30
m
V. SUMMARY AND CONCLUSION 35
5.1 Future Work 35
BIBLIOGRAPHY 37
IV
ABSTRACT
The aim of this thesis is to improve the perceived quality of digital hearing aids
1 hrough the introduction of source segregation algorithms that attenuate undesired
speech and noise into the psychoacoustic model of the human ear. It proposes the
use of low-delay modulated filter banks to mimic the critical bands of the human ear
to accomplish blind source separation. This thesis will cover theory and results for
modulated filter banks and blind source separation.
LIST OF TABLES
4.1 Magnitude Square Error of Separated Signals for Figures 4.2 and 4.5 26
4.2 Magnitude Square Error of Separated Signals for Figures 4.4 and 4.5 28
VI
LIST OF FIGURES
1.1 Threshold in Quiet 4
1.2 Critical Bandwidth versus Frequency 5
2.1 Block Diagram of a /C-Decimated Filter Bank 6
2.2 Prototype Design with Linear Phase 9
2.3 Prototype Design with Raised (Dosine 10
2.4 Raised-Cosine: M=4, N=16, D=7 11
2.5 Raised-Cosine: M=4, N=24, D=7 11
2.6 Raised-Cosine: M=4, N=24, D=15 11
2.7 Linear Phase: M=4, N=24, D=15 11
2.8 Analysis Filters from Prototype in Figure 2.4 12
2.9 Synthesis Filters from Prototype in Figure 2.4 12
3.1 Joint Distribution of Unmixed Signals sx, S2 19
3.2 Joint Distribution of Observed Mixed Signals xi, a;2 20
3.3 Joint Distribution after Whitening the Mixtures 20
3.4 Example of BSS using JADE 22
4.1 Flow Diagram of the BSS Filter Bank Method 23
4.2 Separated Signal 1 using a 4-Channel Filter Bank with K = I . . . . 25
4.3 Separated Signal 2 using a 4-Channel Filter Bank with K = 1 . . . . 26
4.4 Separated Signal 1 using a 4-Channel Filter Bank with K — 2 . . . . 27
4.5 Separated Signal 2 using a 4-Channel Filter Bank with K = 2 . . . . 27
4.6 Separated Signal 1 using a 64-Channel Filter Bank with K = 32 . . . 29
4.7 Separated Signal 2 using a 64-Channel Filter Bank with i^ = 32 . . . 29
4.8 Separated Signal 1 using a 64-Channel Filter Bank with K = 32 and subband Power Correction 31
vu
4.9 Separated Signal 2 using a 64-Channel Filter Bank with K = 32 and subband Power Correction 31
•1.10 Analysis Filters for a Psychoacoustic Model of the Critical Band . . . 32
1.11 Synthesis Filters for a Psychoacoustic Model of the Critical Band . . 33
4.12 Separated Signal 1 using a Critical Band Approximation Filter Bank with K = 4 and subband Power Correction 34
4.13 Separated Signal 2 using a Critical Band Approximation Filter Bank with K = A and subband Power Correction 34
Vlll
CHAPTER I
GENERAL INTRODUCTION
Hearing impairment affects approximately 28 million people in the United States
and approximately ten percent of the total world population [1]. Hearing loss severity
can range from mild to so severe that communication is impossible without the use of
a hearing instrument. Despite the severity of the impairment, most hearing impaired
persons have trouble understanding a single speaker in the presence of background
noise or a mixture of speech segments- the so-called cocktail party effect. This is due
to damage in the cochlea of the ear, which results in a broader frequency selectivity
and a reduced dynamic range of the ear. This damage causes a rapid increase in
the sensation of loudness and a lass effective rejection of background noLse. The aim
of this thesis is to improve the perceived quaUty of digital hearing aids through the
introduction of source segregation algorithms that attenuate undesired speech and
noise into the psychoacoustic model of the human ear.
Since concurrent speech segments from different speakers have similar spectral
properties, it is impossible to remove the undesired parts from the desired one through
spectral filtering. Thus more elaborate methods need to be used. Blind source sep
aration [5] is emerging as a technique that allows one to nearly perfectly segregate
different sources without any need of a priori knowledge on the statistical distribu
tions of the sources or the nature of the process by which the source signals were
combined. Instead, it is generally assumed that the sources are statistically indepen
dent and that the mixing model is finear. Based on these assumptions, the separation
algorithm attempts to invert the mixing process.
The drawbacks of current source separation algorithms are the slow speed of con
vergence, the high computational complexity, and the number of microphones needed
(at least as many microphones as speakers). In the case of modem digital hearing aids,
(e.g., Phonak Claro), two microphones are available to perform beam-forming. Using
the above approach, it is thus possible to segregate two sources. This is sufficient to
1
our application if all undesired/background speakers can be combined to one noise
source. However, instead of applying source separation directly to the broadband
signals captured from the microphones, this thesis proposes to incorporate the source
separation with a filter bank that mimics the psychoacoustic model of the human ear.
Experiments have revealed that the peripheral auditory system behaves as a bank of
filters with increasing bandwidth towards high frequencies [18]. Its time-frequency
behavior has been successfully modelled through a wavelet filter bank [6, 16]. The
advantages of source separation in the subband domain compared to algorithms based
on broadband speech signals are as given:
• The frequency decomposition allows for treating frequency bands with different
hearing loss and different time-frequency sensitivity of the ear differently.
• Using a filter bank, the original computationally complex problem of separating
broadband speech signals can be split into a set of smaller parallel problems with
faster speed of convergence.
• Inherent speech statistical properties, such as different fimdamental frequen
cies of different speakers, can be easily described in the frequency domain and
applied to the subband approach.
• An amplification of each subband depending on the hearing loss, which is de
rived from an audiogram, can easily be introduced. This is aheady done in
commercial digital hearing aids.
This chapter presents a brief overview of psychoacoustic principles needed to fulfill
the project. The most important idea will be the concept of the critical band. Chapter
II will cover theory and implementation of low delay filter banks that will be used to
mimic the critical bands of the ear. Chapter III will introduce algorithms used for
blind source separation. Chapter IV will explain the method and implementation of
blind source separation using filter banks. Chapter IV will also present the results of
filter banks used in blind source separation and the psychoacoustic model. Finally,
Chapter V will give a conclusion of the work along with future work to be done in
this area.
1.1 Psychoacoustics
Psychoacoustics is the study of the psychological responses to acoustical stimuli.
In other words, psychoacoustics is the study of how the human brain perceives sound.
It takes into account what makes a sound loud, what determines pitch, etc. Human
hearing ranges from about 20Hz to 20kH2 [18]. Although hearing ranges up to 20kHz,
most of the energy of speech lies in the lower frequency band. In fact, intelligible
speech can be transmitted with a bandwidth of less than 5kHz.
Audibility of sounds is an important feature when performing speech processing
for hearing loss. The sound pressure needed for a tone to be just audible is not con
stant across all firequencies. This threshold, called the threshold in quiet or absolute
threshold, is a function of frequency and is shown in Figure 1.1. This figure shows
how hearing loss caimot be overcome by simply amplifying the signal. Since each
frequency has a different absolute threshold, ampHfying the entire signal may make
some frequencies audible, but could also make other frequencies too loud and become
annoying or even painful to the patient. Hearing loss also affects different people
for different frequencies. Furthermore masking begins to occur. Masking refers to
the amount, or process, by which the absolute threshold is raised by the presence
of another sound (masker). Masking is what gives rise to the cocktail party effect.
Background noise and speech effectively mask out the desired speaker. Masking also
liO dB 120
I I I
^ " threshold of pain 10) dB 120
-100
80 gl .S
60 ^ (n g
iO c
20
Q02 a05 0,1 02 OS 1 2kHz 5 10 20 frequency
Figure 1.1: Threshold in Quiet. This figure is taken from [18, p. 15].
brings up another important topic in psychoacoustics: the critical bandwidth(critical
band). The critical bandwidth concept, introduced by Fletcher in 1940, states that
only a narrow band of frequencies surrounding a desired tone contribute to the mask
ing of that tone [13]. In order for the tone to be masked, the power of the masker
inside the critical band must equal the power of the tone. All frequencies outside the
critical band do not contribute to masking. Figure 1.2 shows the critical bandwidth
versus frequency. In general, the critical bandwidth is about WOHz for / < 500Hz
and is about . 2 / for / > 500Hz. This works well since most of the information in
speech is located at lower frequencies. The ear provides better resolution at these
lower frequencies since the critical bandwidth is smaller for lower frequencies.
The critical band concept can be summed up with the following. An incoming
tone of frequency fc is mcident upon the ear. The ear then acts as a bandpass filter
with center frequency fc and bandwidth equal to the critical bandwidth at fc. Using
this concept it is possible to design a filter bank that imitates the critical bandwidths
of the ear.
5K lOK eoK
The critical bandwidth as a function of frequency at the center of the band.
20 90 100 200 500 IK ZK FREQUENCY (Hi>
Figure 1.2: Critical Bandwidth vs. Frequency. This figure is taken from [7, p. 235].
xtnl
H.(z) - i K
H,(2) * iK
H,(z) * i K
^ H,,W * iK
I
CHAPTER II
FILTER BANKS
2.1 Introduction
TK- F„(2)
! K - F,(2)
TK^ F,(2) ^
IHoti")!
I K - F„.,(2) J M-Channei liller bsnk decimeled by K
IH,(el«)l IH^,(>*OI
q d\ -I JS M M
typical idea] responses
Figure 2.1: Block Diagram of a /VT-Decimated Filter Bank
The first step in the method proposed in this thesis will be incorporating filter
banks in the blind source separation scheme in order to possibly achieve better pro
cessing time. This section introduces filter bank theory necessary to understand the
overall method.
Filter banks are used in many applications such as subband coding for speech and
audio as well as for image coding. A filter bank simply divides a signal into subbands,
processes each subband, and reconstructs the signal. The analysis bank, Hk{z), sep
arates the signal, and the synthesis bank, Fk{z) recombines it. The most simple filter
bank divides the signal into low frequency and high frequency channels(M = 2). This
of course can be expanded into any integer number. These are called M-channel filter
banks. A typical M-channel /C-decimated filter bank is shown in Figure 2.1. One
of the most popular filter bank is the cosine-modulated filter bank. For low-delay,
cosine-modulated filter banks, the analysis and synthesis filters are cosine-modulated
[2] A. Belouchrani, K. Abed-Maraim, J-F. Cardoso, and E. Moulines, A blind source separation technique using second-order statistics, IEEE Transactions on Signal Processing 45 (1997), no. 2, 434-444.
[3] J.-F. Cardoso, Blind beamforming for nan gaxissian signals, IEEE Trans, on Signal Processing 140 (1993), no. 6, 362-370.
[4] , Infomax and maximum likelihood for blind source separation., IEEE Signal Processing Letters 4 (1997), no. 4, 112-114.
[5] , Blind signal separation: Statistical principles., Proceedings of the IEEE 83 (1998), no. 10, 2009 - 2025.
[6] T.A. de Perez, Min Li, H. McAUister, and N.D. Black., Noise reduction and loudness compression in a wavelet modelling of the auditory system.. Proceedings of the 2000 Third IEEE International Caracas Conference on Devices, Circuits and Systems (2000), S l l / 1 - S l l /6 .
[7] John Durrant, Bases of hearing science, 2nd ed., WilUams k Wilkins, 1984.
[8] P. N. Heller, T. Karp, and T. Q. Nguyen, A general formulation of modulated filter banks, IEEE Trans, on Signal Processing 47 (1999), no. 4, 985-102.
[9] Aapo Hyvarinen and Erkki Oja, Independent component analysis: A tutorial., World Wide Web, 1999, http://www.cis.hut.fi/aapo/papers/IJCNN99_tutorialweb/IJCNN99.tutorial3.html.
[10] Alp. Kayabasi, Blind source separation, Tech. report, University of Maryland, Baltimore, 1997.
[11] T. Lee and R. Orghneister, Blind source separation of real-world signals, Proc. ICNN (Houston), 1997, pp. 2129-2135.
[12] Jerry M. Mendel, Tutorial on higher-order statistics (spectra) in signal processing and system theory: Theoretical results and some applications, Proceedings of the IEEE 79 (1991), no. 3, 278-305.
[13] Brian C.J. Moore, An introduction to the psychology of hearing, 2nd ed.. Academic Press, San Diego, 1982.
[14] T. Q. Nguyen, Near-perfect-reconstruction pseudo-qmf banks, IEEE Transactions on Signal Processing 42 (1994), no. 1, 65-76.