SPOKEN LANGUAGE SYSTEMS MIT Laboratory for Computer Science Gammachirp Auditory Filter Alex Park May 7 th , 2003
SPOKEN LANGUAGE SYSTEMS
MIT Laboratory for Computer Science
Gammachirp Auditory Filter
Alex Park
May 7th, 2003
MIT Laboratory for Computer Science
SLSProject Overview
• Goal: – Investigate use of (non-linear) auditory filters for speech analysis
• Background:– Sound analysis in auditory periphery similar to wavelet transform
• Comparison:– Traditional Short-Time Fourier analysis
– Gammatone wavelet based analysis (auditory filter)
• Extension:– Gammachirp filter has level-dependent parameters which can
model non-linear characteristics of auditory periphery
• Implementation:– Specifics of Gammachirp implementation
– How to incorporate level dependency
MIT Laboratory for Computer Science
SLSAuditory Physiology
• Sound pressure variation in the air is transduced through the outer and middle ears onto end of cochlea
• Basilar membrane which runs throughout the cochlea maps place of maximal displacement to frequency
Low freq (200 Hz)
High freq (20 kHz)
Outer ear Middle ear Cochlea
Basilar Membrane
Auditory Nerve
Cortex
MIT Laboratory for Computer Science
SLSMotivation – Why better auditory models?
• Automatic Speech Recognition (ASR)– ASR systems perform adequately in ‘clean’ conditions
– Robustness is a major problem; degradation in low SNR conditions is much worse than humans
• Hearing research– Build better hearing aids and cochlear implants
– Hearing impaired subjects with damaged cochlea have trouble understanding speech in noisy environments
– Current hearing aids perform linear amplification, amplify noise as well as the signal
• Is the lack of compressive non-linearity in the front-end a common link?
MIT Laboratory for Computer Science
SLS
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0
Non-stationary Nature of Speech
• Why is speech a good candidate for local frequency analysis?
Waveform of the word “tapestry”
/t/transient
/ae/tone
/s/noise
MIT Laboratory for Computer Science
SLSTime-Frequency Representation
• The most common way of representing changing spectral content is the Short Time Fourier Transform (STFT)
0 0.01 0.02 0.03 0.04 0.05 0.06-0.1
0
0.1
0.2
Time0 0.01 0.02 0.03 0.04
0
1000
2000
3000
4000
5000
6000
7000
8000
Power
02000
40006000
8000
Frequency (Hz)
FFT
MIT Laboratory for Computer Science
SLSSpectrogram from STFT
“tapestry”
MIT Laboratory for Computer Science
SLSSTFT Characteristics
• We can think of the STFT as filtering using the following basis
0• In the frequency domain, we are using a filterbank consisting of linearly spaced, constant bandwidth filters
0 200 400 600 800 1000 1200 1400 1600 1800 2000
-40
-30
-20
-10
0
dB
Freq (Hz)Freq (Hz)
MIT Laboratory for Computer Science
SLSAuditory Filterbanks
• Unlike the STFT, physiological data indicates that auditory filters:– are spaced more closely at lower freq than at high freq
– have narrower bandwidths at lower frequencies (constant-Q)
• The Gammatone filter bank proposed by Patterson, models these characteristics using a wavelet transform.
• The mother wavelet, or kernel function, is
)2exp())(2exp( 11 tfjtfERBbat cc
n
Gamma Envelope Tone carrier
0 0.1 0.2 0.3 0.4 0.5
MIT Laboratory for Computer Science
SLSGammatone Characteristics
• Unlike the STFT, the Gammatone filterbank uses the following basis
• The corresponding frequency responses are
0 500 1000 1500
-80
-60
-40
-20
0
Freq (Hz)
dB
Freq (Hz)
MIT Laboratory for Computer Science
SLSWhat are we missing?
• The Gammatone filterbank has constant-Q bandwidths and logarithmic spacing of center frequencies
• Also, Gamma envelope guarantees compact support
• But, the filters are 1) symmetric and 2) linear
• Psychophysical experiments indicate that auditory filter shapes are:1) Asymmetric
* Sharper drop-off on high frequency side
2) Non-linear
* Filter shape and gain change depending on input level
* Compressive non-linearity of the cochlea
* Important for hearing in noise and for dynamic range
MIT Laboratory for Computer Science
SLSGammachirp Characteristics
• The Gammachirp filter developed by Irino & Patterson uses a modified version of the Gammatone kernel
)ln2exp())(2exp( 11 tjctfjtfERBbat cc
n
Gamma Envelope Tone carrier Chirp term
• Frequency response is asymmetric, can fit passive filter
• Level-dependent parameters can fit changes due to stimulus
0 0.1 0.2 0.3 0.4 0.5Impulse response
0 500 1000 1500
-80
-60
-40
-20
0
Freq (Hz)
dB
MIT Laboratory for Computer Science
SLSImplementation
• Looking in the frequency domain, the Gammachirp can be obtained by cascading a fixed Gammatone filter with an asymmetric filter
• To fit psychophysical data, a fixed Gammachirp is cascaded with level-dependent asymmetric IIR filters
0 500 1000 1500 2000 2500Frequency (Hz)
Filt
er G
ain
(dB
)
Gammatone AsymmetricCompensation Filter
Gammachirp
200 400 600 800 1000 1200 1400 1600 1800 2000Frequency (Hz)
Filt
er G
ain(
dB)
Level dependentchirps
PassiveGammachirp
Level dependentasymmetries
MIT Laboratory for Computer Science
SLSComparison: Tone vs. Passive Chirp outputs
Time (s)
Passive Gammachirp Spectrogram
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Time (s)
Gammatone Spectrogram
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
• Gammatone output seems to have better frequency res.
• Passive Gammachirp output seems to have better time res.
MIT Laboratory for Computer Science
SLSComparison: Tone vs. Active Chirp Outputs
Active Gammachirp
Gammatone
MIT Laboratory for Computer Science
SLSIncorporating level dependency
• As illustrated in previous slide, passive Gammachirp output offers little advantage on clean speech using fixed stimulus levels
• We can incorporate parameter control via feedback
ComputePassive GC
SpectrogramSegment into
frames
Get stimuluslevel/channel
Filter w/ level specific filter
S1
S2
:SN-1
SN
For each time frame
ReconstructFrames
MIT Laboratory for Computer Science
SLS
Clean
Sample outputs
40dB SNR 20dB SNR
30dB SNR
MIT Laboratory for Computer Science
SLSReferences
• Bleeck, S., Patterson, R.D., and Ives, T. (2003) Auditory Image Model for Matlab. Centre for the Neural Basis of Hearing. http://www.mrc-cbu.cam.ac.uk/cnbh/aimmanual/Introduction/
• Irino, T. and Patterson, R.D. (2001). “A compressive gammachirp auditory filter for both physiological and psychophysical data,” J. Acoust. Soc. Am. 109, 2008-2022.
• Pickles, J.O. (1988). An Introduction to the Physiology of Hearing (Academic, London).
• Slaney, M. (1993). “An efficient implementation of the Patterson-Holdsworth auditory filterbank,” Apple Computer Technical Report #35.
• Slaney, M. (1998). “Auditory Toolbox for Matlab,” Interval Research Technical Report #1998-010. http://rvl4.ecn.purdue.edu/~malcolm/interval/1998-010/
MIT Laboratory for Computer Science
SLSSidenote
Clean
40 dB SNR
30 dB SNR