Speech Processing Under Adverse Listening Conditions
Second International Conference on Intelligent Interactive
Technologies and Multimedia (IITM 2013), 09-11 March 2013,
Allahabad, India
09 March 2013Speech Processing for Persons with Moderate
Sensorineural Hearing Impairment
Prem C. Pandey
EE Dept., IIT Bombay
pcpandey @ ee.iitb.ac.inwww.ee.iitb.ac.in/~pcpandey,
www.ee.iitb.ac.in/~spilab1OutlineA. Speech & Hearing
B. Noise Suppression
S. K. Waddi, P. C. Pandey, N. TiwariSpeech Enhancement Using
Spectral Subtraction and Cascaded Median Based Noise Estimation for
Hearing Impaired Listeners (Proc. NCC 2013, Delhi, 15-17 Feb. 2013,
Paper 3.2_2_1569696063)
C: Reducing the Effect of Increased Spectral Masking
N. Tiwari, P. C. Pandey, P. N. Kulkarni Real-time Implementation
of Multi-band Frequency Compression for Listeners with Moderate
Sensorineural Impairment (,Proc. Interspeech 2012, Portland,
Oregon, 9-13 Sept 2012, Paper 689)[email protected]
Production Mechanism
Excitation source & filter model Excitation: voiced/unvoiced
glottal, frication Filtering: vocal tract filter
[email protected] segments Words Syllables Phonemes
Sub-phonemic segments
Phonemes: basic speech units Vowels: Pure vowels, Diphthongs
Consonants: Semivowels, Stops, Fricatives, Affricates,
Nasals/aba/
/apa/
/aga/
/ada/[email protected] features Modes of
excitationGlottalUnvoiced (aspiration, constriction at the glottis)
Voiced (vibration of vocal chords) FricationUnvoiced (constriction
in vocal tract)Voiced (constriction in vocal tract & glottal
vibration) Movement of articulators Continuant (steady-state vocal
tract configuration): vowels, nasal stops, fricatives
Non-continuant (changing vocal tract): diphthongs, semivowels, oral
stops (plosives) Place of articulation (place of maximum
constriction in vocal tract)Bilabial, Labio-dental, Linguo-dental,
Alveolar, Palatal, Velar, GluttoralChanges in voicing frequency
(Fo)
Supra-segmental features Intonation Rhythm
[email protected] MechanismPeripheral auditory
systemExternal ear (sound collection)Pinna Auditory canalMiddle ear
(impedance matching)Ear drumMiddle ear bonesInner ear (analysis and
transduction): cochleaAuditory nerve (transmission of neural
impulses)
Central auditory system Information processing &
interpretation
[email protected]
Tonotopic map of cochleaAuditory system
[email protected] of hearing lossesConductive loss
Sensorineural loss Central loss Functional lossSensorineural
hearing lossElevated hearing thresholds Reduced intelligibility as
speech components are inaudibleReduced dynamic range & loudness
recruitment (abnormal loudness growth)Distortion of loudness
relationship among speech componentsIncreased temporal masking Poor
detection of acoustic eventsIncreased spectral masking (due to
widening of auditory filters)Reduced frequency selectivity Reduced
ability to sense spectral shapes of speech sounds>> Poor
intelligibility and degraded perception of speech
Hearing [email protected]
availableFrequency selective amplificationImproves audibility but
may not improve intelligibility in presence of noiseAutomatic
volume controlMultichannel dynamic range compression (settable
attack time, release time, and compression ratios) Compresses the
natural dynamic range into the reduced dynamic range
Under InvestigationImprovement of consonant-to-vowel ratio
(CVR): for reducing the effects of increased temporal
maskingTechniques for reducing the effects of increased spectral
masking: Binaural dichotic presentation, Spectral contrast
enhancement, Multi-band frequency compressionNoise suppression
Signal processing in hearing [email protected]
Hearing AidsPre-amp AVC Selectable Freq. Response Amp.Programmable
Digital Hearing AidsPre-amp AVC Multi-band Amplitude Compression
& Freq. Response Amp.Major ProblemsNoisy environment &
reverberationDistortions due to multiband amplitude compressionPoor
speech perception due to increased spectral & temporal
maskingVisit to audiologist for change of settings Proposed Hearing
Aids (with user selectable settings)Pre-amp AVC Noise Suppression
Processing for Reducing the Effects of Increased Spectral Masking
Processing for Reducing the Effects of Increased Temporal Masking
Multi-band Amplitude Compression & Freq. Response Amp.
[email protected] Research Objectives Developing
techniques for improving speech perception by listeners with
moderate-to-severe sensorineural loss Reduction of effects of
increased spectral maskingBinaural aids: Binaural dichotic
presentation using comb filters for spectral splittingMonoaural
aids: Mutiband frequency compression Reduction of spectral
maskingEnhancement of transient parts (weak & short but
perceptually important ) Noise Suppression
Implementation of the techniques using a low-power DSP chip for
real-time operation and with acceptable signal delay (< 60
ms)[email protected] Research Objectives Developing
techniques for improving speech perception by listeners with
moderate-to-severe sensorineural loss Reduction of effects of
increased spectral maskingBinaural aids: Binaural dichotic
presentation using comb filters for spectral splittingMonoaural
aids: Mutiband frequency compression Reduction of spectral
maskingEnhancement of transient parts (weak & short but
perceptually important ) Noise Suppression
Implementation of the techniques using a low-power DSP chip for
real-time operation and with acceptable signal delay (< 60
ms)[email protected]. C. Pandey (EE Dept, IIT Bombay):
"Speech Processing for Persons with Moderate Sensorineural Hearing
Impairment", Plenary talk, Second International Conference on
Intelligent Interactive Technologies and Multimedia (IITM 2013),
09-11 March 2013, Allahabad, India
Abstract Our objective is to develop techniques for improving
speech perception by listeners with moderate-to-severe
sensorineural loss and to implement these techniques using a
low-power DSP chip for real-time operation and with acceptable
signal delay (< 60 ms). Here we present two techniques to reduce
the adverse effects of increased spectral masking associated with
sensorimeural loss. The first technique reduces the effects of
noise in the listening environment and the second one reduces the
effects of increased intra-speech spectral masking.A spectral
subtraction technique is presented for real-time speech enhancement
in the aids used by hearing impaired listeners. For reducing
computational complexity and memory requirement, it uses a
cascaded-median based estimation of the noise spectrum without
voice activity detection. The technique is implemented and tested
for satisfactory real-time operation, with sampling frequency of 12
kHz, processing using window length of 30 ms with 50% overlap, and
noise estimation by 3-frame 4-stage cascaded-median, on a 16-bit
fixed-point DSP processor with on-chip FFT hardware. Enhancement of
speech with different types of additive stationary and
non-stationary noise resulted in SNR advantage of 4 13 dB.Widening
of auditory filters in persons with sensorineural hearing
impairment leads to increased spectral masking and degraded speech
perception. Multi-band frequency compression of the complex
spectral samples using pitch-synchronous processing has been
reported to increase speech perception by persons with moderate
sensorineural loss. It is shown that implementation of multi-band
frequency compression using fixed-frame processing along with
least-squares error based signal estimation reduces the processing
delay and the speech output is perceptually similar to that from
pitch-synchronous processing. The processing is implemented on a
16-bit fixed-point DSP processor and real-time operation is
achieved using about one-tenth of its computing capacity.
[email protected]