1 Cochlear Implants and Auditory Brainstem Implants: Understanding Auditory Processing through Prosthetic Stimulation of Hearing Robert V. Shannon, Ph.D. House Ear Institute Los Angeles, CA
1
Cochlear Implants and Auditory Brainstem Implants:
Understanding Auditory Processing through Prosthetic
Stimulation of Hearing
Robert V. Shannon, Ph.D.House Ear Institute
Los Angeles, CA
2
“Hair Cells” inside the Cochlea
Hair cells convert mechanical sound energy into nerve impulses that go to the brain
3
Clarion Electrode + Positioner
4
Contour electrode in cochlear model
Cochlear Implant Improvement over Time in Adults
0102030405060708090
Perc
ent C
orre
ct
Sentences Words
3M/HouseF0F2F0F1F2MPEAKSPEAK/ClarionContour/EPS
81 01 81 01
5
ABIAuditory Brainstem Implant
CN
6
7
Nucleus 24 ABI
CI24M receiver-stimulator
Monopolarreference electrodes
(ball & plate)
Microcoiled electrodewires
Electrode array(21 platinum disks0.7mm diameter)
T-shapedDacronmesh
Removeablemagnet
ABI 24M Electrode Array
8
Comparison of Verona non-NF2 and HEI NF2 ABI performance
0102030405060708090
100
0-20 21-40 41-60 61-80 81-100
HEI N=149Verona N=29
% Correct Words in Sentences
Perc
ent o
f Sub
ject
s
9
Summary - ABI• ABI Provides limited sound sensations to people deafened by BVS
• Non-Tumor patients can achieve excellentopen-set speech recognition
• Thus, limitations in tumor patient performance are not due to device or neural processing limitations but to damage to the CN by tumors –possibly damage to modulation-specific pathway
PABIPenetrating Electrode ABI
10
Electrodes:PenetratingSurfaceGround
Antenna Coil
Receiver/Stimulator
11
IC??
So What’s Next? Inferior Colliculus? Cortex?So What’s Next? Inferior Colliculus? Cortex?
12
13
14
15
16
17
Speech Spectrum
S e n t e n ce
18
Auditory Pattern Recognition by the Brain
• What features of the pattern of neural output from the cochlea are most critical? Amplitude? Temporal? Cochlear Place?
• Most important features are related to the task:– Speech – tonotopic patterns changing
slowly (<20 Hz) over time– Localization – timing across ears– Pitch and Music – temporal or place?
Noise-Band Processor (4 bands)
Bandpass Filters
300, 713, 1509, 3043,
6000 Hz
Envelope Extraction:Half-wave Rectifier +
LPF
Amplitude Manipulation
Unit
0 6000 Hz
1
2
4
3 +
Bandpass Filters
Noise
Noise
Noise
Noise
19
Loudness Coding(Zeng & Shannon, NeuroReport 1999)
Acoustic
Electric
Central expansion
Direct electricactivation
E
Cochlear compression
Central expansion
A L = kAp
IeL β=
Input Amplitude (Unit)0 200 400 600 800 1000
Out
put A
mpl
itude
(Dyn
amic
Ran
ge %
)
0
20
40
60
80
100
p=0.30
p=3.0p=2.0p=1.5p=1.0
p=0.8
p=0.5
p=0.20
p=0.10
LOG
Amin Amax
O = Ip
20
Amplitude Mapping Effects -Time Waveform
P= 0.1 0.3 0.5 1.0 1.5 2.0 3.0
Power function exponent (P)0.05 0.2 0.50.1
Per
cent
Cor
rect
(%)
0
10
20
30
40
50
60
70
80
90
100
VOWELSCONSONANTS
0.3 0.5 0.8 2 31
A. Cochlear Implant Listeners B: Normal-Hearing Listeners
Fu & Shannon, JASA 1998, 4 Channels
21
Amplitude Mapping• Speech recognition is only mildly affected by large distortions (peak or center clipping, quantization, nonlinearities) in amplitude mapping even when spectral cues are severely limited
• Most commercial implants use a combination of compression and AGC for amplitude mapping
Temporal Cues in Speech• Rosen/Plomp Classification
– Envelope (0-50 Hz)– Periodicity (50-500 Hz)– Fine Structure (> 500 Hz)
• Temporal Psychophysics (NH & CI)– Nonspectral pitch changes and modulation detection up to 300-500 Hz
– So all listeners should be able to utilize first two categories
22
1 10 100 1000
Envelope Lowpass Cutoff Frequency (Hz)
1 10 100 1000
Per
cent
Cor
rect
(%)
0
20
40
60
80
100
Subject: N3Subject: N4Subject: N7Subject: N9Subject: N17Subject: N19CI meanNH mean
A: VOW ELS B: CONSONANTS
Fu and Shannon (2000), JASA, 107(1), 589-597
Summary: Envelope SmoothingSummary: Envelope Smoothing
• No decrease in speech recognition for envelope smoothing down to 20 Hz
• Even when spectral cues are limited• Even in cochlear implants
23
High Stimulation Pulse Rates• High rates should better represent temporal features in speech
• High rates will put the nerve into a more stochastic (normal) firing mode (Wilson, Rubinstein)
• High rates allow stochastic resonance (Morse, Zeng, Rubinstein,Chatterjee)
24
Temporal Cues in Speech
• Examples of speech corrupted by:– Cross-spectral Asynchrony– Time Reversal
•Overall maximum delay varied between 0 and 240 ms, in 40 ms steps•Delays alternated to ensure a maximum delay between adjacent channels
Time (ms)
Freq
uenc
y ch
anne
ls
Stimulus delay patterns
25
Originalspeech
Band-pass filtered, then delayed
4-channel, full spectrum speech with maximum delay (240ms)
750 Hz
1500 Hz
3000 Hz
6000 Hz
A: Full-spectrum processor
Maximum channel delay (ms)0 40 80 120 160 200 240
Perc
ent c
orre
ct (%
)
0
20
40
60
80
100
Mean delay across channels (ms)0 22 45 67 90 112 135
4-channel16-channel
B: Noise-band processor
0 40 80 120 160 200 240
0 22 45 67 90 112 135
26
Time Reversed Speech
• Speech is cut into time segments (20, 50, 100, or 200 msec each)
• Each time segment is reversed in time• Intelligibility is preserved up to 50-100 msec segments (Saberi andPerrott, Nature, 1999)
• CI listeners and 4-band noise processors can only tolerate 20-50 ms reversed segments. More channels (electrodes) allow longer time reversal segments (Fu et al., 2002)
Fu, Neuroreport, 2002
27
Fu, Neuroreport, 2002
Summary – Temporal Information• Only envelope cues appear to be used (<20
Hz), i.e., the temporal analysis window for speech is 20-50 ms. Modulation detection is correlated with speech recognition
• Periodicity cues (50-500 Hz) appear to be signaled by relative spectral features and can be used if forced
• Fine structure (>500 Hz) cued by spectral place cues
• Speech recognition is highly resistant to temporal distortions: cross-spectral asynchrony, temporal reversal, time compression or expansion, smoothing
28
So How Many Channels do You Need?
• 4 is enough for simple speech in quiet• More channels needed for more difficult materials or in noise or with less experience
• Even more channels needed for even simple familiar melody recognition
• Are lots of channels even enough for complex musical pitch and sound quality?
Number of Channels DEMO
1 2 4 8 16 32 Orig
29
30
31
Effects of Distortion on Spectral Channels
• Warping the distribution of information can reduce the recognition of the pattern even when there are many distinct channels
• We can infer how complex patterns are stored and retrieved in the brain by which types of distortion cause the most trouble
32
33
34
Tonotopic Shift• CI electrodes are inserted into cochlea
through round window and end up in tonotopiclocations that cover the pitch range of 500-5000 Hz
• Frequencies of speech may not go to an electrode in the right tonotopic place for that sound, resulting in a tonotopic shift
• In CIs and HAs, amplification in a “dead region” will only spread excitation to a healthy region, resulting in a tonotopic distortion (Turner et al., JSHR, 1999; Shannon et al., JARO, 2001)
35
513Hz 5100Hz
22 20 18 16 14 12 10 9 8 7 6 5 4 3 2 1
0 5 10 15 20 25 30 35 mm
Frequency Allocations
Apex Base
184 513 1168 2476 5085 10290 20677 Hz
Electrode ConfigurationsFull Insertion
)88.010(4.165)( 06.0 −×= xxf
10 rings out
1843 Hz 15650Hz
Noise-Band Processor (4 bands)
Bandpass Filters
300, 713, 1509, 3043,
6000 Hz
Envelope Extraction:Half-wave Rectifier +
LPF
Amplitude Manipulation
Unit
0 6000 Hz
1
2
4
3 +
Bandpass Filters
Noise
Noise
Noise
Noise
36
Tonotopic Shift
Apical edge of frequency allocation (mm re 7.75mm)-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
Nor
mal
ized
Per
cent
Cor
rect
(%)
0
20
40
60
80
100
MaleFemaleChildrenMean scores
Fu and Shannon, 1999 JASA: NH - 16 noise bands
37
Electrodes in CochleaElectrodes in Cochlea
distance
frequency
0mm 5mm 10mm 15mm 20mm 25mm 30mm 35mm
20kHz 10kHz 5kHz 2.5kHz 1kHz 500Hz 200Hz 20Hz
Noise-Band Processor (4 bands)
Bandpass Filters
300, 713, 1509, 3043,
6000 Hz
Envelope Extraction:Half-wave Rectifier +
LPF
Amplitude Manipulation
Unit
0 6000 Hz
1
2
4
3 +
Bandpass Filters
Noise
Noise
Noise
Noise
38
FrequencyFrequency--Place MappingPlace MappingCompression:
Expansion:
acoustic analysis bands
noise carrier bands
acoustic analysis bands
noise carrier bands
Matched:
acoustic analysis bands
noise carrier bands
30mm180Hz
4mm11.8kHz
25mm510Hz
25mm510Hz
25mm510Hz
9mm5.8kHz
9mm5.8kHz
9mm5.8kHz
14mm2.9kHz
20mm1.1kHz
timit sentences: 25 mm insertion depth, 16 channel
change in frequency range (mm)-5 -4 -3 -2 -1 0 1 2 3 4 5
perc
ent c
orre
ct
0
20
40
60
80
100
acoustic bandcochlear region
Baskent et alSubmitted to JARO
39
Original /a/ 100%
51%
52%
59%
50%
Expanded x1.4
Basal shift5 mm
Compressed x0.4
4-band Noise
Effect of frequency/place distortion on vowel recognition
Spectral Cues in Music• While spectral and temporal fine structure are not necessary for speech recognition they are critical for music, illustrating the different demands of speech and music on peripheral sensory processing
• Melody recognition requires many more spectral channels than speech
• “The cochlea isn’t designed for speech… the cochlea is designed for music” (Ed Burns)
40
4 8 16 32 Original
Popular Music with Male Vocal
Instrumental Music - No vocals
4 8 16 32 Original
41
Conclusions• To improve the design and fitting of CIs and
HAs we need to understand more about auditory pattern recognition for different tasks
• For speech recognition, spectral resolution and distortion are more important than amplitude and temporal distortion
• For speech quality and music, spectral resolution is even more important
• Hearing doesn’t end in the cochlea -Understanding the ear-brain system is key to future improvements
Acknowledgements
Qian-Jie Fu, Deniz Baskent, John Galvin III, MonitaChatterjee, Lendra Friesen, Monica Padilla, Mark Robert, Geri Nogaki, Xiaosong Wang, Rachel CruzSupported by NIH (NIDCD)