Page 1
Audio WorkgroupAudio Workgroup
Neuro-inspired Speech RecognitionNeuro-inspired Speech Recognition
Group MembersGroup MembersIsmail Uysal Yoojin ChungRamin Pichevar Rich Hammett Tarek Massoud Ross GaylorDavid Anderson Shihab ShammaHynek Hermanski Shih-Chii LiuGiacomo Indiveri Malcolm Slaney
Page 2
Audio WorkgroupAudio Workgroup
Audio ProjectsAudio Projects
LocalizationLocalization
Speech Speech RecognitionRecognition
Speech Speech RecognitionRecognition More ASRMore ASRMore ASRMore ASR
Page 3
Audio WorkgroupAudio Workgroup
Shihab is RunningShihab is Running
See http://www.hardrock100.com/index.asp
Shihab arriving in Telluride in 2004
(should happen around 4PM today)
Page 4
Audio WorkgroupAudio Workgroup
Localization EffortLocalization Effort
Interaural Time Difference (ITD)
Estimated from time difference between spikes of two matching channels.
Interaural Intensity Difference (IID)
Difference of spike counts between two cochleae.
Azimuth: Combination of ITD and IID
ITD estimation from pure tones
Azimuth estimation from music
Speaker
Microphones
Page 5
Audio WorkgroupAudio Workgroup
Localization EffortLocalization Effort
Page 6
Audio WorkgroupAudio Workgroup
FPAA/Mote – Word RecognitionFPAA/Mote – Word Recognition
Page 7
Audio WorkgroupAudio Workgroup
FPAA/Mote – Word RecognitionFPAA/Mote – Word Recognition
Field Programmable Analog Array (FPAA)—based analog cochlea (non-spiking) with envelope detection.
MOTE—based pattern matching using matched filtering with “receptive fields”
Robosapien—listens to the spoken commands….
Page 8
Audio WorkgroupAudio Workgroup
FPAA/Mote – Word RecognitionFPAA/Mote – Word Recognition
Status:Status:FPAA – (we are using a new FPAA) 2nd-order sections synthesized but a full auditory filter bank is not yet up.
MOTE – real-time communication with Matlab and sampling operational.
Page 9
Audio WorkgroupAudio Workgroup
Relational Network (Simple)Relational Network (Simple)
X Y
Z
MM
X
M
Y
M
Z
m
Patches of neurons
Each measureone quantity
Bidirectionalrelations for feedback/feedforward
Thanks to Rodney Douglas
Page 10
Audio WorkgroupAudio Workgroup
Relational Network (example)Relational Network (example)
Input here
RelationalFeedback
Relational specification
Relational feedback
Page 11
Audio WorkgroupAudio Workgroup
ASR Relational NetworkASR Relational Network
Cochlea
Delay
Phone Recognizer
Word Recognizer
A patch of neurons(one of N output)Note: We don’t know
how to represent delays
Phone Recognizer
Bidirectional links enforce
phoneme/word constraints
Page 12
Audio WorkgroupAudio Workgroup
Relational AdvantagesRelational Advantages
Not an HMMHMMs are great, but…
Incorporate other knowledgeBottom-up perception
Top-down word hypothesis
HallucinateBased on experience
Hear “ba..” and know thatBad, bat, bar, bass, band follow
>
Page 13
Audio WorkgroupAudio Workgroup
Inner hair cells
Silicon CochleaSilicon Cochlea
Ganglion cells
Basilar membrane
highfrequency
lowfrequency
(van Schaik, Liu, 2004)
BASILAR MEMBRANE
INNER HAIR CELLS
GANGLION CELLS
Page 14
Audio WorkgroupAudio Workgroup
Silicon Frequency ResponseSilicon Frequency Response
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Tone ramps into two cochleas
Page 15
Audio WorkgroupAudio Workgroup
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Cochlear Rate ProfilesCochlear Rate Profiles
Left Cochlea Right Cochlea
Spi
kes
per
utte
ranc
e
Page 16
Audio WorkgroupAudio Workgroup
Learning AlgorithmsLearning Algorithms
StatisticalSAS (Pick best channels for decision)
Least squares (for software demo)
Liquid State MachineTake input to high dimensions with spiking net
Spike Timing Dependent Plasticity (STDP)Giocomo/Srinjoy Chip
Brader/Fusi
0 0.05 0.1 0.15 0.2 0.250
0.5
1
1.5
2
2.5
V1
V2
Vowel 1
Vowel 2
LSM Spiking Output
Page 17
Audio WorkgroupAudio Workgroup
Phoneme 1 Phoneme 2 Phoneme 2
Learning Chip ArchitectureLearning Chip Architecture
ImmediateCochlea
Pla
stic
sy
naps
esDelayedCochlea
Phoneme 1
Cochlea Chip
Learning ChipNeurons
Relational Network
Non
plas
tic
syna
pses
Exc
it.
Inhi
b.
Bin
ary
syna
ptic
w
eigh
ts:
, ,
Page 18
Audio WorkgroupAudio Workgroup
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Tone ResultsTone Results
Tone recognitionSpike input from silicon cochlea
TrainingTwo tones
Duplicated input
Positive and negative examples
Testing
Page 19
Audio WorkgroupAudio Workgroup
Phoneme recognitionSpike input from silicon cochlea
TrainingTwo phonemes
Duplicated inputs
Positive and negative examples
Testing
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Phoneme ResultsPhoneme Results
Page 20
Audio WorkgroupAudio Workgroup
Behind the CurtainBehind the Curtain
Page 21
Audio WorkgroupAudio Workgroup
Hardware OverviewHardware Overview
Cochlea
Learning
LearningLearning
PhonemeWord
PCI-AER (for remapping)
PCI-AER (for remapping)
Cochlea
Shih-Chii LiuGiacomo Indiveri
Implemented in MATLAB
Page 22
Audio WorkgroupAudio Workgroup
Infrastructure DifficultiesInfrastructure Difficulties
RemapperEnsuing the problems surrounding AER mapper boards, remapping the AER data from silicon cochlea to the learning chip had to be done in Matlab. (very slow)
PowerThe unpredictable problem caused by the variation in supply voltage as much as 1V.
Sharing chipsThe learning chip had to be shared with two other workgroups.
PC replacement
Page 23
Audio WorkgroupAudio Workgroup
Impedance DifficultiesImpedance Difficulties
Cochlear firing ratesCochlea: 6M spikes/second
30k channels, 200 spikes/second
Silicon Cochlea: 30k spikes/second30 channels, 1k spike/second
Learning Chip: 3k spikes/second30 channels, 100 spikes/second
Dynamic range
Page 24
Audio WorkgroupAudio Workgroup
Desired ResultsDesired Results
/A/ Phoneme Patch
/I/ Phoneme Patch
AI Word Patch
IA Word Patch
A A A IPhoneme Input
Relational Feedback
Without With
Page 25
Audio WorkgroupAudio Workgroup
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
SimulationSimulation
Page 26
Audio WorkgroupAudio Workgroup
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Simulation 2Simulation 2
Page 27
Audio WorkgroupAudio Workgroup
Simulation 3Simulation 3
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Page 28
Audio WorkgroupAudio Workgroup
Great Job!Great Job!
Student MembersStudent MembersIsmail Uysal Yoojin ChungRamin Pichevar Rich Hammett Tarek Massoud Ross Gaylor
Page 29
Audio WorkgroupAudio Workgroup
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Page 30
Audio WorkgroupAudio Workgroup
Silicon CochleaSilicon Cochlea
0 20 40 60 800
0.5
1
1.5
2x 10
5
Channel Number
Mean firing rate
Mean firing rates in response to two tones
/a//i/
0 2 4 6 8 10 12 14
x 105
10
15
20
25
30
35
40
45
50
55Raster plots for two different tones
Time in microseconds
Channel number
200Hz1000Hz
Raster plot for two different tone inputs
Mean firing rates for two different vowel inputs
Channel Number
Cha
nnel
Num
ber
Time in microseconds
Page 31
Audio WorkgroupAudio Workgroup
Word RecognizerWord Recognizer
Four example raster plot (silence, A_, A_ with relational, AI)
Page 32
Audio WorkgroupAudio Workgroup
Software SimulationSoftware Simulation
Page 33
Audio WorkgroupAudio Workgroup
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Software SimulationSoftware Simulation
Page 34
Audio WorkgroupAudio Workgroup
Behind the CurtainBehind the Curtain