Lab ROSA AIE @ MERL - Dan Ellis 2002-01-07 - 1 Audio Information Extraction Dan Ellis <[email protected]> Laboratory for Recognition and Organization of Speech and Audio (Lab ROSA ) Electrical Engineering, Columbia University http://labrosa.ee.columbia.edu/ Outline Audio Information Extraction Speech, music, and other General sound organization Future work & summary 1 2 3 4
33
Embed
Future work & summary Speech, music ... - Academic Commons
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Laboratory for Recognition and Organization of Speech and Audio(Lab
ROSA
)
Electrical Engineering, Columbia Universityhttp://labrosa.ee.columbia.edu/
Outline
Audio Information Extraction
Speech, music, and other
General sound organization
Future work & summary
1
2
3
4
LabROSA
AIE @ MERL - Dan Ellis 2002-01-07 - 2
Audio Information Extraction (AIE)
• Central operation:
- continuous sound mixture
→
distinct objects & events
• Perceptual impression is very strong
- but hard to ‘see’ in signal
1
0 2 4 6 8 10 12 time/s
frq/Hz
0
2000
1000
3000
4000
Voice (evil)
Stab
Rumble Strings
Choir
Voice (pleasant)
Analysis
LabROSA
AIE @ MERL - Dan Ellis 2002-01-07 - 3
Perceptual organization: Bregman’s lake
“Imagine two narrow channels dug up from the edge of a lake, with handkerchiefs stretched across each one. Looking only at the motion of the handkerchiefs, you are to answer questions such as: How many boats are there on the lake and where are they?”
(after Bregman’90)
• Received waveform is a mixture
- two sensors, N signals ...
• Disentangling mixtures as primary goal
- perfect solution is not possible- need knowledge-based
constraints
LabROSA
AIE @ MERL - Dan Ellis 2002-01-07 - 4
The information in sound
• A sense of hearing is evolutionarily useful
- gives organisms ‘relevant’ information
• Auditory perception is
ecologically
grounded
- scene analysis is preconscious (
→
illusions)- special-purpose processing reflects
‘natural scene’ properties- subjective
not
canonical (ambiguity)
freq
/ H
z
0 1 2 3 40
1000
2000
3000
4000
time / s0 1 2 3 4
Steps 1 Steps 2
LabROSA
AIE @ MERL - Dan Ellis 2002-01-07 - 5
Positioning AIE
• Domain
- text ... speech ... music ... general audio
• Operation
- recognize ... index/retrieve ... organize
Audio InformationExtraction
AIE
Unsupervised ClusteringComputational
Auditory Scene Analysis
CASA
Blind Source Separation
BSS
Independent Component
AnalysisICA
AudioContent-based
RetrievalAudio CBR
MultimediaInformationRetrieval
MMIR
MusicInformation
RetrievalMusic IR
AudioFingerprintingSpoken
DocumentRetrieval
SDR
AutomaticSpeechRecognition
ASR
TextInformation
RetrievalIR
TextInformation Extraction
IE
LabROSA
AIE @ MERL - Dan Ellis 2002-01-07 - 6
AIE Applications
• Multimedia access
- sound as complementary dimension- need all modalities for complete information
• Personal audio
- continuous sound capture quite practical- different kind of indexing problem
• Machine perception
- intelligence requires awareness- necessary for communication
• Music retrieval
- area of hot activity- specific economic factors
LabROSA
AIE @ MERL - Dan Ellis 2002-01-07 - 7
Outline
Audio Information Extraction
Speech, music, and other
- Speech recognition - Multi-speaker processing- Music classification- Other sounds
General sound organization
Future work & summary
1
2
3
4
LabROSA
AIE @ MERL - Dan Ellis 2002-01-07 - 8
Automatic Speech Recognition (ASR)
• Standard speech recognition structure:
• ‘State of the art ’ word-error rates (WERs):
- 2% (dictation) - 30% (telephone conversations)
• Can use multiple streams...
Featurecalculation
sound
Acousticclassifier
feature vectorsAcoustic model
parameters
HMMdecoder
Understanding/application...
phone probabilities
phone / word sequence
Word models
Language modelp("sat"|"the","cat")p("saw"|"the","cat")