1 of 27 SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen [email protected]Contents: • Basics of human sound source recognition • Timbre • Voice recognition • Recognition of environmental sounds and events • Musical instrument recognition Audio Research Group, TUT
27
Embed
SOUND SOURCE RECOGNITION AND MODELING - TUT · SOUND SOURCE RECOGNITION AND MODELING CASA seminar, ... interactive properties of the ... • Can require long speech utterances to
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• Different acoustic properties of sound produenable us to recognize sound sources by lis
• These properties are the result of the produ
• The produced sound waves are different prevent
• Acoustic properties change over time
• The acoustic world is linear: sound waves fsources combine together and result larger
• Combination and interaction of properties sthe mix generate new, emergent propertiesthe larger sound producing system
3 of 27
)
at is, “what it
tener can tell thatissimilar”
properties
of properties
teractive
of a violin sounds like this)
learn how the violin sounds
Audio Research Group, TUT
Timbre (= äänen väri
• The perceptual qualities of objects and events; thsounds like”
• ANSI 1973: “The quality of a sound by which a listwo sounds of the same loudness and pitch are d
• There are many stable and time-varying acoustic affecting timbre
• It is unlikely that any one property or combinationuniquely determines timbre
• The sense of timbre comes from the emergent, inproperties of the vibration pattern
• The identification is the result of
• the apprehension of acoustical invariants (the bowing
• inferences made according to learned experience (we
like in different acoustic environments)
4 of 27
roduction
a vibration
nt vibration
nant frequency
, it modifies the source input
m of the signal
re of its
harp peak inignal (and vice
Audio Research Group, TUT
Source-filter model of sound p
• The source is excited by energy to generatepattern
• The filter acts as a resonator, having differemodes
• Each mode can be characterized by its resoand by its damping or quality factor Q
• When the excitation is imposed on the filterelative amplitudes of the components of the
• This results peaks in the frequency spectruat resonant frequencies
• Damping of the vibration modes is a measusharpness of tuning and temporal response
• Lightly damped mode (high Q) results a sspectrum and a longer time delay into the sversa)
5 of 27
spectrum and few
ted by theeristics
e modeled asesulting signale partialically
(1)
f the output a sn
z-transforms of
ract and the
Audio Research Group, TUT
• We can hear both the change in the sound the time differences (if they are more than amilliseconds)
• The final sound is the result of effects resulexcitation, resonators and radiation charact
• In sound producing mechanisms, that can blinear systems, the transfer function of the ris the product of the transfer functions of thsystems (if they are in cascade), mathemat
,
where and are the z-transforms o
excitation signal, respectively. are the
the N subsystems (for instance, the vocal treflections at lips)
Y z( ) X z( ) Hi z( )i 1=
N
∏=
Y z( ) X z( )Hi z( )
6 of 27
gnition
tem should:
of the same kind of instance, musicals or by different
ble to work withverberation and
ditional sounds and
s performancegree ofnd sources
Audio Research Group, TUT
Machine sound source reco
A good sound source recognition sys
• Exhibit generalization. Different instancessound should be recognized as similar. (forinstruments played at different environmentplayers)
• Hande real world complexity . Should be arealistic recording conditions, with noise, reeven competing sound sources.
• Be scalable. Ability to learn to recognize adaffects on performance.
• Exhibit graceful degradation. The systemshould gradually worsen while noise, the dereverberation and number of competing souincreases.
7 of 27
uld be able to its refine
simpler out of twoemory ornderstand how the
Audio Research Group, TUT
• Employ a flexible learning strategy. It shointroduce new categories as necessary andclassification criteria.
• Simplicity, computational efficiency. Thesystems performing equally well is better. (mprocessing requirements, how easy is it to usystem works)