This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 7 / 33
Outline
1 Spatial acoustics
2 Binaural perception
3 Synthesizing spatial audio
4 Extracting spatial sounds
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 8 / 33
Binaural perception
path length difference
path length difference
head shadow (high freq)
source
LR
What is the information in the 2 ear signals?I the sound of the source(s) (L+R)I the position of the source(s) (L-R)
Example waveforms (ShATR database)
2.2 2.205 2.21 2.215 2.22 2.225 2.23 2.235
-0.1
-0.05
0
0.05
0.1
time / s
shatr78m3 waveform
Left
Right
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 9 / 33
Main cues to spatial hearing
Interaural time difference (ITD)I from different path lengths around headI dominates in low frequency (< 1.5 kHz)I max ∼750 µs → ambiguous for freqs > 600 Hz
Interaural intensity difference (IID)I from head shadowing of far earI negligible for LF; increases with frequency
Spectral detail (from pinna reflections) useful for elevation &range
Direct-to-reverberant useful for range
Claps 33 and 34 from 627M:nf90
time / s
freq
/ kH
z
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.80
5
10
15
20
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 10 / 33
Head-Related Transfer Functions (HRTFs)
Capture source coupling as impulse responses
{`θ,φ,R(t), rθ,φ,R(t)}
Collection: (http://interface.cipic.ucdavis.edu/)
0 0.5 1 1.5
-45
0
45
0 0.5 1 1.5
0
1
0 0.5 1 1.5-1
0
1
time / ms time / ms
HRIR_021 Left @ 0 el
HRIR_021 Left @ 0 el 0 az
HRIR_021 Right @ 0 el 0 az
HRIR_021 Right @ 0 el
LEFT
RIGHT
Azi
mut
h / d
eg
Highly individual!
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 11 / 33
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 22 / 33
Synthetic binaural audio
Source convolved with {L,R} HRTFs gives precise positioning. . . for headphone presentation
I can combine multiple sources (by adding)
Where to get HRTFs?I measured set, but: specific to individual, discreteI interpolate by linear crossfade, PCA basis setI or: parametric model - delay, shadow, pinna (Brown and Duda,
1998)
Source
Delay Shadow Pinna
z-tDL(θ)1 - azt
1 - bL(θ)z-1
z-tDR(θ)1 - azt
1 - bR(θ)z-1
Σ pkL(θ,φ)·z-tPkL(θ,φ)
Σ pkR(θ,φ)·z-tPkR(θ,φ)
Room echoKE·z-tE
+
+
(after Brown & Duda '97)
Head motion cues?I head tracking + fast updates
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 23 / 33
Beamforming: Drive interference to zeroI cancel energy during nontarget intervals
ICA: maximize mutual independence of outputsI from higher-order moments during overlap
m1 m2
s1 s2
a11 a21
a12 a22
x
−δ MutInfo δa
Limited by separation model parameter spaceI only N × N?
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 29 / 33
Binaural models
Human listeners do better?I certainly given only 2 channels
Extract ITD and IID cues?
I cross-correlation finds timing differencesI ‘consume’ counter-moving pulsesI how to achieve IID, tradingI vertical cues...
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 30 / 33
Time-frequency masking
How to separate sounds based on direction?I assume one source dominates each time-frequency pointI assign regions of spectrogram to sources based on probabilistic
modelsI re-estimate model parameters based on regions selected
Model-based EM Source Separation and Localization
I Mandel and Ellis (2007)
I models include IID as∣∣∣ Lω
Rω
∣∣∣ and IPD as arg Lω
Rω
I independent of source, but can model it separately
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 31 / 33
Summary
Spatial soundI sampling at more than one point gives information on origin
direction
Binaural perceptionI time & intensity cues used between/within ears
Sound renderingI conventional stereoI HRTF-based
Spatial analysisI optimal linear techniquesI elusive auditory models
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 32 / 33
References
Elizabeth M. Wenzel, Marianne Arruda, Doris J. Kistler, and Frederic L. Wightman.Localization using nonindividualized head-related transfer functions. The Journal ofthe Acoustical Society of America, 94(1):111–123, 1993.
William G. Gardner. A real-time multichannel room simulator. The Journal of theAcoustical Society of America, 92(4):2395–2395, 1992.
C. P. Brown and R. O. Duda. A structural model for binaural sound synthesis. IEEETransactions on Speech and Audio Processing, 6(5):476–488, 1998.
Michael I. Mandel and Daniel P. Ellis. EM localization and separation using interaurallevel and phase cues. In IEEE Workshop on Applications of Signal Processing toAudio and Acoustics, pages 275–278, 2007.
J. C. Middlebrooks and D. M. Green. Sound localization by human listeners. AnnuRev Psychol, 42:135–159, 1991.
Brian C. J. Moore. An Introduction to the Psychology of Hearing. Academic Press,fifth edition, April 2003. ISBN 0125056281.
Jens Blauert. Spatial Hearing - Revised Edition: The Psychophysics of Human SoundLocalization. The MIT Press, October 1996.
V. R. Algazi, R. O. Duda, D. M. Thompson, and C. Avendano. The cipic hrtfdatabase. In Applications of Signal Processing to Audio and Acoustics, 2001 IEEEWorkshop on the, pages 99–102, 2001.
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 33 / 33