Source Localization in Complex Listening Situations:
Selection of Binaural Cues Based on Interaural Coherence
Christof FallerMobile Terminals Division, Agere Systems
Juha MerimaaInstitut für Kommunikationsakustik,
Ruhr-Universität Bochum
Complex listening situations
Jazz
Blaah, blaah,blaah
Hum
Speech source at -15º, good music at 50º, and noise through an open door at -125º azimuth
This work
• A model to extract binaural cues corresponding to human localization performance in several complex listening situations
Outline
1. Model descripiton
2. Simulation resultsA) Independent sources in free-field
B) Precedence effect
C) Independent sources and reverberation
3. Comparison with earlier models
4. Summary
HRTF/BRIR 1
Left ear input
Stimulus 1
HRTF/BRIR N
Right ear input
Gammatone filterbank
HRTF/BRIR N
HRTF/BRIR 1
Stimulus N
Internal noise
Normalized cross-correlation &level difference calculation
Model of neural
transduction
Exponential time window 10 ms
Bernstein et al. 1999
Extraction of binaural cues
• Estimated at each time instant:– Interaural Time Difference (ITD)• Time lag of the maximum of the normalized
cross-correlation
– Interaural Level Difference (ILD)• Ratio of signal energies within time window
– Interaural coherence (IC)• Maximum of the normalized cross-correlation
Assumption for correct localization
• The auditory system needs to acquire ITD and ILD cues similar to those evoked by each source separately in an anechoic environment
Example: Two active sound sources
• Superposition with different level and phase relations at left and right ears
• For independent or non-stationary source signals:– Time-varying binaural cues– Reduced IC
How to obtain correctlocalization cues?
• Simply select ITDs and ILDs only when IC is above a set threshold– An adaptive threshold is assumed
Simulation results
1. Effect of number of sources• Speech sources at same overall level
(Hawley et al. 1999; Drullman & Bronkhorst 2000)– One or two distracters have little effect on
localization performance– Performance is still good for 5 competing
sources
• Simulations with different phonetically balanced sentences recorded by the same male speaker
Two talkers, ±40º azimuth
• 65 and 58 % selected signal power
3 and 5 talkers
• Simulated at 500 Hz critical band• 3 talkers: 0º and ±40º azimuth
• 5 talkers: 0º, ±40º, and ±80º azimuth
3 talkers: c
0 = 0.99
p0 = 54 %
5 talkers: c
0 = 0.99
p0 = 22 %
All
cue
sS
ele
cte
d c
ues
2. Effect of target-to-distracter ratio
• Click-train target in presence of a white noise distracter– Target is localizable down to a few dB above
detection threshold (Good & Gilkey 1996; Good et al. 1997)
– High frequencies are more important for localization (Lorenzi & et al. 1999)
Simulation
• 2 kHz critical band• White noise at 0º azimuth
• 100 Hz clicktrain at 30º azimuth
• -3, -9, and -21 dB absolute target-to-distracter ratios (T/D)– Corresponds to 8, 2, and -10 dB T/D relative
to detection threshold, as defined by Good & Gilkey (1996)
-3 dB T/Dc
0 = 0.990, p
0 = 3 %
-9 dB T/Dc
0 = 0.992, p
0 = 9 %
-21 dB T/Dc
0 = 0.992, p
0 = 99 %
All
cue
sS
ele
cte
d c
ues
Precedence effect
• Perception of subsequent sound events– Fusion– Localization dominance by the first event– Suppression of directional discrimination of
latter events
• Depends on interstimulus delay– Summing localization (approx. 0-1 ms)– Localization dominance by first event
(stimulus dependent, until 2-50 ms)– Independent localization
1. Click pairs
• Classical precedence effect experiment: Two consecutive clicks with same level from different directions
Lead: 40º, lag: -40º, ICI: 5 ms
Click pairs as a function of inter-click interval (ICI)
• Simulations for ICI between 0 - 20 ms• Same click sources: ±40º azimuth
• 500 Hz critical band
• A single threshold did not predict all cases correctly– Threshold was determined for each ICI such
that the standard deviation of ITD is 15 μs
Click pairs as a function of ICI
Click pairs as a function of ICI
Note on crossfrequency processing
• At certain small ICIs the required IC threshold gets very high– Anomalies of precedence effect have been
reported for bandpass filtered clicks (Blauert & Cobben 1978)
• Some characteristic power peaks occur at different ICIs at different critical bands
• Across frequency band processing would allow extraction of correct cues
2. Sinusoidal tones and a reflection
• Steady state cues are a result of coherent summation of sound at the ears of a listener
• Localization depends on onset rate (Rakerd & Hartmann 1986)– Correct localization with a fast onset– Localization based on misleading steady
state cues for tones with a slow onset
Sinusoidal tones: Simulation
• 500 Hz sinusoidal tone• Direct sound from 0º azimuth
• Reflection after 1.4 ms from 30º
• Linear onset ramp
• Steady state level of 65 dB SPL
Sinusoidal tones: Results
• The model cannot as such explain discounting of the steady state cues
• Dependence on onset rate can be explained by considering cues at the time when signal level gets high enough above internal noise
Independent sources and reverberation
• Final test for the model• Simulation at 2 kHz critical band– One speech sources at 30º azimuth– Two speech sources at ±30º azimuth
• BRIRs measured in a hall withRT = 1.4 s at 2 kHz octave band
All
cue
sS
ele
cte
d c
ues
1 talker: c
0 = 0.99
p0 = 1 %
2 talkers: c
0 = 0.99
p0 = 1 %
Comparison with earlier models
Weighting of localization cues with signal power
• Not done outside 10 ms analysis window• Contribution of each time instant to
localization is defined by IC
• Model can neglect information corresponding to high power when due to concurrent activity of several sources
• Power still affects how often ITDs and ILDs of individual sources are sampled
Lindemann (1986)
• Based on contralateral inhibition using a fixed (10 ms) time constant
• Tends to hold cross-correlation peaks with high IC
• Differences– Operation of the cue selection method is not
limited to the 10 ms time window– When necessary (complex situations), the
“memory” of past cues can last longer
Zurek (1987)
• Localization inhibition controlled by onset detection
• In precedence effect conditions, the cue selection naturally derives most localization cues from onsets
• Differences– Cue selection is not limited to getting
information from signal onsets
Summary
• A method was proposed for modeling auditory localization in presence of concurrent sound
• ITD and ILD cues are selected only when they coincide with a large IC
• Operation of the model was verified with results of several psychoacoustical studies from the literature
Thank you!