Source Localization in Complex Listening Situations: Selection of Binaural Cues Based on Interaural Coherence Christof Faller Mobile Terminals Division,

Source Localization in Complex Listening Situations:

Selection of Binaural Cues Based on Interaural Coherence

Christof FallerMobile Terminals Division, Agere Systems

Juha MerimaaInstitut für Kommunikationsakustik,

Ruhr-Universität Bochum

Complex listening situations

Jazz

Blaah, blaah,blaah

Hum

Speech source at -15º, good music at 50º, and noise through an open door at -125º azimuth

This work

• A model to extract binaural cues corresponding to human localization performance in several complex listening situations

Outline

1. Model descripiton

2. Simulation resultsA) Independent sources in free-field

B) Precedence effect

C) Independent sources and reverberation

3. Comparison with earlier models

4. Summary

HRTF/BRIR 1

Left ear input

Stimulus 1

HRTF/BRIR N

Right ear input

Gammatone filterbank

HRTF/BRIR N

HRTF/BRIR 1

Stimulus N

Internal noise

Normalized cross-correlation &level difference calculation

Model of neural

transduction

Exponential time window 10 ms

Bernstein et al. 1999

Extraction of binaural cues

• Estimated at each time instant:– Interaural Time Difference (ITD)• Time lag of the maximum of the normalized

cross-correlation

– Interaural Level Difference (ILD)• Ratio of signal energies within time window

– Interaural coherence (IC)• Maximum of the normalized cross-correlation

Assumption for correct localization

• The auditory system needs to acquire ITD and ILD cues similar to those evoked by each source separately in an anechoic environment

Example: Two active sound sources

• Superposition with different level and phase relations at left and right ears

• For independent or non-stationary source signals:– Time-varying binaural cues– Reduced IC

How to obtain correctlocalization cues?

• Simply select ITDs and ILDs only when IC is above a set threshold– An adaptive threshold is assumed

Simulation results

1. Effect of number of sources• Speech sources at same overall level

(Hawley et al. 1999; Drullman & Bronkhorst 2000)– One or two distracters have little effect on

localization performance– Performance is still good for 5 competing

sources

• Simulations with different phonetically balanced sentences recorded by the same male speaker

Two talkers, ±40º azimuth

• 65 and 58 % selected signal power

3 and 5 talkers

• Simulated at 500 Hz critical band• 3 talkers: 0º and ±40º azimuth

• 5 talkers: 0º, ±40º, and ±80º azimuth

3 talkers: c

0 = 0.99

p0 = 54 %

5 talkers: c

0 = 0.99

p0 = 22 %

All

cue

sS

ele

cte

d c

ues

2. Effect of target-to-distracter ratio

• Click-train target in presence of a white noise distracter– Target is localizable down to a few dB above

detection threshold (Good & Gilkey 1996; Good et al. 1997)

– High frequencies are more important for localization (Lorenzi & et al. 1999)

Simulation

• 2 kHz critical band• White noise at 0º azimuth

• 100 Hz clicktrain at 30º azimuth

• -3, -9, and -21 dB absolute target-to-distracter ratios (T/D)– Corresponds to 8, 2, and -10 dB T/D relative

to detection threshold, as defined by Good & Gilkey (1996)

-3 dB T/Dc

0 = 0.990, p

0 = 3 %

-9 dB T/Dc

0 = 0.992, p

0 = 9 %

-21 dB T/Dc

0 = 0.992, p

0 = 99 %

All

cue

sS

ele

cte

d c

ues

Precedence effect

• Perception of subsequent sound events– Fusion– Localization dominance by the first event– Suppression of directional discrimination of

latter events

• Depends on interstimulus delay– Summing localization (approx. 0-1 ms)– Localization dominance by first event

(stimulus dependent, until 2-50 ms)– Independent localization

1. Click pairs

• Classical precedence effect experiment: Two consecutive clicks with same level from different directions

Lead: 40º, lag: -40º, ICI: 5 ms

Click pairs as a function of inter-click interval (ICI)

• Simulations for ICI between 0 - 20 ms• Same click sources: ±40º azimuth

• 500 Hz critical band

• A single threshold did not predict all cases correctly– Threshold was determined for each ICI such

that the standard deviation of ITD is 15 μs

Click pairs as a function of ICI

Click pairs as a function of ICI

Note on crossfrequency processing

• At certain small ICIs the required IC threshold gets very high– Anomalies of precedence effect have been

reported for bandpass filtered clicks (Blauert & Cobben 1978)

• Some characteristic power peaks occur at different ICIs at different critical bands

• Across frequency band processing would allow extraction of correct cues

2. Sinusoidal tones and a reflection

• Steady state cues are a result of coherent summation of sound at the ears of a listener

• Localization depends on onset rate (Rakerd & Hartmann 1986)– Correct localization with a fast onset– Localization based on misleading steady

state cues for tones with a slow onset

Sinusoidal tones: Simulation

• 500 Hz sinusoidal tone• Direct sound from 0º azimuth

• Reflection after 1.4 ms from 30º

• Linear onset ramp

• Steady state level of 65 dB SPL

Sinusoidal tones: Results

• The model cannot as such explain discounting of the steady state cues

• Dependence on onset rate can be explained by considering cues at the time when signal level gets high enough above internal noise

Independent sources and reverberation

• Final test for the model• Simulation at 2 kHz critical band– One speech sources at 30º azimuth– Two speech sources at ±30º azimuth

• BRIRs measured in a hall withRT = 1.4 s at 2 kHz octave band

All

cue

sS

ele

cte

d c

ues

1 talker: c

0 = 0.99

p0 = 1 %

2 talkers: c

0 = 0.99

p0 = 1 %

Comparison with earlier models

Weighting of localization cues with signal power

• Not done outside 10 ms analysis window• Contribution of each time instant to

localization is defined by IC

• Model can neglect information corresponding to high power when due to concurrent activity of several sources

• Power still affects how often ITDs and ILDs of individual sources are sampled

Lindemann (1986)

• Based on contralateral inhibition using a fixed (10 ms) time constant

• Tends to hold cross-correlation peaks with high IC

• Differences– Operation of the cue selection method is not

limited to the 10 ms time window– When necessary (complex situations), the

“memory” of past cues can last longer

Zurek (1987)

• Localization inhibition controlled by onset detection

• In precedence effect conditions, the cue selection naturally derives most localization cues from onsets

• Differences– Cue selection is not limited to getting

information from signal onsets

Summary

• A method was proposed for modeling auditory localization in presence of concurrent sound

• ITD and ILD cues are selected only when they coincide with a large IC

• Operation of the model was verified with results of several psychoacoustical studies from the literature

Thank you!

Source Localization in Complex Listening Situations: Selection of Binaural Cues Based on Interaural Coherence Christof Faller Mobile Terminals Division,

Documents

Source Localization in Complex Listening Situations: Selection of Binaural Cues Based on Interaural Coherence Christof Faller Mobile Terminals Division,