Modelling compensation for reverberation: work done and planned Amy Beeston and Guy J. Brown Department of Computer Science University of Sheffield EPSRC.

Modelling compensation for reverberation:work done and planned

Amy Beeston and Guy J. Brown

Department of Computer Science

University of Sheffield

EPSRC 12-month meeting, Sheffield: 23rd Oct 2009

2 of 36

Overview

1. Work done

sir-stir framework 3 Across-channel model configurations

2. Work planned

Across-channel model Within-channel model Further questions

3 of 36

Part 1: Work Done

sir-stir framework 3 Across-channel model configurations

Watkins’ sir/stir paradigm

4 of 36

Cat

egor

y bo

unda

ry,

step

Human listeners

Context distance, m100.32

10

5

0

effect ofreverberation

effect ofcompensation

more ‘sir’ responses

more ‘stir’ responses

5 of 36

Efferent auditory processing

Reverberating a speech signal reduces its dynamic range Reflections fill gaps in the temporal envelope

Efferent system helps control dynamic range (Guinan & Gifford, 1988). Could compensation be characterised as restoration of dynamic range?

mean= small valuemean/peak= 0.1216

mean= larger valuemean/peak= 0.2142

dry

reverberated

6 of 36

mean-to-peak ratio (MPR)

measured over some time-window peak does not vary greatly with source-receiver distance mean increases with source-receiver distance MPR = mean/peak

therefore MPR increases with distance

7 of 36

Modelling framework

8 of 36

Stimuli

Watkins, JASA 2005, experiment 5

forward/reversed speech carrierforward/reversed reverberation

fwd rev

fwd

rev

speech carrier

reve

rber

atio

n

Watkins, JASA 2005, experiment 4

reverberate, then flip polarities: noise afterflip polarities, then reverberate: noise before

after before

nois

e

9 of 36

Auditory Periphery

Outer/middle earSimulates human data from Huber et al. (2001)

Basilar membraneDRNL – dual resonance nonlinear filterbank (DRNL)

Originally proposed by Meddis, O’Mard and Lopez-Poveda (2001)

Human parameters from Meddis (2006)

Efferent attenuation introduced by Ferry and Meddis (2007)

Hair cellLinear output between threshold and saturated firing rate (Messing 2007)

Does not model adaptation in the auditory nerve

Bes

t fr

equ

ency

(H

z, lo

g-sp

aced

)

100

8000

100

8000

100

8000

Time

Auditory Nerve STEP

10 of 36

Spectro-temporal excitation patterns

… ok, next you’ll get to click on …{ }sirstir

11 of 36

Efferent attenuation based on dynamic range

Ideato control the amount of efferent attenuation applied in the model according to the dynamic range of the context

Dynamic range measured according to mean-to-peak ratio in AN response

Kurtosis negativedifferentials

offsets mean-to-peakratio

12 of 36

Auditory Nerve response

Across-channel modelthe auditory nerve response is summed across all frequency channels prior to implementation of the efferent system component

Within-channel modelthe auditory nerve response is NOT summed across all frequency channels

13 of 36

Across Channel

the auditory nerve response is summed across all frequency channels prior to implementation of the efferent system component

Σ MPR ATT

freq

uenc

y >

>

time >>

14 of 36

Within Channel

Auditory nerve response in each frequency channel influences the efferent system component

freq

uenc

y >

>

MPR ATTMPR ATT

MPR ATTMPR ATT

MPR ATTMPR ATT

MPR ATTMPR ATT

time >>

Efferent attenuation

MPR

AT

T

Linear map from MPR (of summed AN) to efferent attenuation, ATT ATT turns down the gain on the non-linear pathway of DRNL The rate-intensity curve shifts to the right

15 of 36

16 of 36

Recognition

helps to recover the dip in the temporal envelope corresponding to the ‘t’ closure in ‘stir’

Templates:

sir stir

17 of 36

3 Model configurations

Open loop

Semi-closed loopamount of attenuation is estimated during one second preceding the test-word, and held constant thereafter

Closed loopamount of attenuation is estimated continually in a sliding time window, and updated on a sample-by-sample basis (or at a specified control rate)

18 of 36

Efferent system: Open loop

Open loop

Many simulations were run: the amount of attenuation applied was varied across a range of values (0-30 dB), and the category boundary resulting were recorded in calibration charts.

The ‘best-match’ to human results was found (manually) for each condition.

Near contexts match best with low attenuation values, while far contexts match best with higher attenuation values.

19 of 36

Results: Open loop

Cat

egor

y B

ound

ary

resu

lts

12.5

9.0

22.0

21.5

Attenuation applied (dB)

farnear

Attenuation applied (dB) Attenuation applied (dB)

Attenuation applied (dB)Attenuation applied (dB)

Cal

ibra

tion

curv

es f

or t

unin

g

0, 0.5, … 29.5, 30

0, 0.5, … 29.5, 300, 0.5, … 29.5, 30

0, 0.5, … 29.5, 30

20 of 36

Efferent system: Semi-closed loop

Semi-closed loop

amount of attenuation is estimated during one second preceding the test-word, and held constant thereafter

21 of 36

Semi-closed loop

ATT

…………… ok, next you’ll get to click on ……………{ }sirstir

Examine context within time window to derive a metric value Use metric value to determine the efferent attenuation

22 of 36

Metric:Semi-closed loop MPR

23 of 36

Results:Semi-closed loop

Tuned to match near-near and far-far (fwd fwd) conditions

experiment 5 achieves qualitative (not quantitative) match to human data…

…but experiment 4 conditions do not match well

ATTENUATION=(38.36*MPR)+13.77

fwd rev

forw

ard

reve

rse

speech carrier

reve

rber

atio

n

before after

24 of 36

Efferent system:Closed-loop

Closed loop

amount of attenuation is estimated continually in a sliding time window, and updated on a sample-by-sample basis (or at a specified control rate)

25 of 36

Closed loop

ATTATTATTATTATTATT

…………… ok, next you’ll get to click on ……………{ }sirstir

Examine context within time window to derive a metric value Use metric value to determine the efferent attenuation applied Window slides forward, process repeats…

26 of 36

Metric:Closed loop MPR

Expt. 5

MPR through time

forw

ard

reve

rse

reve

rber

atio

n

fwd revspeech carrier

27 of 36

Closed loop (expt 5)

tuned to ‘best’ match near-near and far-far (fwd fwd) conditions variation possible due to granularity of model (± 0.5)

ATTENUATION=(45*MPR)+18 ATTENUATION=(45*MPR)+19

fwd revspeech carrier fwd revspeech carrier

fwd

rev

reve

rber

atio

n

fwd

rev

reve

rber

atio

n

28 of 36

Closed loop (expt 4)

MPR mapping does not generalise for experiment 4 noise contexts

ATTENUATION=(45*MPR)+18

fwd revspeech carrier

fwd

rev

reve

rber

atio

n

before after

29 of 36

Part 2: Work Planned

Across-channel model

Within-channel model

Further questions

30 of 36


Practical considerations

Control rate specified to speed up the simulation(usually 1 kHz i.e., attenuation parameter is updated every 1 ms)

Time-window over which to determine metric(usually previous 1 second, different values under investigation at present)

Shape of window(rectangular at present, should have a ‘forgetting function’)

Question to Tony et al.

What data can we use to determine the shape/duration window?

31 of 36


Σ MPR ATT

freq

uenc

y >

>

time >>

window shape/duration?

time >>

wei

ght

32 of 36


Previously we asked what duration and shape is the metric-window in time.

Now we ask what duration and shape is the metric-window in frequency.

/t/ is defined by sharp onset burst 2->8 kHz (Régnier & Allen, 2008)

template matching over restricted areas of the frequency domain

33 of 36


Frequency-dependent suppression: Feedback from efferent system appears to be

fairly narrowly tuned fall-off in the effect of efferent-induced threshold

shift at low BFs [data from cat, Guinan & Gifford (1988)]

improves representation of low-frequency speech structure when efferent attenuation is high

Modelling implications: Need no longer be a pooled auditory nerve

(STEP) response for metric/map to attenuation Each channel can react quasi-independently to

the audio context it hears

34 of 36

Within-channel modelfr

eque

ncy

>>

MPR ATTMPR ATT

MPR ATTMPR ATT

MPR ATTMPR ATT

MPR ATTMPR ATT

time >>

time >>

wei

ght

window shapes/durations?

35 of 36

Questions

Is there a time-analogy to the frequency gaps in 8-band stimuli?

- imposing gaps so that bits are missing from the freq/time pattern in the context window.

- might allow an importance weighting for time-bands like for the frequency bands.

36 of 36

Implication?

What happens with a silent context?

Physiology predicts that efferent system is not activated

Model predicts small dynamic range,- maximum mean/peak ratio- high efferent attenuation- low category boundary (more stirs)

specifically, if (when) context is shorter than metric window:- should we shorten the metric window?- zero pad the utterances?- count previous trial as context?

37 of 36

Thanks

Tony Watkins, Simon Makin and Andrew Raimond of Reading University for all the data.

Ray Meddis and Robert Ferry of Essex University for the DRNL program code.

Kalle Palomäki, Hynek Hermansky and Roger Moore for discussion.

The end

Modelling compensation for reverberation: work done and planned Amy Beeston and Guy J. Brown Department of Computer Science University of Sheffield EPSRC.

Documents

model adaptation

channelauditory nerve

peak ratio mpr

frequency channels

restoration of dynamic

sourcereceiver distance

sourcereceiver distance

dynamic rangereflections