Modelling compensation for reverberation: work done and planned Amy Beeston and Guy J. Brown Department of Computer Science University of Sheffield EPSRC 12-month meeting, Sheffield: 23rd Oct 2009
Jan 05, 2016
Modelling compensation for reverberation:work done and planned
Amy Beeston and Guy J. Brown
Department of Computer Science
University of Sheffield
EPSRC 12-month meeting, Sheffield: 23rd Oct 2009
2 of 36
Overview
1. Work done
sir-stir framework 3 Across-channel model configurations
2. Work planned
Across-channel model Within-channel model Further questions
3 of 36
Part 1: Work Done
sir-stir framework 3 Across-channel model configurations
Watkins’ sir/stir paradigm
4 of 36
Cat
egor
y bo
unda
ry,
step
Human listeners
Context distance, m100.32
10
5
0
effect ofreverberation
effect ofcompensation
more ‘sir’ responses
more ‘stir’ responses
5 of 36
Efferent auditory processing
Reverberating a speech signal reduces its dynamic range Reflections fill gaps in the temporal envelope
Efferent system helps control dynamic range (Guinan & Gifford, 1988). Could compensation be characterised as restoration of dynamic range?
mean= small valuemean/peak= 0.1216
mean= larger valuemean/peak= 0.2142
dry
reverberated
6 of 36
mean-to-peak ratio (MPR)
measured over some time-window peak does not vary greatly with source-receiver distance mean increases with source-receiver distance MPR = mean/peak
therefore MPR increases with distance
7 of 36
Modelling framework
8 of 36
Stimuli
Watkins, JASA 2005, experiment 5
forward/reversed speech carrierforward/reversed reverberation
fwd rev
fwd
rev
speech carrier
reve
rber
atio
n
Watkins, JASA 2005, experiment 4
reverberate, then flip polarities: noise afterflip polarities, then reverberate: noise before
after before
nois
e
9 of 36
Auditory Periphery
Outer/middle earSimulates human data from Huber et al. (2001)
Basilar membraneDRNL – dual resonance nonlinear filterbank (DRNL)
Originally proposed by Meddis, O’Mard and Lopez-Poveda (2001)
Human parameters from Meddis (2006)
Efferent attenuation introduced by Ferry and Meddis (2007)
Hair cellLinear output between threshold and saturated firing rate (Messing 2007)
Does not model adaptation in the auditory nerve
Bes
t fr
equ
ency
(H
z, lo
g-sp
aced
)
100
8000
100
8000
100
8000
Time
Auditory Nerve STEP
10 of 36
Spectro-temporal excitation patterns
… ok, next you’ll get to click on …{ }sirstir
11 of 36
Efferent attenuation based on dynamic range
Ideato control the amount of efferent attenuation applied in the model according to the dynamic range of the context
Dynamic range measured according to mean-to-peak ratio in AN response
Kurtosis negativedifferentials
offsets mean-to-peakratio
12 of 36
Auditory Nerve response
Across-channel modelthe auditory nerve response is summed across all frequency channels prior to implementation of the efferent system component
Within-channel modelthe auditory nerve response is NOT summed across all frequency channels
13 of 36
Across Channel
the auditory nerve response is summed across all frequency channels prior to implementation of the efferent system component
Σ MPR ATT
freq
uenc
y >
>
time >>
14 of 36
Within Channel
Auditory nerve response in each frequency channel influences the efferent system component
freq
uenc
y >
>
MPR ATTMPR ATT
MPR ATTMPR ATT
MPR ATTMPR ATT
MPR ATTMPR ATT
time >>
Efferent attenuation
MPR
AT
T
Linear map from MPR (of summed AN) to efferent attenuation, ATT ATT turns down the gain on the non-linear pathway of DRNL The rate-intensity curve shifts to the right
15 of 36
16 of 36
Recognition
helps to recover the dip in the temporal envelope corresponding to the ‘t’ closure in ‘stir’
Templates:
sir stir
17 of 36
3 Model configurations
Open loop
Semi-closed loopamount of attenuation is estimated during one second preceding the test-word, and held constant thereafter
Closed loopamount of attenuation is estimated continually in a sliding time window, and updated on a sample-by-sample basis (or at a specified control rate)
18 of 36
Efferent system: Open loop
Open loop
Many simulations were run: the amount of attenuation applied was varied across a range of values (0-30 dB), and the category boundary resulting were recorded in calibration charts.
The ‘best-match’ to human results was found (manually) for each condition.
Near contexts match best with low attenuation values, while far contexts match best with higher attenuation values.
19 of 36
Results: Open loop
Cat
egor
y B
ound
ary
resu
lts
12.5
9.0
22.0
21.5
Attenuation applied (dB)
farnear
Attenuation applied (dB) Attenuation applied (dB)
Attenuation applied (dB)Attenuation applied (dB)
Cal
ibra
tion
curv
es f
or t
unin
g
0, 0.5, … 29.5, 30
0, 0.5, … 29.5, 300, 0.5, … 29.5, 30
0, 0.5, … 29.5, 30
20 of 36
Efferent system: Semi-closed loop
Semi-closed loop
amount of attenuation is estimated during one second preceding the test-word, and held constant thereafter
21 of 36
Semi-closed loop
ATT
…………… ok, next you’ll get to click on ……………{ }sirstir
Examine context within time window to derive a metric value Use metric value to determine the efferent attenuation
22 of 36
Metric:Semi-closed loop MPR
23 of 36
Results:Semi-closed loop
Tuned to match near-near and far-far (fwd fwd) conditions
experiment 5 achieves qualitative (not quantitative) match to human data…
…but experiment 4 conditions do not match well
ATTENUATION=(38.36*MPR)+13.77
fwd rev
forw
ard
reve
rse
speech carrier
reve
rber
atio
n
before after
24 of 36
Efferent system:Closed-loop
Closed loop
amount of attenuation is estimated continually in a sliding time window, and updated on a sample-by-sample basis (or at a specified control rate)
25 of 36
Closed loop
ATTATTATTATTATTATT
…………… ok, next you’ll get to click on ……………{ }sirstir
Examine context within time window to derive a metric value Use metric value to determine the efferent attenuation applied Window slides forward, process repeats…
26 of 36
Metric:Closed loop MPR
Expt. 5
MPR through time
forw
ard
reve
rse
reve
rber
atio
n
fwd revspeech carrier
27 of 36
Closed loop (expt 5)
tuned to ‘best’ match near-near and far-far (fwd fwd) conditions variation possible due to granularity of model (± 0.5)
ATTENUATION=(45*MPR)+18 ATTENUATION=(45*MPR)+19
fwd revspeech carrier fwd revspeech carrier
fwd
rev
reve
rber
atio
n
fwd
rev
reve
rber
atio
n
28 of 36
Closed loop (expt 4)
MPR mapping does not generalise for experiment 4 noise contexts
ATTENUATION=(45*MPR)+18
fwd revspeech carrier
fwd
rev
reve
rber
atio
n
before after
29 of 36
Part 2: Work Planned
Across-channel model
Within-channel model
Further questions
30 of 36
Across-channel model
Practical considerations
Control rate specified to speed up the simulation(usually 1 kHz i.e., attenuation parameter is updated every 1 ms)
Time-window over which to determine metric(usually previous 1 second, different values under investigation at present)
Shape of window(rectangular at present, should have a ‘forgetting function’)
Question to Tony et al.
What data can we use to determine the shape/duration window?
31 of 36
Across-channel model
Σ MPR ATT
freq
uenc
y >
>
time >>
window shape/duration?
time >>
wei
ght
32 of 36
Within-channel model
Previously we asked what duration and shape is the metric-window in time.
Now we ask what duration and shape is the metric-window in frequency.
/t/ is defined by sharp onset burst 2->8 kHz (Régnier & Allen, 2008)
template matching over restricted areas of the frequency domain
33 of 36
Within-channel model
Frequency-dependent suppression: Feedback from efferent system appears to be
fairly narrowly tuned fall-off in the effect of efferent-induced threshold
shift at low BFs [data from cat, Guinan & Gifford (1988)]
improves representation of low-frequency speech structure when efferent attenuation is high
Modelling implications: Need no longer be a pooled auditory nerve
(STEP) response for metric/map to attenuation Each channel can react quasi-independently to
the audio context it hears
34 of 36
Within-channel modelfr
eque
ncy
>>
MPR ATTMPR ATT
MPR ATTMPR ATT
MPR ATTMPR ATT
MPR ATTMPR ATT
time >>
time >>
wei
ght
window shapes/durations?
35 of 36
Questions
Is there a time-analogy to the frequency gaps in 8-band stimuli?
- imposing gaps so that bits are missing from the freq/time pattern in the context window.
- might allow an importance weighting for time-bands like for the frequency bands.
36 of 36
Implication?
What happens with a silent context?
Physiology predicts that efferent system is not activated
Model predicts small dynamic range,- maximum mean/peak ratio- high efferent attenuation- low category boundary (more stirs)
specifically, if (when) context is shorter than metric window:- should we shorten the metric window?- zero pad the utterances?- count previous trial as context?
37 of 36
Thanks
Tony Watkins, Simon Makin and Andrew Raimond of Reading University for all the data.
Ray Meddis and Robert Ferry of Essex University for the DRNL program code.
Kalle Palomäki, Hynek Hermansky and Roger Moore for discussion.
The end