1 Digital Audio Signal Processing DASP Lecture-3: Noise Reduction-I Single-Channel Noise Reduction Marc Moonen Dept. E.E./ESAT-STADIUS, KU Leuven [email protected]homes.esat.kuleuven.be/~moonen/ Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 2 Single-Channel Noise Reduction • Microphone signal is • Goal: Estimate s[k] based on y[k] • Applications: Speech enhancement in conferencing, handsfree telephony, hearing aids, … Digital audio restoration desired signal estimate desired signal s[k] noise signal(s) ? ] [k s ⌢ y[k] ] [ ] [ ] [ k n k s k y + = desired signal contribution noise contribution
17
Embed
Digital Audio Signal Processing DASP - KU Leuvendspuser/dasp... · Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 4 Overview • Spectral Subtraction
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 5
Spectral Subtraction Methods: Basics
• Signal chopped into `frames’ (e.g. 10..20msec), for each frame a frequency domain representation (=spectrum) is
(i-th frame) • However, as speech signal is an on/off signal, some frames have
speech +noise, i.e.
some frames have noise only, i.e. • A speech detection algorithm (a.k.a. ‘voice activity detection’, VAD)
is needed to distinguish between these 2 types of frames (based on energy/dynamic range/statistical properties,…)
][][][ knksky +=
)()()( ωωω iii NSY +=
Yi (ω) = Si (ω)+ Ni (ω) framei ∈ { speech+ noise' frames}
frames} only'-noise{`frame )( 0 )( ∈+= iii NY ωω
Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 6
Spectral Subtraction Methods: Basics
• Definition: µ(ω) = average magnitude of noise spectrum
• Assumption: noise characteristics change slowly, hence estimate µ(ω) by (long-time) averaging over noise-only frames
• Estimate clean speech spectrum Si(ω) (for each frame), using
corrupted speech spectrum Yi(ω) (for each frame, i.e. short-time estimate) + estimated µ(ω):
based on `gain function’
)()()(ˆ ωωω iii YGS =
))(ˆ),(()( ωµωω ii YfG =
µ(ω) = 1# noise-only frames
Yi (ω)noise-only frames∑
})({)( ωωµ iNE=
4
Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 7
Spectral Subtraction Methods: Basics
• PS: Applying a gain function like
…can improve signal-to-noise ratio (SNR) of the signal as a whole (i.e. all radial frequencies)
…but does not improve the SNR for a particular radial frequency (i.e. speech and noise are equally scaled)
…hence impact on speech intelligibility is found to be minimal (or not-existing)
…but ‘listening comfort’ is said to be improved
For true SNR & speech intelligibility improvement, see multi-channel noise reduction (Lecture 4-5)
)()()(ˆ ωωω iii YGS =
Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 8
Spectral Subtraction: Gain Functions
5
Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 9
Spectral Subtraction: Gain Functions
• Example 1: Ephraim-Malah Suppression Rule (EMSR) with:
• This corresponds to a MMSE (*) estimation of the ‘speech spectral amplitude’
|Si(ω)| based on observation Yi(ω) ( estimate equal to E{ |Si(ω)| | Yi(ω) } ) assuming Gaussian a priori distributions for Si(ω) and Ni(ω) [Ephraim & Malah 1984]
• Similar formula for MMSE ‘log-spectral amplitude’ estimation [Ephraim & Malah 1985]
Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 10
Spectral Subtraction: Gain Functions
• Example 2: Magnitude Subtraction – Signal model:
– Estimation of clean speech spectrum:
– PS: half-wave rectification
)(,)()()()(
ωθω
ωωωiyj
i
iii
eYNSY
=
+=
[ ]
)()()(ˆ1
)(ˆ)()(ˆ
)(
)(,
ωωωµ
ωµωω
ω
ωθ
i
G
i
jii
YY
eYS
i
iy
!"!#$ ⎥⎥⎦
⎤
⎢⎢⎣
⎡−=
−=
))(,0max()( ωω ii GG ⇐
6
Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 11
Spectral Subtraction: Gain Functions
• Example 3: Wiener Estimation – Linear MMSE estimation: find linear filter Gi(ω) to minimize MSE – Solution:
Assume speech s[k] and noise n[k] are uncorrelated, then...
– PS: half-wave rectification
2
2
2
22
,
,,
,
,
)()(ˆ1
)(
)(ˆ)()(
)()()()(
)(ω
ωµ
ω
ωµωω
ωω
ωω
ωii
i
iyy
inniyy
iyy
issi
YYY
PPP
PP
G −=−
=−
==
{ }{ } )(
)()().()().()(
,
,
ω
ω
ωωωω
ωiyy
isy
ii
iii P
PYYEYSEG == çcross-correlation in i-th frame
çauto-correlation in i-th frame
⎪⎪⎭
⎪⎪⎬
⎫
⎪⎪⎩
⎪⎪⎨
⎧
−=
2)(ˆ
)().()(!"!#$ω
ωωωiS
YGSE iii
))(,0max()( ωω ii GG ⇐
Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 12
Spectral Subtraction: Implementation
→ Short-time Fourier Transform/WOLA (Chapter 2) yn[i] = estimate for Y(ωn ) at time i (i-th frame) N=number of frequency bins (channels) n=0..N-1 D=downsampling factor w[k] and v[k] = length-N analysis and synthesis window (=prototype filter)
→ frames with 50%...66% overlap (i.e. 2-, 3-fold oversampling, N=2D..3D) → subband processing: G[n,i] is gain for ωn at time i (i-th frame)
y[k]
][ˆ ks
sn[i]=G[n, i].yn[i]
Δ2 2
w0 00 w1
w2z−1 0
0 w3z−1
"
#
$$$$$
%
&
'''''
F−1
x A
x B
x C
x D
F v0z−1 0 v2 0
0 v1z−1 0 v3
"
#
$$$$
%
&
''''
Δ2 2 +
yn[i] sn[i]
7
Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 13
Spectral Subtraction: Musical Noise
• Audio demo:
• Artifact: musical noise What? Short-time estimates of |Yi(ω)| fluctuate randomly in noise-only frames, resulting in random gains Gi(ω) → statistical analysis shows that broadband noise is transformed into
signal composed of short-lived tones with randomly distributed frequencies (=musical noise)
][ky magnitude subtraction ][ˆ ks
Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 14
probability that speech is present, given observation
)()()(ˆ ωωω ii YGS =
instantaneous average
Spectral Subtraction: Musical Noise
Solutions? - Magnitude averaging: replace Yi(ω) in calculation of
Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 20
State space model of a time-varying discrete-time system
with v[k] and w[k]: mutually uncorrelated, zero mean, white noises Then: given A[k], B[k], C[k], D[k], V[k], W[k] and input/output-observations u[k],y[k], k=0,1,2,... then Kalman filter produces MMSE estimates of internal states x[k], k=0,1,...
E{ v[k]]w[k]
!
"##
$
%&&. v[k]H w[k]H!"#
$%&} = V[k] 0
0 W[k]
!
"##
$
%&&
PS: will also use shorthand notation here, i.e. xk, yk ,.. instead of x[k], y[k],..
V[k]=V[k]12 .V[k]
T2
= Cholesky/square-root factorization
process noise x[k +1] = A[k].x[k]+B[k].u[k]+ v[k]y[k] = C[k].x[k]+D[k].u[k]+w[k]
!"#
$#measurement
noise
Kalman Filter Recap 1/4
11
Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 21
Kalman Filter Recap 2/4
Definition: = MMSE-estimate of xk using all available data up until time l
• `FILTERING’ = estimate
• `PREDICTION’ = estimate
• `SMOOTHING’ = estimate
xk|l
xk|k
xk|k+n,n > 0
xk|k−n,n > 0
Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 22
Initalization:
è ‘Conventional’ Kalman Filter: For k=0,1,2,.. Given and corresponding error covariance matrix : Step 1: Measurement Update (produces ‘filtered’ estimate) (compare to standard RLS!) Step 2: Time Update (produces ‘1-step prediction’)
è Better: ‘Square Root’ Algorithm
xk|k−1
=error covariance matrix
Pk|k−1
Kalman Filter Recap 3/4
12
Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 23
Kalman Filter Recap 4/4
Kalman Smoother
Estimate states x[1], x[2],…, x[T] based on data u[k], y[k], k = 1,…T,T+1, …, N
è With ‘Conventional’ Kalman Filter: 1. forward run: apply previous equations for k = 1, 2, … N
2. backward run: apply following equations for k = N, N -1, …1
Result: (better) estimates
è With ‘Square Root’ Kalman Filter: Full backsubstitution!
( )( )
]1|1[]1|1[][][]|[]1|[]1[]1|1[]|1[
]1|[ˆ]|[ˆ]1[]1|1[ˆ]|1[ˆ
1 −−−−=
−−−−−−=−
−−−+−−=−
− kkkkkkNkkkkkkNk
kkNkkkkNk
T
T
PAPFFPPFPP
xxFxx
]|[ˆ],...,|1[ˆ NTxNx
Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 24
Kalman Filter Based Noise Reduction
• Assume AR model of speech and noise
• Equivalent state-space model is…
y[k] = microphone signal
s[k] = αns[k − n ]+ gsu[k]n=1
Ns
∑
n[k] = βnn[k − n ]+ gnw[k]n=1
Nn
∑u[k], w[k] = zero mean, unit
variance,white noise
⎩⎨⎧
=
+=+
][][][][]1[
kkykkk
TxcvAxx
13
Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 25
Kalman Filter Based Noise Reduction
with:
xT [k]= s[k − Ns +1] ! s[k] n[k − Nn +1] ! n[k]"#
$%
⎥⎦
⎤⎢⎣
⎡=
⎥⎥⎦
⎤
⎢⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡=
n
sNM
T
n
s
g00g
GCA00A
A ;100100;!"!#$
%!"!#$
%
As =
0 1 ! 0" 0 # 00 ! 0 1αNs αNs−1 ! α1
"
#
$$$$
%
&
''''
; An =
0 1 ! 0" 0 # 00 ! 0 1βNn
βNn−1! β1
"
#
$$$$$
%
&
'''''
[ ]Tkwkuk ][][.][ Gv =
[ ] [ ];00;00 nTns
Ts gg !! == gg
TGGQ .=
s[k]
and
n[k
] are
incl
uded
in s
tate
vec
tor,
he
nce
can
be e
stim
ated
by
Kal
man
Filt
er
Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 26
Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 29
Signal Subspace Methods
• Signal model: • Construct (LxM) Hankel or Toeplitz matrix (L≥M)
with: Y = S + N • Assumptions:
– Clean signal is `rank deficient’: rank(S) = K < M – Clean signal is orthogonal to noise: ST N = 0 – Assume white noise (p.34 for coloured noise): NT N = σ2
noise .IM
y[k]= s[k]+ n[k], k = 0,…,N −1
Y =
y[0] y[1] ! y[M −1]y[1] y[2] ! y[M ]" " "
y[L −1] y[L] ! y[N −1]
"
#
$$$$$
%
&
'''''
Y =
y[M −1] y[M − 2] ! y[0]y[M ] y[M −1] ! y[1]! ! !
y[N −1] y[N − 2] ! y[L −1]
"
#
$$$$$
%
&
'''''
or
Hankel matrix Toeplitz matrix
Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 30
Signal Subspace Methods
• Tool: Singular value decomposition (SVD) of Y
∑1: K largest σi corresponding to ‘signal+noise subspace’ ∑2: M-K smallest σI ≈σnoise corresponding to ‘noise subspace’
• Step-1 : From SVD of Y, find estimate of S – Least squares estimate – Minimum variance estimate
• Step-2 : Construct from
Y =UΣVT = U1 U2"#
$%
Σ1 00 Σ2
"
#&&
$
%''
V1T
V2T
"
#
&&
$
%
''
1,,0],[ −= Nkks … S
S
16
Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 31
Signal Subspace Methods
• Step-1 (v1.0) : Least squares estimator YLS
– Goal: approximate Y by a matrix YLS of rank K
– Solution is obtained by setting M-K smallest singular values to zero (`truncation’)
– Problem: proper determination of order K
minrank(YLS=K )
Y−YLS F
2
S =YLS =U1Σ1V1
Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 32
Signal Subspace Methods
• Step-1 (v2.0): Minimum variance estimator YMV – Find a matrix T that minimizes (assume S is given, then i-th column of T is optimal linear filter for i-th column of S) – Then:
PS : rank-deficiency condition not needed here PS : σnoise has to be known (e.g. estimated during noise-only)
minTYT−S
F
2
SMV =YMV = Y.(YTY)−1YTSTMV! "## $##
= Y ⋅ (VΣ2VT )−1 ⋅ (YTYVΣ2VT! −NTN
σ noise2 I!)
= YUΣVT! ⋅V ⋅diag{σ i
2 −σ noise2
σ i2 } ⋅VT
= U ⋅diag{σ i2 −σ noise
2
σ i
} ⋅VT
(*)
17
Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 33
Signal Subspace Methods
• Step-2 : Signal Reconstruction
Goal: Given , construct
Problem: does not have Hankel/Toeplit structure
Solution: Restore Hankel/Toeplitz structure by arithmetically averaging every anti-diagonal/diagonal of the matrix e.g. Hankel
S 1,...,0],[ −= Nkks
∑=
+−+−
=β
ααβ kkkiis ],2[ˆ
11][ S )2,1max( +−= Liα
)1,min( += iMβ
S
Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 34
Remark: Coloured Noise Case
• PS: What if coloured noise NT N ≠ σ2noiseIM ?
– Estimate N during noise-only periods – Similar procedure, now including ‘pre-whitening’ of
Hankel/Toeplitz matrix prior to SVD and ‘de-whitening’ step after estimation of (details omitted)
• PS: Link with spectral subtraction – Compare (*) p.32 with gain function on p.11 (Wiener) – Interpretation: ‘Spectral subtraction’ in signal dependent
filter bank (analysis=VT, synthesis=V) instead of WOLA