Digital Audio Signal Processing DASP - KU Leuvendspuser/dasp... · Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 4 Overview • Spectral Subtraction

1

Digital Audio Signal Processing

DASP

Lecture-3: Noise Reduction-I Single-Channel Noise Reduction

Marc Moonen Dept. E.E./ESAT-STADIUS, KU Leuven

[email protected] homes.esat.kuleuven.be/~moonen/

Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 2

Single-Channel Noise Reduction

•  Microphone signal is

•  Goal: Estimate s[k] based on y[k]

•  Applications: Speech enhancement in conferencing, handsfree telephony, hearing aids, … Digital audio restoration

desired signal estimate

desired signal s[k]

noise signal(s)

? ][ks⌢y[k] ][][][ knksky +=

desired signal contribution

noise contribution

2


Single-Channel Noise Reduction

•  Will consider methods that do not rely on a priori information (e.g. expected noise spectrum)

•  Will consider speech applications: s[k] = speech signal Speech is on/off signal, where speech pauzes can be used to estimate noise spectra

desired signal estimate

desired signal s[k]

noise signal(s)

? ][ks⌢y[k] ][][][ knksky +=

desired signal contribution

noise contribution


Overview

•  Spectral Subtraction Methods –  Spectral subtraction basics (=spectral filtering) –  Features: gain functions, implementation, musical

noise,…

•  Iterative Wiener Filter Based Noise Reduction

•  Kalman Filter Based Noise Reduction

•  Signal Subspace Methods

3


Spectral Subtraction Methods: Basics

•  Signal chopped into `frames’ (e.g. 10..20msec), for each frame a frequency domain representation (=spectrum) is

(i-th frame) •  However, as speech signal is an on/off signal, some frames have

speech +noise, i.e.

some frames have noise only, i.e. •  A speech detection algorithm (a.k.a. ‘voice activity detection’, VAD)

is needed to distinguish between these 2 types of frames (based on energy/dynamic range/statistical properties,…)

][][][ knksky +=

)()()( ωωω iii NSY +=

Yi (ω) = Si (ω)+ Ni (ω) framei ∈ { speech+ noise' frames}

frames} only'-noise{`frame )( 0 )( ∈+= iii NY ωω



•  Definition: µ(ω) = average magnitude of noise spectrum

•  Assumption: noise characteristics change slowly, hence estimate µ(ω) by (long-time) averaging over noise-only frames

•  Estimate clean speech spectrum Si(ω) (for each frame), using

corrupted speech spectrum Yi(ω) (for each frame, i.e. short-time estimate) + estimated µ(ω):

based on `gain function’

)()()(ˆ ωωω iii YGS =

))(ˆ),(()( ωµωω ii YfG =

µ(ω) = 1# noise-only frames

Yi (ω)noise-only frames∑

})({)( ωωµ iNE=

4



•  PS: Applying a gain function like

…can improve signal-to-noise ratio (SNR) of the signal as a whole (i.e. all radial frequencies)

…but does not improve the SNR for a particular radial frequency (i.e. speech and noise are equally scaled)

…hence impact on speech intelligibility is found to be minimal (or not-existing)

…but ‘listening comfort’ is said to be improved

For true SNR & speech intelligibility improvement, see multi-channel noise reduction (Lecture 4-5)

)()()(ˆ ωωω iii YGS =


Spectral Subtraction: Gain Functions

5



•  Example 1: Ephraim-Malah Suppression Rule (EMSR) with:

•  This corresponds to a MMSE (*) estimation of the ‘speech spectral amplitude’

|Si(ω)| based on observation Yi(ω) ( estimate equal to E{ |Si(ω)| | Yi(ω) } ) assuming Gaussian a priori distributions for Si(ω) and Ni(ω) [Ephraim & Malah 1984]

•  Similar formula for MMSE ‘log-spectral amplitude’ estimation [Ephraim & Malah 1985]

(*) minimum mean squared error

Gi (ω) =π2

1SNRpost

!

"##

$

%&&

SNRprio

1+SNRprio

!

"##

$

%&&.M SNRpost

SNRprio

1+SNRprio

!

"##

$

%&&

'

())

*

+,,

M[θ ] = e−θ2 (1+θ )I0 (

θ2)+θ I1(

θ2)

"

#$%

&'

SNRpost (ω) =Yi (ω)

2

µ(ω)2

SNRprio (ω) = (1-α)max(SNRpost -1,0)+αGi−1(ω)Yi−1(ω)

2

⌢µ(ω)2

modified Bessel functions

skip formulas



•  Example 2: Magnitude Subtraction –  Signal model:

–  Estimation of clean speech spectrum:

–  PS: half-wave rectification

)(,)()()()(

ωθω

ωωωiyj

i

iii

eYNSY

=

+=

[ ]

)()()(ˆ1

)(ˆ)()(ˆ

)(

)(,

ωωωµ

ωµωω

ω

ωθ

i

G

i

jii

YY

eYS

i

iy

!"!#$ ⎥⎥⎦

⎤

⎢⎢⎣

⎡−=

−=

))(,0max()( ωω ii GG ⇐

6



•  Example 3: Wiener Estimation –  Linear MMSE estimation: find linear filter Gi(ω) to minimize MSE –  Solution:

Assume speech s[k] and noise n[k] are uncorrelated, then...

–  PS: half-wave rectification

2

2

2

22

,

,,

,

,

)()(ˆ1

)(

)(ˆ)()(

)()()()(

)(ω

ωµ

ω

ωµωω

ωω

ωω

ωii

i

iyy

inniyy

iyy

issi

YYY

PPP

PP

G −=−

=−

==

{ }{ } )(

)()().()().()(

,

,

ω

ω

ωωωω

ωiyy

isy

ii

iii P

PYYEYSEG == çcross-correlation in i-th frame

çauto-correlation in i-th frame

⎪⎪⎭

⎪⎪⎬

⎫

⎪⎪⎩

⎪⎪⎨

⎧

−=

2)(ˆ

)().()(!"!#$ω

ωωωiS

YGSE iii

))(,0max()( ωω ii GG ⇐


Spectral Subtraction: Implementation

→  Short-time Fourier Transform/WOLA (Chapter 2) yn[i] = estimate for Y(ωn ) at time i (i-th frame) N=number of frequency bins (channels) n=0..N-1 D=downsampling factor w[k] and v[k] = length-N analysis and synthesis window (=prototype filter)

→  frames with 50%...66% overlap (i.e. 2-, 3-fold oversampling, N=2D..3D) →  subband processing: G[n,i] is gain for ωn at time i (i-th frame)

y[k]

][ˆ ks

sn[i]=G[n, i].yn[i]

Δ2 2

w0 00 w1

w2z−1 0

0 w3z−1

"

#

$$$$$

%

&

'''''

F−1

x A

x B

x C

x D

F v0z−1 0 v2 0

0 v1z−1 0 v3

"

#

$$$$

%

&

''''

Δ2 2 +

yn[i] sn[i]

7


Spectral Subtraction: Musical Noise

•  Audio demo:

•  Artifact: musical noise What? Short-time estimates of |Yi(ω)| fluctuate randomly in noise-only frames, resulting in random gains Gi(ω) →  statistical analysis shows that broadband noise is transformed into

signal composed of short-lived tones with randomly distributed frequencies (=musical noise)

][ky magnitude subtraction ][ˆ ks


probability that speech is present, given observation

)()()(ˆ ωωω ii YGS =

instantaneous average

Spectral Subtraction: Musical Noise

Solutions? -  Magnitude averaging: replace Yi(ω) in calculation of

Gi(ω) by a local average over frames

-  EMSR (see p.9)

-  augment Gi(ω) with soft-decision VAD: Gi(ω) → P(H1 | Yi(ω)). Gi(ω) …

8


Overview


noise,…





Iterative Wiener Filter Based Noise Reduction

Example of signal model-based spectral subtraction…

•  Basic: Wiener filtering based spectral subtraction (p.11), with

(improved) spectra estimation based on parametric models

•  Procedure: 1.  Estimate parameters of a speech model from noisy signal y[k] 2.  Using estimated speech parameters, perform noise reduction

(e.g. Wiener estimation, p. 11) 3.  Re-estimate parameters of speech model from the speech

signal estimate 4.  Iterate 2 & 3

9



white noise generator

pulse train

… …

pitch period voiced

unvoiced

x ∑=

−−M

m

mjme

11

1ωα

sg

speech signal

frequency domain: time domain: = linear prediction parameters

all-pole filter

u[k]

)(1

)(

1

ωα

ωω

Ue

gS M

m

mjm

s

∑=

−−=

∑=

+−=M

msm kugmksks

1

][][][ α

[ ]TMαα !1=α



For each frame (vector) y[m] (i=iteration nr.) 1.  Estimate and

2.  Construct Wiener Filter (p.11)

with:

•  estimated during noise-only periods • 

3. Filter speech frame y[m]

isg , [ ]TiMii ,,1 αα !=α

)()()(..)(ωω

ωω

nnss

ss

PPPG+

==

)(ωnnP

2

1,

,

1)(

∑=

−−

≈M

m

mjim

isss

e

gP

ωα

ω

][ˆ mis

Repeat

until some error criterion is satisfied

10


Overview


noise,…





State space model of a time-varying discrete-time system

with v[k] and w[k]: mutually uncorrelated, zero mean, white noises Then: given A[k], B[k], C[k], D[k], V[k], W[k] and input/output-observations u[k],y[k], k=0,1,2,... then Kalman filter produces MMSE estimates of internal states x[k], k=0,1,...

E{ v[k]]w[k]

!

"##

$

%&&. v[k]H w[k]H!"#

$%&} = V[k] 0

0 W[k]

!

"##

$

%&&

PS: will also use shorthand notation here, i.e. xk, yk ,.. instead of x[k], y[k],..

V[k]=V[k]12 .V[k]

T2

= Cholesky/square-root factorization

process noise x[k +1] = A[k].x[k]+B[k].u[k]+ v[k]y[k] = C[k].x[k]+D[k].u[k]+w[k]

!"#

$#measurement

noise

Kalman Filter Recap 1/4

11



Definition: = MMSE-estimate of xk using all available data up until time l

•  `FILTERING’ = estimate

•  `PREDICTION’ = estimate

•  `SMOOTHING’ = estimate

xk|l

xk|k

xk|k+n,n > 0

xk|k−n,n > 0


Initalization:

è ‘Conventional’ Kalman Filter: For k=0,1,2,.. Given and corresponding error covariance matrix : Step 1: Measurement Update (produces ‘filtered’ estimate) (compare to standard RLS!) Step 2: Time Update (produces ‘1-step prediction’)

è Better: ‘Square Root’ Algorithm

xk|k−1

=error covariance matrix

Pk|k−1


12



Kalman Smoother

Estimate states x[1], x[2],…, x[T] based on data u[k], y[k], k = 1,…T,T+1, …, N

è With ‘Conventional’ Kalman Filter: 1.  forward run: apply previous equations for k = 1, 2, … N

2.  backward run: apply following equations for k = N, N -1, …1

Result: (better) estimates

è With ‘Square Root’ Kalman Filter: Full backsubstitution!

( )( )

]1|1[]1|1[][][]|[]1|[]1[]1|1[]|1[

]1|[ˆ]|[ˆ]1[]1|1[ˆ]|1[ˆ

1 −−−−=

−−−−−−=−

−−−+−−=−

− kkkkkkNkkkkkkNk

kkNkkkkNk

T

T

PAPFFPPFPP

xxFxx

]|[ˆ],...,|1[ˆ NTxNx


Kalman Filter Based Noise Reduction

•  Assume AR model of speech and noise

•  Equivalent state-space model is…

y[k] = microphone signal

s[k] = αns[k − n ]+ gsu[k]n=1

Ns

∑

n[k] = βnn[k − n ]+ gnw[k]n=1

Nn

∑u[k], w[k] = zero mean, unit

variance,white noise

⎩⎨⎧

=

+=+

][][][][]1[

kkykkk

TxcvAxx

13



with:

xT [k]= s[k − Ns +1] ! s[k] n[k − Nn +1] ! n[k]"#

$%

⎥⎦

⎤⎢⎣

⎡=

⎥⎥⎦

⎤

⎢⎢⎣

⎡=⎥

⎦

⎤⎢⎣

⎡=

n

sNM

T

n

s

g00g

GCA00A

A ;100100;!"!#$

%!"!#$

%

As =

0 1 ! 0" 0 # 00 ! 0 1αNs αNs−1 ! α1

"

#

$$$$

%

&

''''

; An =

0 1 ! 0" 0 # 00 ! 0 1βNn

βNn−1! β1

"

#

$$$$$

%

&

'''''

[ ]Tkwkuk ][][.][ Gv =

[ ] [ ];00;00 nTns

Ts gg !! == gg

TGGQ .=

s[k]

and

n[k

] are

incl

uded

in s

tate

vec

tor,

he

nce

can

be e

stim

ated

by

Kal

man

Filt

er



Disadvantages iterative approach: •  complexity •  delay

split signal

in frames

estimate parameters

Kalman Filter

reconstruct signal

gs,i; αn,i gn,i; βn,i

iterations

y[k]

][ks

si[m]

][ˆ min

Iterative algorithm (details omitted)

14



iteration index time index (no iterations)

State Estimator (Kalman Filter)

Parameters Estimator

(Kalman Filter)

D

D

s[k −1| k −1]]1|1[ˆ −− kkn

][ky

]|[ˆ],|[ kknkks

αn[k −1| k −1] βn [k −1| k −1]

gs[k −1| k −1] ]1|1[ˆ −− kkgnαn[k | k] βn [k | k]

],|[ˆ kkgs ]|[ˆ kkgn

ns ˆ,ˆ

αn , βn ns gg ˆ,ˆ

Sequential algorithm (details omitted)


Overview


noise,…




15


Signal Subspace Methods

•  Signal model: •  Construct (LxM) Hankel or Toeplitz matrix (L≥M)

with: Y = S + N •  Assumptions:

–  Clean signal is `rank deficient’: rank(S) = K < M –  Clean signal is orthogonal to noise: ST N = 0 –  Assume white noise (p.34 for coloured noise): NT N = σ2

noise .IM

y[k]= s[k]+ n[k], k = 0,…,N −1

Y =

y[0] y[1] ! y[M −1]y[1] y[2] ! y[M ]" " "

y[L −1] y[L] ! y[N −1]

"

#

$$$$$

%

&

'''''

Y =

y[M −1] y[M − 2] ! y[0]y[M ] y[M −1] ! y[1]! ! !

y[N −1] y[N − 2] ! y[L −1]

"

#

$$$$$

%

&

'''''

or

Hankel matrix Toeplitz matrix



•  Tool: Singular value decomposition (SVD) of Y

∑1: K largest σi corresponding to ‘signal+noise subspace’ ∑2: M-K smallest σI ≈σnoise corresponding to ‘noise subspace’

•  Step-1 : From SVD of Y, find estimate of S –  Least squares estimate –  Minimum variance estimate

•  Step-2 : Construct from

Y =UΣVT = U1 U2"#

$%

Σ1 00 Σ2

"

#&&

$

%''

V1T

V2T

"

#

&&

$

%

''

1,,0],[ −= Nkks … S

S

16



•  Step-1 (v1.0) : Least squares estimator YLS

–  Goal: approximate Y by a matrix YLS of rank K

–  Solution is obtained by setting M-K smallest singular values to zero (`truncation’)

–  Problem: proper determination of order K

minrank(YLS=K )

Y−YLS F

2

S =YLS =U1Σ1V1



•  Step-1 (v2.0): Minimum variance estimator YMV –  Find a matrix T that minimizes (assume S is given, then i-th column of T is optimal linear filter for i-th column of S) –  Then:

PS : rank-deficiency condition not needed here PS : σnoise has to be known (e.g. estimated during noise-only)

minTYT−S

F

2

SMV =YMV = Y.(YTY)−1YTSTMV! "## $##

= Y ⋅ (VΣ2VT )−1 ⋅ (YTYVΣ2VT! −NTN

σ noise2 I!)

= YUΣVT! ⋅V ⋅diag{σ i

2 −σ noise2

σ i2 } ⋅VT

= U ⋅diag{σ i2 −σ noise

2

σ i

} ⋅VT

(*)

17



•  Step-2 : Signal Reconstruction

Goal: Given , construct

Problem: does not have Hankel/Toeplit structure

Solution: Restore Hankel/Toeplitz structure by arithmetically averaging every anti-diagonal/diagonal of the matrix e.g. Hankel

S 1,...,0],[ −= Nkks

∑=

+−+−

=β

ααβ kkkiis ],2[ˆ

11][ S )2,1max( +−= Liα

)1,min( += iMβ

S


Remark: Coloured Noise Case

•  PS: What if coloured noise NT N ≠ σ2noiseIM ?

–  Estimate N during noise-only periods –  Similar procedure, now including ‘pre-whitening’ of

Hankel/Toeplitz matrix prior to SVD and ‘de-whitening’ step after estimation of (details omitted)

•  PS: Link with spectral subtraction –  Compare (*) p.32 with gain function on p.11 (Wiener) –  Interpretation: ‘Spectral subtraction’ in signal dependent

filter bank (analysis=VT, synthesis=V) instead of WOLA

•  Audio Demo:

Subspace Algorithm

S

Digital Audio Signal Processing DASP - KU Leuvendspuser/dasp... · Digital Audio Signal Processing Version 2016-2017 Lecture-3: Noise Reduction-I p. 4 Overview • Spectral Subtraction

Documents