E85.2607: Lecture 8 -- Source-Filter Processing · 2010-04-01 · E85.2607: Lecture 8 { Source-Filter Processing 2010-04-01 18 / 21. Applications - Cross-synthesis/Vocoding freq

E85.2607: Lecture 8 – Source-Filter Processing

E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 1 / 21

Source-filter analysis/synthesis

Transformation

Analysis

Synthesis

Sourcesignal

Spectralenvelope

n

f

n

n

n1 n2

n

Spectralenvelope

Separate

Source/excitation fine time/frequency structure (e.g. pitch)Filter broad spectral shape (resonances)

Similar to subtractive synthesis

Satisfying physical interpretation for real-world signals

Easier to make sense of than e.g. phase


Human speech production

Reasonable approximation to speechsignals:

Source is oscillation of vocal chords

e.g. normal speech (varyingpitches) vs whispering

Filtered by vocal tract(throat + tongue + lips)

e.g. “oooh” vs “aaah”resonances = formants

Both are time-varying


Source filter model

Excitation source

t

tResonance filter

f

0 200 400 600 800 1000

−5

0

5

10

x 10−3 time signal of pred. error e(n)

n !0 2 4 6 8

−100

−80

−60

−40

−20

magnitude spectra |X(f)| and |G" H(f)| in dB

f/kHz !


Formants in speech

watch thin as a dimeahas

mdnctcl

^

θ zwzh e

III ayε


How to separate the source and filter?

x(n)

Chan. Voc.LPC

Cepstrum

y(n)e (n)1

H (z)1

H (z)21

Spectral EnvelopeEstimation

Source Signal

Spectral EnvelopeTransformation

Source SignalProcessing

Short-time analysis

For each frame, estimate spectral envelope (filter response)1 Channel vocoder (frequency-domain)2 Linear Predictive Coding (LPC) (time-domain)3 Cepstral analysis

Source signal is whats left over (residual) after “whitening”


Channel vocoder

Wideband STFT filterbank

but using relatively few filters

Linearly spaced with equalbandwidth (STFT)Logarithmically spaced(constant-Q filter bank)

Take RMS energy in eachfrequency band

x(n)

BP 1

x (n)2BP1

( )2 LP x (n)RMS1

BP 2 ( )2 LP x (n)RMS2

BP k ( )2 LP x (n)RMSk

x (n)2BP2

x (n)2BPk

BP1

f

BP2

BPk

(a)

(b)

BP1

f

BPk

BP2

Octave-spaced channel stacking

Equally-spaced channel stacking

0 1000 2000 3000 4000 5000 6000 7000 8000

−100

−80

−60

−40

−20

0

X(f)/

dB

f/Hz !

Short−time spectrum and spectral envelope


Channel vocoder using FFT

x(n)

BP 1

x (n)2BP1

( )2 LP x (n)RMS1

BP 2 ( )2 LP x (n)RMS2

BP k ( )2 LP x (n)RMSk

x (n)2BP2

x (n)2BPk

BP1

f

BP2

BPk

(a)

(b)

BP1

f

BPk

BP2

Octave-spaced channel stacking

Equally-spaced channel stacking

0 1000 2000 3000 4000 5000 6000 7000 8000

−100

−80

−60

−40

−20

0

X(f)/

dB

f/Hz !

Short−time spectrum and spectral envelope

Lowpass filter magnitude of each STFT frame

i.e. filter columns of the spectrogram


Linear predictive coding

Predict next input sample as linear combination of previous samples

Synthesis filter(spectral envelope

model)

Excitation source Sound

a2

x(n) z -1

apa1

e(n)

x(n)

z -1 z -1

_

Filter is described by a few filter coefficients for each frame

xm[n] ≈ x [n] =

p∑k=1

akx [n − k]

Excitation is whats left after filtering (residual aka prediction error)

e[n] = x [n]− x [n] = x [n]−p∑

k=1

akx [n − k]


LPC analysis/synthesis

x(n)

P(z) x(n)

e(n) y(n)

P(z)

~e(n)

(a) (b)

_

(a) LPC analysis (b) LPC synthesis

P(z) is just an FIR filter: P(z) =∑p

k=1 akz−k

Excitation is still a filtered version of the input:

E (x) = X (z) (1− P(z))

For synthesis, pass (approximate) excitation through the inverse filter:

Y (z) = E (z)H(z)

H(z) =1

1− P(z)

all-pole “autoregressive” (AR) modeling


LPC - varying filter order

LPC filter H(z) models the spectrum of x [n]Minimizing the energy of the residual e[n] gives optimal coefficients

{ak} = argminak

∑n

(x [n]−

∑k

akx [m − k]

)2

The approximation improves with increasing filter order p

0 2 4 6 8−100

−50

0

50

100spectra of original and LPC filters

|X(f)|/dB

f/kHz !

p=10p=20

p=40p=60p=80p=120


Estimating LPC parameters

Set derivative of∑

n e2[n] w.r.t. ak zero and solve for ak :

∂

∂ak

∑n

e2[n] = 0

End up with p linear equations involving autocorrelations of x :∑m

x [m]x [m − k] =∑

i

ak

∑m

x [m − i ]x [m − k]

Solve using Levinson-Durbin recursion


LPC example

0 1000 2000 3000 4000 5000 6000 7000 freq / Hz

time / samp

-60

-40

-20

0

0 50 100 150 200 250 300 350 400

dB

windowed original

original spectrum

LPC residual

residual spectrum

LPC spectrum

-0.3

-0.2

-0.1

0

0.1

Filter poles

z-plane


Short-time LPC analysis

E4896 Music Signal Processing (Dan Ellis) 2010-02-22 - /16

Short-Time LP Analysis• Solve LPC for each ~20 ms frame

10time / s

freq

/ kHz

0

2

4

6

8

freq

/ kHz

0

2

4

6

8

0.5 1 1.5 2 2.5 3

-1 0 1-1

-0.5

0

0.5

1

12

Real Part

Imag

inary

Par

t

0 0.2 0.4 0.6 0.8 1-15

-10

-5

0

5

10

15

20


Cepstral analysis

cepstrum = String.reverse(“spec”) + “trum”Entire lexicon of funny anagrams

Insight: source and filter add in the log spectral domain

X (z) = E (z)H(z)

log X (z) = log E (z) + log H(z)

Makes them easy to separate

y(n)=x(n)*h(n)

FFT log|Y(k)|Y(k) Y

^(k)R

IFFT

w(n)

w (n)HP

FFT

Source Envelope

Real Cepstrum

c(n)

w (n)LP

FFT

Spectral Envelope

c (n)h

c (n)x

C (k)=h log|H(k)|

C (k)=x log|X(k)|


Liftering example

By low-pass “liftering” the cepstrum we obtain the spectral envelope of the signal


Liftering example 2

Original waveform has excitation finestructure convolved with resonances

DFT shows harmonics modulated byresonances

Log DFT is sum of harmonic ‘comb’ andresonant bumps

IDFT separates out resonant bumps (lowquefrency) and regular, fine structure(‘pitch pulse’)

Selecting low-n cepstrum separatesresonance information (deconvolution /‘liftering’)

0 100 200 300 400-0.2

0

0.2Waveform and min. phase IR

samps

0 1000 2000 30000

10

20abs(dft) and liftered

freq / Hz

freq / Hz0 1000 2000 3000

-40

-20

0

log(abs(dft)) and liftered

0 100 200

0

100

200 real cepstrum and lifter

quefrency

dB

pitch pulse


Applications - Speech coding


4. LPC Synthesis• LP analysis on ~20ms frames gives

prediction filter and residual recombining them should yield perfectcoding applications further compress

e.g. simple pitch tracker ! “buzz-hiss” encoding

13

A(z) e[n]s[n]

e[n]

f

|1/A(ej!)|

LPC analysis

Represent & encode

Represent & encode

Excitation generator

All-pole filter

Input s[n]

Filter coefficients {ai}

Residual e[n]

Encoder Decoder

t

Output s[n]^e[n]^

H(z) = 1

1 - "aiz

-i

-50

0

50

100

1.3 1.35 1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.75 time / s

16 ms frame boundariesPitch period valuesLow bitrate speech codec used in cell phones is based on LPCQuantize LPC filter parameters, use crude approximation to residual

Many different ways to represent filter params:prediction coefficients {ak}, roots of 1− P(z), line spectral frequenciesSwitch between noise and pulse train for excitation


4. LPC Synthesis• LP analysis on ~20ms frames gives

prediction filter and residual recombining them should yield perfectcoding applications further compress

e.g. simple pitch tracker ! “buzz-hiss” encoding

13

A(z) e[n]s[n]

e[n]

f

|1/A(ej!)|

LPC analysis

Represent & encode

Represent & encode

Excitation generator

All-pole filter

Input s[n]

Filter coefficients {ai}

Residual e[n]

Encoder Decoder

t

Output s[n]^e[n]^

H(z) = 1

1 - "aiz

-i

-50

0

50

100

1.3 1.35 1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.75 time / s

16 ms frame boundariesPitch period values

Use codebook of excitations (CELP: Code Excited Linear Prediction)


Applications - Cross-synthesis/Vocodingfr

eq /

Hz

freq

/ H

z

0

1000

2000

3000

4000

time / s0 0.2 0.4 0.6 0.8 1 1.2 1.40

1000

2000

3000

4000

Original (mpgr1_sx419)

Noise-excited LPC resynthesis with pole freqs

Reconstruct using excitation from one sound and filter from another

Whisperization: replace excitation with white noise


http://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/mpgr1_sx419-8k.wav

http://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/mpgr1_sx419-8k-whisper.wav

Still more applications


LPC Warping• Replacing delays z-1 with allpass elements

warps frequencies but not magnitudes

http://www.ee.columbia.edu/~dpwe/resources/matlab/polewarp/

14

0 0.2 0.4 0.6 0.8 ^0

0.2

0.4

0.6

0.8 = 0.6

= -0.6

z + α

αz + 1

Time

Freq

uenc

yOriginal

0.5 1 1.5 2 2.5 30

2000

4000

6000

8000

Time

Freq

uenc

y

Warped LPC resynth, = -0.2

0.5 1 1.5 2 2.5 30

2000

4000

6000

8000

Process formants independent of pitchPitch-shifting while preserving formantsShift formants while preserving pitch

http://www.ee.columbia.edu/˜dpwe/resources/matlab/polewarp/

Voice transformationPitch-analysis


http://www.ee.columbia.edu/~dpwe/resources/matlab/polewarp/sm1_cln.wav

http://www.ee.columbia.edu/~dpwe/resources/matlab/polewarp/sm1_cln_anp2.wav

http://www.ee.columbia.edu/~dpwe/resources/matlab/polewarp/

Reading

DAFX 9.1 – 9.3 - Source-Filter Processing


E85.2607: Lecture 8 -- Source-Filter Processing · 2010-04-01 · E85.2607: Lecture 8 { Source-Filter Processing 2010-04-01 18 / 21. Applications - Cross-synthesis/Vocoding freq

Documents

E85.2607: Lecture 8 -- Source-Filter Processing · 2010-04-01 · E85.2607: Lecture 8 { Source-Filter Processing 2010-04-01 18 / 21. Applications - Cross-synthesis/Vocoding freq