YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: E85.2607: Lecture 8 -- Source-Filter Processing · 2010-04-01 · E85.2607: Lecture 8 { Source-Filter Processing 2010-04-01 18 / 21. Applications - Cross-synthesis/Vocoding freq

E85.2607: Lecture 8 – Source-Filter Processing

E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 1 / 21

Page 2: E85.2607: Lecture 8 -- Source-Filter Processing · 2010-04-01 · E85.2607: Lecture 8 { Source-Filter Processing 2010-04-01 18 / 21. Applications - Cross-synthesis/Vocoding freq

Source-filter analysis/synthesis

Transformation

Analysis

Synthesis

Sourcesignal

Spectralenvelope

n

f

n

n

n1 n2

n

Spectralenvelope

Separate

Source/excitation fine time/frequency structure (e.g. pitch)Filter broad spectral shape (resonances)

Similar to subtractive synthesis

Satisfying physical interpretation for real-world signals

Easier to make sense of than e.g. phase

E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 2 / 21

Page 3: E85.2607: Lecture 8 -- Source-Filter Processing · 2010-04-01 · E85.2607: Lecture 8 { Source-Filter Processing 2010-04-01 18 / 21. Applications - Cross-synthesis/Vocoding freq

Human speech production

Reasonable approximation to speechsignals:

Source is oscillation of vocal chords

e.g. normal speech (varyingpitches) vs whispering

Filtered by vocal tract(throat + tongue + lips)

e.g. “oooh” vs “aaah”resonances = formants

Both are time-varying

E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 3 / 21

Page 4: E85.2607: Lecture 8 -- Source-Filter Processing · 2010-04-01 · E85.2607: Lecture 8 { Source-Filter Processing 2010-04-01 18 / 21. Applications - Cross-synthesis/Vocoding freq

Source filter model

Excitation source

t

tResonance filter

f

0 200 400 600 800 1000

−5

0

5

10

x 10−3 time signal of pred. error e(n)

n !0 2 4 6 8

−100

−80

−60

−40

−20

magnitude spectra |X(f)| and |G" H(f)| in dB

f/kHz !

E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 4 / 21

Page 5: E85.2607: Lecture 8 -- Source-Filter Processing · 2010-04-01 · E85.2607: Lecture 8 { Source-Filter Processing 2010-04-01 18 / 21. Applications - Cross-synthesis/Vocoding freq

Formants in speech

watch thin as a dimeahas

mdnctcl

^

θ zwzh e

III ayε

E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 5 / 21

Page 6: E85.2607: Lecture 8 -- Source-Filter Processing · 2010-04-01 · E85.2607: Lecture 8 { Source-Filter Processing 2010-04-01 18 / 21. Applications - Cross-synthesis/Vocoding freq

How to separate the source and filter?

x(n)

Chan. Voc.LPC

Cepstrum

y(n)e (n)1

H (z)1

H (z)21

Spectral EnvelopeEstimation

Source Signal

Spectral EnvelopeTransformation

Source SignalProcessing

Short-time analysis

For each frame, estimate spectral envelope (filter response)1 Channel vocoder (frequency-domain)2 Linear Predictive Coding (LPC) (time-domain)3 Cepstral analysis

Source signal is whats left over (residual) after “whitening”

E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 6 / 21

Page 7: E85.2607: Lecture 8 -- Source-Filter Processing · 2010-04-01 · E85.2607: Lecture 8 { Source-Filter Processing 2010-04-01 18 / 21. Applications - Cross-synthesis/Vocoding freq

Channel vocoder

Wideband STFT filterbank

but using relatively few filters

Linearly spaced with equalbandwidth (STFT)Logarithmically spaced(constant-Q filter bank)

Take RMS energy in eachfrequency band

x(n)

BP 1

x (n)2BP1

( )2 LP x (n)RMS1

BP 2 ( )2 LP x (n)RMS2

BP k ( )2 LP x (n)RMSk

x (n)2BP2

x (n)2BPk

BP1

f

BP2

BPk

(a)

(b)

BP1

f

BPk

BP2

Octave-spaced channel stacking

Equally-spaced channel stacking

0 1000 2000 3000 4000 5000 6000 7000 8000

−100

−80

−60

−40

−20

0

X(f)/

dB

f/Hz !

Short−time spectrum and spectral envelope

E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 7 / 21

Page 8: E85.2607: Lecture 8 -- Source-Filter Processing · 2010-04-01 · E85.2607: Lecture 8 { Source-Filter Processing 2010-04-01 18 / 21. Applications - Cross-synthesis/Vocoding freq

Channel vocoder using FFT

x(n)

BP 1

x (n)2BP1

( )2 LP x (n)RMS1

BP 2 ( )2 LP x (n)RMS2

BP k ( )2 LP x (n)RMSk

x (n)2BP2

x (n)2BPk

BP1

f

BP2

BPk

(a)

(b)

BP1

f

BPk

BP2

Octave-spaced channel stacking

Equally-spaced channel stacking

0 1000 2000 3000 4000 5000 6000 7000 8000

−100

−80

−60

−40

−20

0

X(f)/

dB

f/Hz !

Short−time spectrum and spectral envelope

Lowpass filter magnitude of each STFT frame

i.e. filter columns of the spectrogram

E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 8 / 21

Page 9: E85.2607: Lecture 8 -- Source-Filter Processing · 2010-04-01 · E85.2607: Lecture 8 { Source-Filter Processing 2010-04-01 18 / 21. Applications - Cross-synthesis/Vocoding freq

Linear predictive coding

Predict next input sample as linear combination of previous samples

Synthesis filter(spectral envelope

model)

Excitation source Sound

a2

x(n) z -1

apa1

e(n)

x(n)

z -1 z -1

_

Filter is described by a few filter coefficients for each frame

xm[n] ≈ x [n] =

p∑k=1

akx [n − k]

Excitation is whats left after filtering (residual aka prediction error)

e[n] = x [n]− x [n] = x [n]−p∑

k=1

akx [n − k]

E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 9 / 21

Page 10: E85.2607: Lecture 8 -- Source-Filter Processing · 2010-04-01 · E85.2607: Lecture 8 { Source-Filter Processing 2010-04-01 18 / 21. Applications - Cross-synthesis/Vocoding freq

LPC analysis/synthesis

x(n)

P(z) x(n)

e(n) y(n)

P(z)

~e(n)

(a) (b)

_

(a) LPC analysis (b) LPC synthesis

P(z) is just an FIR filter: P(z) =∑p

k=1 akz−k

Excitation is still a filtered version of the input:

E (x) = X (z) (1− P(z))

For synthesis, pass (approximate) excitation through the inverse filter:

Y (z) = E (z)H(z)

H(z) =1

1− P(z)

all-pole “autoregressive” (AR) modeling

E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 10 / 21

Page 11: E85.2607: Lecture 8 -- Source-Filter Processing · 2010-04-01 · E85.2607: Lecture 8 { Source-Filter Processing 2010-04-01 18 / 21. Applications - Cross-synthesis/Vocoding freq

LPC - varying filter order

LPC filter H(z) models the spectrum of x [n]Minimizing the energy of the residual e[n] gives optimal coefficients

{ak} = argminak

∑n

(x [n]−

∑k

akx [m − k]

)2

The approximation improves with increasing filter order p

0 2 4 6 8−100

−50

0

50

100spectra of original and LPC filters

|X(f)|/dB

f/kHz !

p=10p=20

p=40p=60p=80p=120

E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 11 / 21

Page 12: E85.2607: Lecture 8 -- Source-Filter Processing · 2010-04-01 · E85.2607: Lecture 8 { Source-Filter Processing 2010-04-01 18 / 21. Applications - Cross-synthesis/Vocoding freq

Estimating LPC parameters

Set derivative of∑

n e2[n] w.r.t. ak zero and solve for ak :

∂ak

∑n

e2[n] = 0

End up with p linear equations involving autocorrelations of x :∑m

x [m]x [m − k] =∑

i

ak

∑m

x [m − i ]x [m − k]

Solve using Levinson-Durbin recursion

E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 12 / 21

Page 13: E85.2607: Lecture 8 -- Source-Filter Processing · 2010-04-01 · E85.2607: Lecture 8 { Source-Filter Processing 2010-04-01 18 / 21. Applications - Cross-synthesis/Vocoding freq

LPC example

0 1000 2000 3000 4000 5000 6000 7000 freq / Hz

time / samp

-60

-40

-20

0

0 50 100 150 200 250 300 350 400

dB

windowed original

original spectrum

LPC residual

residual spectrum

LPC spectrum

-0.3

-0.2

-0.1

0

0.1

Filter poles

z-plane

E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 13 / 21

Page 14: E85.2607: Lecture 8 -- Source-Filter Processing · 2010-04-01 · E85.2607: Lecture 8 { Source-Filter Processing 2010-04-01 18 / 21. Applications - Cross-synthesis/Vocoding freq

Short-time LPC analysis

E4896 Music Signal Processing (Dan Ellis) 2010-02-22 - /16

Short-Time LP Analysis• Solve LPC for each ~20 ms frame

10time / s

freq

/ kHz

0

2

4

6

8

freq

/ kHz

0

2

4

6

8

0.5 1 1.5 2 2.5 3

-1 0 1-1

-0.5

0

0.5

1

12

Real Part

Imag

inary

Par

t

0 0.2 0.4 0.6 0.8 1-15

-10

-5

0

5

10

15

20

E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 14 / 21

Page 15: E85.2607: Lecture 8 -- Source-Filter Processing · 2010-04-01 · E85.2607: Lecture 8 { Source-Filter Processing 2010-04-01 18 / 21. Applications - Cross-synthesis/Vocoding freq

Cepstral analysis

cepstrum = String.reverse(“spec”) + “trum”Entire lexicon of funny anagrams

Insight: source and filter add in the log spectral domain

X (z) = E (z)H(z)

log X (z) = log E (z) + log H(z)

Makes them easy to separate

y(n)=x(n)*h(n)

FFT log|Y(k)|Y(k) Y

^(k)R

IFFT

w(n)

w (n)HP

FFT

Source Envelope

Real Cepstrum

c(n)

w (n)LP

FFT

Spectral Envelope

c (n)h

c (n)x

C (k)=h log|H(k)|

C (k)=x log|X(k)|

E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 15 / 21

Page 16: E85.2607: Lecture 8 -- Source-Filter Processing · 2010-04-01 · E85.2607: Lecture 8 { Source-Filter Processing 2010-04-01 18 / 21. Applications - Cross-synthesis/Vocoding freq

Liftering example

By low-pass “liftering” the cepstrum we obtain the spectral envelope of the signal

E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 16 / 21

Page 17: E85.2607: Lecture 8 -- Source-Filter Processing · 2010-04-01 · E85.2607: Lecture 8 { Source-Filter Processing 2010-04-01 18 / 21. Applications - Cross-synthesis/Vocoding freq

Liftering example 2

Original waveform has excitation finestructure convolved with resonances

DFT shows harmonics modulated byresonances

Log DFT is sum of harmonic ‘comb’ andresonant bumps

IDFT separates out resonant bumps (lowquefrency) and regular, fine structure(‘pitch pulse’)

Selecting low-n cepstrum separatesresonance information (deconvolution /‘liftering’)

0 100 200 300 400-0.2

0

0.2Waveform and min. phase IR

samps

0 1000 2000 30000

10

20abs(dft) and liftered

freq / Hz

freq / Hz0 1000 2000 3000

-40

-20

0

log(abs(dft)) and liftered

0 100 200

0

100

200 real cepstrum and lifter

quefrency

dB

pitch pulse

E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 17 / 21

Page 18: E85.2607: Lecture 8 -- Source-Filter Processing · 2010-04-01 · E85.2607: Lecture 8 { Source-Filter Processing 2010-04-01 18 / 21. Applications - Cross-synthesis/Vocoding freq

Applications - Speech coding

E4896 Music Signal Processing (Dan Ellis) 2010-02-22 - /16

4. LPC Synthesis• LP analysis on ~20ms frames gives

prediction filter and residual recombining them should yield perfectcoding applications further compress

e.g. simple pitch tracker ! “buzz-hiss” encoding

13

A(z) e[n]s[n]

e[n]

f

|1/A(ej!)|

LPC analysis

Represent & encode

Represent & encode

Excitation generator

All-pole filter

Input s[n]

Filter coefficients {ai}

Residual e[n]

Encoder Decoder

t

Output s[n]^e[n]^

H(z) = 1

1 - "aiz

-i

-50

0

50

100

1.3 1.35 1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.75 time / s

16 ms frame boundariesPitch period valuesLow bitrate speech codec used in cell phones is based on LPCQuantize LPC filter parameters, use crude approximation to residual

Many different ways to represent filter params:prediction coefficients {ak}, roots of 1− P(z), line spectral frequenciesSwitch between noise and pulse train for excitation

E4896 Music Signal Processing (Dan Ellis) 2010-02-22 - /16

4. LPC Synthesis• LP analysis on ~20ms frames gives

prediction filter and residual recombining them should yield perfectcoding applications further compress

e.g. simple pitch tracker ! “buzz-hiss” encoding

13

A(z) e[n]s[n]

e[n]

f

|1/A(ej!)|

LPC analysis

Represent & encode

Represent & encode

Excitation generator

All-pole filter

Input s[n]

Filter coefficients {ai}

Residual e[n]

Encoder Decoder

t

Output s[n]^e[n]^

H(z) = 1

1 - "aiz

-i

-50

0

50

100

1.3 1.35 1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.75 time / s

16 ms frame boundariesPitch period values

Use codebook of excitations (CELP: Code Excited Linear Prediction)

E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 18 / 21

Page 19: E85.2607: Lecture 8 -- Source-Filter Processing · 2010-04-01 · E85.2607: Lecture 8 { Source-Filter Processing 2010-04-01 18 / 21. Applications - Cross-synthesis/Vocoding freq

Applications - Cross-synthesis/Vocodingfr

eq /

Hz

freq

/ H

z

0

1000

2000

3000

4000

time / s0 0.2 0.4 0.6 0.8 1 1.2 1.40

1000

2000

3000

4000

Original (mpgr1_sx419)

Noise-excited LPC resynthesis with pole freqs

Reconstruct using excitation from one sound and filter from another

Whisperization: replace excitation with white noise

E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 19 / 21

Page 20: E85.2607: Lecture 8 -- Source-Filter Processing · 2010-04-01 · E85.2607: Lecture 8 { Source-Filter Processing 2010-04-01 18 / 21. Applications - Cross-synthesis/Vocoding freq

Still more applications

E4896 Music Signal Processing (Dan Ellis) 2010-02-22 - /16

LPC Warping• Replacing delays z-1 with allpass elements

warps frequencies but not magnitudes

http://www.ee.columbia.edu/~dpwe/resources/matlab/polewarp/

14

0 0.2 0.4 0.6 0.8 ^0

0.2

0.4

0.6

0.8 = 0.6

= -0.6

z + α

αz + 1

Time

Freq

uenc

yOriginal

0.5 1 1.5 2 2.5 30

2000

4000

6000

8000

Time

Freq

uenc

y

Warped LPC resynth, = -0.2

0.5 1 1.5 2 2.5 30

2000

4000

6000

8000

Process formants independent of pitchPitch-shifting while preserving formantsShift formants while preserving pitch

http://www.ee.columbia.edu/˜dpwe/resources/matlab/polewarp/

Voice transformationPitch-analysis

E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 20 / 21

Page 21: E85.2607: Lecture 8 -- Source-Filter Processing · 2010-04-01 · E85.2607: Lecture 8 { Source-Filter Processing 2010-04-01 18 / 21. Applications - Cross-synthesis/Vocoding freq

Reading

DAFX 9.1 – 9.3 - Source-Filter Processing

E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 21 / 21


Related Documents