E85.2607: Lecture 9 -- Sinusoidal Modelingronw/adst-spring2010/lectures/...Amp Freq Amp Freq Amp Freq Amp (t) 1 Freq (t) 1 Amp (t) 2 Freq (t) 2 Amp (t) N Freq (t) N! E85.2607: Lecture

E85.2607: Lecture 9 – Sinusoidal Modeling

E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 1 / 23

Sinusoidal Modeling/Spectral Modeling Synthesis (SMS)

OriginalSpectrum

TransformedSpectrum

TransformedFeature

OriginalFeature

Time

TransformedSound

SpectralAnalysis

FeatureExtraction Transform.

FeatureAddition

SpectralSynthesis

0 100 200 300 400 500

-1

-0.5

0

0.5

1

0 100 200 300 400 500

-1

-0.5

0

0.5

1

0 2000 4000 6000 8000 10000-140

-120

-100

-80

-60

-40

-20

0

0 2000 4000 6000 8000 10000-140

-120

-100

-80

-60

-40

-20

0

0 100 200 300 400 500

OriginalSound

Featu

re

Time

0 100 200 300 400 500

Time

Frequency Frequency

Time

Similar to phase vocoder, but assumes that signal issum of sinusoids with smoothly varying parameters:

x [n] ≈ x̃ [n] =∑k

ak [n] cos(ωk [n])

Freq. domain representation analogous to Fourier series

Flexible representation for transformations...

E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15

1. Sinusoidal Modeling• Periodic sounds

! ridges in spectrogram

each ridge is a sinusoidal harmonic

.. with smoothly-varying parameters

.. an efficient & flexibledescription?

2

Violin.arco.ff.A4


SMS overview

smoothingwindow

FFT

windowgeneration

peakdetection

pitchestimationsound

magnitudespectrum

phasespectrum

peakdata

peakcontinuation

pitchfrequency

sine frequency

sine magnitudes

sine phases

additivesynthesis

amplitudecorrection

FFT

Residualmodeling

magnitudespectrum

phasespectrum

residualspectral data

smoothingwindowwindow

generation

sinusoidalcomponent

residualcomponent

peakdata

Analysis

Extract sinusoidal tracks from STFT

Residual contains portion of signal that isnot well modeled by sinusoids

Synthesis

Each track controls an oscillator


Analysis overview


Sinusoidal Peak Picking• Local maxima in DFT frames

• Quadratic fit for sub-bin resolution

7

400 600 800 freq / Hz

-20

-10

0

10

20

400 600 800 freq / Hz

-10

-5

0

leve

l / d

B

phas

e / r

ad!

"!#$#%"&"'()

(*+

%(+*,

time / s

freq

/ Hz

leve

l / d

B

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.180

2000

4000

6000

8000

0 1000 2000 3000 4000 5000 6000 7000 freq / Hz-60

-40

-20

0

20

1 Break signal into frame ala STFT

Be aware of time-frequency tradeoff

2 Pick peaks in each frame

Often expect to find peaks in harmonic series due to common pitchSearch for common factor

3 Organize spectral peaks into time-varying sinusoidal tracks

Expect sinusoids to drift in amplitude and frequency


Review: Time-frequency tradeoff

X [k , m] =N−1∑n=0

x [n + mL]w [n]e−j2πkn

N

DFT length N

Window determines freq. resolution

Must be long enough to resolve harmonics∼ 2× longest pitch period (∼ 50− 100 ms)Too long → blurred ak [n]

Hop size L

typically N/2 or N/4

small hop → simplerinterpolation over time

good freq. resolutionbad time resolution

bad freq. resolutiongood time resolution

FFT

sine wave, 2,000 HzHamming window

sine wave spectrum


Peak picking

Peak detection !  Accuracy on the detection of peak is limited to half a sample. !  We cannot rely solely on zero-padding since it becomes very expensive !  A solution is to combine zero-padding with quadratic interpolation

Find local maxima in each DFT frame

Accuracy of peak detection is limited to half a sample

Can zero pad, but can be expensive

Alternative: Frequency resolution smaller than one bin using quadraticinterpolation


Peak picking 2


Peak Selection• Don’t want every peak

just “true” sinusoidsthreshold?

local shape - fits

• Look for stabilityof frequency & amplitude in successive time framesphase derivative in time/freq

8

δ(ω − ω0) ∗W (ejω)

0 1000 2000 3000 4000 5000 6000 7000 freq / Hz-60

-40

-20

0

20le

vel /

dB

Not all peaks correspond to a stable sinusoid

Only retain peaks larger than some threshold

Only retain peaks consistent with harmonic series of underlying pitch

need pitch tracker, input must be monophonicpitch-synchronous analysis?

Look for stable parameters in adjacent frames

e.g. stable phase derivative in time and frequency


Peak continuation

Connect peaks in adjacent frames totrack sinusoid trajectory

Lots of different approaches

e.g. Macaulay-Quatieri approach:Greedily attach peak in currentframe to closest peak in next frame

Ambiguous if large frequencychanges

Peak continuation

!  Once peaks have been detected, and possibly a fundamental frequency identified, we want to organise peaks into time-varying trajectories.

!  The sinusoidal model assumes peaks as part of a frequency trajectory.

!  Peak continuation assigns peaks to a given track


Track formation heuristics


Track Formation• Connect peaks in adjacent frames to form

sinusoidscan be ambiguous if large frequency changes

• Unclaimed peak → create new track• No continuation of track → termination

hysteresis

9

time

freq

death

birthexistingtracks

newpeaks

Unclaimed peak → create new trackNo continuation of track → terminationLots of other potential rules:

min track length to avoid spurious peaksmax allowable frequency deviation between adjacent framesmax silent gap lengthmax number of tracks per frame

Tricky to implement . . .


Sinusoidal synthesis


3. Sinusoidal Synthesis• Each sinusoid track

drives an oscillator

can interpolate amplitude, frequency samples

• Faster method synthesizes DFT framesthen overlap-addtrickier to achieve frequency modulation

11

{ak[n],ωk[n]}

0 0.05 0.1 0.15 0.2500

600

7000

1

2

3

0 0.05 0.1 0.15 0.2time / s

time / sfreq

/ Hz

leve

l

-3-2-10123

ak[n]·cos( k[n]·t)

k[n]

ak[n]

n

Each sinusoid track {ak [n], ωk [n]} drives an oscillatorInterpolate parameters to avoid clicks at frameboundaries

Faster method: synthesize DFT frames, thenoverlap-add

Amp

Freq

Amp

Freq

Amp

Freq

Amp (t)1

Freq (t)1

Amp (t)2

Freq (t)2

Amp (t)N

Freq (t)N

!


Sinusoid + noise model

Sinusoids is not a good fit for all types of signals (e.g. noise)

Sometimes want to retain residual in addition to sinusoids


4. Noise Residual• Some energy is not well fit with sinusoids

e.g. noisy energy

• Can just keep it as residualor model it some other way

• Leads to “sinusoidal + noise” model

13

0 1000 2000 3000 4000 5000 6000 7000 freq / Hz

mag

/ dB

-80

-60

-40

-20

0

20

original

sinusoids

residualLPC

x[n] =�

k

ak[n]cos(ωk[n]n) + e[n]

Model residual as white noise passed through time-varying filter


Sinusoidal subtraction

original soundx(n)

synthesized soundwith phase matchings(n)

residual sounde(n) = w(n) x(x(n) - s(n)),n = 0,1, . . ., N - 1

0.105 0.106 0.107 0.108 0.109 0.11 0.111

-5000

0

5000

time (sec)

am

plit

ud

e

0.105 0.106 0.107 0.108 0.109 0.11 0.111

-5000

0

5000

time (sec)

am

plit

ud

e

0.105 0.106 0.107 0.108 0.109 0.11 0.111

-5000

0

5000

time (sec)

am

plit

ud

e

0 5 10 15 20-100

-90

-80

-70

-60

-50

-40

-30

frequency (KHz)

magnitu

de

(dB

)

0 5 10 15 20-100

-90

-80

-70

-60

-50

-40

-30

frequency (KHz)

magnitu

de

(dB

)

b) residual spectrum and its approximation

a) original spectrum


Modeling the residual

original soundx(n)

synthesized soundwith phase matchings(n)

residual sounde(n) = w(n) x(x(n) - s(n)),n = 0,1, . . ., N - 1

0.105 0.106 0.107 0.108 0.109 0.11 0.111

-5000

0

5000

time (sec)

am

plit

ud

e

0.105 0.106 0.107 0.108 0.109 0.11 0.111

-5000

0

5000

time (sec)

am

plit

ud

e

0.105 0.106 0.107 0.108 0.109 0.11 0.111

-5000

0

5000

time (sec)

am

plit

ud

e

0 5 10 15 20-100

-90

-80

-70

-60

-50

-40

-30

frequency (KHz)

magnitu

de

(dB

)

0 5 10 15 20-100

-90

-80

-70

-60

-50

-40

-30

frequency (KHz)

magnitu

de

(dB

)

b) residual spectrum and its approximation

a) original spectrum

Many options for modeling residual filter parameters

Can use LPC to approximate shape of residual

or simply smooth magnitude spectrum ala channel vocoder


Residual synthesis

spectral magnitudeapproximation of residual

random spectral phase

synthesized sound

synthesized soundwith window


Synthesis: Putting it all together

spectralsine

generation

sinefrequencies

sinemagnitudes

sinephases

magnitudespectrum

polar torectangularconversion

phasespectrum

complexspectrum

polar torectangularconversion

IFFT

spectralresidual

generation

residualspectral data

magnitudespectrum

phasespectrum

windowgeneration

synthesiswindow

outputsound


Sin + noise - example


Sinusoids + Noise Decomposition• Removing sines reveals noise & transients

• Different representation approaches...14

Time

Freq

uenc

yGuitar - original

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

1000

2000

3000

4000

Time

Freq

uenc

y

Guitar - sinusoid reconstruction

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

1000

2000

3000

4000

Time

Freq

uenc

y

Guitar - residual (original - sines)

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

1000

2000

3000

4000


Examples

Orig Sin Residual SMS Transformedflute

clarinetguitardrumswater


http://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/fl-A6-fr.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/fl-A6-fr.det.sms.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/fl-A6-fr.stoc.sms.http://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/fl-A6-fr.sms.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/clar.wavhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/clar-dr2.wavhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/clar-dre.wavhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/clar-dx2.wavhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/guitar_orig.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/guitar_det.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/guitar_stoch.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/guitar_synth.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/guitar_transf3.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/guitar_transf10.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/drums_synt.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/drums_det.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/drums_synt.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/drum_transf2.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/drum_transf10.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/water.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/water.sms.au

Applications

Loads of applications

Similar to other TF analysis-synthesis techniquesbut parameters more convenient for some transformations

can treat spectral shape independent from harmonicscan treat different tracks independently

Filtering with arbitrary time resolution

50 10 15 20

2

4

6

8

10

12

f in kHz

Timescale modification


Sinusoidal Modification• Sinusoidal description very easy to modify

e.g. changing time base of sample points

• Frequency stretchpreserve formant envelope?

12

0 1000 2000 3000 40000

10

20

30

40

freq / Hz

leve

l / d

B

0 1000 2000 3000 40000

10

20

30

40

freq / Hz

leve

l / d

B

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

1000

2000

3000

4000

5000

time / s

freq

/ Hz


Applications: Frequency transformations

Partial-dependent frequency scaling

!f

e.g. pseudo inharmonicities of higher partials in piano sound

Frequency stretching: ω̂k [n] = p ∗ ωk [n]

e.g. pitch-shift without preserving timbre

Spectral shape shift

!f

Quantize pitch (autotune)E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 19 / 23

Still more applications

Effects we’ve seen before:

Vibrato modulate ωk [n] with LFOTremolo modulate ak [n] with LFO

Hoarseness: boost residual relative to sinusoidal components

Morphing between sounds

Gender change: shift pitch and formants separately

Singing voice synthesis/conversion (Vocaloid)

Separate transients from steady-state harmonics

. . .


http://www.vocaloid.com/en/index.htmlhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/LEON_Demo_Check_It_Out.mp3http://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/kimi_no_uwasa_128.mp3http://www.ee.columbia.edu/~ronw/e6820/ylt.wavhttp://www.ee.columbia.edu/~ronw/e6820/yltsin.wavhttp://www.ee.columbia.edu/~ronw/e6820/ylt-transients-512.wav

Tools: SPEAR

From Michael Klingbeil: http://www.klingbeil.com/spear/


Examples• Using Michael Klingbeil’s SPEARhttp://www.klingbeil.com/spear/

4


http://www.klingbeil.com/spear/

Bringing it all together

PVOC X [k , n] = A[k , n] e jω[k,n]

Magnitude and phase of fixed oscillators

Source-filter X [k , n] = E [k , n] H[k , n]

Filter captures spectral shape (smoothed A[k , n])Source is whatever is left - typically pulse traincorresponding to harmonic series

Sinusoids X [k, n] = Ak [n] cos(ωk [n]) + E [k, n]

Organize input into oscillators with time-varyingfrequency (harmonic tracks)Sinusoid magnitudes Ak [n] approximate spectralshapeFrequencies ωk [n] encode source information


Reading

DAFX Chapter 10 - Spectral Processing


E85.2607: Lecture 9 -- Sinusoidal Modelingronw/adst-spring2010/lectures/...Amp Freq Amp Freq Amp Freq Amp (t) 1 Freq (t) 1 Amp (t) 2 Freq (t) 2 Amp (t) N Freq (t) N! E85.2607: Lecture

Documents

E85.2607: Lecture 9 -- Sinusoidal Modelingronw/adst-spring2010/lectures/...Amp Freq Amp Freq Amp Freq Amp (t) 1 Freq (t) 1 Amp (t) 2 Freq (t) 2 Amp (t) N Freq (t) N! E85.2607: Lecture