Jan 27, 2021

E85.2607: Lecture 9 – Sinusoidal Modeling

E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 1 / 23

Sinusoidal Modeling/Spectral Modeling Synthesis (SMS)

Original Spectrum

Transformed Spectrum

Transformed Feature

Original Feature

Time

Transformed Sound

Spectral Analysis

Feature Extraction Transform.

Feature Addition

Spectral Synthesis

0 100 200 300 400 500

-1

-0.5

0

0.5

1

0 100 200 300 400 500

-1

-0.5

0

0.5

1

0 2000 4000 6000 8000 10000 -140

-120

-100

-80

-60

-40

-20

0

0 2000 4000 6000 8000 10000 -140

-120

-100

-80

-60

-40

-20

0

0 100 200 300 400 500

Original Sound

F e a tu

re

Time

0 100 200 300 400 500

Time

Frequency Frequency

Time

Similar to phase vocoder, but assumes that signal is sum of sinusoids with smoothly varying parameters:

x [n] ≈ x̃ [n] = ∑ k

ak [n] cos(ωk [n])

Freq. domain representation analogous to Fourier series

Flexible representation for transformations...

E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15

1. Sinusoidal Modeling • Periodic sounds

! ridges in spectrogram

each ridge is a sinusoidal harmonic

.. with smoothly-varying parameters

.. an efficient & flexible description?

2

Vi ol in .a rc o. ff. A4

E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 2 / 23

SMS overview

smoothing window

FFT

window generation

peak detection

pitch estimationsound

magnitude spectrum

phase spectrum

peak data

peak continuation

pitch frequency

sine frequency

sine magnitudes

sine phases

additive synthesis

amplitude correction

FFT

Residual modeling

magnitude spectrum

phase spectrum

residual spectral data

smoothing windowwindow

generation

sinusoidal component

residual component

peak data

Analysis

Extract sinusoidal tracks from STFT

Residual contains portion of signal that is not well modeled by sinusoids

Synthesis

Each track controls an oscillator

E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 3 / 23

Analysis overview

E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15

Sinusoidal Peak Picking • Local maxima in DFT frames

• Quadratic fit for sub-bin resolution

7

400 600 800 freq / Hz

-20

-10

0

10

20

400 600 800 freq / Hz

-10

-5

0

le ve

l / d

B

ph as

e / r

ad !

" !#$#%"&"'()

(*+

%(+*,

time / s

fre q

/ H z

le ve

l / d

B

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.180

2000

4000

6000

8000

0 1000 2000 3000 4000 5000 6000 7000 freq / Hz-60

-40

-20

0

20

1 Break signal into frame ala STFT

Be aware of time-frequency tradeoff

2 Pick peaks in each frame

Often expect to find peaks in harmonic series due to common pitch Search for common factor

3 Organize spectral peaks into time-varying sinusoidal tracks

Expect sinusoids to drift in amplitude and frequency

E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 4 / 23

Review: Time-frequency tradeoff

X [k , m] = N−1∑ n=0

x [n + mL]w [n]e−j 2πkn

N

DFT length N

Window determines freq. resolution

Must be long enough to resolve harmonics ∼ 2× longest pitch period (∼ 50− 100 ms) Too long → blurred ak [n]

Hop size L

typically N/2 or N/4

small hop → simpler interpolation over time

good freq. resolution bad time resolution

bad freq. resolution good time resolution

FFT

sine wave, 2,000 Hz Hamming window

sine wave spectrum

E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 5 / 23

Peak picking

Peak detection ! Accuracy on the detection of peak is limited to half a sample. ! We cannot rely solely on zero-padding since it becomes very expensive ! A solution is to combine zero-padding with quadratic interpolation

Find local maxima in each DFT frame

Accuracy of peak detection is limited to half a sample

Can zero pad, but can be expensive

Alternative: Frequency resolution smaller than one bin using quadratic interpolation

E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 6 / 23

Peak picking 2

E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15

Peak Selection • Don’t want every peak

just “true” sinusoids threshold?

local shape - fits

• Look for stability of frequency & amplitude in successive time frames phase derivative in time/freq

8

δ(ω − ω0) ∗W (ejω)

0 1000 2000 3000 4000 5000 6000 7000 freq / Hz -60

-40

-20

0

20 le

ve l /

d B

Not all peaks correspond to a stable sinusoid

Only retain peaks larger than some threshold

Only retain peaks consistent with harmonic series of underlying pitch

need pitch tracker, input must be monophonic pitch-synchronous analysis?

Look for stable parameters in adjacent frames

e.g. stable phase derivative in time and frequency

E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 7 / 23

Peak continuation

Connect peaks in adjacent frames to track sinusoid trajectory

Lots of different approaches

e.g. Macaulay-Quatieri approach: Greedily attach peak in current frame to closest peak in next frame

Ambiguous if large frequency changes

Peak continuation

! Once peaks have been detected, and possibly a fundamental frequency identified, we want to organise peaks into time- varying trajectories.

! The sinusoidal model assumes peaks as part of a frequency trajectory.

! Peak continuation assigns peaks to a given track

E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 8 / 23

Track formation heuristics

E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15

Track Formation • Connect peaks in adjacent frames to form

sinusoids can be ambiguous if large frequency changes

• Unclaimed peak → create new track • No continuation of track → termination

hysteresis

9

time

fre q

death

birth existing tracks

new peaks

Unclaimed peak → create new track No continuation of track → termination Lots of other potential rules:

min track length to avoid spurious peaks max allowable frequency deviation between adjacent frames max silent gap length max number of tracks per frame

Tricky to implement . . .

E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 9 / 23

Sinusoidal synthesis

E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15

3. Sinusoidal Synthesis • Each sinusoid track

drives an oscillator

can interpolate amplitude, frequency samples

• Faster method synthesizes DFT frames then overlap-add trickier to achieve frequency modulation

11

{ak[n],ωk[n]}

0 0.05 0.1 0.15 0.2500

600

700 0

1

2

3

0 0.05 0.1 0.15 0.2 time / s

time / sfr eq

/ Hz

le ve

l

-3 -2 -1 0 1 2 3

ak[n]·cos( k[n]·t)

k[n]

ak[n]

n

Each sinusoid track {ak [n], ωk [n]} drives an oscillator Interpolate parameters to avoid clicks at frame boundaries

Faster method: synthesize DFT frames, then overlap-add

Amp

Freq

Amp

Freq

Amp

Freq

Amp (t)1

Freq (t)1

Amp (t)2

Freq (t)2

Amp (t)N

Freq (t)N

!

E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 10 / 23

Sinusoid + noise model

Sinusoids is not a good fit for all types of signals (e.g. noise)

Sometimes want to retain residual in addition to sinusoids

E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15

4. Noise Residual • Some energy is not well fit with sinusoids

e.g. noisy energy

• Can just keep it as residual or model it some other way

• Leads to “sinusoidal + noise” model

13

0 1000 2000 3000 4000 5000 6000 7000 freq / Hz

m ag

/ dB

-80

-60

-40

-20

0

20

original

sinusoids

residual LPC

x[n] = �

k

ak[n]cos(ωk[n]n) + e[n]

Model residual as white noise passed through time-varying filter

E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 11 / 23

Sinusoidal subtraction

original sound x(n)

synthesized sound with phase matching s(n)

residual sound e(n) = w(n) x(x(n) - s(n)), n = 0,1, . . ., N - 1

0.105 0.106 0.107 0.108 0.109 0.11 0.

Welcome message from author

This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Related Documents