Top Banner

Click here to load reader

E85.2607: Lecture 9 -- Sinusoidal ronw/adst-spring2010/lectures/... Amp Freq Amp Freq Amp Freq Amp (t) 1 Freq (t) 1 Amp (t) 2 Freq (t) 2 Amp (t) N Freq (t) N! E85.2607: Lecture 9 {

Jan 27, 2021

ReportDownload

Documents

others

  • E85.2607: Lecture 9 – Sinusoidal Modeling

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 1 / 23

  • Sinusoidal Modeling/Spectral Modeling Synthesis (SMS)

    Original Spectrum

    Transformed Spectrum

    Transformed Feature

    Original Feature

    Time

    Transformed Sound

    Spectral Analysis

    Feature Extraction Transform.

    Feature Addition

    Spectral Synthesis

    0 100 200 300 400 500

    -1

    -0.5

    0

    0.5

    1

    0 100 200 300 400 500

    -1

    -0.5

    0

    0.5

    1

    0 2000 4000 6000 8000 10000 -140

    -120

    -100

    -80

    -60

    -40

    -20

    0

    0 2000 4000 6000 8000 10000 -140

    -120

    -100

    -80

    -60

    -40

    -20

    0

    0 100 200 300 400 500

    Original Sound

    F e a tu

    re

    Time

    0 100 200 300 400 500

    Time

    Frequency Frequency

    Time

    Similar to phase vocoder, but assumes that signal is sum of sinusoids with smoothly varying parameters:

    x [n] ≈ x̃ [n] = ∑ k

    ak [n] cos(ωk [n])

    Freq. domain representation analogous to Fourier series

    Flexible representation for transformations...

    E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15

    1. Sinusoidal Modeling • Periodic sounds

    ! ridges in spectrogram

    each ridge is a sinusoidal harmonic

    .. with smoothly-varying parameters

    .. an efficient & flexible description?

    2

    Vi ol in .a rc o. ff. A4

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 2 / 23

  • SMS overview

    smoothing window

    FFT

    window generation

    peak detection

    pitch estimationsound

    magnitude spectrum

    phase spectrum

    peak data

    peak continuation

    pitch frequency

    sine frequency

    sine magnitudes

    sine phases

    additive synthesis

    amplitude correction

    FFT

    Residual modeling

    magnitude spectrum

    phase spectrum

    residual spectral data

    smoothing windowwindow

    generation

    sinusoidal component

    residual component

    peak data

    Analysis

    Extract sinusoidal tracks from STFT

    Residual contains portion of signal that is not well modeled by sinusoids

    Synthesis

    Each track controls an oscillator

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 3 / 23

  • Analysis overview

    E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15

    Sinusoidal Peak Picking • Local maxima in DFT frames

    • Quadratic fit for sub-bin resolution

    7

    400 600 800 freq / Hz

    -20

    -10

    0

    10

    20

    400 600 800 freq / Hz

    -10

    -5

    0

    le ve

    l / d

    B

    ph as

    e / r

    ad !

    " !#$#%"&"'()

    (*+

    %(+*,

    time / s

    fre q

    / H z

    le ve

    l / d

    B

    0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.180

    2000

    4000

    6000

    8000

    0 1000 2000 3000 4000 5000 6000 7000 freq / Hz-60

    -40

    -20

    0

    20

    1 Break signal into frame ala STFT

    Be aware of time-frequency tradeoff

    2 Pick peaks in each frame

    Often expect to find peaks in harmonic series due to common pitch Search for common factor

    3 Organize spectral peaks into time-varying sinusoidal tracks

    Expect sinusoids to drift in amplitude and frequency

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 4 / 23

  • Review: Time-frequency tradeoff

    X [k , m] = N−1∑ n=0

    x [n + mL]w [n]e−j 2πkn

    N

    DFT length N

    Window determines freq. resolution

    Must be long enough to resolve harmonics ∼ 2× longest pitch period (∼ 50− 100 ms) Too long → blurred ak [n]

    Hop size L

    typically N/2 or N/4

    small hop → simpler interpolation over time

    good freq. resolution bad time resolution

    bad freq. resolution good time resolution

    FFT

    sine wave, 2,000 Hz Hamming window

    sine wave spectrum

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 5 / 23

  • Peak picking

    Peak detection !  Accuracy on the detection of peak is limited to half a sample. !  We cannot rely solely on zero-padding since it becomes very expensive !  A solution is to combine zero-padding with quadratic interpolation

    Find local maxima in each DFT frame

    Accuracy of peak detection is limited to half a sample

    Can zero pad, but can be expensive

    Alternative: Frequency resolution smaller than one bin using quadratic interpolation

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 6 / 23

  • Peak picking 2

    E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15

    Peak Selection • Don’t want every peak

    just “true” sinusoids threshold?

    local shape - fits

    • Look for stability of frequency & amplitude in successive time frames phase derivative in time/freq

    8

    δ(ω − ω0) ∗W (ejω)

    0 1000 2000 3000 4000 5000 6000 7000 freq / Hz -60

    -40

    -20

    0

    20 le

    ve l /

    d B

    Not all peaks correspond to a stable sinusoid

    Only retain peaks larger than some threshold

    Only retain peaks consistent with harmonic series of underlying pitch

    need pitch tracker, input must be monophonic pitch-synchronous analysis?

    Look for stable parameters in adjacent frames

    e.g. stable phase derivative in time and frequency

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 7 / 23

  • Peak continuation

    Connect peaks in adjacent frames to track sinusoid trajectory

    Lots of different approaches

    e.g. Macaulay-Quatieri approach: Greedily attach peak in current frame to closest peak in next frame

    Ambiguous if large frequency changes

    Peak continuation

    !  Once peaks have been detected, and possibly a fundamental frequency identified, we want to organise peaks into time- varying trajectories.

    !  The sinusoidal model assumes peaks as part of a frequency trajectory.

    !  Peak continuation assigns peaks to a given track

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 8 / 23

  • Track formation heuristics

    E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15

    Track Formation • Connect peaks in adjacent frames to form

    sinusoids can be ambiguous if large frequency changes

    • Unclaimed peak → create new track • No continuation of track → termination

    hysteresis

    9

    time

    fre q

    death

    birth existing tracks

    new peaks

    Unclaimed peak → create new track No continuation of track → termination Lots of other potential rules:

    min track length to avoid spurious peaks max allowable frequency deviation between adjacent frames max silent gap length max number of tracks per frame

    Tricky to implement . . .

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 9 / 23

  • Sinusoidal synthesis

    E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15

    3. Sinusoidal Synthesis • Each sinusoid track

    drives an oscillator

    can interpolate amplitude, frequency samples

    • Faster method synthesizes DFT frames then overlap-add trickier to achieve frequency modulation

    11

    {ak[n],ωk[n]}

    0 0.05 0.1 0.15 0.2500

    600

    700 0

    1

    2

    3

    0 0.05 0.1 0.15 0.2 time / s

    time / sfr eq

    / Hz

    le ve

    l

    -3 -2 -1 0 1 2 3

    ak[n]·cos( k[n]·t)

    k[n]

    ak[n]

    n

    Each sinusoid track {ak [n], ωk [n]} drives an oscillator Interpolate parameters to avoid clicks at frame boundaries

    Faster method: synthesize DFT frames, then overlap-add

    Amp

    Freq

    Amp

    Freq

    Amp

    Freq

    Amp (t)1

    Freq (t)1

    Amp (t)2

    Freq (t)2

    Amp (t)N

    Freq (t)N

    !

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 10 / 23

  • Sinusoid + noise model

    Sinusoids is not a good fit for all types of signals (e.g. noise)

    Sometimes want to retain residual in addition to sinusoids

    E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15

    4. Noise Residual • Some energy is not well fit with sinusoids

    e.g. noisy energy

    • Can just keep it as residual or model it some other way

    • Leads to “sinusoidal + noise” model

    13

    0 1000 2000 3000 4000 5000 6000 7000 freq / Hz

    m ag

    / dB

    -80

    -60

    -40

    -20

    0

    20

    original

    sinusoids

    residual LPC

    x[n] = �

    k

    ak[n]cos(ωk[n]n) + e[n]

    Model residual as white noise passed through time-varying filter

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 11 / 23

  • Sinusoidal subtraction

    original sound x(n)

    synthesized sound with phase matching s(n)

    residual sound e(n) = w(n) x(x(n) - s(n)), n = 0,1, . . ., N - 1

    0.105 0.106 0.107 0.108 0.109 0.11 0.

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.