YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
  • E85.2607: Lecture 9 – Sinusoidal Modeling

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 1 / 23

  • Sinusoidal Modeling/Spectral Modeling Synthesis (SMS)

    OriginalSpectrum

    TransformedSpectrum

    TransformedFeature

    OriginalFeature

    Time

    TransformedSound

    SpectralAnalysis

    FeatureExtraction Transform.

    FeatureAddition

    SpectralSynthesis

    0 100 200 300 400 500

    -1

    -0.5

    0

    0.5

    1

    0 100 200 300 400 500

    -1

    -0.5

    0

    0.5

    1

    0 2000 4000 6000 8000 10000-140

    -120

    -100

    -80

    -60

    -40

    -20

    0

    0 2000 4000 6000 8000 10000-140

    -120

    -100

    -80

    -60

    -40

    -20

    0

    0 100 200 300 400 500

    OriginalSound

    Featu

    re

    Time

    0 100 200 300 400 500

    Time

    Frequency Frequency

    Time

    Similar to phase vocoder, but assumes that signal issum of sinusoids with smoothly varying parameters:

    x [n] ≈ x̃ [n] =∑k

    ak [n] cos(ωk [n])

    Freq. domain representation analogous to Fourier series

    Flexible representation for transformations...

    E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15

    1. Sinusoidal Modeling• Periodic sounds

    ! ridges in spectrogram

    each ridge is a sinusoidal harmonic

    .. with smoothly-varying parameters

    .. an efficient & flexibledescription?

    2

    Violin.arco.ff.A4

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 2 / 23

  • SMS overview

    smoothingwindow

    FFT

    windowgeneration

    peakdetection

    pitchestimationsound

    magnitudespectrum

    phasespectrum

    peakdata

    peakcontinuation

    pitchfrequency

    sine frequency

    sine magnitudes

    sine phases

    additivesynthesis

    amplitudecorrection

    FFT

    Residualmodeling

    magnitudespectrum

    phasespectrum

    residualspectral data

    smoothingwindowwindow

    generation

    sinusoidalcomponent

    residualcomponent

    peakdata

    Analysis

    Extract sinusoidal tracks from STFT

    Residual contains portion of signal that isnot well modeled by sinusoids

    Synthesis

    Each track controls an oscillator

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 3 / 23

  • Analysis overview

    E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15

    Sinusoidal Peak Picking• Local maxima in DFT frames

    • Quadratic fit for sub-bin resolution

    7

    400 600 800 freq / Hz

    -20

    -10

    0

    10

    20

    400 600 800 freq / Hz

    -10

    -5

    0

    leve

    l / d

    B

    phas

    e / r

    ad!

    "!#$#%"&"'()

    (*+

    %(+*,

    time / s

    freq

    / Hz

    leve

    l / d

    B

    0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.180

    2000

    4000

    6000

    8000

    0 1000 2000 3000 4000 5000 6000 7000 freq / Hz-60

    -40

    -20

    0

    20

    1 Break signal into frame ala STFT

    Be aware of time-frequency tradeoff

    2 Pick peaks in each frame

    Often expect to find peaks in harmonic series due to common pitchSearch for common factor

    3 Organize spectral peaks into time-varying sinusoidal tracks

    Expect sinusoids to drift in amplitude and frequency

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 4 / 23

  • Review: Time-frequency tradeoff

    X [k , m] =N−1∑n=0

    x [n + mL]w [n]e−j2πkn

    N

    DFT length N

    Window determines freq. resolution

    Must be long enough to resolve harmonics∼ 2× longest pitch period (∼ 50− 100 ms)Too long → blurred ak [n]

    Hop size L

    typically N/2 or N/4

    small hop → simplerinterpolation over time

    good freq. resolutionbad time resolution

    bad freq. resolutiongood time resolution

    FFT

    sine wave, 2,000 HzHamming window

    sine wave spectrum

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 5 / 23

  • Peak picking

    Peak detection !  Accuracy on the detection of peak is limited to half a sample. !  We cannot rely solely on zero-padding since it becomes very expensive !  A solution is to combine zero-padding with quadratic interpolation

    Find local maxima in each DFT frame

    Accuracy of peak detection is limited to half a sample

    Can zero pad, but can be expensive

    Alternative: Frequency resolution smaller than one bin using quadraticinterpolation

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 6 / 23

  • Peak picking 2

    E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15

    Peak Selection• Don’t want every peak

    just “true” sinusoidsthreshold?

    local shape - fits

    • Look for stabilityof frequency & amplitude in successive time framesphase derivative in time/freq

    8

    δ(ω − ω0) ∗W (ejω)

    0 1000 2000 3000 4000 5000 6000 7000 freq / Hz-60

    -40

    -20

    0

    20le

    vel /

    dB

    Not all peaks correspond to a stable sinusoid

    Only retain peaks larger than some threshold

    Only retain peaks consistent with harmonic series of underlying pitch

    need pitch tracker, input must be monophonicpitch-synchronous analysis?

    Look for stable parameters in adjacent frames

    e.g. stable phase derivative in time and frequency

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 7 / 23

  • Peak continuation

    Connect peaks in adjacent frames totrack sinusoid trajectory

    Lots of different approaches

    e.g. Macaulay-Quatieri approach:Greedily attach peak in currentframe to closest peak in next frame

    Ambiguous if large frequencychanges

    Peak continuation

    !  Once peaks have been detected, and possibly a fundamental frequency identified, we want to organise peaks into time-varying trajectories.

    !  The sinusoidal model assumes peaks as part of a frequency trajectory.

    !  Peak continuation assigns peaks to a given track

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 8 / 23

  • Track formation heuristics

    E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15

    Track Formation• Connect peaks in adjacent frames to form

    sinusoidscan be ambiguous if large frequency changes

    • Unclaimed peak → create new track• No continuation of track → termination

    hysteresis

    9

    time

    freq

    death

    birthexistingtracks

    newpeaks

    Unclaimed peak → create new trackNo continuation of track → terminationLots of other potential rules:

    min track length to avoid spurious peaksmax allowable frequency deviation between adjacent framesmax silent gap lengthmax number of tracks per frame

    Tricky to implement . . .

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 9 / 23

  • Sinusoidal synthesis

    E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15

    3. Sinusoidal Synthesis• Each sinusoid track

    drives an oscillator

    can interpolate amplitude, frequency samples

    • Faster method synthesizes DFT framesthen overlap-addtrickier to achieve frequency modulation

    11

    {ak[n],ωk[n]}

    0 0.05 0.1 0.15 0.2500

    600

    7000

    1

    2

    3

    0 0.05 0.1 0.15 0.2time / s

    time / sfreq

    / Hz

    leve

    l

    -3-2-10123

    ak[n]·cos( k[n]·t)

    k[n]

    ak[n]

    n

    Each sinusoid track {ak [n], ωk [n]} drives an oscillatorInterpolate parameters to avoid clicks at frameboundaries

    Faster method: synthesize DFT frames, thenoverlap-add

    Amp

    Freq

    Amp

    Freq

    Amp

    Freq

    Amp (t)1

    Freq (t)1

    Amp (t)2

    Freq (t)2

    Amp (t)N

    Freq (t)N

    !

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 10 / 23

  • Sinusoid + noise model

    Sinusoids is not a good fit for all types of signals (e.g. noise)

    Sometimes want to retain residual in addition to sinusoids

    E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15

    4. Noise Residual• Some energy is not well fit with sinusoids

    e.g. noisy energy

    • Can just keep it as residualor model it some other way

    • Leads to “sinusoidal + noise” model

    13

    0 1000 2000 3000 4000 5000 6000 7000 freq / Hz

    mag

    / dB

    -80

    -60

    -40

    -20

    0

    20

    original

    sinusoids

    residualLPC

    x[n] =�

    k

    ak[n]cos(ωk[n]n) + e[n]

    Model residual as white noise passed through time-varying filter

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 11 / 23

  • Sinusoidal subtraction

    original soundx(n)

    synthesized soundwith phase matchings(n)

    residual sounde(n) = w(n) x(x(n) - s(n)),n = 0,1, . . ., N - 1

    0.105 0.106 0.107 0.108 0.109 0.11 0.111

    -5000

    0

    5000

    time (sec)

    am

    plit

    ud

    e

    0.105 0.106 0.107 0.108 0.109 0.11 0.111

    -5000

    0

    5000

    time (sec)

    am

    plit

    ud

    e

    0.105 0.106 0.107 0.108 0.109 0.11 0.111

    -5000

    0

    5000

    time (sec)

    am

    plit

    ud

    e

    0 5 10 15 20-100

    -90

    -80

    -70

    -60

    -50

    -40

    -30

    frequency (KHz)

    magnitu

    de

    (dB

    )

    0 5 10 15 20-100

    -90

    -80

    -70

    -60

    -50

    -40

    -30

    frequency (KHz)

    magnitu

    de

    (dB

    )

    b) residual spectrum and its approximation

    a) original spectrum

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 12 / 23

  • Modeling the residual

    original soundx(n)

    synthesized soundwith phase matchings(n)

    residual sounde(n) = w(n) x(x(n) - s(n)),n = 0,1, . . ., N - 1

    0.105 0.106 0.107 0.108 0.109 0.11 0.111

    -5000

    0

    5000

    time (sec)

    am

    plit

    ud

    e

    0.105 0.106 0.107 0.108 0.109 0.11 0.111

    -5000

    0

    5000

    time (sec)

    am

    plit

    ud

    e

    0.105 0.106 0.107 0.108 0.109 0.11 0.111

    -5000

    0

    5000

    time (sec)

    am

    plit

    ud

    e

    0 5 10 15 20-100

    -90

    -80

    -70

    -60

    -50

    -40

    -30

    frequency (KHz)

    magnitu

    de

    (dB

    )

    0 5 10 15 20-100

    -90

    -80

    -70

    -60

    -50

    -40

    -30

    frequency (KHz)

    magnitu

    de

    (dB

    )

    b) residual spectrum and its approximation

    a) original spectrum

    Many options for modeling residual filter parameters

    Can use LPC to approximate shape of residual

    or simply smooth magnitude spectrum ala channel vocoder

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 13 / 23

  • Residual synthesis

    spectral magnitudeapproximation of residual

    random spectral phase

    synthesized sound

    synthesized soundwith window

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 14 / 23

  • Synthesis: Putting it all together

    spectralsine

    generation

    sinefrequencies

    sinemagnitudes

    sinephases

    magnitudespectrum

    polar torectangularconversion

    phasespectrum

    complexspectrum

    polar torectangularconversion

    IFFT

    spectralresidual

    generation

    residualspectral data

    magnitudespectrum

    phasespectrum

    windowgeneration

    synthesiswindow

    outputsound

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 15 / 23

  • Sin + noise - example

    E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15

    Sinusoids + Noise Decomposition• Removing sines reveals noise & transients

    • Different representation approaches...14

    Time

    Freq

    uenc

    yGuitar - original

    0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

    1000

    2000

    3000

    4000

    Time

    Freq

    uenc

    y

    Guitar - sinusoid reconstruction

    0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

    1000

    2000

    3000

    4000

    Time

    Freq

    uenc

    y

    Guitar - residual (original - sines)

    0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

    1000

    2000

    3000

    4000

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 16 / 23

  • Examples

    Orig Sin Residual SMS Transformedflute

    clarinetguitardrumswater

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 17 / 23

    http://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/fl-A6-fr.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/fl-A6-fr.det.sms.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/fl-A6-fr.stoc.sms.http://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/fl-A6-fr.sms.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/clar.wavhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/clar-dr2.wavhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/clar-dre.wavhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/clar-dx2.wavhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/guitar_orig.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/guitar_det.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/guitar_stoch.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/guitar_synth.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/guitar_transf3.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/guitar_transf10.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/drums_synt.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/drums_det.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/drums_synt.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/drum_transf2.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/drum_transf10.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/water.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/water.sms.au

  • Applications

    Loads of applications

    Similar to other TF analysis-synthesis techniquesbut parameters more convenient for some transformations

    can treat spectral shape independent from harmonicscan treat different tracks independently

    Filtering with arbitrary time resolution

    50 10 15 20

    2

    4

    6

    8

    10

    12

    f in kHz

    Timescale modification

    E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15

    Sinusoidal Modification• Sinusoidal description very easy to modify

    e.g. changing time base of sample points

    • Frequency stretchpreserve formant envelope?

    12

    0 1000 2000 3000 40000

    10

    20

    30

    40

    freq / Hz

    leve

    l / d

    B

    0 1000 2000 3000 40000

    10

    20

    30

    40

    freq / Hz

    leve

    l / d

    B

    0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

    1000

    2000

    3000

    4000

    5000

    time / s

    freq

    / Hz

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 18 / 23

  • Applications: Frequency transformations

    Partial-dependent frequency scaling

    !f

    e.g. pseudo inharmonicities of higher partials in piano sound

    Frequency stretching: ω̂k [n] = p ∗ ωk [n]

    e.g. pitch-shift without preserving timbre

    Spectral shape shift

    !f

    Quantize pitch (autotune)E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 19 / 23

  • Still more applications

    Effects we’ve seen before:

    Vibrato modulate ωk [n] with LFOTremolo modulate ak [n] with LFO

    Hoarseness: boost residual relative to sinusoidal components

    Morphing between sounds

    Gender change: shift pitch and formants separately

    Singing voice synthesis/conversion (Vocaloid)

    Separate transients from steady-state harmonics

    . . .

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 20 / 23

    http://www.vocaloid.com/en/index.htmlhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/LEON_Demo_Check_It_Out.mp3http://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/kimi_no_uwasa_128.mp3http://www.ee.columbia.edu/~ronw/e6820/ylt.wavhttp://www.ee.columbia.edu/~ronw/e6820/yltsin.wavhttp://www.ee.columbia.edu/~ronw/e6820/ylt-transients-512.wav

  • Tools: SPEAR

    From Michael Klingbeil: http://www.klingbeil.com/spear/

    E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15

    Examples• Using Michael Klingbeil’s SPEARhttp://www.klingbeil.com/spear/

    4

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 21 / 23

    http://www.klingbeil.com/spear/

  • Bringing it all together

    PVOC X [k , n] = A[k , n] e jω[k,n]

    Magnitude and phase of fixed oscillators

    Source-filter X [k , n] = E [k , n] H[k , n]

    Filter captures spectral shape (smoothed A[k , n])Source is whatever is left - typically pulse traincorresponding to harmonic series

    Sinusoids X [k, n] = Ak [n] cos(ωk [n]) + E [k, n]

    Organize input into oscillators with time-varyingfrequency (harmonic tracks)Sinusoid magnitudes Ak [n] approximate spectralshapeFrequencies ωk [n] encode source information

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 22 / 23

  • Reading

    DAFX Chapter 10 - Spectral Processing

    E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 23 / 23