1 Robust Temporal and Spectral Modeling for Query By Melody Shai Shalev, Hebrew University Yoram Singer, Hebrew University Nir Friedman, Hebrew University.

1

Robust Temporal and Spectral Modeling for

Query By Melody

Shai Shalev, Hebrew University

Yoram Singer, Hebrew University

Nir Friedman, Hebrew University

Shlomo Dubnov, Ben-Gurion University

2Prelude

3 Problem Setting

Database of real recordings

Query: a melody

Find: performances of the queried melody

4

Challenge• Find performances of the queried

melody independent of:– Tempo – Performing instrument – Dynamics – Expression – Accompaniment

5

Related Work• A. Ghias, et al. “Query by humming”

• A. S. Durey and M. A. Clements. “Melody spotting using hidden markov models”

• C. Raphael. “Automatic segmentation of acoustic musical signals using HMMs”

• B. Doval and X. Rodet. “Fundamental frequency estimation using a new harmonic matching method”

6

Overview of Solution

• Employ a statistical framework

• Align a melody to a performance using an explicit tempo modeling

• Employ a maximum likelihood model for the spectrum of a note given the note’s pitch value

• Find the best alignment of a melody to a performance

using dynamic programming

7

Statistical Framework

Query Engine

M)|SP( i

For each recording

find:

A database of real recordings

L1 S,...,S

A melody query

)p,(d),...,p,(dM kk11

Ranked list of

L1 S,...,S

According to

M)|SP( i

8 Melody Modeling TT

M))A(T,|P(S P(T)M)|T,P(S

HiddenVariable

ObservedVariable

Legend:

M)|P(S

M))A(T,|P(S P(T)Tmax

Melody

)p,(d),...,p,(d kk11

Tempo

)t(t k1,...,

Aligned Melody

)p,d(),...,p,d( kkk111 tt

Sound

n1 s,...,s

9

Tempo Modeling

• Sequence of scaling factors (one per note)

• Model tempo as a first order Markov model

k

2i1ii1k1 )T|P(T)P(T)T,...,P(T

• Use log-normal distribution to model conditional probability of tempo

ρ)),(log(T~)T | log(T 1-i1-ii Ν

10 Spectral Modeling

1st harmonic 2nd harmonic

3rd harmonic 4th harmonic

hH

h0h -A)S(

1

11 Spectral Modeling

)()( 00)F( NS

0 500 1000 1500 2000 2500 3000 3500 4000 45000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(Hz)

F(

)

NoiseSignal

0ω

12 Spectral Modeling (cont.)

0 500 1000 1500 2000 2500 3000 3500 4000 45000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(Hz)

F(

)

NoiseSignal

0ω

)(-A)F( 0h

hH

h 1

13

Spectral Modeling (cont.)

• Estimate the amplitude at each harmonyand global variance of the noise using the maximum likelihood principle

• Resulting signal-to-noise likelihood function:

2

0

2

00

)N(

)S(log))|log(P(F

14Finding the best

melody-performance alignment• Recurse over tempo and end-time of the previous note

Dynamic Programming procedure

• Complexity:

)MTO(k 2

#notes Length of Signal

#Possible Tempo values

15

• Queries: 50 melodies from opera arias (from Midi files)

• Database: over 800 performances of opera arias performed by over 50 tenors with full orchestral accompaniment

• Compared our variable-tempo (VT) model vs. fixed-tempo (FT) and locally-fixed-tempo (LFT) models

• Compared our Harmonic with Scaled Noise (HSN) spectral model vs. Harmonic with Independent Noise (HIN) model

Experimental Results

16

Evaluation Measures

Oerr = 0

Cov = 3 - 2

+-

+

-- -

--

Lik

elih

ood

Val

ue

Index of Performancein the ranked list

1 2 3 4 5

3

2

1

1

2

1AvgP

17

Summary of Results

• One Error of VT+HSN: 8%

• Average Precision of VT+HSN: 95%

• Coverage of VT+HSN: 0.21

18 Results

0.7521.670.350.6922.960.38FT

0.7517.940.370.6917.330.43LFT

0.6911.830.460.6510.670.51VT5

Sec.

0.7319.080.360.7119.830.38FT

0.428.150.660.448.100.66LFT

0.193.020.830.191.750.86VT15Sec.

0.7922.460.330.7720.690.34FT

0.485.980.630.465.900.66LFT

0.100.400.920.080.210.95VT25Sec.

OerrCovAvgPOerrCovAvgP

HINHSN

Spectral Distribution Model

19

Precision-Recall

0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Recall

Pre

cisi

on

FT/25LFT/25VT/25

20

Illustration of Segmentation

21

Future Work

• More data • Other genre of music • Alternative spectral distribution models using

supervised learning methods. • Use alignment results for separating a soloist from the accompaniment

1 Robust Temporal and Spectral Modeling for Query By Melody Shai Shalev, Hebrew University Yoram Singer, Hebrew University Nir Friedman, Hebrew University.

Documents

spectral modeling slide

queried melody slide

melody sound slide

prelude slide

bengurion university

dynamic programming

melody query

precisionrecall slide