Page 1
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 1
EE E6820: Speech & Audio Processing & Recognition
Lecture 10:Music Analysis
Music Transcription
Music Summarization
Music Information Retrieval
Music Similarity Browsing
Dan Ellis <[email protected] >http://www.ee.columbia.edu/~dpwe/e6820/
Columbia University Dept. of Electrical EngineeringSpring 2006
1
2
3
4
Page 2
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 2
Music Transcription
• Basic idea: Recover the score
• Is it possible? Why is it hard?
- music students do it... but they are highly trained; know the rules
• Motivations
- for study: what was played?- highly compressed representation (e.g. MIDI)- the ultimate restoration system...
1
Time
Fre
quen
cy
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
1000
2000
3000
4000
Page 3
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 3
Transcription framework
• Recover discrete events to explain signal
- analysis-by-synthesis?
• Exhaustive search?
- would be possible given exact note waveforms - .. or just a 2-dimensional ‘note’ template?
but superposition is not linear in
|
STFT
|
space
• Inference depends on all detected notes
- is this evidence ‘available’ or ‘used’?- full solution is exponentially complex
Note events{tk, pk, ik}
ObservationsX[k,n]synthesis
?
notetemplate
2-Dconvolution
Page 4
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 4
Problems for transcription
• Music is practically worst case!
- note events are often synchronized
→
defeats common onset- notes have harmonic relations (2:3 etc.)
→
collision/interference between harmonics- variety of instruments, techniques, ...
• Listeners are very sensitive to certain errors
- .. and impervious to others
• Apply further constraints
- like our ‘music student’- maybe even the whole score (Scheirer)!
Page 5
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 5
Spectrogram Modeling
• Sinusoid model
- as with synthesis, but signal is more complex
• Break tracks
- need to detect new ‘onset’ at single frequencies
• Group by onset & common harmonicity
- find sets of tracks that start around the same time
- + stable harmonic pattern
• Pass on to constraint-based filtering...
time / s
freq
/ H
z
0 1 2 3 40
500
1000
1500
2000
2500
3000
0 0.5 1 1.5 time / s0
0.020.040.06
Page 6
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 6
Searching for multiple pitches
(Klapuri 2001)
• At each frame:
- estimate dominant
f
0
by checking for harmonics
- cancel it from spectrum- repeat until no
f
0
is prominent
audioframe
Harmonics enhancement Predominant f0 estimation
-20
0
20
40
60
0 1000 2000 3000 4000frq / Hz
leve
l / d
B
-10
0
10
20 0 1000 2000 3000 frq / Hz0
10
dB
0 100 200 300 f0 / Hz0
f0 spectral smoothing
Stop when no more prominent f0s
Subtract& iterate
0 1000 2000 3000 frq/Hz0
10
20
30
dB
Page 7
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 7
Multi-Pitch Extraction Results
(Rob Turetsky)
• After continuity cleanup:
• Captures main notes, plus a lot else
- hand-tuned termination thresholds?
• (Evaluation?)
freq
/ kH
z
0
2
4
6
8
time / sec
f0 /
Hz
0 2 4 6 8 10 12 14 16 180
500
1000
Beatles - Lucy in the Sky with Diamonds - seg 1
MPE Pianoroll
Page 8
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 8
Probabilistic Pitch Estimates
(Goto 2001)
• Generative probabilistic model of spectrum as weighted combination of tone models at different fundamental frequencies:
• ‘Knowledge’ in terms of tone models + prior distributions for
f
0
:
• EM (RT) results:
p x f( )( ) w F m,( ) p x f( ) F m,( )m∑ Fd∫=
x [cent]
x [cent]F
c(t)(1|F,1)
c(t)(1|F,2)
c(t)(2|F,1)
c(t)(3|F,1)
c(t)(4|F,1)
c(t)(2|F,2) c(t)(3|F,2)
c(t)(4|F,2)
F+1200
F+2400F+1902
fundamentalfrequency
p(x | F,1, (t)(F,1))
p(x | F,2, (t)(F,2))(m=2)Tone Model
(m=1)Tone Model
p(x,1 | F,2, (t)(F,2))
µ
µµ
Page 9
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 9
Generative Model Fitting
(Walmsley et al. 1999)
• Generative model of harmonic complexes in the time domain:
• Too many parameters to solve by EM!
→
Use Markov chain Monte Carlo (MCMC)to find good solution
• Results?
di γ iqGi
qbi
qei+
q 1=
Q
∑=samples
voices harmonic
switch harmonic weights
bases
noise
bq
νq
ωq
Γqσ2ωq
σ2e
iHq
i γqii
di
Page 10
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 10
Transcription as Pattern Recognition
(Graham Poliner)
• Existing methods use prior knowledge about the structure of pitched notes
- i.e. we
know
they have regular harmonics
• What if we didn’t know that, but just had examples and features?
- the classic pattern recognition problem
• Could use music signal as evidence for pitch class in a black-box classifier:
- nb: more than one class at once!
• But where can we get labeled training data?
Trainedclassifier
Audio
p("C0"|Audio)p("C#0"|Audio)p("D0"|Audio)p("D#0"|Audio)p("E0"|Audio)p("F0"|Audio)
Page 11
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 11
Ground Truth Data
(Turetsky & Ellis 2003)
• Pattern classifiers need training data
- i.e. need
{signal, note-label}
sets- i.e. MIDI transcripts of real music...already exist?
• Idea: force-align MIDI and original
- can estimate time-warp relationships- recover accurate note events in real music!
"Don't you want me" (Human League), verse1
17 18 19 20 21 22 23 24 25 26
0
2
4
19 20 21 22 23 24 25 26 27 2840
60
80MIDI notesMIDI notes
MIDI ReplicaMIDI Replica
OriginalOriginal
freq
/ kH
zM
IDI #
0
2
4
freq
/ kH
z
time / sec
Page 12
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 12
Features for MIDI alignments
• Features that will match between MIDI replicas and original audio...
• Pitch is key attribute to match
- narrowband spectral features (but: timing...)- emphasize 100 Hz - 2 kHz
• Local spectral variation, not absolute levels
- remove local average & normalize local range
-40
-20
0
20
0 500 1000 1500 2000-2
0
2
4
0 500 1000 1500 2000freq / Hz
leve
l / d
Bno
rmal
ized
OriginalDYWMB: Corresponding Frames @ ts = 20.3
MIDI replica
Page 13
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 13
Alignment example
• Inner-product distance on normalized spectral slices (8192 pt @ 22050 Hz):
200 400 600 800 1000 1200
200
400
600
800
1000
1200
Original / 186 ms frames
DYWMB: Original - MIDI alignmentM
IDI r
eplic
a / 1
86 m
s fr
ames
Page 14
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 14
Extracted training data
• Want labeled examples of notes (in every context) to train pattern recognizer- still perfecting alignment, but an example:
23 240
1
2
72 730
1
2
93 940
1
2
162 1630
1
2
31 32 40 41 48 49
80 81 85 86 88 89
146 147 153 154 158 159
167 168 186 187 191 192
freq
/ kH
z
time / sec
DYWMB: Alignments to MIDI note 57 mapped to Orig Audio
Page 15
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 15
Polyphonic Piano Transcription(Poliner & Ellis 2006)
• Training data from player piano
• Independent classifiers for each note- plus a little HMM smoothing
• Nice results- .. when test data resembles training
i /level / dB
freq
/ pi
tch
0 1 2 3 4 5 6 7 8 9
A1
A2
A3
A4
A5
A6
-20
-10
0
10
20
Page 16
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 16
Outline
Music Transcription
Music Summarization- Segmentation- Identifying repetition- Evaluation
Music Information Retrieval
Music Similarity Browsing
1
2
3
4
Page 17
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 17
Music Summarization
• What does it mean to ‘summarize’?- compact representation of larger entity- maximize ‘information content’- sufficient to recognize known item
• So summarizing music?- short version e.g. <10% duration (< 20s for pop)- sufficient to identify style, artist- e.g. chorus or ‘hook’?
• Why? - browsing existing collection- discovery among unknown works- commerce...
2
Page 18
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 18
Summarization Approach
(with thanks to Beth Logan)
convert to features
label features & segmentto discover song structure
use heuristics to choosekey phrase
0 1 1 2 2 3 3 2 2
0 1 1 2 2 3 3 2 2
song
key phrase
Page 19
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 19
Segmentation
• Find contiguous regions that are internally similar and different from neighbors
• E.g. “self-similarity” matrix (Foote 1997)
- 2D convolution of checkerboard down diagonal= compare fixed windows at every point
0 500 1000 15000
200
400
600
800
1000
1200
1400
1600
1800
100 200 300 400 500
DYWMB - self similarity
time / 183ms frames
time
/ 183
ms
fram
es
Page 20
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 20
BIC segmentation
• Want to use evidence from whole segment, not just local window
• Do ‘significance test’ on every possible division of every possible context
• Eventually, a boundary is found:
L(X;M0)
lastsegmentation point
currentcontext limit
candidateboundary
L(X1;M1) L(X2;M2)
time0 N
log L(X1;M1)L(X2;M2)L(X;M0)
≷ λ2 log(N)∆#(M)BIC:
15 16 17 18 19 20 21 22 23 24 25
-200
-100
0
time / min
BIC
score last
segpoint current
contextlimit
boundary passes BIC
no boundary found
with shorter context
Page 21
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 21
HMM segmentation
• Recall, HMM Viterbi path is joint classification and segmentation- e.g. for singing/accompaniment segmentation
• But: HMM states need to be defined in advance- define a ‘generic set’? (MPEG7)- learn them from the piece to be segmented?
(Chu & Logan 2000, Peeters et. al 2002)
• Result is ‘anonymous’ state sequencecharacteristic of particular piece
# U2-The_Joshua_Tree-01-Where_The_Streets_Have_No_Name 3367717 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 3 3 3 3 3 3 3 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 1
Page 22
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 22
Finding Repeats
• Music frequently repeats main phrases
• Repeats give off-diagonal ridges in Similarity matrix (Bartsch ’01)
• Or: clustering at phrase-level ...
0 500 1000 15000
200
400
600
800
1000
1200
1400
1600
1800
DYWMB - self similarity
time / 183ms frames
time
/ 183
ms
fram
es
Page 23
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 23
Clustering-based summarization(Logan & Chu 2000)
• Find segments in song by greedy clustering:
• Biggest cluster chosen as “key phrase”- large contiguous block taken as example
divide into fixed length segments
calculate distortion between everysegment pair
find pair with lowest distortion
song features
combine thepair
0 1 2 3 1 5 6distortion > threshold
stop
0 1 2 3 4 5 6
0 1 2 3 4 5 6
Page 24
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 24
Evaluating Summaries
• Hard to evaluate:What is the ‘right answer’?- difficult to construct or judge a summary until you
know the song...
• Bartsch & Wakefield: 93 songs, ‘chorus’ hand-marked, 70% frame-level precision-recall- aiming to find chorus/refrain
• Chu & Logan:18 Beatles #1 hits rated by 10 subjectsas Good/Average/Poor- “significantly better than random”
• Without a good metric, how to make choices to improve the algorithm?
Page 25
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 25
Outline
Music Transcription
Music Summarization
Music Information Retrieval- What it could mean- Unsupervised clustering- Learned classification
Music Similarity Browsing
1
2
3
4
Page 26
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 26
Music Information Retrieval
• Text-based searching concepts for music?- “musical Google”- finding a specific item- finding something vague- finding something new
• Significant commercial interest
• Basic idea:Project music into a space where neighbors are “similar”
• (Competition from human labeling)
3
Page 27
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 27
Music IR: Queries & Evaluation
• What is the form of the query?
- Query by Humming- considerable attention, recent demonstrations- need/user base?
• Query by noisy example- “Name that tune” in a noisy bar- Shazam Ltd.: commercial deployment- database access is the hard part?
• Query by multiple examples- “Find me more stuff like this”
• Text queries? (Whitman & Smaragdis 2002)
• Evaluation problems- requires large, shareable music corpus!- requires a well-defined task
Page 28
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 28
Unsupervised Clustering(Rauber, Pampalk, Merkl 2002)
• Map music into an auditory-based space
• Build ‘clusters’ of nearby → similar music- “Self-Organizing Maps”
(Kohonen)
• Look at the results:
- quantitative evaluation?
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
d3-kryptonitelimp-rearranged
limp-showpr-deadcellwildwildwest
fbs-praisekorn-freaklimp-99
rem-endoftheworld
fbs-praisekorn-freaklimp-99
rem-endoftheworld
fbs-praisekorn-freaklimp-99
rem-endoftheworld
fbs-praisekorn-freaklimp-99
rem-endoftheworld
fbs-praisekorn-freaklimp-99
rem-endoftheworld
fbs-praisekorn-freaklimp-99
rem-endoftheworld
fbs-praisekorn-freaklimp-99
rem-endoftheworld
fbs-praisekorn-freaklimp-99
rem-endoftheworld
fbs-praisekorn-freaklimp-99
rem-endoftheworld
fbs-praisekorn-freaklimp-99
rem-endoftheworld
fbs-praisekorn-freaklimp-99
rem-endoftheworld
fbs-praisekorn-freaklimp-99
rem-endoftheworld
fbs-praisekorn-freaklimp-99
rem-endoftheworld
fbs-praisekorn-freaklimp-99
rem-endoftheworld
fbs-praisekorn-freaklimp-99
rem-endoftheworld
fbs-praisekorn-freaklimp-99
rem-endoftheworld
fbs-praisekorn-freaklimp-99
rem-endoftheworld
fbs-praisekorn-freaklimp-99
rem-endoftheworld
fbs-praisekorn-freaklimp-99
rem-endoftheworld
fbs-praisekorn-freaklimp-99
rem-endoftheworld
fbs-praisekorn-freaklimp-99
rem-endoftheworld
fbs-praisekorn-freaklimp-99
rem-endoftheworld
fbs-praisekorn-freaklimp-99
rem-endoftheworld
fbs-praisekorn-freaklimp-99
rem-endoftheworld
fbs-praisekorn-freaklimp-99
rem-endoftheworld
limp-nobodylovespr-neverenough
limp-nobodylovespr-neverenoughlimp-nobodylovespr-neverenough
limp-nobodylovespr-neverenough
limp-nobodylovespr-neverenoughlimp-nobodylovespr-neverenough
limp-nobodylovespr-neverenoughlimp-nobodylovespr-neverenough
limp-nobodylovespr-neverenough
limp-nobodylovespr-neverenoughlimp-nobodylovespr-neverenough
limp-nobodylovespr-neverenough
limp-nobodylovespr-neverenoughlimp-nobodylovespr-neverenough
limp-nobodylovespr-neverenoughlimp-nobodylovespr-neverenough
limp-nobodylovespr-neverenough
limp-nobodylovespr-neverenoughlimp-nobodylovespr-neverenough
limp-nobodylovespr-neverenough
limp-nobodylovespr-neverenoughlimp-nobodylovespr-neverenough
limp-nobodylovespr-neverenoughlimp-nobodylovespr-neverenough
limp-nobodylovespr-neverenough
ga-anneclairega-heaven
limp-wanderingpr-binge
ga-anneclairega-heaven
limp-wanderingpr-binge
ga-anneclairega-heaven
limp-wanderingpr-binge
ga-anneclairega-heaven
limp-wanderingpr-binge
ga-anneclairega-heaven
limp-wanderingpr-binge
ga-anneclairega-heaven
limp-wanderingpr-binge
ga-anneclairega-heaven
limp-wanderingpr-binge
ga-anneclairega-heaven
limp-wanderingpr-binge
ga-anneclairega-heaven
limp-wanderingpr-binge
ga-anneclairega-heaven
limp-wanderingpr-binge
ga-anneclairega-heaven
limp-wanderingpr-binge
ga-anneclairega-heaven
limp-wanderingpr-binge
ga-anneclairega-heaven
limp-wanderingpr-binge
ga-anneclairega-heaven
limp-wanderingpr-binge
ga-anneclairega-heaven
limp-wanderingpr-binge
ga-anneclairega-heaven
limp-wanderingpr-binge
ga-anneclairega-heaven
limp-wanderingpr-binge
ga-anneclairega-heaven
limp-wanderingpr-binge
ga-anneclairega-heaven
limp-wanderingpr-binge
ga-anneclairega-heaven
limp-wanderingpr-binge
ga-anneclairega-heaven
limp-wanderingpr-binge
ga-anneclairega-heaven
limp-wanderingpr-binge
ga-anneclairega-heaven
limp-wanderingpr-binge
ga-anneclairega-heaven
limp-wanderingpr-binge
ga-anneclairega-heaven
limp-wanderingpr-binge
br-punkrocklimp-stalematebr-punkrock
limp-stalematebr-punkrock
limp-stalematebr-punkrock
limp-stalematebr-punkrock
limp-stalematebr-punkrock
limp-stalematebr-punkrock
limp-stalematebr-punkrock
limp-stalematebr-punkrock
limp-stalematebr-punkrock
limp-stalematebr-punkrock
limp-stalematebr-punkrock
limp-stalematebr-punkrock
limp-stalematebr-punkrock
limp-stalematebr-punkrock
limp-stalematebr-punkrock
limp-stalematebr-punkrock
limp-stalematebr-punkrock
limp-stalematebr-punkrock
limp-stalematebr-punkrock
limp-stalematebr-punkrock
limp-stalematebr-punkrock
limp-stalematebr-punkrock
limp-stalematebr-punkrock
limp-stalematebr-punkrock
limp-stalemate limp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lessonlimp-lesson ga-innocentga-time
ga-innocentga-time
ga-innocentga-time
ga-innocentga-time
ga-innocentga-time
ga-innocentga-time
ga-innocentga-time
ga-innocentga-time
ga-innocentga-time
ga-innocentga-time
ga-innocentga-time
ga-innocentga-time
ga-innocentga-time
ga-innocentga-time
ga-innocentga-time
ga-innocentga-time
ga-innocentga-time
ga-innocentga-time
ga-innocentga-time
ga-innocentga-time
ga-innocentga-time
ga-innocentga-time
ga-innocentga-time
ga-innocentga-time
ga-innocentga-time
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
addictga-lie
pinkpanthervm-classicalgas
vm-toccata
verve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetverve-bittersweetga-iwantit
ga-moneymilkparty
ga-iwantitga-moneymilk
party
ga-iwantitga-moneymilk
party
ga-iwantitga-moneymilk
party
ga-iwantitga-moneymilk
party
ga-iwantitga-moneymilk
party
ga-iwantitga-moneymilk
party
ga-iwantitga-moneymilk
party
ga-iwantitga-moneymilk
party
ga-iwantitga-moneymilk
party
ga-iwantitga-moneymilk
party
ga-iwantitga-moneymilk
party
ga-iwantitga-moneymilk
party
ga-iwantitga-moneymilk
party
ga-iwantitga-moneymilk
party
ga-iwantitga-moneymilk
party
ga-iwantitga-moneymilk
party
ga-iwantitga-moneymilk
party
ga-iwantitga-moneymilk
party
ga-iwantitga-moneymilk
party
ga-iwantitga-moneymilk
party
ga-iwantitga-moneymilk
party
ga-iwantitga-moneymilk
party
ga-iwantitga-moneymilk
party
ga-iwantitga-moneymilk
party
nma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousaynma-poisonwhenyousay
backforgoodbigworld
br-anesthesiayoulearn
backforgoodbigworld
br-anesthesiayoulearn
backforgoodbigworld
br-anesthesiayoulearn
backforgoodbigworld
br-anesthesiayoulearn
backforgoodbigworld
br-anesthesiayoulearn
backforgoodbigworld
br-anesthesiayoulearn
backforgoodbigworld
br-anesthesiayoulearn
backforgoodbigworld
br-anesthesiayoulearn
backforgoodbigworld
br-anesthesiayoulearn
backforgoodbigworld
br-anesthesiayoulearn
backforgoodbigworld
br-anesthesiayoulearn
backforgoodbigworld
br-anesthesiayoulearn
backforgoodbigworld
br-anesthesiayoulearn
backforgoodbigworld
br-anesthesiayoulearn
backforgoodbigworld
br-anesthesiayoulearn
backforgoodbigworld
br-anesthesiayoulearn
backforgoodbigworld
br-anesthesiayoulearn
backforgoodbigworld
br-anesthesiayoulearn
backforgoodbigworld
br-anesthesiayoulearn
backforgoodbigworld
br-anesthesiayoulearn
backforgoodbigworld
br-anesthesiayoulearn
backforgoodbigworld
br-anesthesiayoulearn
backforgoodbigworld
br-anesthesiayoulearn
backforgoodbigworld
br-anesthesiayoulearn
backforgoodbigworld
br-anesthesiayoulearn
“Islands of Music”
Page 29
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 29
Genre Classification(Tzanetakis et al. 2001)
• Classifying music into genres would get you some way towards finding “more like this”
• Genre labels are problematic, but they exist
• Real-time visualization of “GenreGram”:
- 9 spectral and 8 rhythm features every 200ms- 15 genres trained on 50 examples each,
single Gaussian model → ~ 60% correct
Page 30
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 30
Artist Classification(Berenzweig et al. 2001)
• Artist label as available stand-in for genre
• Train MLP to classify frames among 21 artists
• Using only “voice” segments:Song-level accuracy improves 56.7% → 64.9%
0 10 20 30 40 50 60 70 80
Boards of CanadaSugarplastic
Belle & SebastianMercury Rev
CorneliusRichard Davies
Dj ShadowMouse on Mars
The Flaming LipsAimee Mann
WilcoXTCBeck
Built to SpillJason Falkner
OvalArto LindsayEric Matthews
The MolesThe Roots
Michael Penntrue voice
time / sec
Track 4 - Arto Lindsay (dynvox=Arto, unseg=Oval)
0 50 100 150 200
Boards of CanadaSugarplastic
Belle & SebastianMercury Rev
CorneliusRichard Davies
Dj ShadowMouse on Mars
The Flaming LipsAimee Mann
WilcoXTCBeck
Built to SpillJason Falkner
OvalArto Lindsay
Eric MatthewsThe MolesThe Roots
Michael Penntrue voice
time / sec
Track 117 - Aimee Mann (dynvox=Aimee, unseg=Aimee)
Page 31
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 31
Artist Similarity
• Artist classes as a basis for overall similarity:Less corrupt than ‘record store genres’?
• But: what is similarity between artists?
- pattern recognition systems give a number...
• Need subjective ground truth:Collected via web site
www.musicseer.com
• Results:- 1800 users, 22,500 judgments collected over 6 months
backstreet_boys
whitney_
new_ron_carter
a
all_saints
annie_lennox
aqua
belinda_carlislers
celine_dionchristina_aguilera
eiffel_65
erasure
miroquai
janet_jacksonjessica_simpson lara_fabian
lauryn_hill
madonna
mariah_carey
nelly_furtado
pet_shop_boys
pr
roxette
sade
wain
sof
spice_girls
toni_braxtone
Page 32
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 32
Outline
Music Transcription
Music Summarization
Music Information Retrieval
Music Similarity Browsing- Anchor space- Playola browser
1
2
3
4
Page 33
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 33
Music Similarity Browsing
• Most interesting problem in music IR is finding new music- is there anything on mp3.com that I would like?
• Need a space where music/artists are arranged according to perceived similarity
• Particularly interested in little-known bands- little or no ‘community data’ (e.g. collab. filtering)- audio-based measures are critical
• Also need models of personal preference- where in the space is stuff I like- relative sensitivity to different dimensions
4
Page 34
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 34
Anchor space
• A classifier trained for one artist (or genre) will respond partially to a similar artist
• A new artist will evoke a particular pattern of responses over a set of classifiers
• We can treat these classifier outputs as a new feature space in which to estimate similarity
• “Anchor space” reflects subjective qualities?
n-dimensionalvector in "Anchor
Space"Anchor
Anchor
Anchor
AudioInput
(Class j)
p(a1|x)
p(a2|x)
p(an|x)
GMMModeling
Conversion to Anchorspace
n-dimensionalvector in "Anchor
Space"Anchor
Anchor
Anchor
AudioInput
(Class i)
p(a1|x)
p(a2|x)
p(an|x)
GMMModeling
Conversion to Anchorspace
SimilarityComputation
KL-d, EMD, etc.
Page 35
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 35
Anchor space visualization
• Comparing 2D projections of per-frame feature points in cepstral and anchor spaces:
- each artist represented by 5GMM- greater separation under MFCCs!- but: relevant information?
third cepstral coef
fifth
cep
stra
l coe
f
madonnabowie
Cepstral Features
1 0.5 0 0.5
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
Country
Ele
ctro
nica
madonnabowie
10
5
Anchor Space Features
15 10 5
15
0
Page 36
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 36
Playola interface ( www.playola.org )
• Browser finds closest matches to single tracks or entire artists in anchor space
• Direct manipulation of anchor space axes
Page 37
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 37
Evaluation
• Are recommendations good or bad?
• Subjective evaluation is the ground truth- .. but subjects don’t know the bands being
recommended- can take a long time to decide if a
recommendation is good
• Measure match to other similarity judgments- e.g. musicseer data:
Top rank agreement
0
10
20
30
40
50
60
70
80
cei cmb erd e3d opn kn2 rnd ANK
%
SrvKnw 4789x3.58
SrvAll 6178x8.93
GamKnw 7410x3.96GamAll 7421x8.92
Page 38
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 38
Summary
• Music transcription:Hard, but some progress
• Music summarization:New, interesting problem
• Music IR:Alternative paradigms, lots of interest
Data-driven machine learning techniquesare valuable in each case
Page 39
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 39
References
Mark A. Bartsch and Gregory H. Wakefield (2001) “To Catch a Chorus: Using Chroma-based Representations for Audio Thumbnailing”, Proc. WASPAA, Mohonk, Oct 2001. http://musen.engin.umich.edu/papers/ bartsch_wakefield_waspaa01_final.pdf
A. Berenzweig, D. Ellis, S. Lawrence (2002). “Using Voice Segments to Improve Artist Classification of Music “, Proc. AES-22 Intl. Conf. on Virt., Synth., and Ent. Audio. Espoo, Finland, June 2002.http://www.ee.columbia.edu/~dpwe/pubs/aes02-aclass.pdf
A. Berenzweig, D. Ellis, S. Lawrence (2002). “Anchor Space for Classification and Similarity Measurement of Music“, Proc. ICME-03, Baltimore, July 2003.http://www.ee.columbia.edu/~dpwe/pubs/icme03-anchor.pdf
J. Foote (1997), “A similarity measure for automatic audio classification”, Proc. AAAI Spring Symposium on Intelligent Integration and Use of Text, Image, Video, and Audio Corpora, March 1997.http://citeseer.nj.nec.com/foote97similarity.html
Masataka Goto (2001), “A Predominant-F0 Estimation Method for CD Recordings: MAP Estimation using EM Algorithm for Adaptive Tone Models”, Proc. ICASSP 2001, Salt Lake City, May 2001.http://staff.aist.go.jp/m.goto/PAPER/ICASSP2001goto.pdf
A. Klapuri, T. Virtanen, A. Eronen, J. Seppänen (2001), “Automatic transcription of musical recordings”, Proc. CRAC workshop, Eurospeech, Denmark, Sep 2001.http://www.cs.tut.fi/sgn/arg/klap/crac2001/crac2001.pdf
Beth Logan and Stephen Chu (2000), “Music summarization using key phrases”, Proc. IEEE ICASSP, Istanbul, June 2000. http://crl.research.compaq.com/publications/techreports/reports/2000-1.pdf
Page 40
E6820 SAPR - Dan Ellis L10 - Music Analysis 2006-04-06 - 40
References (2)
G. Peeters, A. La Burthe, X. Rodet (2002), “Toward automatic music audio summary generation from signal analysis”, Proc. ISMIR-02, Paris, October 2002.http://ismir2002.ircam.fr/proceedings%5C02-FP03-3.pdf
Andreas Rauber, Elias Pampalk and Dieter Merkl (2002), “Using Psychoacoustic models and Self-Organizing Maps to create a hierarchical structuring of music by musical styles”, Proc. ISMIR-02, Paris, October 2002.http://ismir2002.ircam.fr/proceedings%5C02-FP02-4.pdf
G. Tzanetakis, G. Essl, P. Cook (2001), “Automatic Musical Genre Classification of Audio Signals”, Proc. ISMIR-01, Bloomington, October 2001. http://ismir2001.indiana.edu/pdf/tzanetakis.pdf
P.J. Walmsley, S.J. Godsill, and P.J.W. Rayner (1999), “Polyphonic pitch tracking using joint Bayesian estimation of multiple frame parameters”, Proc. WASPAA, Mohonk, Oct 1999. http://www.ee.columbia.edu/~dpwe/papers/WalmGR99-polypitch.pdf
Brian Whitman and Paris Smaragdis (2002), “Combining Musical and Cultural Features for Intelligent Style Detection”, Proc. ISMIR-02, Paris, October 2002.http://ismir2002.ircam.fr/proceedings%5C02-FP02-1.pdf