Music Info Extraction - Ellis 2005-10-26 p. /30 1 1. Learning Music 2. Melody Extraction 3. Drum Pattern Modeling 4. Music Similarity Extracting Information from Music Audio Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Engineering, Columbia University, NY USA http://labrosa.ee.columbia.edu /
30
Embed
Extracting Information from Music Audiodpwe/talks/musicie-2005-10.pdf · • Audio → Score very desirable for data compression, searching, learning • Full solution is elusive
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Music Info Extraction - Ellis 2005-10-26 p. /301
1. Learning Music2. Melody Extraction3. Drum Pattern Modeling4. Music Similarity
Extracting Information from Music Audio
Dan EllisLaboratory for Recognition and Organization of Speech and Audio
Dept. Electrical Engineering, Columbia University, NY USA
http://labrosa.ee.columbia.edu/
Music Info Extraction - Ellis 2005-10-26 p. /302
LabROSA Overview
InformationExtraction
MachineLearning
SignalProcessing
Speech
Music Environment
FDLPMeetingturns
Personalaudio
Eigenrhythms
Music Info Extraction - Ellis 2005-10-26 p. /303
1. Learning from Music
• A lot of music data availablee.g. 60G of MP3 ≈ 1000 hr of audio, 15k tracks
• What can we do with it?implicit definition of ‘music’
• Quality vs. quantitySpeech recognition lesson:10x data, 1/10th annotation, twice as useful
• Motivating Applicationsmusic similarity / classificationcomputer (assisted) music generationinsight into music
Music Info Extraction - Ellis 2005-10-26 p. /304
Ground Truth Data
• A lot of unlabeled music data availablemanual annotation is much rarer
• Unsupervised structure discovery possible.. but labels help to indicate what you want
• Weak annotation sourcesartist-level descriptionssymbol sequences without timing (MIDI)errorful transcripts
• Evaluation requires ground truthlimiting factor in Music IR evaluations?
First-place agreement percentage- simple significance test
Music Info Extraction - Ellis 2005-10-26 p. /30
Using SVMs for Artist ID
• Support Vector Machines (SVMs) find hyperplanes in a high-dimensional spacerelies only on matrix of distances between pointsmuch ‘smarter’ than nearest-neighbor/overlapwant diversity of reference vectors...
26
(w x) + b = –1(w x) + b = + 1
x 1
y
y i = +1
w
(w x) + b = 0
x 2
i = – 1
Music Info Extraction - Ellis 2005-10-26 p. /30
Song-Level SVM Artist ID
• Instead of one model per artist/genre, use every training song as an ‘anchor’then SVM finds best support for each artist
27
D
D
D
D
D
D
MFCCs
Art
ist
1A
rtis
t 2
Song Features
DAG SVM
Test Song
Artist
Training
Music Info Extraction - Ellis 2005-10-26 p. /30
Artist ID Results
• ISMIR/MIREX 2005 also evaluated Artist ID• 148 artists, 1800 files (split train/test)
from ‘uspop2002’• Song-level SVM clearly dominates
using only MFCCs!
28
Table 4: Results of the formalMIREX 2005 Audio Artist ID evaluation (USPOP2002) from http://www.music-ir.
org/evaluation/mirex-results/audio-artist/.
Rank Participant Raw Accuracy Normalized Runtime / s
1 Mandel 68.3% 68.0% 10240
2 Bergstra 59.9% 60.9% 86400
3 Pampalk 56.2% 56.0% 4321
4 West 41.0% 41.0% 26871
5 Tzanetakis 28.6% 28.5% 2443
6 Logan 14.8% 14.8% ?
7 Lidy Did not complete
References
Jean-Julien Aucouturier and Francois Pachet. Improving
timbre similarity : How high’s the sky? Journal of
Negative Results in Speech and Audio Sciences, 1(1),
2004.
Adam Berenzweig, Beth Logan, Dan Ellis, and Brian
Whitman. A large-scale evaluation of acoustic and
subjective music similarity measures. In International
Symposium on Music Information Retrieval, October
2003.
Dan Ellis, Adam Berenzweig, and Brian Whitman.
The “uspop2002” pop music data set, 2005.
http://labrosa.ee.columbia.edu/projects/musicsim/
uspop2002.html.
Jonathan T. Foote. Content-based retrieval of music and
audio. In C.-C. J. Kuo, Shih-Fu Chang, and Venkat N.
Gudivada, editors, Proc. SPIE Vol. 3229, p. 138-147,
Multimedia Storage and Archiving Systems II, pages
138–147, October 1997.
Alex Ihler. Kernel density estimation toolbox for matlab,
2005. http://ssg.mit.edu/ ihler/code/.
Beth Logan. Mel frequency cepstral coefficients for mu-
sic modelling. In International Symposium on Music
Information Retrieval, 2000.
Beth Logan and Ariel Salomon. A music similarity func-
tion based on signal analysis. In ICME 2001, Tokyo,
Japan, 2001.
Michael I. Mandel, Graham E. Poliner, and Daniel P. W.
Ellis. Support vector machine active learning for mu-
sic retrieval. ACM Multimedia Systems Journal, 2005.
Submitted for review.
Pedro J. Moreno, Purdy P. Ho, and Nuno Vasconcelos.
A kullback-leibler divergence based kernel for SVM
classification in multimedia applications. In Sebastian
Thrun, Lawrence Saul, and Bernhard Scholkopf, edi-
tors, Advances in Neural Information Processing Sys-
tems 16. MIT Press, Cambridge, MA, 2004.
Alan V. Oppenheim. A speech analysis-synthesis system
based on homomorphic filtering. Journal of the Acosti-
cal Society of America, 45:458–465, February 1969.
William D. Penny. Kullback-liebler divergences of nor-
mal, gamma, dirichlet and wishart densities. Technical
report, Wellcome Department of Cognitive Neurology,
2001.
John C. Platt, Nello Cristianini, and John Shawe-Taylor.
Large margin dags for multiclass classification. In S.A.
Solla, T.K. Leen, and K.-R. Mueller, editors, Advances
in Neural Information Processing Systems 12, pages
547–553, 2000.
George Tzanetakis and Perry Cook. Musical genre classi-
fication of audio signals. IEEE Transactions on Speech
and Audio Processing, 10(5):293–302, July 2002.
Kristopher West and Stephen Cox. Features and classi-
fiers for the automatic classification of musical audio
signals. In International Symposium on Music Infor-
mation Retrieval, 2004.
Brian Whitman, Gary Flake, and Steve Lawrence. Artist
detection in music with minnowmatch. In IEEE Work-
shop on Neural Networks for Signal Processing, pages
559–568, Falmouth, Massachusetts, September 10–12
2001.
Changsheng Xu, Namunu C Maddage, Xi Shao, Fang
Cao, and Qi Tian. Musical genre classification using
support vector machines. In International Conference
on Acoustics, Speech, and Signal Processing. IEEE,
2003.
MIREX 05 Audio Artist (USPOP2002)
Music Info Extraction - Ellis 2005-10-26 p. /30
Playlist Generation
• SVMs are well suited to “active learning”solicit labels on items closest to current boundary
• Automatic player with “skip”= Ground truth data collectionactive-SVM automatic playlist generation
29
Music Info Extraction - Ellis 2005-10-26 p. /3030
Conclusions
• Lots of data + noisy transcription + weak clustering⇒ musical insights?