Machine Learning for Music Faculty of Mathematics and Informatics, SU Petko Nikolov April 8, 2015
Machine Learning for Music
Faculty of Mathematics and Informatics, SUPetko Nikolov April 8, 2015
About Me
Machine Learning
Music Information Retrieval
Machine Learning / Automated Data Science
What’s Music Information Retrieval?
Musicology
Computer Science
Signal Processing
Machine Learning
MIR
Music Recommendations
Recommending tags
Spotify’s Shuffle Mode
● Not really random
● Certainly some processing
● Probably some MIR behind
Pandora’s Music Genome Project
● started in 2000
● 800 000 manually annotated tracks by music experts
● 450 attributes to describe music
● 25 minutes per track to label
MIREX
Music Information Retrieval Evaluation eXchange annual competition featuring more than 20 tasks
state-of-the-art algorithms compete against each other
Structured Information
Retrieval
Synthesis
fingerprintingcover song detectiongenre recognitioninstrument recognitionmood detectiontranscriptionplaylist generation
beat trackingkey detectionpitch trackingvocal detectionrecommendationaudio similaritysource separation
genre recognitioninstrument recognitionmood detection
vocal detection
audio similarity
MIR Architecture
Audio
Segmentation and
Preprocessing
MIR Architecture
Audio
Segmentation and
Preprocessing
Feature Extraction
MIR Architecture
Audio
Segmentation and
Preprocessing
Feature Extraction
Machine Learning
MIR Architecture
Audio
Segmentation and
Preprocessing
Feature Extraction
Machine Learning
classical
piano
romanticBethoven
by Daniel Barenboim
2 4
MIR Architecture
Audio
Segmentation and
Preprocessing
classical
piano
romanticBethoven
Deep Learning
by Daniel Barenboim
2 4
MIR Architecture
Audio
Audio signal
Audio signal
human hearing: 20 Hz to 20 KHz
Segmentation
SegmentationFrame
SegmentationFrame
52 ms
SegmentationFrame
52 msf1
SegmentationFrame
52 msf1 f2
SegmentationFrame
52 msf1 f2 f3
SegmentationFrame
52 msf1 f2 f3 f4
SegmentationFrame
52 msf1 f2 f3 f4 fn
Spectrum - on frame level
Discrete Fourier Transform (DFT)
time frequency
Feature extraction
f x
Spectral Centroid
where is the ‘center of mass’ of the spectrum
Spectral Slope
fit linear regression and get the slope coef.
Spectral Slope
fit linear regression and get the slope coef.
Spectral Slope
fit linear regression and get the slope coef.
Spectral Slope
fit linear regression and get the slope coef.
Spectral Correlation is the cosine distance between the frequency vectors of two consecutive framesVariation is (1.0 - correlation) respectively.
Spectral Correlation / Variation
Feature extraction - Result
f11 f12 f13 f14 f15 ……… f1m
f21 f22 f23 f24 f25 ……… f2m
centroid
correlation
Frames
Feature extraction - Result
f11 f12 f13 f14 f15 ……… f1m
f21 f22 f23 f24 f25 ……… f2m
centroid
correlation
Framesframes number vary across audio recordings
Universal Background Model
Gaussian Mixture Model
frame feature vector
Gaussian Mixture Model
Multivariate Gaussian Distribution
Gaussian Mixture Model
Gaussian Mixture Model
Gaussian Mixture Model - per track
Gaussian Mixture Model - per track
Gaussian Mixture Model - per track
Gaussian Mixture Model - per track
[𝛍1,𝛍2,𝛍3,𝛍4]
Classification - Example Neural Netaik
wk
Feature vector
Input Hidden Output
Likelihood of Rock?
Layers:
Classification - Example Neural Netaik
wk
Feature vector
Input Hidden Output
Likelihood of Rock?
Layers:
Classification - Example Neural Netaik
wk
Feature vector
Input Hidden Output
Likelihood of Rock?
Layers:
What’s Deep Learning?
(defn deep-learning? [neural-net] (hidden-layer? neural-net))
we are trying to learn new high-level representation having many more hidden layers
input is as raw as possible
Mel-spectrum
Deep Neural Network
Deep Neural Network
Backpropagation
Deep Neural Network
Backpropagation
Deep Neural Network
Backpropagation gradient fades quickly
Deep Belief Network
Input (Mel spectrum)
Output
Hidden Layer 3
Hidden Layer 2
Hidden Layer 1Restricted Boltzmann Machine
RBM
RBM
RBM
Rock Jazz Punk Electronic
Deep Belief Network
Input (Mel spectrum)
Hidden Layer 1Restricted Boltzmann Machine
Deep Belief Network
Input (Mel spectrum)
Hidden Layer 1Restricted Boltzmann Machine
Deep Belief Network
Input (Mel spectrum)
Output
Hidden Layer 3
Hidden Layer 2
Hidden Layer 1Restricted Boltzmann Machine
RBM
RBM
RBM
Rock Jazz Punk Electronic
Deep Auto Encoders
Mel spectrum
Mel spectrumOutput
Input
Deep Auto Encoders
Mel spectrum
Mel spectrumOutput
Input
Used for denoising
Tools
essentia - audio retrieval algorithms
theano - CPU/GPU symbolic optimization
scikit-learn - machine learning in Python