Top Banner
Machine Learning for Music Faculty of Mathematics and Informatics, SU Petko Nikolov April 8, 2015
64
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Machine learning for Music

Machine Learning for Music

Faculty of Mathematics and Informatics, SUPetko Nikolov April 8, 2015

Page 2: Machine learning for Music

About Me

Machine Learning

Music Information Retrieval

Machine Learning / Automated Data Science

Page 3: Machine learning for Music

What’s Music Information Retrieval?

Musicology

Computer Science

Signal Processing

Machine Learning

MIR

Page 4: Machine learning for Music
Page 5: Machine learning for Music
Page 6: Machine learning for Music

Music Recommendations

Page 7: Machine learning for Music

Recommending tags

Page 8: Machine learning for Music

Spotify’s Shuffle Mode

● Not really random

● Certainly some processing

● Probably some MIR behind

Page 9: Machine learning for Music

Pandora’s Music Genome Project

● started in 2000

● 800 000 manually annotated tracks by music experts

● 450 attributes to describe music

● 25 minutes per track to label

Page 10: Machine learning for Music

MIREX

Music Information Retrieval Evaluation eXchange annual competition featuring more than 20 tasks

state-of-the-art algorithms compete against each other

Page 11: Machine learning for Music

Structured Information

Retrieval

Synthesis

Page 12: Machine learning for Music

fingerprintingcover song detectiongenre recognitioninstrument recognitionmood detectiontranscriptionplaylist generation

beat trackingkey detectionpitch trackingvocal detectionrecommendationaudio similaritysource separation

Page 13: Machine learning for Music

genre recognitioninstrument recognitionmood detection

vocal detection

audio similarity

Page 14: Machine learning for Music

MIR Architecture

Audio

Page 15: Machine learning for Music

Segmentation and

Preprocessing

MIR Architecture

Audio

Page 16: Machine learning for Music

Segmentation and

Preprocessing

Feature Extraction

MIR Architecture

Audio

Page 17: Machine learning for Music

Segmentation and

Preprocessing

Feature Extraction

Machine Learning

MIR Architecture

Audio

Page 18: Machine learning for Music

Segmentation and

Preprocessing

Feature Extraction

Machine Learning

classical

piano

romanticBethoven

by Daniel Barenboim

2 4

MIR Architecture

Audio

Page 19: Machine learning for Music

Segmentation and

Preprocessing

classical

piano

romanticBethoven

Deep Learning

by Daniel Barenboim

2 4

MIR Architecture

Audio

Page 20: Machine learning for Music

Audio signal

Page 21: Machine learning for Music

Audio signal

human hearing: 20 Hz to 20 KHz

Page 22: Machine learning for Music

Segmentation

Page 23: Machine learning for Music

SegmentationFrame

Page 24: Machine learning for Music

SegmentationFrame

52 ms

Page 25: Machine learning for Music

SegmentationFrame

52 msf1

Page 26: Machine learning for Music

SegmentationFrame

52 msf1 f2

Page 27: Machine learning for Music

SegmentationFrame

52 msf1 f2 f3

Page 28: Machine learning for Music

SegmentationFrame

52 msf1 f2 f3 f4

Page 29: Machine learning for Music

SegmentationFrame

52 msf1 f2 f3 f4 fn

Page 30: Machine learning for Music

Spectrum - on frame level

Discrete Fourier Transform (DFT)

time frequency

Page 31: Machine learning for Music

Feature extraction

f x

Page 32: Machine learning for Music

Spectral Centroid

where is the ‘center of mass’ of the spectrum

Page 33: Machine learning for Music

Spectral Slope

fit linear regression and get the slope coef.

Page 34: Machine learning for Music

Spectral Slope

fit linear regression and get the slope coef.

Page 35: Machine learning for Music

Spectral Slope

fit linear regression and get the slope coef.

Page 36: Machine learning for Music

Spectral Slope

fit linear regression and get the slope coef.

Page 37: Machine learning for Music

Spectral Correlation is the cosine distance between the frequency vectors of two consecutive framesVariation is (1.0 - correlation) respectively.

Spectral Correlation / Variation

Page 38: Machine learning for Music

Feature extraction - Result

f11 f12 f13 f14 f15 ……… f1m

f21 f22 f23 f24 f25 ……… f2m

centroid

correlation

Frames

Page 39: Machine learning for Music

Feature extraction - Result

f11 f12 f13 f14 f15 ……… f1m

f21 f22 f23 f24 f25 ……… f2m

centroid

correlation

Framesframes number vary across audio recordings

Page 40: Machine learning for Music

Universal Background Model

Page 41: Machine learning for Music

Gaussian Mixture Model

frame feature vector

Page 42: Machine learning for Music

Gaussian Mixture Model

Multivariate Gaussian Distribution

Page 43: Machine learning for Music

Gaussian Mixture Model

Page 44: Machine learning for Music

Gaussian Mixture Model

Page 45: Machine learning for Music

Gaussian Mixture Model - per track

Page 46: Machine learning for Music

Gaussian Mixture Model - per track

Page 47: Machine learning for Music

Gaussian Mixture Model - per track

Page 48: Machine learning for Music

Gaussian Mixture Model - per track

[𝛍1,𝛍2,𝛍3,𝛍4]

Page 49: Machine learning for Music

Classification - Example Neural Netaik

wk

Feature vector

Input Hidden Output

Likelihood of Rock?

Layers:

Page 50: Machine learning for Music

Classification - Example Neural Netaik

wk

Feature vector

Input Hidden Output

Likelihood of Rock?

Layers:

Page 51: Machine learning for Music

Classification - Example Neural Netaik

wk

Feature vector

Input Hidden Output

Likelihood of Rock?

Layers:

Page 52: Machine learning for Music

What’s Deep Learning?

(defn deep-learning? [neural-net] (hidden-layer? neural-net))

we are trying to learn new high-level representation having many more hidden layers

input is as raw as possible

Page 53: Machine learning for Music

Mel-spectrum

Page 54: Machine learning for Music

Deep Neural Network

Page 55: Machine learning for Music

Deep Neural Network

Backpropagation

Page 56: Machine learning for Music

Deep Neural Network

Backpropagation

Page 57: Machine learning for Music

Deep Neural Network

Backpropagation gradient fades quickly

Page 58: Machine learning for Music

Deep Belief Network

Input (Mel spectrum)

Output

Hidden Layer 3

Hidden Layer 2

Hidden Layer 1Restricted Boltzmann Machine

RBM

RBM

RBM

Rock Jazz Punk Electronic

Page 59: Machine learning for Music

Deep Belief Network

Input (Mel spectrum)

Hidden Layer 1Restricted Boltzmann Machine

Page 60: Machine learning for Music

Deep Belief Network

Input (Mel spectrum)

Hidden Layer 1Restricted Boltzmann Machine

Page 61: Machine learning for Music

Deep Belief Network

Input (Mel spectrum)

Output

Hidden Layer 3

Hidden Layer 2

Hidden Layer 1Restricted Boltzmann Machine

RBM

RBM

RBM

Rock Jazz Punk Electronic

Page 62: Machine learning for Music

Deep Auto Encoders

Mel spectrum

Mel spectrumOutput

Input

Page 63: Machine learning for Music

Deep Auto Encoders

Mel spectrum

Mel spectrumOutput

Input

Used for denoising

Page 64: Machine learning for Music

Tools

essentia - audio retrieval algorithms

theano - CPU/GPU symbolic optimization

scikit-learn - machine learning in Python