MUSIC MATRIX – AUTOMATED GENRE CLASSIFICATION SAMPURN RATTAN 10503864
Jun 20, 2015
MUSIC MATRIX – AUTOMATED GENRE CLASSIFICATION
SAMPURN RATTAN10503864
INTRODUCTION
AUTOMATED GENRE
CLASSIFICATION
*Automated genre classification is the process by which a musical piece is associated to a genre to allow users to search, browse, and organize their music catalogues; through machine learning and advanced algorithms.
*In simple terms, your songs are sorted, according to genre, without any intervention or effort on your part.
AUTOMATED GENRE
CLASSIFICATION
*FEATURE EXTRACTION
*CLASSIFICATION
FEATURE EXTRACTION
*Each digital audio file has some features. These are extracted for the purpose of genre identification.
*These features can be classified into three categories, namely, timbre, pitch and rhythm.
FEATURE EXTRACTION
*Timbre – the quality that distinguishes different types of sound production, such as voices and musical instruments, string instruments, wind instruments, and percussion instruments.
*Pitch – the perception-based quality that allows ordering of sound on a frequency-related scale.
*Rhythm – the timing of musical sounds and silences on a human scale.
*A list of features of audio file
FEATURE EXTRACTION
*Some formulae and procedures used to calculate features
*Zero Crossings Rate for (int samp = 0; samp < samples.length - 1; samp++)
{
if (samples[samp] > 0.0 && samples[samp + 1] < 0.0)
count++;
else if (samples[samp] < 0.0 && samples[samp + 1] > 0.0)
count++;
else if (samples[samp] == 0.0 && samples[samp + 1] != 0.0)
count++;
}
FEATURE EXTRACTION
*Beat Sumdouble sum = 0.0;
for (int i = 0; i < beat_histogram.length; i++)
sum += beat_histogram[i];
double[] result = new double[1];
result[0] = sum;
return result;
FEATURE EXTRACTION
*Strongest Frequency Via Zero Crossings
result = (zero_crossings / 2.0) * (sampling_rate / (double) samples.length)
CLASSIFICATION
*The above extracted features are then used to identify genre using one or more clustering algorithms.
*Many approaches are used for the above, including Unsupervised and Supervised approach.
CLASSIFICATION
*Unsupervised Approaches have no knowledge about genres. Classifier can observe the data position in the feature space, but do not know what the genre cluster of the data is.
*Unsupervised classifiers:K-means, Agglomerative hierarchical
clustering, Self-organizing Map (SOM), Growing hierarchical Self-organizing Map (GHSOM).
CLASSIFICATION
*In Supervised Approaches, the system is trained by manually labeling the data at first, then, when unlabeled data (new coming data) comes, the trained system is used to classify it into a known genre.
*Supervised classifiers:
K-nearest neighbor (KNN), Gaussian Mixture Model (GMM), Linear Discriminant Analysis (LDA), Support Vector Machines (SVMs), Artificial Neural Networks (ANNs).
CLASSIFICATION
*A fuzzy inference system is implemented.
*It is a supervised classifier.
*Rules are manually created.
*The rules are, then, implemented on two feature sets, and the output evaluated.
*Feature set 1 = (Zero Crossings, Beat Sum, Strongest Frequency)
*Feature set 2 =(MFCC)
CLASSIFICATION
*Classification results
Accuracy Hits Ratio
Feature Set 1 (ZCR + BS + SF)
85.0% 65.38%
Feature Set 2 (MFCC) 72.5% 65.9%
MUSIC MATRIX
*The “front-end” of my project.
*The Music Matrix is a NxN matrix where each cell represents a list of song(s) which are placed in one or more genres, in a fuzzy manner.
*This system clearly demonstrates multi-label songs.
MUSIC MATRIX
*For example, choosing a cell in the following matrix may cause a list of songs to be played, that are 60%-70% classic, and 10%-15% pop.
PROBLEMS
*Huge size of genre (and sub-genre) list.
*Non-Agreement on Taxonomies – Well-known websites like Allmusic (http://www.allmusic.com—531genres), Amazon (http://www.amazon.com—719 genres), and Mp3 (http://www.mp3.com—430 genres).
*Trivialization of music art.
*Classification Basis
PROBLEMS
*Fuzzy definition of genres
*Differences in human perception
*Scalability of any AMC system
CONCLUSIONS & FINDINGS
*Automated Genre Classification is a non-trivial task.
*Emotion and music-matching is subjective.
*The problems of genre taxonomy are carried onto Automated Genre Classification.
*Extraction of all features of an audio file is not only unnecessary, but also counterproductive.
*Different combinations of extracted features and various classification algorithms yield different results, of different accuracy.
*A combination of low-level signal properties such as zero-crossing rate, spectral centroid and skewness, mean energy, etc. and perception-based features such as MFCCs, beat histograms, etc. may be the most appropriate set.
*Multi-label classification is the most appropriate for real world.
*A fuzzy classification algorithm must be used to allow for multi-label songs.
*A lot of novelty functions have been created, but, sadly, they return results of lesser accuracy.
*Practices used for Automated Genre Classification can also be used to sieve similar songs. It may help in copyright and IPR protection.
Ref: http://www.thatsongsoundslike.com/
Q & A