THE KIKI-BOUBA CHALLENGE: ALGORITHMIC COMPOSITION FOR CONTENT-BASED MIR RESEARCH & DEVELOPMENT BOB L. STURM AUDIO ANALYSIS LAB AALBORG UNIVERSITY DENMARK.

Evaluation

The Kiki-Bouba Challenge: Algorithmic composition for content-based MIR Research & DevelopmentBob L. Sturm Audio Analysis Lab Aalborg University DenmarkNick Collins Department of MUsic Durham University U.K.

Problems in content-based mir research There are at least six problems inhibiting research in content-based MIR:

formal and explicit problem definitionsquantity of data ground truthconflicts with intellectual property law evaluation and validityresearch reproducibilityformal problem definitions are rare (if non-existent)A problem must be formally defined if it is to be tackled successfully with formal algorithms.What is music? What is content? What is information? How is information related to recorded music?See: Sturm, Bardeli, Langlois, Emiya, Formalizing the problem of music description, this conference.(Hint: The complete specification of the use case is essential.)Datasets are often used as a proxy to define a problem.Whatever the problem is, remains implicit and undefined. Consider a system that reproduces the labels of a dataset. : What problem is it actually solving? need for A lot of DataHow much data is enough, and how should it be collected?The correct answers depend critically on the formal definition of the problem (see previous slide).

Ground truth can be problematicIts definition and collection depend critically on the formal definition of the problem (see previous slide).Humans are expensive, and can be unreliable.

Intellectual property law conflicts with the needs of the data scientist, machine learning algorithms, etc.

Validity in evaluationWhen a problem is not explicitly defined, how can one design a valid experiment to evaluate a solution?Convenience and prevalence trump validity and relevanceB. L. Sturm, The state of the art ten years after a State of the art: Future research in music information retrieval J. New Music Research, 43(2): 147-172, 2014.Music listening can be an illusion!B. L. Sturm, Classification accuracy is not enough: On the Evaluation of Music Genre Recognition Systems, J. Intell. Info. Systems, 41(3): 371-406, 2013.B. L. Sturm, A simple method to determine if a music information retrieval system is a horse, IEEE Trans. Multimedia 16(6): 1636-1644, 2014.So, What if:there was a content-based MIR problem that:can be defined explicitly and formally,has limitless data with perfect ground truth unencumbered by intellectual property law,requires valid evaluation to solve,and facilitates reproducibility?

If one cant solve such a simplified problem, then why hope to solve one that is not explicitly defined, using limited data of questionable relevance?

This simplification is what we attempt to do with the Kiki-Bouba Challenge (KBC).KBC: The abstract problemDevelop a system that can discriminate (task 1)identify (task 2)recognize (task 3)and imitate (task 4)Aristotelian categories of music.

Domain of KBCThe music universe (sample space) is populated by music belonging to either one of two categories, Kiki and Bouba.

We compose music of each category algorithmically:provides limitless amount of data free of copyrightunambiguous rules for categorisation (Aristotelian)perfect ground truth1) Discrimination taskGiven an unlabeled set of music recordings from the music universe, build a system that determines: there exist two categories in this music universe; 2) what high-level criteria (content) discriminate them.

Unsupervised learning, BUT ensuring discrimination is caused by content (and not criteria irrelevant to the task).2) Identification taskBuild a system that identifies, using high-level criteria (content), recordings of music (either from this music universe or from another) as being Kiki, Bouba, or neither.

Supervised learning, BUT ensuring identification is caused by content (and not criteria irrelevant to the task).3) Recognition taskBuild a system that recognizes high-level content in real- world music recordings as being similar to contents in music from Kiki, Bouba, both, or neither.

Relevance ranking, BUT ensuring recognition and ranking are caused by content (and not criteria irrelevant to the task).4) Imitation taskBuild a system that composes music having high-level content similar to that in music from Kiki and/or Bouba.

Backwards engineering the compositional rules of the music universe by listening.

Of over 500 papers proposing music genre recognition systems, only 3 attempt this.See: B. L. Sturm, A survey of evaluation in music genre recognition, in Post-proc. 2012 Adaptive Multimedia Retrieval, 2014. An example realisation

An example realisation

An example realisation

An example realisationhttp://composerprogrammer.com/kikibouba.html

An unacceptable solution (for our realisation)For the identification task, one proposes the following system: bags of frames of features (BFFs)# zero crossings in a frame of duration 46.3 msmean and variance of 129 consecutive frames (50% overlap)normalised dimensionsnearest neighbor classifiermajority of classifications of first 10 BFFstraining dataset has 250observations of each class (but many more can beproduced)

An unacceptable solution (for our realisation)For the identification task, we evaluate the system using the convenient (but invalid w.r.t. task) way: treat test observations by system, compare outputs to ground truth (Classify), measure accuracystratified, 500 observationsAccuracy: 100%!Publish immediately?What high-level criteria (content) is this system using to identify the classes of the recordings?

An unacceptable solution (for our realisation)Lets measure the responses of the same system treating irrelevantly transformed observations:

Accuracy now: 50% Why is the system now no longer listening to the music?It never was listening.The perfect performance before results from the system exploiting the confounding of zero crossing statistics with class labels.

BoubaKikiKikiBouba

B. L. Sturm, A simple method to determine if a music information retrieval system is a horse, IEEE Trans. Multimedia 16(6): 1636-1644, 2014.

A horse is not necessarily useless, but a horse does not solve KBC.If you can elicit any response from A system by changing irrelevant factors, then you have a horse.

overviewKBC is an abstract challenge, from which one generates a realisation and solves it.KBC is a simplification of the problem of music genre recognition (or autotagging, or music description)KBC uses algorithmic composition to neutralize: the lack of definition of a problem involving complex, culturally negotiated concepts that are incompatible with Aristotelian categorization, not to mention the formal nature of algorithmslimited data that cannot be shared legallyThe incentive of KBC is to solve problems in content-based MIR, rather than appearing to solve problems.To solve KBC is not to reproduce the most ground truth

ecological validity?One might say, KBC is too much of a simplification.It doesnt involve real music. A solution to it will not be useful in the real world..The aim of KBC is not the solution, but to solve the problem.To solve KBC requires developing machine listening at the content level, AND implementing an evaluation that is valid with respect to the relevant hypotheses.Having solved the simple problem, one can then include more Aristotelian classes, overlap or blending between classes, etc.ConclusionsThere is more to be done by way of definitionthe specification of success criteria for each task, which hinges upon the definition of content in a realisationthe design of valid evaluation approaches for each task

We hope KBC encourages new and progressive approaches to solving problems in content-based MIRSuccessfully addressing some problems in MIR requires listening, not reproducing ground truth by any means.

You can find the code of our KBC realisation here: http://composerprogrammer.com/kikibouba.html

Thank youThe work of BLS was supported in part by:Independent Postdoc Grant 11-105218 from Det Frie Forskningsrd (2012-2013), Danish Research CouncilAD:MT departmental fellowship at AAU

THE KIKI-BOUBA CHALLENGE: ALGORITHMIC COMPOSITION FOR CONTENT-BASED MIR RESEARCH & DEVELOPMENT BOB L. STURM AUDIO ANALYSIS LAB AALBORG UNIVERSITY DENMARK.

Documents