Top Banner
http://www.xkcd.com/655/
37

xkcd/655

Dec 30, 2015

Download

Documents

noel-galloway

http://www.xkcd.com/655/. Audio Retrieval. David Kauchak cs160 Fall 2009 Thanks to Doug Turnbull for some of the slides. Administrative. CS Colloquium vs. Wed. before Thanksgiving . 250M iPods. 8M artists. 9 B songs. 150M songs. 1M downloads/month. 93M a mericans. consumers. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: xkcd/655/

http://www.xkcd.com/655/

Page 2: xkcd/655/

Audio Retrieval

David Kauchakcs160

Fall 2009Thanks to Doug Turnbull for some of the slides

Page 3: xkcd/655/

Administrative

CS Colloquium vs. Wed. before Thanksgiving

Page 4: xkcd/655/

producers consumers

8M artists

150M songs

9B songs

93M americans

250M iPods

musictechnology

1M downloads/mon

th

Page 5: xkcd/655/

Audio Index construction

Audio files tobe indexed

indexer

Index

audio preprocessing

wavmidi

mp3

slow,jazzy,punk

may be keyed off of text

may be keyed off of audio features

Page 6: xkcd/655/

Audio retrieval

Query Index

Systems differ by what the query input is and how they figure out the result

Page 7: xkcd/655/

Song identification

Index

Given an audio signal, tell me what the “song” is

Examples: Query by Humming 1-866-411-SONG Shazam Bird song identification …

Page 8: xkcd/655/

Song identification

How might you do this?

Query by humming

“song” name

Page 9: xkcd/655/

Song identification

“song” name

Page 10: xkcd/655/

Song similarity

Index

Find the songs that are most similar to the input song

Examples: Genius Pandora Last.fm

song

Page 11: xkcd/655/

Song similarity

How might you do this?

IR approach

f1 f2 f3 … fn

f1 f2 f3 … fn

f1 f2 f3 … fn

f1 f2 f3 … fn

f1 f2 f3 … fn

f1 f2

f3 … fn

rank by cosine sim

Page 12: xkcd/655/

Song similarity: collaborative filtering

song1

song2

song3

songm

user

1

user

2

user

3

user

n

user

4What might you conclude from this information?

Page 13: xkcd/655/

Songs using descriptive text search

Index

Examples: Very few commercial systems like this …

jazzy,smooth,easy listening

Page 14: xkcd/655/
Page 15: xkcd/655/
Page 16: xkcd/655/
Page 17: xkcd/655/
Page 18: xkcd/655/
Page 19: xkcd/655/

Music annotation

The key behind keyword based system is annotating the music with tags

dance, instrumental, rock

blues, saxaphone, cool vibe

pop, ray charles, deep

Ideas?

Page 20: xkcd/655/

Annotating music

The human approach

“expert musicologists” from Pandora

Pros/Cons?

Page 21: xkcd/655/

Annotating music

Another human approach: games

Page 22: xkcd/655/

Annotating music

the web: music reviews

challenge?

Page 23: xkcd/655/

Automatically annotating music

Learning a music tagger

songsignal review

tagger

tagger

blues, saxaphone, cool vibe

Page 24: xkcd/655/

Automatically annotating music

Learning a music tagger

songsignal review

taggerWhat are the tasks we need to accomplish?

Page 25: xkcd/655/

System Overview

ParameterEstimation

TT

Annotation

Training Data

Data

Audio-FeatureExtraction (X)

Vocabulary

Document Vectors (y)

Features Model

Page 26: xkcd/655/

Automatically annotating music

Frank Sinatra - Fly me to the moonThis is a jazzy, singer / songwriter song that is

calming and sad. It features acoustic guitar, piano, saxophone, a nice male vocal solo, and emotional, high-pitched vocals. It is a song with a light beat and a slow tempo.

Dr. Dre (feat. Snoop Dogg) - Nuthin' but a 'G' thang

This is a dance poppy, hip-hop song that is arousing and exciting. It features drum machine, backing vocals, male vocal, a nice acoustic guitar solo, and rapping, strong vocals. It is a song that is very danceable and with a heavy beat.

First step, extract “tags” from the reviews

Page 27: xkcd/655/

Automatically annotating music

Frank Sinatra - Fly me to the moonThis is a jazzy, singer / songwriter song that is

calming and sad. It features acoustic guitar, piano, saxophone, a nice male vocal solo, and emotional, high-pitched vocals. It is a song with a light beat and a slow tempo.

Dr. Dre (feat. Snoop Dogg) - Nuthin' but a 'G' thang

This is a dance poppy, hip-hop song that is arousing and exciting. It features drum machine, backing vocals, male vocal, a nice acoustic guitar solo, and rapping, strong vocals. It is a song that is very danceable and with a heavy beat.

First step, extract “tags” from the reviews

Page 28: xkcd/655/

Learn a probabilistic model that captures a relationship between audio content and tags.

‘Jazz’‘Male Vocals’

‘Sad’‘Slow Tempo’

Autotagging

Frank Sinatra‘Fly Me to the Moon’

Content-Based Autotagging

p(tag | song)

Page 29: xkcd/655/

Modeling a Song

+

+

+

+

++++

++ +++++++

+

++

+++

++ ++

+++

+++

+++++

+

+

+

+

+

+

+++++

+

+

+

+

+

+

++++

+ ++

+

+++

+

+

++

+

++++

+

+

+

+

+

+ +++++

+

+

+

+

+

++++

+++

+

+

+

+

+

+

+++++

+

+

+

+

+

+++

+

++

+

+++

+++

+ +++ +

++

+

+

++++

+ +

+ +++

+++

+

+

+

+

+++

+ ++

+ ++

+

+

+

+++ +

+

+ +

+ +++

++

+

+ ++

+++

++ +

++ +

++

+

+ +

+ +

+ +++

+++

+

++

+

+

+

Bag of MFCC vectors

+ +

+

++

++

+

+ +

+

++

++

+++

++

+

+ ++

+++

++ +

+

+ +

++

++ +

+++

+ ++

+ +

++

+++

++

++

+ +

++

++

+ +++

+ +++ +

+ +++

++

+

+ ++

+

+++++

+++++

+

+

+

+

+

++

++++ ++

+

+

+

+

+++

++

+

+

+

+

++++ ++

+

+

+

+

+

+++ ++

+

+

+

+

+ ++++

+

+

+

+

+

+++

++ ++

+

+

+

+

+

+++++

+

+

+

+

+

++ +++ +

+

+

+

++

+ ++++ ++

+ +

++

++

+++

+++

+ +++ +

++

+

++

+++ + ++ +

cluster feature vectors

Page 30: xkcd/655/

1. Take all songs associated with tag t

2. Estimate ‘features clusters’ for each song

3. Combine these clusters into a single representative model for that tag

romantic

Tag ModelMixture of

song clusters

p(x|t)

clusters

romantic

+++

+++++

+

+

+++++

++++

+

+++++

++++

+++

+++++

+++

+++++++

++

+++

+++++

++

+++

+++++

++

+++

+++++

++

+++

+++++

++

+++

+++++

++

+++

+++++

++

+++

+++++

+

+

+++++++

++

+++

+++++

+

+

+++++++

++

+++

+++++

+

+

+++++++

++

+++

+++++

+++++

+++++

++

+++

+++++

+++++

+++++

++

+++

+++++

+++++

+++++

++

Modeling a Tag

Page 31: xkcd/655/

1. Calculate the likelihood of the features for a tag model

Romantic?

+++

+++++

+

+

+++++

++++

+

+++++

++++

+++

+++

++++ +

+++++++

++

+++

+++++

++

+++

+++++

++

+++++++

++

++ +

++

+

+

+

+

Determining Tags

+++

+++++

+

+

+++++

++++

+

+++++

++++

+++

+++

++++ +

+++++++

++

+++

+++++

++

+++

+++++

++

+++++++

++

++ +

++

+

+

+

+

Inference

withRomantic Tag

Model

S1

S2

romantic?

Page 32: xkcd/655/

32

Annotation

Semantic Multinomial for “Give it Away” by the Red Hot Chili Peppers

Page 33: xkcd/655/

33

The CAL500 data set

The Computer Audition Lab 500-song (CAL500) data set

500 ‘Western Popular’ songs 174-word vocabulary

genre, emotion, usage, instrumentation, rhythm, pitch, vocal characteristics

3 or more annotations per song 55 paid undergrads annotate music for 120 hours

Other Techniques1. Text-mining of web documents2. ‘Human Computation’ Games - (e.g., Listen Game)

Page 34: xkcd/655/

RetrievalThe top 3 results for - “pop, female vocals, tender”

0.33

0.02

0.02

0.02

1. Shakira - The One

2. Alicia Keys - Fallin’

3. Evanescence - My Immortal

Page 35: xkcd/655/

Retrieval

‘Tender’ Crosby, Stills and Nash - GuinnevereJewel - Enter from the EastArt Tatum - Willow Weep for MeJohn Lennon - ImagineTom Waits - Time

‘Female Vocals’ Alicia Keys - Fallin’Shakira - The OneChristina Aguilera - Genie in a BottleJunior Murvin - Police and ThievesBritney Spears - I'm a Slave 4 U

‘Tender’

AND

‘Female Vocals’

Jewel - Enter from the East Evanescence - My Immortal Cowboy Junkies - Postcard Blues Everly Brothers - Take a Message to Mary Sheryl Crow - I Shall Believe

Query Retrieved Songs

Page 36: xkcd/655/

Annotation results

Annotation of the CAL500 songs with 10 words from a vocabulary of 174 words.

Model Precision Recall

Random 0.14 0.06

Our System 0.27 0.16

Human 0.30 0.15

Page 37: xkcd/655/

Retrieval results

Model AROC

Random 0.50

Our System - 1 Word 0.71

Our System - 2 Words 0.72

Our System - 3 Words 0.73