Top Banner
Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego [email protected]
21

Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego [email protected].

Dec 16, 2015

Download

Documents

Maurice Casey
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu.

Audio Meets Image Retrieval Techniques

Dave Kauchak

Department of Computer Science

University of California, San Diego

[email protected]

Page 2: Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu.

Image vs. Audio

? ? ?

?

?

?

ClassicalCountry

Rock

Page 3: Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu.

Image techniques to audio Idea: Apply image retrieval (and

classification) techniques to audio Image is 2-D Audio is 1-D

Page 4: Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu.

Benefits Don’t have to reinvent the wheel Image techniques have had fairly

good success More literature in image

processing Audio retrieval is a relatively new

field

Page 5: Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu.

Key Concepts and Goals Image techniques to audio processing

Apply a number of different image techniques (and show they work )

Relate various parts of audio to counterparts in image

Novel data set with known ground truth

Multiple input for audio Raw audio

Page 6: Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu.

A first step… Audio retrieval

Input: A number of songs Output: “Similar” songs from an

audio database Histogramming methods (Puzicha

et. al.) Wavelets instead of gabor filters

Page 7: Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu.

Basic Technique

DWT

Database

Most “similar” songs

histogram

Page 8: Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu.

Normal vs. Proportional Histogramming

Remember DWT:

Different number of samples per level Normal: Histogram each level with

same number of bins Proportional: Histogram each level

keeping samples/bin equal

Page 9: Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu.

Compare Histograms Chi-square on each level

Sum chi-square value and use for dissimilarity measure (lower the better)

Sum dissimilarity over all input songs

i JfIf

JfIfJID

)()(

)()(),(

2

Page 10: Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu.

Ground Truth Data Set Songs by 4 different bands (10

songs each) Dave Mathews band U2 Blink 182 Green Day

Mono, sampled at 22 KHz from a number of sources

Page 11: Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu.

Experiment Input = 5 songs by a single band Goal = Pull out 5 other songs by

that band 10 random experiments per band

(40 total) Normal bins: 8, 16, 32, 64, 128,

192, 256, 320, 384, 448, 512 Proportional bins: 4, 8, 16, 32, 64

Page 12: Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu.

Scoring By points:

5 pts. Correct answer in first place 4 pts. Correct answer in second place,

etc. Perfect = 5+4+3+2+1 = 15

Percentage correct at each place Percentage that have correct

answer less than or equal to place

Page 13: Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu.

Results: Points

Normal Histogramming: Increasing Bin Size

0

0.5

1

1.5

2

2.5

3

3.5

4

8 16 32 64 128 192 256 320 384 448 512

Number of Bins

Sco

re

Normal

Page 14: Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu.

Results: Points Proportional

Proportional Histogramming: Increasing Bin Size

0

0.5

1

1.5

2

2.5

3

3.5

4 8 16 32 64

Number of Bins

Sco

re

Proportional

Page 15: Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu.

Best Score Results: 16 bins

1st 2nd 3rd 4th 5th Score

Dave Mathews

.6 .8 .4 .3 .2 8.2

Blink 182

.3 .1 .1 0 .1 2.3

U2 0 0 0 .1 0 .2

Green Day

.2 .3 .2 0 .5 3.3

Average .275 .3 .175 .1 .2 3.5

Page 16: Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu.

Different Bands

Normal Proportional

Dave Mathews

6.9 5.8

Blink 182 1.3 2

U2 .9 1.5

Green Day 2.1 2

Average 2.8 2.8

Page 17: Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu.

Percentage correct

1st 2nd 3rd 4th 5th

Normal .23 .17 .17 .17 .18

Proportional

.16 .21 .24 .15 .15

Page 18: Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu.

One last result NormalPercentage that have good answers less than or equal to entry:

bins 1 2 3 4 58 0.3 0.375 0.425 0.5 0.625

16 0.275 0.4 0.475 0.5 0.5532 0.25 0.325 0.4 0.45 0.52564 0.25 0.3 0.4 0.475 0.55

128 0.225 0.325 0.425 0.5 0.6192 0.2 0.3 0.4 0.5 0.675256 0.225 0.35 0.475 0.525 0.65320 0.2 0.35 0.475 0.55 0.625384 0.225 0.325 0.5 0.55 0.625448 0.2 0.325 0.425 0.55 0.625512 0.2 0.35 0.45 0.55 0.625

ProportionalPercentage that have good answers less than or equal to entry:

bins 1 2 3 4 54 0.2 0.375 0.625 0.65 0.758 0.225 0.375 0.525 0.55 0.625

16 0.175 0.4 0.55 0.55 0.57532 0.125 0.375 0.45 0.5 0.57564 0.075 0.3 0.45 0.5 0.575

Page 19: Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu.

Summary of Results Overall, results are not amazing Band choice has large influence Normal and Proportional perform

somewhat similar Proportional is more even over all

bands Bin size doesn’t appear to be crucial

75% of a chance a song by the same band will end up in top 5

Page 20: Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu.

Next Step… Adaptive Binning Vary Parameters

Levels Song length Histogram comparison methods

Another image retrieval algorithm Boosting for feature selection using large

feature set? Other?

Larger and more diverse database

Page 21: Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu.

Conclusion Even though results are not

fabulous, image processing techniques CAN be used for audio processing

Using bands for testing allows for ground truth

Audio files are BIG!