Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego [email protected]
Dec 16, 2015
Audio Meets Image Retrieval Techniques
Dave Kauchak
Department of Computer Science
University of California, San Diego
Image techniques to audio Idea: Apply image retrieval (and
classification) techniques to audio Image is 2-D Audio is 1-D
Benefits Don’t have to reinvent the wheel Image techniques have had fairly
good success More literature in image
processing Audio retrieval is a relatively new
field
Key Concepts and Goals Image techniques to audio processing
Apply a number of different image techniques (and show they work )
Relate various parts of audio to counterparts in image
Novel data set with known ground truth
Multiple input for audio Raw audio
A first step… Audio retrieval
Input: A number of songs Output: “Similar” songs from an
audio database Histogramming methods (Puzicha
et. al.) Wavelets instead of gabor filters
Normal vs. Proportional Histogramming
Remember DWT:
Different number of samples per level Normal: Histogram each level with
same number of bins Proportional: Histogram each level
keeping samples/bin equal
Compare Histograms Chi-square on each level
Sum chi-square value and use for dissimilarity measure (lower the better)
Sum dissimilarity over all input songs
i JfIf
JfIfJID
)()(
)()(),(
2
Ground Truth Data Set Songs by 4 different bands (10
songs each) Dave Mathews band U2 Blink 182 Green Day
Mono, sampled at 22 KHz from a number of sources
Experiment Input = 5 songs by a single band Goal = Pull out 5 other songs by
that band 10 random experiments per band
(40 total) Normal bins: 8, 16, 32, 64, 128,
192, 256, 320, 384, 448, 512 Proportional bins: 4, 8, 16, 32, 64
Scoring By points:
5 pts. Correct answer in first place 4 pts. Correct answer in second place,
etc. Perfect = 5+4+3+2+1 = 15
Percentage correct at each place Percentage that have correct
answer less than or equal to place
Results: Points
Normal Histogramming: Increasing Bin Size
0
0.5
1
1.5
2
2.5
3
3.5
4
8 16 32 64 128 192 256 320 384 448 512
Number of Bins
Sco
re
Normal
Results: Points Proportional
Proportional Histogramming: Increasing Bin Size
0
0.5
1
1.5
2
2.5
3
3.5
4 8 16 32 64
Number of Bins
Sco
re
Proportional
Best Score Results: 16 bins
1st 2nd 3rd 4th 5th Score
Dave Mathews
.6 .8 .4 .3 .2 8.2
Blink 182
.3 .1 .1 0 .1 2.3
U2 0 0 0 .1 0 .2
Green Day
.2 .3 .2 0 .5 3.3
Average .275 .3 .175 .1 .2 3.5
Different Bands
Normal Proportional
Dave Mathews
6.9 5.8
Blink 182 1.3 2
U2 .9 1.5
Green Day 2.1 2
Average 2.8 2.8
One last result NormalPercentage that have good answers less than or equal to entry:
bins 1 2 3 4 58 0.3 0.375 0.425 0.5 0.625
16 0.275 0.4 0.475 0.5 0.5532 0.25 0.325 0.4 0.45 0.52564 0.25 0.3 0.4 0.475 0.55
128 0.225 0.325 0.425 0.5 0.6192 0.2 0.3 0.4 0.5 0.675256 0.225 0.35 0.475 0.525 0.65320 0.2 0.35 0.475 0.55 0.625384 0.225 0.325 0.5 0.55 0.625448 0.2 0.325 0.425 0.55 0.625512 0.2 0.35 0.45 0.55 0.625
ProportionalPercentage that have good answers less than or equal to entry:
bins 1 2 3 4 54 0.2 0.375 0.625 0.65 0.758 0.225 0.375 0.525 0.55 0.625
16 0.175 0.4 0.55 0.55 0.57532 0.125 0.375 0.45 0.5 0.57564 0.075 0.3 0.45 0.5 0.575
Summary of Results Overall, results are not amazing Band choice has large influence Normal and Proportional perform
somewhat similar Proportional is more even over all
bands Bin size doesn’t appear to be crucial
75% of a chance a song by the same band will end up in top 5
Next Step… Adaptive Binning Vary Parameters
Levels Song length Histogram comparison methods
Another image retrieval algorithm Boosting for feature selection using large
feature set? Other?
Larger and more diverse database