This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Image Analysis & Retrieval
CS/EE 5590 Special Topics (Class Ids: 44873, 44874)
Scale Space Response via Laplacian of Gaussian The scale is controlled by ๐
Characteristic Scale:
Image Analysis & Retrieval, 2016 p.3
2
2
2
22
y
g
x
gg
๐ = ๐โ ๐ฅ+๐ฆ 2
2๐
r
image๐ = 0.8๐ ๐ = 1.2๐ ๐ = 2๐
โฆ
characteristic scale
SIFT
Use DoG to approximate LoG Separable Gaussian filter
Difference of image instead of difference of Gaussian kernel
Image Analysis & Retrieval, 2016 p.4
LoG
Scale space construction By Gaussian Filtering, and Image Difference
Peak Strength & Edge Removal
Peak Strength: Interpolate true DoG response and pixel location by Taylor
expansion
Edge Removal:
Re-do Harris type detection to remove edge on much reduced pixel set
Image Analysis & Retrieval, 2016 p.5
Scale Invariance thru Dominant Orientation Coding
Voting for the dominant orientation Weighted by a Gaussian window to give more emphasis to the
gradients closer to the center
Image Analysis & Retrieval, 2016 p.6
SIFT Matching and Repeatability Prediction
SIFT Distance
Not all SIFT are created equalโฆ
Peak strength (DoG response at interpolated position)
Image Analysis & Retrieval, 2016 p.7
Combined scale/peak strength pmf
๐(๐ 11, ๐ ๐โ
2 )
๐(๐ 11, ๐ ๐
2)โค ๐
Box Fitler โ CABOX work
Basic Idea: Approximate DoG with linear combination of box filters
min.๐
๐โ ๐ต โ ๐ ๐ฟ22 + ๐ ๐ฟ1
Solution by LASSO
Image Analysis & Retrieval, 2016 p.8
= h1* h2*+ + โฆ
Outline
ReCap of Lecture 06 SIFT
Box Filter
Image Retrieval System
Why Aggregation ?
Aggregation Schemes
Summary
Image Analysis & Retrieval, 2016 p.9
Image Matching/Retrieval System
SIFT is a sub-image level feature, we actually care more on how SIFT match will translate into image level matching/retrieval accuracy
Say if we can compute a single distance from a collection of features:
Then for a data base of n images, we can compute an n x n distance matrix This gives us full information of the performance of this
feature/distance system
How to characterize the performance of such image matching and retrieval system ?
Image Analysis & Retrieval, 2016 p.10
๐ ๐ผ1, ๐ผ2 =
๐
๐ผ๐๐(๐น๐1, ๐น๐
2)
๐ท๐ ,๐= ๐(๐ผ๐ , ๐ผ๐)
Thresholding for Matching
Basically, for any pair of Images (documents, in IR jargon), we declare
Then for each possible image pair, or pairs we care, for a given threshold t, there will be 4 possible consequences TP pair: {Ij, Ik} declared matching pairs, d(Ij, Ik) < t;
Called โKernel codebook encodingโ by Chatfield et al. 2011. Cast a weighted vote into the most similar clusters.
Image Analysis & Retrieval, 2016 p.60
Simple case: Soft Assignment
Called โKernel codebook encodingโ by Chatfield et al. 2011. Cast a weighted vote into the most similar clusters.
This is fast and easy to implement (try it for Project 3!) but it does have some downsides for image retrieval โthe inverted file index becomes less sparse.
Image Analysis & Retrieval, 2016 p.61
A first example: the VLAD
Given a codebook ,e.g. learned with K-means, and a set oflocal descriptors :
โข assign:
โข compute:
โข concatenate viโs + normalize
Image Analysis & Retrieval, 2016 p.62
Jรฉgou, Douze, Schmid and Pรฉrez, โAggregating local descriptors into a compact image representationโ, CVPRโ10.
3
x
v1 v2v3 v4
v5
1
4
2
5
โ assign descriptors
โก compute x- i
โข vi=sum x- i for cell i
A first example: the VLAD
A graphical representation of
Image Analysis & Retrieval, 2016 p.63
Jรฉgou, Douze, Schmid and Pรฉrez, โAggregating local descriptors into a compact image representationโ, CVPRโ10.
VL_FEAT Implementation
Matlab:
Image Analysis & Retrieval, 2016 p.64
function [vc]=vladSiftEncoding(sift,
codebook)
dbg=1;
if dbg
if (0) % init VL_FEAT, only need
to do once
run('../../tools/vlfeat-
0.9.20/toolbox/vl_setup.m');
end
im = imread('../pics/flarsheim-
2.jpg');
[f, sift] =
vl_sift(single(rgb2gray(im))); sift =
single(sift');
[indx, codebook] = kmeans(sift,
16);
% make sift # smaller
sift = sift(1:800,:);
end
[n, kd]=size(sift);
[m, kd]=size(codebook);
% compute assignment
dist = pdist2(codebook, sift);
mdist = mean(mean(dist));
% normalize the heat kernel s.t. mean
dist is mapped to 0.5
a = -log(0.5)/mdist;
indx = exp(-a*dist);
vc=vl_vlad(sift', codebook', indx);
if dbg
figure(41); colormap(gray);
subplot(2,2,1); imshow(im);
title('image');
subplot(2,2,2); imagesc(dist);
title('m x n distance');
subplot(2,2,3); imagesc(indx);
title('m x n assignment');
subplot(2,2,4); imagesc(reshape(vc,
[m, kd]));title('vlad code');
end
VLAD Code
What are the tweaks ? Code book design
Soft Assignment options
Image Analysis & Retrieval, 2016 p.65
References
Vocabulary Tree: David Nistรฉr, Henrik Stewรฉnius: Scalable Recognition with a Vocabulary