Lecture 6: Multimedia Information Retrievalcs9314/07s1/lectures/Jian_Intro_L6.pdf · Lecture 6: Multimedia Information Retrieval Dr. Jian ... COMP9314 Advanced Database Systems –
Post on 17-Apr-2018
217 Views
Preview:
Transcript
Lecture 6: Multimedia Information Retrieval
Dr. Jian Zhang
NICTA & CSE UNSWCOMP9314 Advanced Database
S1 2007jzhang@cse.unsw.edu.au
COMP9314 Advanced Database Systems – Lecture 6 – Slide 2 – J Zhang
Reference Papers and ResourcesPapers:
Colour spaces-perceptual, historical and applicational background: An overview of colour spaces used in image processing.
Colour indexing: using Histogram Intersection for object identification and Histogram Back-projection for object location.
Comparing Images Using Color Coherence Vectors: The original paper for CCV.
Using Perceptually Weighted Histograms for Colour-based Image Retrieval: The original paper for PWH.
The QBIC Project-Querying Images By Content Using Color, Texture, and Shape: The original paper for IBM QBIC project.
Useful resources
MPEG-7 homepage: http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm
IBM QBIC system homepage: http://wwwqbic.almaden.ibm.com/
UIUC CBIR system homepage: http://www.ifp.uiuc.edu/~qitian/MARS.html
COMP9314 Advanced Database Systems – Lecture 6 – Slide 3 – J Zhang
6.1 Image Retrieval based on TextureTexture
Introduction to texture feature
The concept of texture is intuitively obvious but has no precise definition
Texture can be described by its tone and structure
Tone – based on pixel intensity properties
Structure – describes spatial relationships of primitives
COMP9314 Advanced Database Systems – Lecture 6 – Slide 4 – J Zhang
6.1 Image Retrieval based on TextureTexture
MPEG-7 standardThe homogeneous texture descriptor (HTD). Two components of the HTD will be performed in the whole extraction procedure
Mean energyEnergy deviation
The 2-D frequency plane is partitioned into 30 frequency channels
The syntax of HTD = [fDC, fSD, e1,e2,…e30,d1,d2,…,d30] . where fDC and fSD are the mean and standard deviation of the image respectively, Where ei and di are the mean energy and energy deviation that nonlinearly scaled and quantized of the ith channel
COMP9314 Advanced Database Systems – Lecture 6 – Slide 5 – J Zhang
6.1 Image Retrieval based on TextureTexture
The frequency plane partitioning is uniform along the angular direction but not uniform along the radial direction.
COMP9314 Advanced Database Systems – Lecture 6 – Slide 6 – J Zhang
6.1 Image Retrieval based on TextureTexture
Each channel is modeled using Gabor function:
If a channel indexed by (s,r) where s is the radial index and r is the angular index. Then the (s,r)-channel in the freq. domain
Where and are the standard deviation of the Gaussian in the radial direction and the angular direction, respectively
⎥⎦
⎤⎢⎣
⎡ −−⋅⎥
⎦
⎤⎢⎣
⎡ −−=
2
2
2
2
22 r
r
s
sr,s
)(exp)(exp),(Gτθθ
σωωθω
sσ rτ
COMP9314 Advanced Database Systems – Lecture 6 – Slide 7 – J Zhang
6.2 Image Retrieval based on TextureTexture
The energy of each channel is defined as the log-scaled sum of the square of the Gabor-filtered Fourier transform coefficients of an image
]1[log10 ii pe +=
[ ]∑ ∑+ +=
°
°=
=1
0
360
)0(
2, ),(),(
ω θ
θωωθω PGp rsi
where
COMP9314 Advanced Database Systems – Lecture 6 – Slide 8 – J Zhang
6.2 Image Retrieval based on TextureTexture
the Fourier transform of an image represent in the polar freq. domain where is the Fourier transform in the Cartesian coordinate system
The energy deviation of each feature channel is defined as the log-scaled standard deviation of the square of the Gabor-filtered Fourier transform coefficients of an image
The HTD consists of the mean and standard deviation of the image intensity, the energy and energy deviation for each feature channel
),(P θω
]q[logd ii += 110[ ]{ }∑ ∑
+ +=
°
°=
−=1
0
360
)0(
22, ),(),(
ω θ
θωωθω irsi pPGqwhere
)sin,cos(F),(P θωθωθω = )v,u(F
idie
∑−
=∑−
=
+−=1
0
1
0
)//(2),(1),(M
x
N
y
NvyMuxjeyxfMN
vuF π
COMP9314 Advanced Database Systems – Lecture 6 – Slide 9 – J Zhang
6.2 Image Retrieval based on TextureTexture
Texture [4] can also be defined as a function of the spatial variation in pixel intensities.
One example is to use statistical properties of the spatial distribution of gray-levels of an image. Two types of statistical properties can be used, i.e. (1) first-order statistics and (2) second-order statistics.
The first-order statistics measures only depend on the individual pixel gray-levels.
Define -- the number of distinct grey levelsDefine – the random variable denoting the grey-levelDefine -- the probability of a grey level occurring in the image
)z(p i
zL
COMP9314 Advanced Database Systems – Lecture 6 – Slide 10 – J Zhang
6.2 Image Retrieval based on TextureTexture
The first-order statistics measures only depend on the individual pixel gray-levels.
Define -- the number of distinct grey levelsDefine – the random variable denoting the grey-levelDefine -- the probability of a grey level occurring in the image
Overall mean Overall standard deviation
Skewness R-Inverse variance
Overall Uniformity Overall Entropy
∑−
==
1
0
L
iii )z(pzm ∑
−
=
−=1
0
2 )()(L
iii zpmzσ
.)()()(1
0
33 ∑
−
=
−=L
iii zpmzzμ
)(111 2 z
Rσ+
−=
∑−
=
=1
0
2 )(L
iizpU ∑
−
=
−=1
010 )(log)(
L
iii zpzpe
)z(p i
zL
COMP9314 Advanced Database Systems – Lecture 6 – Slide 11 – J Zhang
6.2 Image Retrieval based on TextureThe second-order statistics take into account the relationship between the pixel and its neighbors
The Grey-level Co-occurrence Matrix (GLCM) is used to calculate the second-order statistics. Suppose the following 4x4 pixel image with 3 distinct grey-levels:
And d = (dx, dy) = (1,0) means that compute the co-occurrences of the pixels to the left of the current one.
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
2200220000110011
COMP9314 Advanced Database Systems – Lecture 6 – Slide 12 – J Zhang
6.2 Image Retrieval based on TextureThe 3x3 co-occurrence matrix is defined as follows. From the table, the element [0,0] in the GLCM matrix is 4. That is the number of counts of pixels with grey-level 0 that have a unit with a gray-level of 0 in the left
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
2200220000110011
COMP9314 Advanced Database Systems – Lecture 6 – Slide 13 – J Zhang
6.2 Image Retrieval based on TextureThe Symmetrical GLCM can be computed by adding it to its transpose such as with the position operator (-1,0).
A GLCM will be then normalized by dividing each individual element by the total count in the matrix giving the co-occurrence probabilities.
Computing the GLCM over the full 256 gray-level is very expensive and it will also not achieve a good statistical approximation due to a lot of cells with zero values
A 16 linearly scaled grey-levels is commonly used in CBIR application. The position operation in a CBIR system can be: (1,0), (0,1), (1,1) and (-1,0).
COMP9314 Advanced Database Systems – Lecture 6 – Slide 14 – J Zhang
6.2 Image Retrieval based on TextureBased on GLCM, the second-order statistics are then computed as follows:
Angular Second Moment (Energy) measures the homogeneity of the image
Entropy has the same meaning with one of the first-order statistics but using GLCM instead:
Inverse Difference Moment (Homogeneity) I is another measure of homogeneity which is sometimes called local homogeneity
∑∑=i j
ijcA 2
A
∑∑−=i j
ijij cc 2logδ
∑∑ −+=
i j
ij
jic
I 2)(1
COMP9314 Advanced Database Systems – Lecture 6 – Slide 15 – J Zhang
6.2 Image Retrieval based on TextureContrast (Inertia) measures how inhomogeneous the image is
Correlation cor measures the linear dependency on the pairs of pixels:
∑∑ −=i j
ijcjiC 2)(
yx
i jijyx cji
corσσ
μμ∑∑ −−=
))((
∑ ∑=i j
ijx ci ][μ ∑ ∑=j i
ijy cj ][μ
∑ ∑−=i j
ijxx ci ])[( 2μσ ∑ ∑−=j i
ijyy cj ])[( 2μσ
Where
COMP9314 Advanced Database Systems – Lecture 6 – Slide 16 – J Zhang
6.2 Image Retrieval based on TextureLocal Edge Histograms
The edge histogram descriptor (EHD) defined in MPEG-7 represents local edge distribution in the image
Specifically, the image is first divided into sub-images.
The local-edge distribution for each sub-image can be represented by a histogram.
To generate the histogram, edges in the sub-images are categorized into five types:
vertical, horizontal, 45 degree diagonal, 135 degree diagonal, non-directional edges and then computed for each sub-imagesSince there are 16 sub-images, totally 5x16=80 histogram bins are required
COMP9314 Advanced Database Systems – Lecture 6 – Slide 17 – J Zhang
6.2 Image Retrieval based on TextureLocal Edge Histograms
384
256
96
64
8
8
ImageSub-Image
Image Block
a0 a1
a2 a3
An example for dividing an image into sub-images and 8x8 image blocks
4
4
4
4
(0,0) (0,1)
(1,0)
(2,0)(3,0)
(1,1)
(2,1)(3,1)
(0,2)
(1,2)
(2,2)(3,2)
(0,3)
(1,3)
(2,3)(3,3)
COMP9314 Advanced Database Systems – Lecture 6 – Slide 18 – J Zhang
6.2 Image Retrieval based on TextureLocal Edge Histograms
EHD extraction:Each sub-image is first converted to grey-scale levels. The EHD calculation is based on image blocks such as 8x8 pixels.
For a 384x256 size of image, 16 sub-images is divided and each sub-image is further divided into 8x8 blocks, the average intensities in the image block are defined as a0, a1, a2 and a3 respectively.
The edge direction of a block is determined by calculating the edge magnitudes.
COMP9314 Advanced Database Systems – Lecture 6 – Slide 19 – J Zhang
6.2 Image Retrieval based on TextureEHD extraction
The largest edge magnitude is chosen as the edge direction if the magnitude is larger than the threshold
If the magnitude is smaller than the threshold, the block will be decided as containing no-edge and its counts are discarded and not used in computing histograms.
The direction of the edge is shown below
m0 (Horizontal)
m45
45o
135o
m90 (Vertical)
The direction of the edge
COMP9314 Advanced Database Systems – Lecture 6 – Slide 20 – J Zhang
6.2 Image Retrieval based on TextureEHD extraction
The edge magnitude can be calculated (digital filtering) as follows
After calculating the edge magnitude for each image block, 5 histogram columns for this sub-image will be calculated
321090 aaaam −+−= 32100 aaaam −−+=
3045 22 aam −= 21135 22 aam −=
3210 2222 aaaam ldirectionanon +−−=−
COMP9314 Advanced Database Systems – Lecture 6 – Slide 21 – J Zhang
6.3 Image Indexing and Retrieval based on Shape
ShapeBasic concept on shape
The shape of an object or region reflects to its profile and physical structure.
A low-level feature – shape of objects within the images
For retrieval based on shapes, image must be segmented into individual objects
Due to the difficulty of robust and accurate image segmentation,the use of shape features for image retrieval has been limited to special applications where objects or regions are readily available
COMP9314 Advanced Database Systems – Lecture 6 – Slide 22 – J Zhang
6.3 Image Indexing and Retrieval based on Shape
ShapeBasic concept on shape
A good shape representation and similarity measurement for recognition and retrieval purposes should have the following two important properties:
Each shape should have a unique representation, invariant to translation, rotation and scale;
Similar shapes should have similar representations so that retrieval can be based on distance among shape representation
COMP9314 Advanced Database Systems – Lecture 6 – Slide 23 – J Zhang
6.3 Image Indexing and Retrieval based on Shape
Shape RepresentationBoundary-based methods
Chain Codes, fitting line segmentation, Fourier description…
Region-based methodsMoments, orientation …
Geometry-based methodsPerimeter measurement, area attribute …
Structure-based methodsMedial axis transform (MAT) – Skeleton and thinning algorithm
COMP9314 Advanced Database Systems – Lecture 6 – Slide 24 – J Zhang
6.3 Image Indexing and Retrieval based on Shape
Boundary-based methods -- Chain CodeChain codes are used to represent a boundary by a connected sequence of straight-line segments of special length and direction
Typically, this representation is based on 4- or 8-connectivity of the segments. The direction of each segment is coded by using a numbering scheme
Direction numbers for 4-directional chain code Direction numbers for 8-directional chain code
COMP9314 Advanced Database Systems – Lecture 6 – Slide 25 – J Zhang
6.3 Image Indexing and Retrieval based on Shape
Boundary-based methods -- Chain Code
COMP9314 Advanced Database Systems – Lecture 6 – Slide 26 – J Zhang
6.3 Image Indexing and Retrieval based on Shape
Boundary-based methods -- Fourier Descriptors (FDs) A shape is first represented by a feature function called a shape signature. A discrete Fourier Transform (in frequency domain) is applied to the signature to obtain FD of the shape.
For u=0 to N-1, Where N is the number of samples of f(i).
Three commonly used signature: curvature based radius based boundary coordinator based
( )∑−
= ⎥⎦⎤
⎢⎣⎡ −⋅=
1
021 N
i Nuijexpif
NFn
π
COMP9314 Advanced Database Systems – Lecture 6 – Slide 27 – J Zhang
6.3 Image Indexing and Retrieval based on Shape
Boundary-based methods -- Fourier Descriptors (FDs) The Radius-based signature – consists of a number of ordered distance from the shape centroid to boundary points (called radii). The radii are defined as
Where are the coordinates of the centroid and for i=0 to 63 are the coordinates of the 64 sample points along the shape boundary and the number of pixels between each two neighboring points is the same
A feature vector which is invariant to start point (p), rotation (r) and scale (s) should be calculated.
( ) ( )22icici yyxxr −+−=
⎥⎦
⎤⎢⎣
⎡=
0
63
0
1
FF
,...FF
x
( )ii y,x( )cc y,x
COMP9314 Advanced Database Systems – Lecture 6 – Slide 28 – J Zhang
6.3 Image Indexing and Retrieval based on Shape
Boundary-based methods -- Fourier Descriptors (FDs)The distance between shapes is calculated as the Euclidean distance between their feature vectors.
Using FDs is to convert the sensitive radius lengths into the frequency domain where the data is more robust to small changes and noise.
The FDs capture the general features and form of the shape instead of each individual detail
COMP9314 Advanced Database Systems – Lecture 6 – Slide 29 – J Zhang
6.3 Image Indexing and Retrieval based on Shape
Region-based shape representation and similarity measure
The shape similarity measurements based on shape representations, in general, do not conform to human perception.
The following similarity measurements do not match well with human similarity judgment. They are:
Algebraic Spline curve distance Cumulative turning angleSign of curvature and,Hausdorff-distance
COMP9314 Advanced Database Systems – Lecture 6 – Slide 30 – J Zhang
6.3 Image Indexing and Retrieval based on Shape
Region-based shape representation and similarity measure
Basic idea of region-based shape representationAs shown in the figure below, if 1 is assigned to the cell with at least 15% of pixels covered by the shape, and a 0 to each of theother cells. The more grids, the more accurate the shape Rep.
A binary sequence is created by scanning from left to right and top to bottom – 11100000,11111000,01111110,01111111.
Generation of binary sequence for a shape
COMP9314 Advanced Database Systems – Lecture 6 – Slide 31 – J Zhang
6.3 Image Indexing and Retrieval based on Shape
Rotation normalizationRotate the shape so that its major axis is parallel with the x-axis including two possibilities:
Only one of the binary sequences is saved while two orientations are accounted for during retrieval time by representing the query shape using two binary sequences
Two possible orientations with the major axis along the x direction
COMP9314 Advanced Database Systems – Lecture 6 – Slide 32 – J Zhang
6.3 Image Indexing and Retrieval based on Shape
Scale normalizationAll shapes are scaled so that their major axes have the same fixed length.
Unique shape representation – shape indexAfter rotation and scale normalization and selection of a grid cell size, a unique binary sequence for each shape based on a unique major axis.
This binary sequence is used as a index of the shape
When the cell size is decided, the number of grid cells in the xdirection is fixed (i.e 8), The number of cells in the y direction depends on the eccentricity of the shape. The cell number for Y can range from 1 to 8.
COMP9314 Advanced Database Systems – Lecture 6 – Slide 33 – J Zhang
6.3 Image Indexing and Retrieval based on Shape
Similarity measure between two shapes based on their indexes
Based on the shape eccentricities, there are three cases for similarity measurement
Same basic rectangle of two normalized shapes: bitwise compare and distance calculation between the shape point position values, For example:
A and B have the same eccentricity of 4
A = 11111111 11100000 and B= 11111111 1111100, then the distance value between A and B is 3
If two normalized shape have very different basic rectangles, wecan assume these two shapes are quite different (i.e. different on Minor Axis)
COMP9314 Advanced Database Systems – Lecture 6 – Slide 34 – J Zhang
6.3 Image Indexing and Retrieval based on Shape
If two normalized shapes have slightly different basic rectangles, the perceptual similarity is still possible.
Add the 0s at the end of the index of the shape with shorter minor axis to extend the index to the same length as the other shapeExample:A = (2, 11111111 11110000) ,and B = (3, 11111111 11111000 11100000), then the shape A binary number is extended to the same length of B. Hence A = (3, 11111111 11110000 00000000). The distance of A and B is 4
COMP9314 Advanced Database Systems – Lecture 6 – Slide 35 – J Zhang
6.4 Data Structure for Efficient Multimedia Similarity Search
IntroductionThe retrieval is based on the similarity between the query vector and the feature vector
If the feature dimensions high and the number of stored objects are huge, it will be too slow to do the linearly search for all features vectors
Techniques and data structures are required to re-organize feature vectors and develop fast search method to locate the relevant features quickly
The main idea is to divide the high dimension feature vector space into many sub-space and focus on one or a few sub-spaces for effective search
COMP9314 Advanced Database Systems – Lecture 6 – Slide 36 – J Zhang
6.4 Data Structure for Efficient Multimedia Similarity Search
Three common queries:Point query – users’ query is represented as a vector
Feature vectors exactly match
Range query – users’ query is represented as a feature vector and distance range
The distance metrics – i.e. L1 and L2 (Euclidean distance)
The k nearest neighbours query – users’ query is specified by a vector and a integer k.
The k objects whose distances from the query are the smallest are retrieved.
COMP9314 Advanced Database Systems – Lecture 6 – Slide 37 – J Zhang
6.4 Data Structure for Efficient Multimedia Similarity Search -- Filtering Process
Query methods based on color-histogramUse histograms with very few bins to select potential retrieval candidatesThen use the full histograms to calculate the distanceFor a special case, calculate the average of RGB value such as
where A = {R,G, B}
Given the average color vectors and of two images. The Euclidean distance:
Tavgavgavg BGRx ),( ,=
p
pA
avg
p
pA∑
= =1
)(
x y∑=
−=3
1
2)(),(i
iiavg yxyxd
COMP9314 Advanced Database Systems – Lecture 6 – Slide 38 – J Zhang
6.5 Data Structure for Efficient Multimedia Similarity Search – B+ Tree
To achieve an efficient way for query processThe weakness of traditional similarity calculation on feature vectors within search space is sequentialA B+ tree is a hierarchical structure with a number of nodes to store the feature vectors
to record 10
to record 20
to record 60
COMP9314 Advanced Database Systems – Lecture 6 – Slide 39 – J Zhang
6.5 Data Structure for Efficient Multimedia Similarity Search – B+ Tree
Multidimensional B+ TreeEach feature vector has two dimensions. The entire feature space is formed as a large rectangle identified by its lower left and top right corners.
Replace each key value with a rectangular regionThe pointers of leaf nodes point to lists of feature vectors within corresponding rectangular regions.
D1,2 0
D1,0 0 D2,1 0
D0,0 D0,1 D1,0 D1,1 D1,2 D2,0 D2,1 D3,0 0
to L0,0 to L0,1 to L1,0 to L1,1 to L1,2 to L2,0 to L2,1 to L3,0
COMP9314 Advanced Database Systems – Lecture 6 – Slide 40 – J Zhang
6.6 Similarity Comparison
Given two feature vectors, I, J, the distance is defined as D(I,J) = f(I,J)Typical similarity metrics
Lp (Minkowski distance)Χ2 metric KL (Kullback-Leibler Divergence)JD (Jeffrey Divergence)QF (Quadratic Form)EMD (Earth Mover’s Distance)
top related