TAMPERE UNIVERSITY OF TECHNOLOGY Department of Electrical Engineering Institute of Signal Processing Mari Partio Content-based Image Retrieval using Shape and Texture Attributes Master of Science Thesis Subject approved in the Department Council meeting on 10 April 2002 Examiners: Professor Moncef Gabbouj Researcher Bogdan Cramariuc
70
Embed
TAMPERE UNIVERSITY OF TECHNOLOGY - TUTmuvis.cs.tut.fi/Documents/MscThesisPartio.pdf · TAMPERE UNIVERSITY OF TECHNOLOGY ... Finland. The work is part of the MuVi-project, ... 5.4.2
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
TAMPERE UNIVERSITY OF TECHNOLOGY Department of Electrical Engineering Institute of Signal Processing
Mari Partio
Content-based Image Retrieval using Shape and Texture Attributes
Master of Science Thesis
Subject approved in the Department Council meeting on 10 April 2002
Examiners: Professor Moncef Gabbouj Researcher Bogdan Cramariuc
Preface This Thesis has been carried out in the Institute of Signal Processing, Tampere University
of Technology, Finland. The work is part of the MuVi-project, whose emphasis is on
content-based image and video retrieval. First, I would like to thank my examiners,
Professor Moncef Gabbouj and Researcher Bogdan Cramariuc for their corrections and
guidance towards finishing this thesis. I would also like to give special thanks to whole
Image and Video Analysis group. I would also like to thank SPAG and Academy of
Finland for their financial support. Last but not least, I would like to dedicate special
thanks to my family and friends for their continuous support during this work.
Tampere 19.11.2002, Mari Partio Vaajakatu 5 H 176 33720 Tampere, Finland Mobile: 040-7323079 Email: [email protected]
2.1 The Problem of Content-based Retrieval ................................................... 14 2.2 Feature Extraction........................................................................................ 15
2.6.1 Overview of MPEG-7 …………………………………………….21 2.6.2 Shape features …………………………………………………….22 2.6.3 Texture features…………………………………………………...24
3 Human Visual Perception..................................................................... 26 3.1 Anatomy of the human eye .......................................................................... 26 3.2 Lateral Inhibition.......................................................................................... 27 3.3 Perception of Shape ...................................................................................... 28
3.3.1 Classical theories of shape perception…………………………… 28 3.3.2 Modern theories of shape perception …………………………… 29
3.4 Perception of Texture ................................................................................... 29 3.4.1 Feature approach………………………………………………… 29 3.4.2 Frequency approach……………………………………………... 30
4.3 Shape Correspondence using Ordinal Measures....................................... 40 4.3.1 Object alignment based on universal axes………………………..40 4.3.2 Boundary to multilevel image transformation……………………42 4.3.3 Similarity evaluation……………………………………………...44
4.4 Experiments using Ordinal Correlation..................................................... 46 4.4.1 Introducing dataset and selected parameters……………………...46 4.4.2 Results and their evaluation………………………………………47
Abstract TAMPERE UNIVERSITY OF TECHNOLOGY Degree Program in Electrical Engineering Institute of Signal Processing Partio, Mari: Content-based Image Retrieval using Shape and Texture Attributes Master of Science Thesis, p. 70 Examiners: Professor Moncef Gabbouj, Researcher Bogdan Cramariuc Funding: Center of Excellence, SPAG, Academy of Finland Department of Electrical Engineering November 2002 Due to rapid increase in volume of image and video collections, traditional methods of indexing and retrieval using only keywords have become outdated. Therefore, alternative methods to describe images using their visual content have been developed. To produce and test algorithms for content-based image and video retrieval, MUVIS (Multimedia Video Indexing and Retrieval System) was developed at TUT. The goal of MUVIS is a fast, real-time and reliable audio/video (AV) browsing and indexing application, which is also capable of extracting some key features (such as color, texture and shape) of the AV media. Most of the existing image retrieval systems perform reasonably when using color features. However, retrieval accuracy using shape or texture features does not produce as good results. Therefore, this thesis investigates different methods of representing shape and texture in content-based image retrieval. Later, when appropriate segmentation algorithms are available some of these methods could also be applied to video object retrieval. The thesis presents two contributions: one is shape-based and the second is texture-based retrieval method. The former contribution concerns shape analysis and retrieval. Shape attributes can be roughly divided into two main categories: boundary-based and region-based. Since the human visual system itself focuses on edges and ignores uniform regions, this thesis concentrates on boundary-based representations. A novel boundary-based method using distance transformation and ordinal correlation is developed in this thesis. Simulation results show that the proposed technique produced encouraging results when using MPEG-7 shape test database. The second contribution of the thesis is a constrained application in which the database contains a set of rock images. In this application, we applied a technique based on Gray-Level Co-occurrence matrices (GLCM) and compared the results with a well-known method from the literature. It was found that GLCM outperforms Gabor Wavelet features when considering retrieval time and visual quality of the results.
5
Tiivistelmä TAMPEREEN TEKNILLINEN KORKEAKOULU Sähkötekniikan koulutusohjelma Signaalinkäsittelyn laitos Partio, Mari: Kuvan hakeminen käyttäen muoto- ja tekstuuriattribuutteja Diplomityö, s. 70 Tarkastajat: prof. Moncef Gabbouj, tutkija Bogdan Cramariuc Rahoitus: SPAG, Suomen Akatemia Sähkötekniikan osasto Marraskuu 2002 Digitaalisten kuva- ja videokokoelmien nopea kasvu on johtanut tehokkaiden selain- ja hakuohjelmistojen kehittämiseen. Perinteiset kuvanhakumenetelmät perustuvat manuaalisesti lisättyihin kuvaa kuvaaviin avainsanoihin, joiden ongelmana on subjektiivisuus sekä tietokantojen kasvusta johtuva työläys. Tästä syystä vaihtoehtoisten kuvan visuaalisiin ominaisuuksiin, esimerkiksi väriin, tekstuuriin ja muotoihin, pohjautuvien menetelmien kehittäminen kuvien sisällön esittämiseksi on yhä tärkeämpää. Sisältöpohjaisessa kuvanhaussa (CBIR) on tavoitteena löytää tietokannasta visuaalisilta ominaisuuksiltaan hakukuvaa (query image) mahdollisimman hyvin vastaavat kuvat. Ensin kuvista irrotetaan tärkeimmät piirteet ja ne tallennetaan ominaisuusvektoreihin (feature vector). Ominaisuusvektorit indekseineen tallennetaan tietokantaan. Hakukuvan ominaisuusvektoria verrataan muiden kuvien ominaisuusvektoreihin valittua etäisyysmittaa (similarity metric) käyttäen ja kuvat palautetaan etäisyyksien mukaan järjestyksessä pienimmästä alkaen. CBIR -järjestelmien yleistyessä yhtenäisten standardien luominen on tullut yhä tärkeämmäksi. MPEG-7 on MPEG:n (Moving Picture Experts Group) kehittämä standardi, jonka tarkoituksena on kuvata multimediadatan sisältöä. MPEG-7 keskittyy informaation tarkoituksen ja sisällön tulkintaan ja sen vuoksi se on avainasemassa sisältöpohjaisten hakujärjestelmien kehittämisessä. Tämä diplomityö on osa Tampereen Teknillisen Korkeakoulun signaalinkäsittelyn laitoksella kehitettävää MUVIS -järjestelmää, jonka tavoitteena on tuottaa nopea, reaaliaikainen ja luotettava audio/video (AV) multimediahakuohjelmisto, joka pystyy myös poimimaan avainominaisuuksia AV mediasta. Tässä työssä keskitytään lähinnä kuvahakuun, mutta videon segmentointiin käytettävien algoritmien kehittyessä esitettyjä menetelmiä voidaan käyttää myös videohakuun. Useat olemassa olevista kuvanhakujärjestelmistä ovat suorituskyvyltään kohtalaisia suoritettaessa haku väriominaisuuksien perusteella. Käytettäessä muotoa tai tekstuuria hakuperusteena tulokset ovat yleensä epätarkempia. Siksi tässä työssä keskitytään
6
tarkastelemaan erilaisia muotoa ja tekstuuria kuvaavia attribuutteja sekä niiden soveltuvuutta automaattiseen kuvanhakuun. Työ koostuu kahdesta osasta: muoto-pohjaisesta ja tekstuuri-pohjaisesta kuvanhakumenetelmästä. Ensimäisessä osassa käsitellään muodon analyysia ja hakua. Muotoattribuutit voidaan karkeasti jakaa kahteen pääluokkaan: reunapohjaiseen sekä aluepohjaiseen. Koska ihmisen näköaisti keskittyy reunoihin ja jättää tasaiset alueet vähemmälle huomiolle, tässä työssä keskitytään reunapohjaiseen esitystapaan. Etuna reunapohjaisissa menetelmissä on myös se, että ne kuvaavat muotoa tarkemmin ja mahdollistavat muodon kuvaamisen useammalla resoluutiolla (multiresolution approach). Työssä kehitetään myös uusi reunapohjainen muotojen vertailumenetelmä, joka perustuu etäisyysmuunnokseen (distance transformation) sekä ”ordinal correlation–menetelmään”. Esitetty menetelmä antoi rohkaisevia tuloksia MPEG-7:n testidatalla testattaessa. Työn toisessa osassa esitellään tekstuuripohjainen kivikuvien samankaltaisuuden vertailuun keskittyvä sovellus. Tarkkaa määritelmää tekstuurille ei ole, mutta yleensä tekstuurilla käsitetään kuvan aluetta, joka koostuu samankaltaisista toistuvista elementeistä. Riiipuen näiden peruselementtien järjestyksestä, tekstuuria voidaan karkeasti pitää joko säännöllisenä tai stokastisena. Koska tässä työssä käytetyt kivikuvat ovat luonteeltaan stokastisia, tilastolliset menetelmät soveltuvat niiden kuvaamiseen parhaiten. Käytetty tekniikka (GLCM) on tunnettu ja perustuu harmaasävyjen suhteelliseen esiintymiseen kuvassa. Verrattaessa saatuja tuloksia ”Gabor –suodinpankilla” saatuihin, voidaan havaita GLCM paremmaksi tarkasteltaessa hakuaikaa sekä hakutulosten visuaalista laatua.
7
Symbols and Abbreviations Symbols: A area of a region, p. 32 Anl orthogonal Zernike moment of order n and repetition l, p. 53 Ck position of the contour pixel k in the image G, p. 43 dx distance in x-direction, p. 55 dy distance in y-direction, p. 55 Dj metadifference obtained by calculating distance between all pairs of metaslices, p.45 E[N(L)] expected number of boxes, p. 51 fDC mean intensity of texture, p. 24 fSD standard deviation of texture, p. 24 f(x,y) continuous 2D function, p.33 F(n) Discrete Fourier Transform of u(k), p. 36 G gray-scale image, p. 42 Gi pixel values of G, p. 42 g(u,σ) 1D Gaussian kernel of width σ, p. 38 g(x,y) 2D Gabor function, p. 57 G(u,v) Fourier transform of 2D Gabor function, p. 57 I(x,y) image, p. 50 K number of boundary samples, p. 36 l1 number of universal axes, p. 41 L1 histogram distance, p. 15 L2 histogram intersection, p.15 mpq moment of order (p+q), p. 33 M(n) magnitude of the Fourier Descriptors, p. 36 Mk moment image, p. 53 Mpq central moment, p. 34 Mj
X metaslice obtained by combining slices SkX, p. 45
MjY metaslice obtained by combining slices Sk
Y, p. 45 N number of contour points, p. 33 P perimeter of a contour, p. 32 Pd co-occurrence matrix using distance d, p. 55 Qp(σ) radial polynomial of the OFM, p. 54 Rj
X contains the pixels from image X, which belong to area Rj, p. 44 Rj
Y contains the pixels from image Y, which belong to area Rj, p. 44 Rnl radial polynomial for Zernike moments, p. 54 s1, s2, s3 generic stimuli, p. 17 Sk
X slice constructed for every pixel Xk in image X, p. 44
SkY slice constructed for every pixel Yk
in image Y, p. 45
TD feature vector of the HTD, p. 24 u(k) sequence of coordinates (x(k), y(k)) of K-point digital contour in the
xy-plane, k=0, 1, 2, …, K-1, p. 36 û(k) u(k), but only the first M coefficients used, p. 36 Uh higher central frequency, p. 57
8
Ul lower central frequency, p. 57 Upq basis function of the OFM, p. 54 V0 value on the contour, p. 43 Vnl(x,y) Zernike basis function of order n and repetition l, p. 54 W window width for calculating moments, p. 52 Wmn Gabor Wavelet transform, p. 58 xc x-coordinate of the centroid, p. 34 xm normalized x-coordinate for the moment calculation, p. 52 xpyq basis function in geometric moments definition, p. 54 yc y-coordinate of the centroid, p. 34 ym normalized y-coordinate for the moment calculation, p. 52 ηnq normalized definition of moments, p. 34 κ(u) curvature function for any parametrized contour, p. 38 κ(u,σ) evolved curve contour, p. 38 λ the sum of all metadifferences, p. 45 µmn mean value on a certain frequency band, p. 58 µr mean radius, p. 33 µ centroid of a region, p. 33 φ1-φ7 a set of seven invariant moments, p. 34 - 35 σ parameter that controls the shape of the logistic function, p. 53 σmn standard deviation on a certain frequency band, p. 58 ρ Spearman’s ρ (ordinal correlation coefficient), p. 45 ρ(x,y) autocorrelation function, p. 50 τ Kendall’s τ (ordinal correlation coefficient), p. 45
Γ(u) parametrized contour, p. 38 Θµ polar angle, p. 41 θj directional angle, p. 41 Abbreviations: 1D One-dimensional 2D Two-dimensional 3D Three-dimensional AV Audio-visual CBIR Content-based image retrieval CSS Curvature Scale-Space D Descriptor (MPEG-7 standard) DFT Discrete Fourier Transform DDL A Descriptor Definition Language (MPEG-7 standard) DS Description Scheme (MPEG-7 standard) FD Fourier Descriptor GLCM Gray-level co-occurrence matrix H.263+ ITU-T standard HCP High Curvature Point HSV Color model (hue, saturation, value) HTD Homogeneous Texture Descriptor (MPEG-7 standard) ISO The International Organization for Standardization
9
MPEG-1 ISO standard MPEG-2 ISO standard MPEG-4 ISO standard MPEG-7 ISO standard MuVi Multimedia Video Indexing and Retrieval MUVIS Multimedia Video Indexing and Retrieval System NeTra CBIR system, University of California at Santa Barbara OFM Orthogonal Fourier-Mellon moment OR Ordinal correlation PC Personal computer Photobook CBIR system, MIT, Massachusetts Institute of Technology QBIC CBIR system designed by IBM® RGB Color model (red, green, blue) SQUID Shape Queries Using Image Databases (CBIR system, University of
Surrey, UK) TUT Tampere University of Technology UA Universal Axes VisualSEEK CBIR system, Columbia University WT Wavelet Transform WTMM Wavelet Transform Modulus Maxima
10
1 Introduction
During the last decade there has been a rapid increase in volume of image and video
collections. A huge amount of information is available, and daily gigabytes of new visual
information is generated, stored, and transmitted. However, it is difficult to access this
visual information unless it is organized in a way that allows efficient browsing,
searching, and retrieval. Traditional methods of indexing images in databases rely on a
number of descriptive keywords, associated with each image. However, this manual
annotation approach is subjective and recently, due to the rapidly growing database sizes,
it is becoming outdated. To overcome these difficulties in the early 1990s, Content-Based
Image Retrieval (CBIR) emerged as a promising means for describing and retrieving
images. According to its objective, instead of being manually annotated by text-based
keywords, images are indexed by their visual content, such as color, texture, shape, and
spatial layout.
The importance of content-based retrieval for many applications, ranging from art
galleries and museum archives to picture collections, criminal investigation, medical and
geographic databases, makes the visual information retrieval one of the fastest growing
research fields in information technology. Therefore, many content-based retrieval
applications have been created for both research and commercial purposes. In the late
1990s – with the vast introduction of digital images and video to the market – the
necessity for interoperability among different applications that deal with AV content
description arose. For this purpose, in 1997 the ISO MPEG Group initiated the “MPEG-7
Multimedia Description Language” work item. As a goal of this activity, an international
11
1 Introduction
12
MPEG-7 standard was issued in July 2001, defining standardized descriptions and
description systems that allow users or agents to search, identify, filter, and browse
audiovisual content.
The MUVIS system [32], developed at TUT extracts the low level features, such as color,
texture, and shape. It evaluates similarity between a query image and all the other images
in the database specified by the user, and returns the most similar images as best matches.
There exist several systems and applications similar to MUVIS, such as QBIC [12], NeTra
[45], VisualSEEK (color&texture) [23], SQUID [19] and Photobook [6].
The aim of TUT MuVi is to develop a fast, real-time and reliable audio/video (AV)
browsing and indexing framework, which is also capable of extracting well-defined key
features of the AV media. This thesis concentrates on describing shape and texture
attributes, which are now applied on still images. Later, after efficient segmentation, they
should be applied also on video sequences.
Shape is one of the most important visual attributes in an image. In fact, the human visual
system is able to extract and abstract shapes from very complex scenes. The concept of
shape is invariant to translations, rotations, and scaling, the shape of an object is a binary
image representing the extent of the object. Due to these considerations shape presentation
is one of the most challenging aspects of computer vision. Shape representations can be
roughly classified in two major categories: boundary-based and region-based. The former
represents shape by its outline, while the latter considers shape being formed of a set of
two-dimensional regions. The human visual system itself focuses on edges and ignores
uniform regions. Therefore, this thesis is mostly concentrated on boundary-based
representations. On the other hand, feature vectors extracted from boundary-based
representations provide a richer description of a shape allowing the development of multi-
resolution shape description. In this thesis, we give short introduction to most well known
region-based, boundary-based and multi-resolution techniques. In addition, a recently
developed boundary-based approach on shape similarity estimation based on ordinal
correlation is presented.
1 Introduction
13
Texture is another important visual attribute, since it is present almost everywhere in
nature. Textures may be described according to their spatial, frequency or perceptual
properties. This thesis describes briefly several approaches to texture representation and
gives a more detailed view on one spatial representation method (co-occurrence matrices)
and one frequency based method (gabor filter). There is a multi-resolution approach
available for the former, but the latter is a multi-resolution approach by its nature.
In experiments using texture attributes we provide a constrained retrieval application in
which the database contains a set of rock images. In this application, we apply a technique
based on Gray-Level Co-occurrence matrices (GLCM) and compare the results with a
well-known method from the literature. Evaluation of the results is also provided.
This thesis is organized as follows. Chapter 2 reviews the idea behind content-based
retrieval. Chapter 3 gives an overview of the human visual system characteristics, which
play an important role in similarity assessment. Chapters 4 and 5 represent the core of this
thesis. Chapter 4 begins by introducing different shape attributes. It continues by
presenting a novel boundary-based method using distance transformation and ordinal
correlation. The chapter ends by providing some experimental results. Chapter 5 presents
different texture attributes used in content-based image retrieval. The chapter is concluded
by a constrained application in which gray-level co-occurrence matrices are applied to
retrieval of rock images. Chapter 6 provides the conclusions of this thesis.
2 Content-based Image Retrieval (CBIR)
2.1 The Problem of Content-based Retrieval
The idea behind content-based retrieval is to retrieve, from a database, media items (such
as images, video and audio) that are relevant to a given query. Relevancy is judged based
on the content of media items. Several steps are needed for this. First, the features from
the media items are extracted and their values and indices are saved in the database. Then
the index structure is used to ideally filter out all irrelevant items by checking attributes
with the user’s query. Finally, attributes of the relevant items are compared according to
some similarity measure to the attributes of the query and retrieved items are ranked in
order of similarity. This chapter provides a short introduction to each of the steps
mentioned above, which are also shown in Figure 2-1.
Figure 2-1 Block diagram of the content-based retrieval system
14
2.2 Feature Extraction
15
2.2 Feature Extraction
Feature extraction is one of the most important components in a content-based retrieval
system. Since a human is usually judging the results of the query, extracted features
should mimic the human visual perception as much as possible. In broad sense, features
may be divided into low-level features (such as color, texture, shape, and spatial layout)
and high-level semantics (such as concepts and keywords). Use of only low-level features
might not always give satisfactory results, and therefore, high-level semantics should be
added to improve the query whenever possible. High-level semantics can be either
annotated manually or constructed automatically from low-level features. In this chapter
the general low-level visual features are described.
2.2.1 Color
Color is one of the most widely used visual attributes in image retrieval. In fact, most
existing image retrieval systems such as QBIC [12], Netra [45], and VisualSEEK[23] are
most efficient in color retrieval. Retrieval by color similarity requires using such models
of color stimuli that distances in color space correspond to human perceptual distances
between colors. Studies by psychologists and artists have demonstrated that the presence
and distribution of colors induce sensations and convey meaning to the observer,
according to specific rules, which are explained in more detail in [3].
Color histogram is the most commonly used presentation. The histogram reflects the
statistical distribution, or the joint probability of the intensities of the three color channels.
The color histogram is computed by discretizing the colors within the image and counting
the number of pixels of each color. Chromatic similarity between the query histogram and
the histograms of the database images can be evaluated by computing the L1 and L2
distances [3]. L1 distance is a measure of histogram intersection, whereas a L2-related
metric takes into account the similarities between similar but not necessarily identical
colors [46].
Color stimuli are commonly represented as points in three-dimensional color spaces.
Before building the histogram the hardware-oriented (RGB) color space is usually
converted into some perceptually uniform color space, such as the HSV (hue, saturation,
2.2 Feature Extraction
16
value) space. Hue describes the actual wavelength of the color percept, saturation indicates
the amount of white light present in the color and brightness (value) represents the
intensity of the color.
2.2.2 Shape
The shape of an object is a binary image representing the extent of the object. Since the
human perception and understanding of objects and visual forms relies heavily on their
shape properties, shape features play a very important role in CBIR. In general the useful
shape features can be divided into two categories, boundary-based and region-based.
These representations will be introduced in Chapter 4, where two additional extensions for
boundary-based representations, multi-resolution approach and similarity evaluation based
on ordinal correlation, are presented.
2.2.3 Texture
Although no single formal definition for texture exists [33], we refer to texture as an area
containing variations of intensities, which form repeated patterns. Those patterns can be
caused by physical surface properties, such as roughness, or they could result from
reflectance differences, such as the color on a surface. Differences observed by visual
inspection are difficult to define in quantitative manner, which leads to the necessity of
defining texture using some features. In this thesis textural attributes are divided into three
categories: spatial, frequency and moment-based attributes [3]. Those properties will be
discussed in more detail in Chapter 5.
2.2.4 Spatial layout
Spatial relationships between entities often capture the most relevant information in an
image. However, defining similarity according to spatial relationships is generally
complex because relations are not represented by a single crisp statement but rather by a
set of contrasting conditions, which are concurrently satisfied with different degrees. In [3]
spatial relationships are divided into two categories: object-based and relational structures.
In object-based structures spatial relationships are not explicitly stored but visual
information is included in the representation. In this case images are retrieved using object
coordinates. Object-based structures are based on a space partitioning technique that
2.2 Feature Extraction
17
allows a spatial entity to be located in the space it occupies. Therefore, in image retrieval
systems, they can be employed in spatial queries concerned with finding a spatial entity
within the image space.
Relation based structures do not include visual information and preserve only a set of
spatial relationships discarding all uninteresting relationships. Objects are represented
symbolically and spatial relationships explicitly. In image retrieval systems, relation based
structures are suited for finding all images with objects in similar relationships as a query
image.
2.3 Similarity models
2.3.1 The metric model
In the metric model, it is assumed that a set of features models the properties of the input
object (stimulus) so that it can be presented as a point in a suitable feature space. If d is a
distance function and s1, s2, s3 are generic stimuli the following metric axioms must be
verified:
• constancy of self-similarity: ( ) ;3,2,10, == iforssd ii
where * indicates the complex conjugate. Assuming that the local texture regions are
spatially homogeneous, and the mean µmn and the standard deviation σmn of the magnitude
of the transform coefficients are used to represent the region for retrieval purposes:
( )∫∫= dxdyyxWmnmn ,µ , and ( )( ) dxdyyxW mnmnmn
2,∫∫ −= µσ . (5.21)
A feature vector can be constructed using µmn and σmn as feature components.
5.5 Experiments
This chapter presents an application of gray level co-occurrence matrix (GLCM) to
texture-based similarity evaluation of rock images [30]. Retrieval results were evaluated
for two databases, one consisting of the whole images and the other with blocks obtained
by splitting the original images. The retrieval results for both databases were obtained by
calculating distance between the feature vector of the query image and other feature
vectors in the database. The performance of the co-occurrence matrices was also
compared to that of Gabor wavelet features. This similarity evaluation application could
reduce the cost of geological investigations by allowing improved accuracy in automatic
rock sample selection.
5.5 Experiments
59
5.5.1 Testing Database
Description of the Testing Database
The original testing material consists of 168 rock texture images from the collection of
Insinööri-toimisto Saanio&Riekkola Oy. The images in the database are divided into 7
different categories, each containing 24 distinct images. One image from each class can be
seen in Figure 5-1. Apparently some of the images do not contain only one texture
(especially class 4). Therefore, each of the original images is divided into 9 sub images
resulting in a database, which consists of 1512 images (216 images from each class).
Retrieval results were produced for both original and split database.
Figure 5-1 Rock Texture Classes
Normalization of the Testing Database
Before computing the co-occurrence matrices the images in the database are normalized.
The idea is to overcome the effects of monotonic transformations of the true image gray
levels caused by variations of lightning, lens, film and digitizers. Normalization is done
for all the images in the database by setting mean and standard deviation to common
values.
5.5 Experiments
60
5.5.2 Retrieval Procedure
This CBIR system used here consists of two major parts. The first one is feature
extraction, where a set of features is generated to represent the content of each image in
the database. The second task is similarity measurement, where a distance between the
query image and each image in the database is computed using their feature vectors so that
the N most similar images can be retrieved. The block diagram of the system is presented
in Figure 5-2.
Co-occurrence matrices are calculated for all the images in the normalized database.
GLCM is build by incrementing locations, where certain gray levels i and j occur at
distance d apart from each other. To normalize GLCM, its values are divided by the total
number of increments.
Features energy, entropy, contrast, and inverse difference moment are calculated from
each co-occurrence matrix and their values are saved in the feature vector of the
corresponding image. In order to avoid features with larger values having more influence
in the cost function (Euclidean distance) than features with smaller values, feature values
inside each feature class are normalized to be in the range [0,1].
The similarity between images is estimated by summing up Euclidean distances between
corresponding features in their feature vectors. Images having feature vectors closest to
feature vector of the query image are returned as best matches.
5.5 Experiments
61
Figure 5-2 Block Diagram of the Retrieval Procedure
5.5.3 Retrieval Results
The distance parameter can be optimized for each texture type based on the size of textural
elements [6]. However, our testing database consists of images having different sized
textural elements, and therefore, a common distance parameter is rather difficult to find.
The retrieval results shown in this section are produced using a distance vector d=[1,1].
The distance vector has been selected such as to take better into consideration also textures
containing small textural elements.
The experiments described in this section were conducted as follows: each image is
extracted from the database and considered as a query image. For each query image 20
best matches were retrieved from the database. Ideally the goal is to have all the retrieved
query images belonging to the same class as the query. The overall percentage of best
matches coming from certain class is shown in Table 5-2 and Table 5-3. Table 5-2 shows
the results for normalized whole rock images, while Table 5-3 presents the results for
normalized split rock images.
5.5 Experiments
62
C1 C2 C3 C4 C5 C6 C7
100.0%
41.7% 5.4% 21.9% 16.2% 10.2% 4.6%
10.0% 64.6% 20.0% 5.4%
19.8% 16.3% 29.4% 3.9% 30.6%
16.3% 73.9% 9.8%
9.8% 1.0% 3.8% 15.4% 70.0%
0.2% 12.7% 87.1%
Table 5-2 Retrieval results for whole rock images
C1 C2 C3 C4 C5 C6 C7
100.0%
48.3% 4.1% 19.6% 8.5% 12.6% 6.9%
3.3% 82.7% 13.9% 0.1%
19.8% 15.4% 47.1% 2.6% 15.1%
9.9% 78.1% 12.0%
11.8% 0.4% 2.4% 13.4% 71.9% 0.1%
6.1% 6.2% 87.7%
Table 5-3 Retrieval results for split rock images
5.5.4 Evaluation of the Results
As can be seen from Table 5-2 the retrieval results especially for class 1, but also for
classes 3, 5, 6 and 7 are quite reasonable. However, problems occur when considering
classes 2, and particularly class 4. Main reason for this might be that some images inside
these classes contain large areas of some other texture. To reduce this problem original
images are divided into 9 blocks and the retrieval procedure is applied for them. Since
these blocks usually consist of one texture, visual appearance of the retrieval results is
better. Retrieval results for block 03_15_05 can be seen in Figure 5-5. Table 5-3 suggests
that overall percentages of retrieval results from the correct class are slightly better for
blocks than for the whole images.
5.5.5 Comparison with Gabor Results
The performance of co-occurrence matrices was compared to that of Gabor wavelet
features [11], which enable simultaneous localization of energy in both spatial and
frequency domains. Although the upper and lower central frequency parameters Uh and Ul
5.5 Experiments
63
as well as the number of scales and orientations of the Gabor filter were chosen to be
suitable for the rock image database, retrieval percentages (Table 5-4) were considerably
lower than those for co-occurrence matrices (Table 5-3).
C1 C2 C3 C4 C5 C6 C7
48.4% 14.4% 11.0% 13.3% 12.7% 0.2%
13.1% 29.8% 17.7% 11.7% 9.8% 6.2% 11.7%
9.8% 22.5% 30.9% 8.7% 8.5% 9.6% 10.0%
0.4% 13.9% 8.8% 65.4% 1.7% 0.4% 9.4%
12.9% 10.3% 6.0% 1.0% 40.2% 29.6%
12.7% 6.0% 6.9% 0.2% 33.1% 41.1%
0.4% 11.9% 9.8% 4.6% 0.2% 73.1%
Table 5-4 Retrieval results for split rock using Gabor features
The impropriety of the Gabor filters for this particular application might result from the
fact that frequency characteristics of different rock classes were quite similar and no
significant directionality exists in the rock samples. Figure 5-3 shows the FFT of the gabor
filter using parameters chosen to be suitable for rock images. From Figure 5-4 can be seen
that the FFTs of the images from class 1 and class 3 are not very clearly distinguishable.
5.5 Experiments
64
Figure 5-3 FFT of the Gabor filter, using Ul=0.02, Uh=0.2, 3 scales and 2 orientations
Figure 5-4 FFTs of the images 01_01_01.tif and 03_01_01.tif
5.5 Experiments
65
Query 03_15_05.tif
Figure 5-5 Retrieval results for image 03_15_05.tif
6 Conclusions
The goal of this thesis has been to give an overview of the content-based retrieval process
while focusing mainly on shape and texture attributes. In the thesis two contributions have
been presented: the first one is related to shape attributes and the second one to texture
attributes used in content-based retrieval.
The first contribution of the thesis concerns shape analysis and retrieval. Shape attributes
can be roughly divided into two main categories: boundary-based and region-based. Since
the human visual system itself focuses on edges and ignores uniform regions, this thesis
concentrates on boundary-based representations. A novel boundary-based method using
distance transformation and ordinal correlation has been developed in this thesis.
Simulation results showed that the proposed technique produces encouraging results when
using the MPEG-7 shape test database. When applying this technique for evaluation of
segmentation algorithms, it was noticed that the results obtained correspond to visual
perception.
The second contribution is a constrained application in which the database contains a set
of rock images. In this application, a technique using Gray-Level Co-occurrence matrices
(GLCM) was applied and the results were compared with a well-known method from the
literature. It was found that GLCM outperforms Gabor Wavelet features when considering
retrieval time and visual quality of the results. The impropriety of the Gabor filters for this
particular application might result from the fact that frequency characteristics of different
rock classes were quite similar and no significant directionality exists in the rock samples.
66
6 Conclusions
67
In this thesis shape and texture features have been used for content-based image retrieval
separately. However, many applications, for example searching a certain disease from a
medical image database, would benefit from using both of the features simultaneously. In
such cases both the shape of the tumor area and texture inside it play an important role.
Therefore, an efficient combination of these features into multimodal descriptors of the
AV content should be considered in more detail in future work. Especially the
development of a flexible and dynamically adjustable similarity measure based on
relevance feedback obtained from the end-user should be taken into account.
References
[1] A. Alatan, L. Onural, M. Wollborn, and R. Mech, “Image Sequence Analysis for
Emerging Interactive Multimedia Services – the European COST 211 framework”, IEEE Trans. on Circuits and Systems for Video Technology, vol. 8, no. 7, pp. 802-813, 1998.
[2] A. Baraldi and F. Parmiggiani, “An Investigation of the Textural Characteristis Assosiated with Gray Level Cooccurrence Matrix Statistical Parameters”, IEEE Trans. on Geoscience and Remote Sensing, vol. 33, no. 2, pp.293-304, 1995.
[3] A. Del Bimbo, Visual Information Retrieval, 270 pages, Morgan Kaufmann Publishers, San Francisco, California, 1999.
[4] A.K. Jain and F.Farrokhnia, “Unsupervised Texture Segmentation using Gabor Filters”, Pattern Recognition, vol. 24, no.12, pp. 1167-1186, 1991.
[5] A. Kumar, and K.H. Pang, “Defect Detection in Textured Materials using Gabor Filters”, IEEE Transactions on Industry Applications, vol. 38, no. 2, pp. 425-440, March/April 2002.
[6] A. Pentland, R.W. Picard, and S. Sclaroff, “Photobook: Content-Based Manipulation of Image Databases”, International Journal of Computer Vision, 18(3), pp.233-254, 1996.
[7] A. Visa, “Texture Classification and Segmentation based on Neural Network Methods”, PhD. Thesis, p. 48, Helsinki University of Technology, Laboratory of Computer and Information Science, Otaniemi, Finland, 1990.
[8] B. Cramariuc, I. Shmulevich, M. Gabbouj, and A. Makela, “A New Image Similarity Measure Based on Ordinal Correlation”, Proc. International Conference on Image Processing, vol. 3, pp. 718-721, Vancouver, BC, Canada, September 2000.
[9] B.S. Manjunath, C. Shekhar, and R. Chellappa, “A New Approach to Image Feature Detection with Applications”, Pattern Recognition, vol. 29, no. 4, pp. 627-640, 1996.
[10] B. S. Manjunath, J-R. Ohm, V.V. Vasudevan, and A. Yamada, “Color and Texture Descriptors”, IEEE Trans. on Circuits and Systems for Video Technology, vol. 11, no. 6, pp.703-715, June 2001.
[11] B.S. Manjunath, and W.Y. Ma, “Texture Features for Browsing and Retrieval of Image Data”, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, August 1996.
[12] C. Faloutsos, W. Equitz, M. Flickner, W. Niblack, D. Petkovic, and R. Barber, “Efficient and Effective Querying by Image Content”, Journal of Intelligent Information Systems, vol.3, pp. 231-262, 1994.
[13] F. Alaya Cheikh, A. Quddus, and M. Gabbouj, “Multi-level Shape Recognition based on Wavelet-Transform Modulus Maxima”, Proc. of Southwest Symposium on Image Analysis and Interpolation, SSIAI 2000, pp. 8-12, Austin, Texas, USA, April 2-4, 2000.
[14] F. Alaya Cheikh, B. Cramariuc, M. Partio, P. Reijonen, and M. Gabbouj, “Ordinal-Measure Based Shape Correspondence”, Eurasip Journal on Applied
68
References
69
Signal Processing, vol. 2002, no. 4, April 2002, pp.362-371. [15] F. Mokhtarian, and R. Suomela, “Robust Image Corner Detection Through
Curvature Scale Space”, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 20, no. 12, pp. 1376-1381, December 1998.
[16] H. Tamura, S. Mori, and T. Yamawaki, “Textural Features Corresponding to Visual Perception”, IEEE Trans. on Systems, Man, and Cybernetics, vol. SMC-8, no. 6, pp. 460-473, June, 1978.
[21] I. Shmulevich, B. Cramariuc, and M. Gabbouj, “A Framework for Ordinal-based Image Correspondence”, in Proc. X European Signal Processing Conference, Tampere, Finland, September 2000.
[22] J-C. Lin, “Family of the Universal Axes”, Pattern Recognition, vol. 29, no. 3, pp. 477-485, 1996.
[23] J.R. Smith, and S.-F. Chang, “Local color and texture extraction and spatial query”, IEEE Int. Conf. On Image Processing, ICIP’96, Lausanne, Switzerland, September 1996.
[24] J.S. Lee, Y.N. Sun, and C.H. Chen, “Multiscale Corner Detection by Wavelet Transform”, IEEE Trans. on Image Processing, vol. 4, pp. 100-104, 1995.
[25] J.S. Weszka, C. R. Dyer, and A. Rosenfeld, “A Comparative Study of Texture Measures for Terrain Classification”, IEEE Trans. Syst. Man Cybern, vol. SMC-6, no. 4, 1976.
[26] M. Bober, “MPEG-7 Visual Shape Descriptors”, IEEE Trans. on Circuits and Systems for Video Technology, vol. 11, no. 6, pp. 716-719, June 2001.
[27] M. Gabbouj, G. Morrison, F. Alaya Cheikh, and R. Mech, “Redundancy Reduction Techniques and Content Analysis for Multimedia Services – the European COST 211quat Action”, in Proc. Workshop on Image Analysis for Multimedia Services, Berlin, Germany, 31 May – 1 June 1999.
[28] M. Gabbouj, S. Kiranyaz, K. Caglar, B. Cramariuc, F. Alaya Cheikh, O.Guldogan, and E. Karaoglu, “MUVIS: A Multimedia Browsing, Indexing and Retrieval System”, Proceedings of the IWDC 2002 Conference on Advanced Methods for Multimedia Signal Processing, Capri, Italy, September 2002.
[29] M. Qinghong, “Texture Analysis with Applications to Content-based Image Indexing and Retrieval”, Master of Science Thesis, p.53, Tampere University of Technology, May 1999.
[30] M. Partio, B. Cramariuc, M. Gabbouj, and A. Visa, “Rock Texture Retrieval using Gray Level Co-occurrence Matrix”, NORSIG-2002, 5th Nordic Signal Processing Symposium, On Board Hurtigruten M/S Trollfjord, October 4-7, 2002, Norway.
[31] M. Trimeche, “Shape Representations for Image Indexing and Retrieval”, Master of Science Thesis, Tampere University of Technology, May 2000.
[32] M. Trimeche, F. Alaya Cheikh, B. Cramariuc, and M. Gabbouj, “Content-based Description of Images for Retrieval in Large Databases: MUVIS”, X European Signal Processing Conference, Eusipco-2000, vol. 1, September 5-8, 2000,
Tampere, Finland. [33] M. Tuceryan, and A.K. Jain, The Handbook of Pattern Recognition and
Computer Vision (2nd Edition), by C.H. Chen, L.F. Pau, and P.S.P Wang (eds.), pp. 207-248, World Scientific Publishing Co., 1998.
[34] P. Reijonen, “Shape Analysis for Content-Based Image Retrieval”, Master of Science Thesis, Tampere University of Technology, p. 71, October 2001.
[35] P. J. Toivanen, “New Geodesic Distance Transforms for Gray-scale Images”, Pattern Recognition Letters, vol. 17, no. 5, pp. 437-450, 1996.
[36] R.C. Gonzalez, and R.E. Woods, Digital Image Processing, Second Edition, Prentice-Hall, 2002, p. 793.
[37] R.M. Haralick, “Statistical and Structural Approaches to Texture”, Proceedings of the IEEE, vol. 67, no. 5, May 1979, pp. 786-804.
[38] R.M. Haralick, K.Shanmugam, and I. Dinstein, “Textural Features for Image Classification”, IEEE Trans. on Systems, Man and Cybernetics, vol. SMC-3, no. 6, November 1973, pp. 610-621.
[39] R. Mech, “Objective Evaluation Criteria for 2D-shape Estimation Results of Moving Objects”, in Proc. Workshop on Image Analysis for Multimedia Services, pp.23-28, Tampere, Finland, May 2001.
[40] R. Mech and F. Marqués, “Objective Evaluation Criteria for 2D-shape Estimation Results of Moving Objects”, Eurasip Journal on Applied Signal Processing, vol. 2002, no. 4, April 2002, pp.401-409.
[41] S. Berchtold, C. Böhm, and H.-P. Kriegel, “The Pyramid-Technique: Towards Breaking to Curse of Dimensionality”, Proc. of Int. Conf on Management of Data, ACM SIGMOD, Seattle, Washington, 1998.
[42] S. Loncaric, “A Survey of Shape Analysis Techniques”, Pattern Recognition, vol. 31, no. 8, pp. 983-1001, 1998.
[43] S.G. Mallat, and S Zhong, “Characterization of Signals from Multi-scale Edges”, IEEE Trans. on Pattern Recognition and Machine Intelligence, vol. 14, pp. 710-732, 1992.
[44] T.S. Lee, “Image Representation Using 2D Gabor Wavelets”, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 18, no. 10, October 1996. pp. 959-971.
[45] W.Y. Ma, and B.S. Manjunath, “NeTra: A Toolbox for Navigating Large Image Databases”, IEEE Int. Conf. On Image Processing, ICIP’97, Santa Barbara, October 1997.
[46] Y. Rui, T.S. Huang, and S.-F. Chang, “Image Retrieval: Current Techniques, Promising Directions and Open Issues”, Journal of Visual Communication and Image Representation, vol. 10, no. 1, pp. 39-62, March 1999.