Retrieval by spatial similarity: an algorithm and a comparative evaluation q E. Di Sciascio a, * , M. Mongiello a , F.M. Donini b , L. Allegretti a a Politecnico di Bari, Dipartimento di Elettrotecnica ed Elettronica, Via Re David, 200, 70125Bari, Italy b University of Tuscia, Viterbo, Italy Received 1 April 2003; received in revised form 24 May 2004 Available online 15 July 2004 Abstract We present an algorithm for retrieval by spatial similarity of symbolic images. The proposed algorithm is based on graph-matching and is invariant to scaling, rotation and translation, recognizes multiple rotation variants and can deal with multiple instances of a symbolic object in an image. It is particularly suitable for a query by sketch approach, in that it requires that at least objects in the query have to be present in the database image. Our approach is proposed in comparison with other two well-known algorithms. Peculiarities are analyzed and motivated. Results of a comparative evaluation on a test dataset of symbolic images are presented and discussed. Ó 2004 Elsevier B.V. All rights reserved. Keywords: Spatial arrangement; Similarity retrieval; Query by sketch; Symbolic images 1. Introduction In recent years, methods for content-based image retrieval based on primitive features extraction such as color, texture and shape have been widely explored and implemented in research prototypes and commercial systems. Methods for primitive features extraction are now mature en- ough to provide efficient retrieval. Yet to be able to carry out content-based image retrieval at a higher level of abstraction, the input to the problem cannot be just an array of pixels. An image has to be considered as an arrangement of parts that can be obtained either as the output of a segmentation process, or using a graphical language to describe the components. Given a sketch or a description with a few parts arranged according to user’s specification, the problem is to establish whether the sketch can be recognized in the image, and if so to what extent. Recently, new languages have been proposed for the description of two- and three-dimensional q This work was carried out in the framework of projects CNOSSO and MS3DI. * Corresponding author. Tel.: +39-0805-460641; fax: +39- 0805-460410. E-mail addresses: [email protected](E. Di Sciascio), [email protected] (M. Mongiello), [email protected] (F.M. Donini), [email protected] (L. Allegretti). 0167-8655/$ - see front matter Ó 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2004.06.010 Pattern Recognition Letters 25 (2004) 1633–1645 www.elsevier.com/locate/patrec
13
Embed
Retrieval by spatial similarity: an algorithm and a comparative evaluation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Pattern Recognition Letters 25 (2004) 1633–1645
www.elsevier.com/locate/patrec
Retrieval by spatial similarity: an algorithmand a comparative evaluation q
E. Di Sciascio a,*, M. Mongiello a, F.M. Donini b, L. Allegretti a
a Politecnico di Bari, Dipartimento di Elettrotecnica ed Elettronica, Via Re David, 200, 70125Bari, Italyb University of Tuscia, Viterbo, Italy
Received 1 April 2003; received in revised form 24 May 2004
Available online 15 July 2004
Abstract
We present an algorithm for retrieval by spatial similarity of symbolic images. The proposed algorithm is based on
graph-matching and is invariant to scaling, rotation and translation, recognizes multiple rotation variants and can deal
with multiple instances of a symbolic object in an image.
It is particularly suitable for a query by sketch approach, in that it requires that at least objects in the query have to
be present in the database image.
Our approach is proposed in comparison with other two well-known algorithms. Peculiarities are analyzed and
motivated. Results of a comparative evaluation on a test dataset of symbolic images are presented and discussed.
� 2004 Elsevier B.V. All rights reserved.
Keywords: Spatial arrangement; Similarity retrieval; Query by sketch; Symbolic images
1. Introduction
In recent years, methods for content-based
image retrieval based on primitive features
extraction such as color, texture and shape have
been widely explored and implemented in researchprototypes and commercial systems. Methods for
qThis work was carried out in the framework of projects
CNOSSO and MS3DI.*Corresponding author. Tel.: +39-0805-460641; fax: +39-
1634 E. Di Sciascio et al. / Pattern Recognition Letters 25 (2004) 1633–1645
objects, based on structured building blocks, and
on syntactical objects that can be grouped and
modified together. Structured descriptions of
three-dimensional images are present in e.g., lan-
guages for virtual reality like VRML, or in hier-
archical object modeling. They also appear inlanguages adopted in graphical environments for
describing three-dimensional models for architec-
tural design such as AutoCAD, Illustrator,
CorelDraw, Freehand, etc. Interesting applica-
tions involve also the novel Scalable Vector
Graphics (SVG) language, a W3C approved
standard for describing two-dimensional graphics
at the level of objects rather than individual points,with descriptions based on XML.
With reference to previous work on the subject,
methods for retrieval by spatial similarity can be
classified into symbolic projection, graph-match-
ing, geometric and spatial reasoning methods.
The first class includes approaches in which
retrieval of images basically reverted to string
matching (Chang et al., 1983). The modeling oficonic images was presented in terms of 2D string,
each string accounting for the position of icons
along one of the two planar dimensions. Variants
of 2D string such as 2D-G-string (Chang and
Jungert, 1991), 2D-C-string (Lee and Hsu, 1990)
have been proposed to deal with situations of
overlapping objects with complex shapes.
Methods of the second class describe domainobjects included in an image and their spatial
relations using a Spatial Orientation Graph (SOG)
(Gudivada and Raghavan, 1995). Objects in a
symbolic image are associated with vertexes in a
weighted graph. The spatial relations among the
objects are represented through the list of the
edges connecting pairs of centroids. A similarity
function computes the degree of closeness betweenthe two edge lists representing the query and the
database picture as a measure of the matching
between the two spatial graphs. Several variants of
methods based on graph-matching have been
proposed. A recent paper on the topic is (El-Kwae
and Kabuka, 1999), where SIMDTC algorithm is
proposed as an extension of the spatial-graph ap-
proach including both the topological and direc-tional constraints. The topological extension of the
objects can be obviously useful in determining
further differences between images. The similarity
algorithm extends the graph-matching proposed in
(Gudivada and Raghavan, 1995) and retains the
properties of the original approach, including its
invariance to scaling, rotation and translation
and is also able to recognize multiple rotationvariants.
Gudivada proposed a logical representation of
an image based on so-calledHR-string (Gudivada,
1998). Such a representation also provides a
geometry-based approach to iconic indexing based
on spatial relationships between the iconic objects
in an image individuated by their centroid coor-
dinates. Translation, rotation and scale variantimages and the variants generated by an arbitrary
composition of these three geometric transforma-
tions are considered. The similarity between a
database and a query image is obtained through a
spatial similarity algorithm, SIMG, that measures
the degree of similarity between a query and a
database image by comparing the similarity be-
tween their HR-strings. The algorithm recognizesrotation, scale and translation variants of the
arrangement of objects, and also subsets of the
arrangements. A constraint limiting the practical
use of this approach is the assumption that an
image can contain at most one instance of each
icon or object. An extension of this approach has
been proposed in (Zhou et al., 2001), where objects
are not identified by a single point, and their ori-entation is taken into account.
In this paper we propose an algorithm for
retrieval by spatial similarity that measures the
degree of similarity between the spatial layout of
objects in a sketched query and the layout of ob-
jects in a symbolic image. The proposed algorithm
is invariant to scaling, rotation and translation and
can deal with multiple instances of an object in animage. The algorithm is particularly suitable for a
query by sketch approach, in that it requires that
at least objects in the query have to be present in
the database image. The rationale is that a user
will sketch first objects that he/she believes more
relevant to his needs, and eventually add more
details if the retrieved set is too large. In our set-
ting, the user in the query/retrieval loop refines his/her search by adding new details, but he/she can be
sure that at least elements explicitly included in the
E. Di Sciascio et al. / Pattern Recognition Letters 25 (2004) 1633–1645 1635
query will be present in the retrieved set. The
algorithm is sound with a formalism that some of
us proposed in (Di Sciascio et al., 2002b). There a
theory was proposed for representing images and
an algorithm for reasoning about images that is
sound and complete with respect to the semanticsof the proposed formalism. The method was also
extended to support structured, content-based
indexing and retrieval of SVG documents (Di
Sciascio et al., 2002c).
The remaining of the paper is organized as
follows: in Section 2 we describe the three algo-
rithms we consider: our proposal and the algo-
rithms we adopt for comparison, i.e., SIMG andSIMDTC . Then, in Section 3, we compare behavior
of the algorithms in a controlled setting and dis-
cuss relevant results. We draw conclusions in the
last section.
2. Algorithms for spatial similarity
We start describing the three analyzed algo-
rithms: SIMG (Gudivada, 1998), SIMDTC (El-
Kwae and Kabuka, 1999), and SIML, the one we
propose. All three algorithms measure the degree
of similarity between the spatial layout of objects
in a graphical query, and the layout of objects in a
database image.
In their current definition, SIMG and SIMDTC
cannot be directly compared with SIML since they
have more restrictive requirements. We properly
modified their definition in order to cope with the
requirements of SIML, without affecting their
structure.
Fig. 1. SIMG a
In the following subsections we describe the
three algorithms and explain their characteristics
with reference to a single pair of images: a query
image Iq and a database image Id obtained as a
rotation variant. Image variants can be perfect or
multiple ones. A perfect variant is generated byapplying to all the objects in the image the same
transformation with the same magnitude. Other-
wise the variant is multiple.
2.1. SIMG algorithm
SIMG algorithm is based on the definition of
HR-string by Gudivada (1998). An image I ¼ fO0;O1; . . . ;On�1g composed by n objects is represented
as a symbolic image by associating a name with
each object. The objects are ordered according to
increasing values of the angle h that the line joiningthe centroids of the objects to the centroid of the
image subtends with the positive x-axis. The
notation assumes that there exist neither multiple
instances of an object type, nor objects having thesame centroids coordinates.
HR-strings are obtained by associating each
object with information concerning those objects
that precede and follow it in the considered order.
Fig. 1 pictures a query image Iq having four
objects, a joypad, a headset, a camera and a
monitor and a database image Id that is a rotation
variant of Iq. Objects are ordered considering theangle h between the positive x-axis and the segment
joining the centroid of the image with the centroids
of each object. The order is given by the joypad,
the headset, the camera and the monitor; the ob-
jects are given indexes Oq0, Oq
1, Oq2, Oq
3. In the
lgorithm.
1636 E. Di Sciascio et al. / Pattern Recognition Letters 25 (2004) 1633–1645
database image there is a different order since the
image is a rotation variant of the query: the order
is given by the monitor, the joypad, the headset
and the camera and the objects are given indexes
Od0, O
d1, O
d2, O
d3.
The HR-string is obtained by associating eachobject with an index representing the object name,
the indexes of the two objects respectively on the
left and on the right side of it (from the viewpoint
of an observer posed in the center of the group),
the two distances between the centroid of the ob-
ject and the centroids of the left and right neigh-
bors.
The algorithm compares the two HR-stringsrepresenting the query image and the database
image and returns a similarity value. Similarity is
computed by summing three contributions given
by an object factor, a scale factor and a spatial
factor.
The object factor sums a contribution due to
the existence of corresponding objects in the two
HR-strings. The spatial factor indicates the degreeof importance of the relationship among the ob-
jects. The scale factor, that includes a function of
the right and left neighbors distances, adds a
contribution that measures the scale variations in
the image.
In the following we explain more in detail the
SIMG algorithm in order to compare it with SIML.
Given the HR-string of the query image, thealgorithm searches in the HR-string of the data-
base image the index of the corresponding object.
In this case, it adds a contribution due to the ob-
ject factor to the similarity measure. It also com-
pares the corresponding left and right neighbors
and adds a spatial and scale factor to similarity
value whether they represent the same objects: in
the example, the algorithm compares the two HR-strings starting in the corresponding object, Oq
1 in
Iq and Od2 in Id. It adds an object factor for the two
corresponding objects and a spatial factor and a
scale factor for the left neighbor since both Oq0 and
Od1 represent a joypad; finally, it adds a spatial
factor and a scale factor for the right neighbor
since both Oq2 and Od
3 represent a camera.
The steps are repeated for all the objects in theHR-string of Iq and the final value of similarity is
returned.
2.2. SIMDTC algorithm
SIMDTC (El-Kwae and Kabuka, 1999) com-
putes similarity between two images as a function
of three factors: number of common objects,closeness of directional spatial relationships and
closeness of topological spatial relationships.
Directional spatial relationships are represented
using a spatial orientation graph, SOG. Each edge
in a SOG connects the centroids of two objects in
the image. An edge list is the set of all the edges in
the SOG and it has nðn�1Þ2
edges for an image having
n objects. Given a query image Iq and a databaseimage Id, the algorithm extracts the edge lists Eq
and Ed for Iq and Id.A similarity degree between the two images is
computed as a similarity function between Eq and
Ed that returns a real number in the range ½0; 1�.Given a pair of objects in Iq and Id, the algo-
rithm computes the difference angle between cor-
responding edges eq and ed, i.e., edge eq connectsthe same objects connected by ed. The angle is
computed by translating eq and ed such that their
starting point coincide with the origin and by
considering the smaller angle.
Fig. 2 pictures an example of the query and
database images used in Fig. 1. Objects are named
with the same indexes used in Fig. 1. The algo-
rithm extracts the edge list both for Iq and Id; theobjects in Iq are O0, O1, O2, O3 considering all the
pairs of vertexes in the graph, the obtained edge
list is Eq ¼ fO0O1;O0O2;O0O3;O1O2;O1O3;O2O3g;similarly Ed is computed.
In the bottom of Fig. 2 it is shown the com-
putation of the difference angle for two example
edges respectively of Iq and Id. We consider the
joypad and the camera with the spatial layout theyhave in Iq and Id and the two relative edges. The
directional spatial relationship is represented by
the hO2angle, i.e., the smaller angle between the
two edges. Considering all the six pairs of edges in
the edge lists Eq and Ed, the algorithm extracts the
six angle h that are used for computing the simi-
larity measure summing a function of cos h over alledge pairs. If both images are identical, this willlead to a maximum value of 1. If the edges between
Ed and Eq do not have the same slope or orienta-
tion, the contributing factor from each edge will
Fig. 2. SIMDTC algorithm.
E. Di Sciascio et al. / Pattern Recognition Letters 25 (2004) 1633–1645 1637
depend on the degree by which the corresponding
edge orientation differ.The h angle is used in computing the hRCA
rotation correction angle used to align both the
image and the query as close as possible to obtain
more accurate similarity. Since the algorithm rec-
ognizes also rotational variant of an image, by
computing the angle of rotation, the query is ro-
tated in reverse direction to the original direction
of rotation according to the hRCA angle. In case ofperfect variants, the result of correction angle will
perfectly align the query with the original image.
Topological properties are considered repre-
senting for each object a minimum bounding
rectangle, i.e., the minimum size rectangle that
completely encloses a given object; topological
relations between a pair of objects are disjoint,
meets, contains inside, overlap, covers, covered by,equals. Topological relationships are computed by
means of a binary function of the type of topo-
logical relationship between corresponding pairs
of objects in the query and in the database image.
For each pair of objects in Iq and Id the function
returns 1 if the two objects have the same topo-
logical relationship, 0 otherwise.
The overall similarity is computed as a weighted
sum of the number of common objects, the direc-tional and the topological components.
The algorithm sums the contributions of direc-
tional and topological components over all the
pairs of corresponding objects in Iq and Id.
2.3. SIML algorithm
SIML provides a scale–rotation–translationinvariant measure of similarity.
Fig. 3 pictures the same query and database
images adopted for describing SIMG and SIMDTC .
For each object, SIML extracts all the oriented
angles obtained joining the centroid of the object
with all the other objects in the image. The algo-
rithm performs the following steps: it considers the
object Oq0 in the query image i.e., the joypad and
the pivot vector with origin in the joypad and
vertex in the next object, Oq1, the headset; it com-
putes the angles a0 obtained considering the pivot
vector and the vector with origin in the next object
Oq1; in the same manner it computes the angle a1,
considering the pivot vector and the vector with
vertex in object Oq3. In the next step, the algorithm
Fig. 3. SIML algorithm.
1638 E. Di Sciascio et al. / Pattern Recognition Letters 25 (2004) 1633–1645
considers a different pivot vector with origin in Oq0
and vertex in the next object Oq2 and extracts the
remaining angle a2.The corresponding angles are extracted for the
database image; in Fig. 3 they are noted as bi.
For a given object, the algorithm computes the
maximum error between corresponding angles.
Similarity measure is obtained as a function of the
maximum error for all the groups of objects. We
use a function Uðx; fx; fyÞ to change a distance x(in which 0 corresponds to perfect matching) to a
similarity measure (in which the value 1 corre-
sponds to perfect matching), and to ‘‘smooth’’ the
changes of the quantity x, depending on two
parameters fx, fy.More formally, the algorithm can be described
as follows:
Algorithm SimLðIq; IdÞ;input a query image Iq ¼ fOq
0 ;Oq1 ; . . . ;O
qn�1g
a subset Id ¼ Od0 ;O
d1 ; . . . ;O
dn�1
� �of shapes of
a database image
output SimL
begin
for i 2 f0; . . . ; n� 1g do
Dspatial½i� ¼ 0;j ¼ iþ 1;
if j ¼ n then j ¼ 0;
while j 6¼ i docompute pivot vector r between Od
j and
Odi ;
compute pivot vector u between Oqj and
Oqi ;
k ¼ jþ 1;
if k ¼ n then k ¼ 0;
while k 6¼ i docompute pivot vector s between Od
k
and Odi ;
b ¼ ComputeAngle brsð Þ;compute pivot vector v between Oq