Page 1
Invariant Generalised Ridgelet-Fourier for Shape-based Image Retrieval
Mas Rina Mustaffa1, Fatimah Ahmad, Ramlan Mahmod, and Shyamala Doraisamy
Department of Multimedia
Faculty of Computer Science and Information Technology
Universiti Putra Malaysia
43400 UPM Serdang, Selangor, Malaysia
{MasRina1, Fatimah, Ramlan, shyamala}@fsktm.upm.edu.my
Abstract— A new shape descriptor called the Invariant
Generalised Ridgelet-Fourier is defined for the application of
Content-based Image Retrieval (CBIR). The proposed spectral-
based method is invariant to rotation, scaling, and translation
(RST) as well as able to handle images of arbitrary size. The
implementation of Ridgelet transform on the ellipse containing
the shape and the normalisation of the Radon transform is
introduced. The 1D Wavelet transform is then applied to the
Radon slices. In order to extract the rotation invariant feature,
Fourier transform is implemented in the Ridgelet domain. The
performance of the proposed method is accessed on a standard
MPEG-7 CE-1 B dataset in terms of few objective evaluation
criteria. From the experiments, it is shown that the proposed
method provides promising results compared to several
previous methods.
Keywords – CBIR; Shape descriptor; Ridgelet transform;
Invariant to RST
I. INTRODUCTION
Content-based Image Retrieval (CBIR) technologies are
developing amazingly due to the need of retrieval systems
that will be able to retrieve images effectively and efficiently.
This CBIR features make this technology useful in many
areas such as crime prevention, medicine, law, science,
fashion, and interior design. Compared to conventional image
retrieval techniques that use indexing keywords to retrieve
image files, CBIR works in a totally different manner by
retrieving images on the basis of automatically derived low-
level features, middle-level features, or high-level features.
Among these features, the low-level features are the most
popular due to their simplicity compared to other levels of
features. Among the low-level features, shape is considered
as one of the most important visual features and it is one of
the basic features used to describe image content. This is
because human beings are likely to distinguish scenes as
being composed of individual objects and these individual
objects are indeed usually identified by their shapes.
It has been found that complicated shapes can be
effectively characterised by using a description with multiple
resolutions [1-2]. This multi-resolution property is
very important as it provides a simple hierarchical framework
for the interpretation of the image information [3]. At
different resolutions, the details of an image generally
characterise different physical structures of the scene. At a
coarse resolution, these details correspond to the larger
structures, which provide the image “context”. It is therefore
natural to analyse first the image details at a coarse resolution
and then gradually increase the resolution. One of the earliest descriptors with multi-resolution
property that is used to describe shapes is the Wavelet [4-5].
However, methods based on Wavelet descriptor do not have
the direction factor, which happens to be an important and
unique feature of multi-dimensional signals. That brings us to
other multi-resolution representations with the direction
factor, which the traditional Wavelet fails to have, such as
Ridgelet [6], Curvelet [7], Contourlet [8], and Beamlet [9].
Of all these transforms, the factor of direction is most
obvious in Ridgelet. Ridgelet transform is introduced by
Candes and Donoho [6] to deal effectively with line
singularities instead of point singularities as in the case of
Wavelet. Ridgelet transform can be described as the
application of Wavelet to the Radon transform of an image.
As the Radon space corresponds to the parameters of the
lines in the image, and applying Wavelet allows detecting
singularities, the Ridgelet transform will detect singularities
in the Radon space, which will correspond to the parameters
of relevant lines in the image. Therefore, the Ridgelet
transform combines advantages from both transforms, which
is the ability to detect lines from the Radon transform, and
the multi-resolution property of a Wavelet to work at several
levels of detail.
The bivariate Ridgelet θψ ,,ba in 2R can be defined as:
),/)sincos((2/1,, abyxaba −+= − θθψψ θ (1)
where a > 0 is a scaling parameter, θ is an orientation
parameter, b is a location scalar parameter, and (.)ψ is a
Wavelet function. This function is constant along the lines
=+ θθ sincos yx constant. Transverse to these ridges is a
Wavelet. Its Ridgelet coefficients can be defined as follows.
Given an integrable bivariate image ),( yxf , as:
79978-1-4244-5651-2/10/$26.00 ©2010 IEEE
Page 2
∫= dxdyyxfba
baR ),(,,
),,( θψθ (2)
Eq. (2) can be deduced into an application of 1D Wavelet
transform to the projections of the Radon transform where the
Radon transform is denoted as:
,)sincos(),(),( dxdyyxyxfRaT ρθθδρθ −+= (3)
where δ is the Dirac distribution. So Ridgelet transform is
precisely the application of a 1D Wavelet transform to the
slices of the Radon transform where the angular variable θ is
constant and ρ is varying.
Ridgelet transform has successfully been applied to
astronomical image representation, image denoising, image
deconvolution, grey and colour image contrast enhancement,
etc. Some of the applications of Ridgelet transform in the
above-mentioned field can be found in [10-13]. However, the
transformation has enjoyed very little exposure in describing
shapes for image retrieval. Apart from that, many of the
existing Ridgelet transforms mentioned in the literature are
only applied to images of size M×M or the M× N images
will need to be pre-segmented into several congruent blocks
with fixed side-length (M×M sub-images) in order to process
them (note that M and N represents the width and height of an
image, respectively). The analysis of arbitrary images
requires the definition of a general descriptor. Therefore, the
existing approaches put a huge limitation in describing
shapes, as they are not flexible for images of various sizes.
Another weakness of the existing Ridgelet transform is that
they are usually defined on square images. According to [14],
Ridgelet transform defined on square images is not suitable
for extracting rotation-invariant features. They propose a
rotation-invariant Ridgelet transform defined on a circular
disc. However, in order to achieve rotation-invariant Ridgelet
transform which can suitably accommodate M × N images,
using circular disc is not suitable either.
Therefore, this work aims to tackle the above-mentioned
issues by introducing a rotation, scaling, and translation (RST)
invariant shape descriptor based on Ridgelet transform that is
able to handle images of arbitrary size. The proposed method
will be tested on few different objective performance
measurements to prove the stability of the method. The
outline of this paper is as follows. Section II explains the
proposed Invariant Generalised Ridgelet-Fourier descriptor.
The proposed framework for evaluation and analysis of
results on the other hand are described in Section III. Finally,
the conclusion is presented in Section IV.
II. INVARIANT GENERALISED RIDGELET-FOURIER
DESCRIPTOR
This research is based on the work done by Chen et al.
[14] where enhancements on their work have been made so
that the descriptor will now results in a RST invariant
Ridgelet transform for images of arbitrary size, hence given
the name Invariant Generalised Ridgelet-Fourier descriptor.
First of all, the M×N images will have to go through the
translation and rotation invariant process. The translation
invariant can be achieved by shifting the centroid of the
pattern image to the image centre through regular moments.
Regular moments mpq are defined as:
),,(1
0
1
0
yxfyxm qM
x
N
y
ppq ∑ ∑=
−
=
−
= (4)
where M and N represents the number of columns and rows,
respectively. The ,, 0110 mm and 00m are found using (4)
above. The value obtained for the respective regular moments
will then be plugged-in into (5) to obtain the centroid location,
x and y .
00
01
00
10 ,m
my
m
mx == (5)
The scaling invariant on the other hand can be done by using
the following equation:
,)()(max 22
0),( yyxxa yxf −+−= ≠ (6)
where a is the longest distance from ),( yx to a point ),( yx
on the pattern.
Next, the pixels of the translation and scaling invariant
image that fall outside of the ellipse template centred at (M/2,
N/2) will be ignored. The Radon transform is then performed
on the elliptical pattern of the image. There are two important
parameters that need to be determined for this process where
one is the theta (θ) and the other parameter is the rho (p). In
order to make the framework suitable for M× N images, the
number of θ will need to be robust for arbitrary image size
and the number of ρ will have to be in the form of 2n to put
up for the 1D Wavelet transform. Therefore, the Radon
transform will be normalised to 128 points for both the θ and
the ρ . In order to ease the calculation, the same number of θ
and ρ is selected for each set. More explanation on the
ellipse template setting and the Radon transform
normalisation can be found in [15].
After normalising the Radon transform, the next step is to
apply the 1D Daubechies-4 Wavelet transform (Db4) on each
of the Radon slice to obtain the Ridgelet coefficients. Db4 is
found to be one of the best Wavelet families to be used for
shape representation [24, 26]. The Db4 low-pass scaling
coefficients are G = [0.483, 0.837, 0.224, -0129] while the
Db4 high-pass scaling coefficients are H = [-0.129, -0.224,
0.837, -0.483].
In order to make the descriptor invariant to rotation, the
1D Fourier transform is performed along the θ direction on
the d3 and d4 Wavelet decomposition levels. The intermediate
scale Wavelet coefficients are usually preferable as the high
80
Page 3
frequency Wavelet decomposition levels are very sensitive to
noise and accumulation errors while the low frequency
Wavelet decomposition levels have lost important
information of the original image. For each of the mentioned
Wavelet decomposition levels, only 15 Fourier magnitude
spectrums are captured to represent the shape. Therefore, the
total coefficients for each image using the proposed Invariant
Generalised Ridgelet-Fourier is 360, which is still a
reasonable size for shape representation.
III. EVALUATION AND ANALYSIS OF RESULTS
In this section, the retrieval performance of the proposed
Invariant Generalised Ridgelet-Fourier descriptor is
compared and tested. A series of experiments are conducted
on an Intel Pentium Dual-Core 2.5 GHz desktop. The
experiments are performed on the standard MPEG-7 CE-1 B
dataset, which is usually used to test the overall robustness of
the shape representations towards rotation, scaling, and
translation invariant as well as similarity retrieval. It consists
of 1400 shapes of 70 groups. There are 20 similar shapes in
each group, which provide the ground truth. The dataset can
be
downloaded at http://www.imageprocessingplace.com/root_fi
les_V3/ image_databases.htm. For the experiments, 50
classes from this dataset are considered which brings us to a
total of 1000 images. Some examples of the images used for
the evaluations of the proposed work are shown in Fig. 1
below.
Figure 1. Some samples of the images used for the evaluation
In order to show the robustness and stability of the
proposed descriptor, the comparison and evaluation are done
based on few evaluation criteria, namely average precision-
recall, Average Retrieval Rate (ARR), Average Normalized
Modified Retrieval Rank (ANMRR), average r-measure, and
average p1-measure.
Precision and recall measures have been widely used for
evaluating the performance of the CBIR system. This is due
to its simple calculations and results obtained from these
measures can be easily interpreted. Apart from that, the
results obtained from these measures are usually visualised
through
graph representations, which will make it easier to analyse.
The retrieval precision, Precision (q) of a system with respect
to a query q is defined as the ratio of the number of retrieved
relevant images, N (q), over the number of total retrieved
images, M (q) [16]. Given a set of Q queries, the average
retrieval precision of the system is then given by (7):
∑=
= Q
qqM
qN
QPrecisionAverage
1)(
)(1 (7)
On the other hand, the retrieval recall, Retrieval (q) of a
system with respect to a query q is the ratio of the number of
retrieved relevant images, N (q), over the total number of
relevant images in the database for the respective query, G
(q) [16]. Given a set of Q queries, the average retrieval recall
of the system is then given by (8) below:
∑=
= Q
q qG
qN
QRecallAverage
1 )(
)(1 (8)
It is a common case where as the number of images returned
to the user increases, precision will decrease while the recall
will increase. Due to this fact, instead of using average
precision or average recall as separate performance measures
for CBIR systems, the precision-recall curve is usually
adopted.
Another popular CBIR performance measurement is the
Retrieval Rate (RR). The retrieval rate, RR (q) of a system
with respect to a query q is defined as the ratio of the number
of ground truth images found within the first α retrievals
),( qF α over the total number of relevant images in the
database for the respective query, G (q). Given a set of Q
queries, the ARR can then be obtained based on the following
equation:
∑=
= Q
q qG
qF
QRRAverage
1 )(
),(1 α (9)
The factor α should be 1≥ , where a largerα is more tolerant.
However, α should not be too large as this would result in
the system being less discriminative between very good
retrieval results and the not so good ones. It has been
suggested in [17] that for relatively large ground truth set
(approximately between 20 – 25 items), the system will still
be judged as useful if the items of the ground truth are found
within )(2 qG× . Therefore for this experiment, 202×=α is
used, which is equivalent to the first 40 retrievals. ANMRR
on the other hand is a new accuracy evaluation method
proposed in MPEG-7 [18]. Unlike precision and recall, this
81
Page 4
performance measurement can determine both how many
correct images are retrieved as well as how high they are
ranked among the retrieval results. The ANMRR is computed
as follows:
,
1
)(1
∑=
=Q
q
qNMRRQ
ANMRR (10)
where Q is the number of queries and q is the query. The
ARR and the ANMRR is always in a range of 0 to 1. A high
ARR value represents a good performance in terms of
retrieval rate, and a low ANMRR value indicates a good
performance in terms of retrieval rank.
Another popular evaluation criteria used to evaluate
retrieval effectiveness is the pair known as r-measure and p1-
measure. From this pair, the average r-measure and the
average p1-measure can then be obtained. These performance
measurements have been utilised in [19-20]. Let
),( 1 Qqq ..., be the set of query images. For a query Iq , let
Ii be the unique correct answer (for this experiment,
Ii would be the second item among the 20 items of ground
truth being retrieved for query Iq ). The r-measure calculates
the sum of the rank of correct answer of all queries as shown
in (11). The average r-measure is then obtained by dividing it
with the number of queries Q, as shown in (12).
)(1∑ =
=−Q
I Iirankmeasurer (11)
Q
measurermeasurerAverage
−=− (12)
In contrast, the p1-measure computes the sum of the precision
at the recall equal to one while the average p1-measure can
then be obtained by dividing r-measure with the number of
queries Q. Both calculations can be referred to (13) and (14)
respectively.
)(
1
1
1
∑ =
=−Q
IIirank
measurep (13)
Q
measurepmeasurepAverage
−=− 1
1 (14)
Note that a method is good if it has a low r-measure and a
high p1-measure.
To benchmark the retrieval performance, the proposed
Invariant Generalised Ridgelet-Fourier’s results are compared
to that of Ridgelet-Fourier (RF) method by [14] and the basic
Ridgelet descriptor. The parameters setting for each of the
respective methods are shown in Table 1.
TABLE 1. SUMMARY OF THE SETTINGS USED BY THE THREE
METHODS INVOLVED IN THE EVALUATION
Method
Subject
Ridgelet
Descriptor
Ridgelet-
Fourier (RF)
[14]
Proposed
Invariant
Generalised
Ridgelet-
Fourier
Radon
Transform
Setting
θ : 128.
ρ : 128.
θ : π / M.
ρ : 64.
where M is the
shortest length
of the image dimension.
θ : 128.
ρ : 128.
Template
Setting
Square. Circle. Ellipse.
Rotation
Invariant
Tensor Theory
[28].
1D Fourier
transform.
1D Fourier
transform.
Total
Coefficients
3072
coefficients.
180
coefficients.
360
coefficients.
A total of three images from each image class are
randomly selected as queries and retrievals are carried out.
Overall, there will be up to 150 query images selected for the
whole retrieval experiments. The Query-by Example (QBE)
paradigm is employed. In QBE, the respective descriptor
values are extracted from a query image and then matched to
the corresponding descriptor values of images contained in
the database. The distance (or dissimilarity), ijd between two
shapes, ix and jx at position k is calculated using the L1
distance metric as shown in (15) below.
∑=
−=n
k
jkikij xxd1
(15)
Fig. 2 shows the average precision-recall of 150 queries for
MPEG-7 CE-1 B dataset for all the three methods. The x-axis
of the graph represents the 11 standard recall levels while the
y-axis represents the average precision values at the 11
standard recall levels. It can be seen clearly from Fig. 2 that
the proposed Invariant Generalised Ridgelet-Fourier provides
better performance at all recall levels compared to the
previous mentioned methods. The Ridgelet descriptor on the
other hand comes in second followed by the RF [14].
82
Page 5
Figure 2. Average precision-recall of 150 queries based on MPEG-7 CE-1 B
dataset
Table 2 tabulates the retrieval results of the Invariant
Generalised Ridgelet-Fourier, RF [14], and basic Ridgelet
descriptor based on various performance measurements as
mentioned earlier. For each different performance
measurement, the retrieval values of the method achieving
better results than the rest are put in bold. From Table 2, we
can see that the proposed Invariant Generalised Ridgelet-
Fourier significantly outperforms the other two methods in
terms of all the performance measurements used. As we are
aware, different evaluation criteria may have different
priority e.g. precision-recall only focus on retrieving the
relevant images without focusing on the retrieval rank, ARR
focuses on retrieval rank but just the topα retrieval, etc. So it
is important to measure the performance of a method using
various performance measurements to prove the stability. The
proposed Invariant Generalised Ridgelet-Fourier is indeed a
stable method as it has been shown that the proposed
descriptor not only is able to retrieve most (if not all) of the
correct images but it is also able to retrieve them at a much
better (higher) rank compared to the method by [14] and
basic Ridgelet descriptor.
TABLE 2. RETRIEVAL RESULTS OF THREE DIFFERENT METHODS BASED ON VARIOUS PERFORMANCE MEASUREMENTS
Method
Evaluation
Ridgelet
Descriptor
RF [14] Proposed
Invariant
Generalised
RF
Average
Precision-Recall
(higher value is
better)
0.4203 0.3329 0.5724
ARR
(higher value is
better)
0.3877 0.3317 0.5027
ANMRR
(lower value is
better)
0.1274 0.1341 0.1191
r-measure
(lower value is
better)
1812 2119 1041
Average
r-measure
(lower value is
better)
12.0800 14.1267 6.9400
p1-measure
(higher value is
better)
28.1047 21.94 42.1714
Average
p1-measure
(higher value is
better)
0.1874 0.1463 0.2811
IV. CONCLUSION
In this paper, a new invariant shape descriptor for arbitrary
image size based on the Ridgelet transform is proposed. The
proposed Invariant Generalised Ridgelet-Fourier has
improved the common Ridgelet transform methods which are
usually relied on images that will need to be pre-segmented
into several congruent blocks with a fixed length of M×M.
The proposed approach has definitely overcome the huge
limitation of most existing methods, as most of the images
nowadays do not only come in one size. Implementing the
Ridgelet transform on an ellipse template has also aided in
obtaining more accurate rotation representation for arbitrary
size images compared to using square or circular disc
template.
The proposed method is compared to that of [14] and with
the basic Ridgelet transform in terms of few performance
measurements, namely average precision-recall, ARR,
ANMRR, average r-measure, and average p1-measure.
Results indicate that the proposed method has successfully
achieved better results for all mentioned performance
measurements compared to the other two previous methods,
which indeed proves its superiority.
Future work will include enhancing the method by
combining it with other features like colour or texture, and
considering other approaches for achieving rotation
invariance to overcome information loss associated with the
Fourier magnitude spectrum.
REFERENCES
[1] L.F. Costa and R.M. Cesar, Shape analysis and classification:
theory and practice, CRC Press: Boca Raton, Florida, 2001.
[2] F. Mokhtarian, “Silhouette-based isolated object recognition through curvature scale space,” IEEE Trans. Pattern Analysis
and Machine Intelligence, vol. 17, no. 5, pp. 539-544, 1995.
83
Page 6
[3] J.J. Koenderink, “The structure of images,” Biological
Cybernetics, vol. 50, no. 5, pp. 363-370, 1984.
[4] M.G. Albanesi and L. Lombardi, “Wavelets for multi-
resolution shape recognition,” in LNCS, vol. 1311, A.D, Bimbo
Ed. London: Springer-Verlag, 1997, pp. 276-283.
[5] I.E. Rube', M. Ahmed, and M. Kamel, “Wavelet
approximation-based affine invariant shape representation
functions,” IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 28, no. 2, pp. 323-327, 2006. [6] E.J. Candès and D.L. Donoho, “Ridgelets: a key to higher
dimensional intermittency?,” Philosophical Trans. Royal
Society A: Mathematical, Physical and Engineering Sciences,
vol. 357, no. 1760, pp. 2495-2509, 1999.
[7] E.J. Candès and D.L. Donoho, “Curvelets – a surprisingly
effective non-adaptive representation for objects with edges, curves, and surfaces,” in Curve and Surface Fitting, A. Cohen,
C. Rabut, and L.L. Schumaker Eds. Nashville, TN: Vanderbilt
University Press, 1999.
[8] M.N. Do and M. Vetterli, “The contourlet transform: an
efficient directional multi-resolution image representation,”
IEEE Trans. Image Processing, vol. 14, no. 12, pp. 2091-2106, 2005.
[9] D. Donoho and X. Huo, “Beamlet pyramids: a new form of
multi-resolution analysis, suited for extracting lines, curves,
and objects from very noisy image data,” Proc. SPIE, vol.
4119, 2000. [10] W. Pan, T.D. Bui, and C.Y. Suen, “Rotation invariant texture
classification by ridgelet transform and frequency-orientation
space decomposition,” Signal Processing, vol. 88, no. 1, pp.
189-199, 2008.
[11] S. Arivazhagan, L. Ganesan, and T.G. Kumar, “Texture
classification using ridgelet transform,” Pattern Recognition
Letters, vol. 27, no. 16, pp. 1875-1883, 2006.
[12] Z. Yao and N. Rajpoot, “Radon/ridgelet signature for image
authentication,” Proc. International Conference on Image
Processing (ICIP '04), vol. 1, pp. 43-46, 2004.
[13] J. Xiaa, L. Ni, and Y. Miao, “A new digital implementation of
ridgelet transform for images of dyadic length,” Journal of Network and Computer Applications, vol. 30, no. 4, pp. 1346-
1355, 2007.
[14] Y. Chen, D. Bui, and A. Krzyzak, “Rotation invariant feature
extraction using ridgelet and fourier transforms,” Pattern
Analysis and Applications, vol. 9, no. 1, pp. 83-93, 2006.
[15] Mas Rina Mustaffa, Fatimah Ahmad, Ramlan Mahmod, and Shyamala Doraisamy, “Generalised ridgelet-fourier for M× N
images: determining the normalisation criteria,” Proc. IEEE
International Conference on Signal & Image Processing
Applications (ICSIPA’09), 18-19 November 2009. in press.
[16] G. Salton and M.J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill: New York, 1982.
[17] B.S. Manjunath, J.R. Ohm, V.V. Vasudevan, and A. Yamada,
“Colour and texture descriptors,” IEEE Trans. Circuits and
Systems for Video Technology, vol. 11, no. 6, pp. 703-715,
2001.
[18] MPEG-7 Visual Part of eXperimentation Model Version 2.0.
ISO/MPEG MPEG-7 Output Document, 1999.
[19] J. Huang, S. Kumar, M. Mitra, W. Zhu, and R. Zabih, “Spatial
colour indexing and applications,” Intl. J. of Computer Vision,
vol. 35, no. 3, pp. 245–268, 1999.
[20] Jongan Park, Youngeun An, Ilhoe Jeong, Gwangwon Kang,
and K. Pankoo, “Image indexing using spatial multi-resolution colour correlogram,” Proc. IEEE International Workshop on
Imaging Systems and Techniques (IST '07), pp. 1-4, 2007.
84