Multiview Image Matching Based on SURF (Speeded-
Up Robust Features) Method
Fawaidul Badri
Department of Computer Science Institut of Creative Technology Binus @Malang,
Araya Mansion No. 8 -22 Pandanwangi, Blimbing Malang, Indonesia
Abstract
Image matching is one of the studies that combine the concepts of object recognition,
image detection, and pattern recognition on objects. the purpose of Image matching is to
look for a feature of an object. In this research, the method used is SURF (Speeded-Up
Robust Feature). stages of SURF (Speeded-Up Robust Feature) matching computer
image captures an image of an object, then picks up certain image intensity elements of a
captured image, then extracts features from the captured digital image. In image
matching generally consists of image detection, in image detection the computer will
look for and count elements of the image intensity that are important from that image.
The purpose of this research is feature extraction for image matching purposes. Feature
extraction is not affected by the point of view, rotation, or changes in the scale of the
image, it is very important to get special features that are different from other images.
The SURF (Speeded-Up Robust Feature) method of the resulting keypoint orientation is
not affected by changes in the angle of view, rotation, or scale of the image. The results
of this research from three scenarios using the SURF (Speeded-Up Robust Feature)
method produce an optimal average match. With the test data four rotational change
scenarios have no effect in matching with each other. Meanwhile, changes in size or
scale and angle of image capture are still at a poor level.
Keywords: Image Matching, Feature, SURF (Speeded-Up Robust Feature).
1. Introduction
Image matching and face recognition is a regular research at this time, image matching
is a group of knowledge in the field of image processing and computer vision. Image
matching looks for a feature of an object. The flow of computer image matching captures
an image of an object, then takes certain image intensity elements from an captured
image, then extracts features from the captured digital image. In image matching
generally consists of image detection, in image detection the computer will look for and
count elements of the image intensity that are important from that image. This stage of
the detection process is very important in the image matching system to find out the
detailed characteristics of the image to be matched. So that this image matching system
is widely used in image detection or panoramic image applications [1].
Journal of Information and Computational Science
Volume 9 Issue 11 - 2019
ISSN: 1548-7741
www.joics.org448
In image matching, a technique is needed to find the similarity of intersection points
between two images taken from multiview images from different points of view [2] [3]
[4]. the relationship between the coordinates of one image with another refers to the
relationship between one pixel and another pixel that represents in one scene from the
point of view captured from the camera. the technique of calculating the coordinates of
an image that has something in common is called a transformation point [5].
Effectiveness in image matching can be influenced by several factors such as the point
of view of multiview images, changes in scale, or size of the image, changes in rotation
and changes in the intensity of light from the image. The process of image matching, the
image must first be detected, then look for special features that distinguish it from other
images. In the detection process, sometimes the system does not understand or the
system cannot detect the image, this is caused by changes in lighting, size, rotation or
point of view or multiview in taking an image [6].
Image matching and detection is an interesting study, image matching is the basis of
further research in the field of panoramic, object recognition, and object recognition. A
lot of information can be taken from a digital image such as color depth, brightness and
density of image points or pixels. By utilizing this information the computer can
recognize the extraction of features contained in the image [7].
Research conducted by Jacob Pedersen 2011 shows the results of research on object
detection using the SURF (Speeded-Up Robust Features) method. From the research
conducted to provide features that are not affected by changes in scale, transparency or
rotation of the object. SURF looks for several keypoints found on objects that have
similarities between other objects [5].
There are changes in several parameters such as changes in size, transparency and
rotation, scale and from the point of view so we need a study to calculate the
compatibility of the image using the SURF (Speeded-Up Robust Features) method.
2. LITERATURE REVIEW
2.1 Multiview Image
Multiview image is one step to retrieve data from different points of view. This image
multiview technique aims to obtain information points from the image. The information
contained in the image multiview in the form of points and point of view of the captured
image [8].
2.2 Digital Image
Digital image is a representation of a two-dimensional image that has a collection of
several digital values commonly called pixels. Pixels are the smallest elements that make
up the image and contain values that represent the brightness of a color at a certain
coordinate value. In general, digital images are rectangular or square in shape that have a
certain width and height. This size is usually expressed in terms of the number of pixels
so that the size of the image is always round, from each pixel having a coordinate point
Journal of Information and Computational Science
Volume 9 Issue 11 - 2019
ISSN: 1548-7741
www.joics.org449
according to its position in the image, coordinates in the image are usually expressed as
positive integers starting from the numbers 0 and 1 depending on the system used. Each
pixel also has a number value that represents information that is represented by pixels.
There are two types of digital images, namely still images and still images. In principle, a
moving image is a collection of images in the form of frames [9].
2.3 Grayscale image
Grayscale image is a gray image that has gradation values ranging from white to
black. The range of values of each pixel can be represented by 8 bits, or 1 byte. The
value contained in each pixel Red, Green, Blue will be divided by 3 so that it will change
the value of each pixel contained in the image [10].
2.4 SURF (Speeded-Up Robust Features)
SURF (Speeded-Up Robust Features) is a computer vision method for extracting
features from detection and describing local features in the image. An image will be
converted into a feature vector which is then used as an approach to detect the points
contained in the image. In the SURF (Speeded-Up Robust Features) method there are
several stages that first detect the Gaussian Filter, the second stage describes the feature.
2.4.1 Gausian filter
Gaussian filter utilizes the integral image in calculating the
convolution of the image so that it will produce a new matrix called the
Hessian matrix. Integral image of an image is a collection of pixels (x, y)
where the value of each at the coordinate point (x, y) is the sum of the
pixel values at point (0,0) to point (x, y) of the source image. The second
Gaussian filter approach can be represented by matrices 0, 1, -1, 2.
Figure 2.1 Gaussian filter left Gyy right Gaussian filter Gxy
The Hessian matrix is obtained by image convolution of the
gaussian filter. This convolution can be done quickly by utilizing the
integral image that has been made before. Gyy is a gaussian filter matrix
convoluted with the Gxy matrix, for the Gxy filter Gaussian matrix is
obtained by rotating the gaussian filter Gyy 900.
Journal of Information and Computational Science
Volume 9 Issue 11 - 2019
ISSN: 1548-7741
www.joics.org450
2.4.2 Find the extremes of Hessian matrix determinants
Calculates the value of the Hessian matrix determinant then looks
for the extreme values (maxima values and minima values compared to
neighbor values). The Hessian matrix at point x = (x, y) of image I with
scale σ is defined by the following equation:
2.4.3 Determine candidate features
Determination of candidate features is determined by non-
maximum suppression methods on the scale of neighboring imagery.
Then the extremes of the Hessian matrix determinant are interpolated
at 3x3x3 space scale by the method proposed by Brown. This method
will be applied to each prospective feature to find the location of the
extrema after interpoilation. 3D quadratic uses taylor expansion to
scale space functions D(x,y,σ) which is shifted such that the origin is
used as a test point using equation 2.
Where D and its derivatives are calculated at the test point and
is a deviation from the test point. Whereas the extreme
location can be calculated using equation 3 as follows:
If the value of the deviation x is greater than 0.5 in any dimension,
then the extreme is located at another point near that point. To find
the extreme location, the test point is moved to a point that has more
than 0.5 deviations and the deviation is recalculated at that point, and
the results are added to the test point.
2.4.4 Feature description
Describe feature points using the Haar wavelet response as vector
units.
2.4.5 Giving orientation
Describing feature points to vector descriptors is done so that
feature points have resistance to rotation, contrast, and change of
viewpoint. To be resistant to the rotation, each feature detected will
be given an orientation. First, we will calculate the Haar wavelet
Journal of Information and Computational Science
Volume 9 Issue 11 - 2019
ISSN: 1548-7741
www.joics.org451
response to the y-axis and the x-axis with points in the neighboring
environment with a 6s radius around the feature point, where s is the
scale at the detected feature point. With the sampling step each scale
is from each scale, as well as the calculation of wavelet responses
according to scale. So that on a large scale the wavelet will be large
too. For this reason, integral image calculation will be used for the
point detection process to be faster. Thus only six operations are
needed that compute the response on the x-axis and y-axis at each
scale on the image. wavelet width is to use 4s, and an orientation
value will be given on the 4s scale. Provision of orientation can be
seen in Figure 2.2.
Figure 2.2 Giving Orientation
After the wavelet response is calculated with Gaussian (σ = 2.5s)
centered on points of interest, responses are represented as vectors in
space with horizontal responses along the vector base and vertical
responses along the coordinates. Dominant orientation is estimated by
counting the number of all responses in a sliding orientation window
that includes an angle π3. Horizontal and vertical responses in the
window are summed. Two horizontal orientation responses and
vertical orientation responses will be added up then produce a new
vector.
The longest vector gives an orientation to the point of interest. The
size of the sliding window is a parameter, which has been chosen
experimentally. The small size of the single wavelet response that
dominates, the large size produces maxima in vector length. Both
result in an unstable scale region orientation.
2.5 Ekstraksi komponen deskriptor
To extract descriptors, the first step taken is to create a rectangular area that is
centered around the feature point, and its orientation leads to a predetermined
orientation. The size of the square window is 20 s. The area is then subdivided into sub-
regions. This area still contains space information in accordance with the original. For
each of these sub-regions, a few simple features will be calculated at 5x5 space sample
points. For reasons of simplicity, the Haar-wavelet response in the horizontal direction
will be called dx and the Haar-wavelet response in the vertical direction is called dy.
What is meant by "vertical" and "horizontal" in this case are defined according to the
Journal of Information and Computational Science
Volume 9 Issue 11 - 2019
ISSN: 1548-7741
www.joics.org452
orientation of the feature's point in question. To increase its resistance to geometric
deformations and localization errors, the response will be weighted with a Gaussian (σ =
3,3s) centered on the feature point.
Then, wavelet responses dx and dy are summarized above each sub-region and form
the first set of entries to the feature vector. To get information about changes in intensity,
we also extract the absolute number of values from the response, | dx | and | dy |
Therefore, each sub-region has a four-dimensional vector descriptor v for the underlying
intensity structure This produces a descriptor vector for all 4 ×
4 sub-regions with a length of 64. The wavelet response does not change with bias in the
lighting (offset). Invariant with contrast (scale factor) can be obtained by turning the
descriptor into a unit vector. Properties of descriptors of three that have different patterns
of image intensity in a sub-region. Combination of local intensity patterns like that, it
produces different descriptors. The four-vector decryptors dx, dy, dy, dy can be seen in
Figure 2.3.
Figure 2.3 Descriptors of sub-regions represent the nature of the intensity pattern. Left:
In terms of homogeneous regions, all values are relatively low. Middle: In the presence
of frequencies in the direction of x value of high, but others remain low. If the
intensity increases gradually in the x-direction, both are valuable and high.
3. PROPOSED WORK
3.1 Research stages
The flowchart in this research is carried out in several stages, the first step is taking
training data and testing data, then changing the RGB pixels to grayscale, then extracting
features using SURF (Speeded-Up Robust Feature). The last stage displays images that
have a match between one image to another.
Journal of Information and Computational Science
Volume 9 Issue 11 - 2019
ISSN: 1548-7741
www.joics.org453
Figure 3.1 Block Diagram of Proposed System
3.1.1 Training Data and Testing data
Training data and testing data here are images that will be processed by
the system. Training data and testing data are taken using a Kinect camera,
each object is taken with a different point of view, the point of view in data
retrieval starts from 1.5-1.30 turning angle in object retrieval. with a range
ranging from 5-30 cm. The data from the image here has undergone a
process in such a way as to test the capability of the system, starting from
taking the viewpoint of the image, changing the scale of the image, and also
changing the rotation with a certain angular rotation, as well as the distance
of the object taking with the camera.
3.1.2 Change the RGB Image to Grayscale
At this stage training data and testing data will be changed from RGB
(Red, Green, Blue) to grayscale image. the value of the intensity of the
RBG (Red, Green, Blue) will be divided by 3. So that the image will be
grayscale.
3.1.3 Feature extraction SURF Method
Feature extraction with SURF (Speeded-Up Robust Feature) is carried out
for the process of finding and determining the keypoint of an image to look
for similarities between one image with another image. descriptors on the
image that will be the basis of this research. the descriptor also to obtain
information about changes in intensity, also extracts the absolute number of
values from the response in each image.
Training data and
Testing Data
Change the RGB
image to Grayscale
Feature Extraction
SURF Method
Result Image
Matching
Journal of Information and Computational Science
Volume 9 Issue 11 - 2019
ISSN: 1548-7741
www.joics.org454
3.1.4 Result Image Matching
The end result of this process is in the form of the same descriptor
between one image with another image, which has similarities and
compatibility between training images and testing images.
4. RESULTS AND DISCUSSION
4.1 Data acquisition
The data used in this study are training data and testing data. Retrieval of
training data and testing data using the kinect xbox 360 camera. With a
distance of taking 5-30 cm, as well as differences in angle of view from 1.5-
1.30. The image is RGB (Red, Green, Blue) and a JPG file.
4.2 Data Prepossing
At this preposinging stage it is the stage of changing the RBG image to
Graysale. Where the purpose of this process is to convert and reduce images
from 24 bits to 8 bits to make it easier to process in the system.
4.3 Results image matching
A. The first scenario of training and testing data is a distance of 1.5 with
an angle of 50.
In the first test training data and testing data used the names view0
and view 1 with a distance of 1.5 cm taking with an angle of 50. The
following Image View0 and View1 image.
Image View0 Citra View1
Figure 4.1 Image View0 Figure 4.2 Image View1
In the first test phase to calculate image compatibility using
the SUFR (Speeded-Up Robust Feature) method. The results of the
SURF feature extraction are those obtained by the keypoint location,
the keypoint descriptor, the keypoint location, which explains the
location where the keypoint is located. Has 4 variables, namely row
(row), column (column), scale (size, length of the keypoint),
Orientation (direction of the keypoint in radians). Descriptors (special
features that distinguish keypoints from other keypoints). The
descriptors consist of 128 special features taken from the 4x4 box
Journal of Information and Computational Science
Volume 9 Issue 11 - 2019
ISSN: 1548-7741
www.joics.org455
around the keypoint. Descriptors who have undergone a
transformation process to find locations that have similarities between
the keypoints with others. The following table is the result of a match
or which has a correspondence from the image transformation.
Table 1. Image transformation results.
1.0063 0.0009 0
0.0624 0.9742 0.9742
54.0183 -0.4748 1.0000
The next stage is the computational process to calculate or
find the index of each view that has similarities. The following results
are computational index that has a match.
Table 2. Matching indexes of each view
Index Image View0 Index Image View1
2 2
3 6
106 105
209 208
From table 2 the smaller the value of each index of an image,
the more it has a high match. Match between images can be seen in
the figure 4.3.
Figure 4.3. The results of the matches of the two images view0 and view1
B. The second scenario of training and testing data is a distance of 1.5
with an angle of 100.
In the second test training data and testing data used the names
view2 and view 3 with a distance of 1.5 cm taking with an angle of 100.
The following Image View2 and View3 images.
Image View2 Image View3
Journal of Information and Computational Science
Volume 9 Issue 11 - 2019
ISSN: 1548-7741
www.joics.org456
Figure 4.4 Citra View2 Figure 4.5 Citra View3
In the first test phase to calculate image compatibility using
the SUFR (Speeded-Up Robust Feature) method. The results of the
SURF feature extraction are those obtained by the keypoint location,
the keypoint descriptor, the keypoint location, which explains the
location where the keypoint is located. Has 4 variables, namely row
(row), column (column), scale (size, length of the keypoint),
Orientation (direction of the keypoint in radians). Descriptors (special
features that distinguish keypoints from other keypoints). The
descriptors consist of 128 special features taken from the 4x4 box
around the keypoint. Descriptors who have undergone a
transformation process to find locations that have similarities between
the keypoints with others. The following table is the result of a match
or which has a correspondence from the image transformation.
Table 3. Image transformation results.
0.9492 0.0113 0
0.0409 0.9787 0
-30.7314 0.0935 1.0000
The next stage is the computational process to calculate or
find the index of each view that has similarities. The following results
are computational index that has a match.
Table 4. Matching indexes of each view
Index Image View2 Index Image View3
22 23
257 299
278 298
355 377
From table 4 the smaller the value of each index of an image,
the more it has a high match. Match between images can be seen in
the figure 4.4.
Journal of Information and Computational Science
Volume 9 Issue 11 - 2019
ISSN: 1548-7741
www.joics.org457
Figure 4.4. The results of the matches of the two images view2 and
view3
C. The third scenario of training and testing data is a distance of 1.5 with
an angle of 200.
In the third test training data and testing data used the names
view4 and view 5 with a distance of 1.5 cm taking with an angle of 200.
The following Image View4 and View5 images.
Image View4 Image View5
Figure 4.5 Image View4 Figure 4.6 Citra View5
In the first test phase to calculate image compatibility using
the SUFR (Speeded-Up Robust Feature) method. The results of the
SURF feature extraction are those obtained by the keypoint location,
the keypoint descriptor, the keypoint location, which explains the
location where the keypoint is located. Has 4 variables, namely row
(row), column (column), scale (size, length of the keypoint),
Orientation (direction of the keypoint in radians). Descriptors (special
features that distinguish keypoints from other keypoints). The
descriptors consist of 128 special features taken from the 4x4 box
around the keypoint. Descriptors who have undergone a
transformation process to find locations that have similarities between
the keypoints with others. The following table is the result of a match
or which has a correspondence from the image transformation.
Journal of Information and Computational Science
Volume 9 Issue 11 - 2019
ISSN: 1548-7741
www.joics.org458
Table 5. Image transformation results.
0.9924 0.0075 0
0.2356 0.8823 0
-79.3008 33.1025 1.0000
The next stage is the computational process to calculate or find the
index of each view that has similarities. The following results are
computational index that has a match.
Table 6. Matching indexes of each view
Index Image View4 Index Image View5
15 18
88 88
256 262
400 241
From table 6 the smaller the value of each index of an image,
the more it has a high match. Match between images can be seen in
the figure 4.7.
Figure 4.7. The results of the matches of the two images view4 and
view5
5. CONCLUSIONS AND SUGGESTIONS
A. CONCLUSIONS
The results of image matching from three scenarios using the SURF
(Speeded-Up Robust Feature) method produce an optimal average match. With
the test data, four rotational change scenarios have no effect in matching with
each other. Meanwhile, changes in size or scale and angle of image capture are
still at a poor level.
There are several factors that influence the success of the system in
matching images, such as the background of the image, the intensity of each
image, the point of view in image capture, and the distance in image capture, and
the quality of each input image in the image. System accuracy factor in detecting
Journal of Information and Computational Science
Volume 9 Issue 11 - 2019
ISSN: 1548-7741
www.joics.org459
and matching keypoint orientation on each image also influences. A slight change
in keypoint orientation can be deemed unsuitable by the system.
B. SUGGESTIONS
To produce a better level of accuracy of matching in taking the input
image taken with a high level of density, a stable place of shooting from the
lighting, as well as taking reduce the angle is too high, so that the match will be
less too, and changes in scale that are too high so it is necessary to try with other
image matching methods to further enhance the accuracy of the image.
Acknowledgments
I would like to thank poor Binus University @Malang, Indonesia. who has supported
us in this research, and not to forget also Pak Co and his colleagues in Informatics
Engineering Binus who have helped in completing this journal.
References
[1] S. Ayushi, T. Kalp Taru, and R. Kumar, “Automated Image Mosaicing System
With Analysis Over Various Image Noise,” Int. J. Comput. Sci. Appl., vol. 6, no.
3, pp. 13–24, 2016.
[2] H. Bay, T. Tuytelaars, and L. Van Gool, “SURF: Speeded up robust features,”
Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes
Bioinformatics), vol. 3951 LNCS, pp. 404–417, 2006.
[3] A. Setiyawan, “Pencocokan Citra Berbasis Scale Invariant Feature Transform
(SIFT) menggunakan Arc Cosinus,” J. Tek. Inform., pp. 1–4, 2014.
[4] F. Wu and X. Fang, “An improved RANSAC homography algorithm for feature
based image mosaic,” Proc. 7th WSEAS Int. Conf. Signal Process. Comput. Geom.
Artif. Vis., pp. 202–207, 2007.
[5] J. T. Pedersen, “Study group SURF: Feature detection & description,” Dep.
Comput. Sci. Aarhus Univ., pp. 1–12, 2011.
[6] B. B. Swapnali and K. S. Vijay, “Feature Extraction Using Surf Algorithm for
Object Recognition,” Int. J. Tech. Res. Appl., vol. 2, no. 4, pp. 2320–8163, 2014.
[7] F. Badri, E. M. Yuniarno, and S. N. Supeno Mardi, “3D point cloud data
registration based on multiview image using SIFT method for Djago temple relief
reconstruction,” Proc. - 2015 4th Int. Conf. Instrumentation, Commun. Inf.
Technol. Biomed. Eng. ICICI-BME 2015, pp. 191–195, 2016.
[8] P. Qiu, Y. Liang, and H. Rong, “Image Mosaics Algorithm Based on SIFT Feature
Point Matching and Transformation Parameters Automatically Recognizing,” no.
Iccsee, pp. 1560–1563, 2013.
[9] A. Muntasa and M. Purnomo, Hery, Konsep Pengolahan Citra Digital dan
Ektraksi Fitur. Yogyakarta: Graha Ilmu, 2010.
[10] R. D. Kusumanto and A. N. Tompunu, “Pengolahan Citra Digital Untuk
Mendeteksi Obyek Menggunakan Pengolahan Warna Model Normalisasi Rgb,”
Semin. Nas. Teknol. Inf. Komun. Terap., vol. 2011, no. Semantik, 2011.
Journal of Information and Computational Science
Volume 9 Issue 11 - 2019
ISSN: 1548-7741
www.joics.org460