Multiview Image Matching Based on SURF (Speeded- Up …joics.org/gallery/[email protected] Abstract ... LITERATURE REVIEW 2.1 Multiview Image Multiview image is

Multiview Image Matching Based on SURF (Speeded-

Up Robust Features) Method

Fawaidul Badri

Department of Computer Science Institut of Creative Technology Binus @Malang,

Araya Mansion No. 8 -22 Pandanwangi, Blimbing Malang, Indonesia

[email protected]

Abstract

Image matching is one of the studies that combine the concepts of object recognition,

image detection, and pattern recognition on objects. the purpose of Image matching is to

look for a feature of an object. In this research, the method used is SURF (Speeded-Up

Robust Feature). stages of SURF (Speeded-Up Robust Feature) matching computer

image captures an image of an object, then picks up certain image intensity elements of a

captured image, then extracts features from the captured digital image. In image

matching generally consists of image detection, in image detection the computer will

look for and count elements of the image intensity that are important from that image.

The purpose of this research is feature extraction for image matching purposes. Feature

extraction is not affected by the point of view, rotation, or changes in the scale of the

image, it is very important to get special features that are different from other images.

The SURF (Speeded-Up Robust Feature) method of the resulting keypoint orientation is

not affected by changes in the angle of view, rotation, or scale of the image. The results

of this research from three scenarios using the SURF (Speeded-Up Robust Feature)

method produce an optimal average match. With the test data four rotational change

scenarios have no effect in matching with each other. Meanwhile, changes in size or

scale and angle of image capture are still at a poor level.

Keywords: Image Matching, Feature, SURF (Speeded-Up Robust Feature).

1. Introduction

Image matching and face recognition is a regular research at this time, image matching

is a group of knowledge in the field of image processing and computer vision. Image

matching looks for a feature of an object. The flow of computer image matching captures

an image of an object, then takes certain image intensity elements from an captured

image, then extracts features from the captured digital image. In image matching

generally consists of image detection, in image detection the computer will look for and

count elements of the image intensity that are important from that image. This stage of

the detection process is very important in the image matching system to find out the

detailed characteristics of the image to be matched. So that this image matching system

is widely used in image detection or panoramic image applications [1].

Journal of Information and Computational Science

Volume 9 Issue 11 - 2019

ISSN: 1548-7741

www.joics.org448

In image matching, a technique is needed to find the similarity of intersection points

between two images taken from multiview images from different points of view [2] [3]

[4]. the relationship between the coordinates of one image with another refers to the

relationship between one pixel and another pixel that represents in one scene from the

point of view captured from the camera. the technique of calculating the coordinates of

an image that has something in common is called a transformation point [5].

Effectiveness in image matching can be influenced by several factors such as the point

of view of multiview images, changes in scale, or size of the image, changes in rotation

and changes in the intensity of light from the image. The process of image matching, the

image must first be detected, then look for special features that distinguish it from other

images. In the detection process, sometimes the system does not understand or the

system cannot detect the image, this is caused by changes in lighting, size, rotation or

point of view or multiview in taking an image [6].

Image matching and detection is an interesting study, image matching is the basis of

further research in the field of panoramic, object recognition, and object recognition. A

lot of information can be taken from a digital image such as color depth, brightness and

density of image points or pixels. By utilizing this information the computer can

recognize the extraction of features contained in the image [7].

Research conducted by Jacob Pedersen 2011 shows the results of research on object

detection using the SURF (Speeded-Up Robust Features) method. From the research

conducted to provide features that are not affected by changes in scale, transparency or

rotation of the object. SURF looks for several keypoints found on objects that have

similarities between other objects [5].

There are changes in several parameters such as changes in size, transparency and

rotation, scale and from the point of view so we need a study to calculate the

compatibility of the image using the SURF (Speeded-Up Robust Features) method.

2. LITERATURE REVIEW

2.1 Multiview Image

Multiview image is one step to retrieve data from different points of view. This image

multiview technique aims to obtain information points from the image. The information

contained in the image multiview in the form of points and point of view of the captured

image [8].

2.2 Digital Image

Digital image is a representation of a two-dimensional image that has a collection of

several digital values commonly called pixels. Pixels are the smallest elements that make

up the image and contain values that represent the brightness of a color at a certain

coordinate value. In general, digital images are rectangular or square in shape that have a

certain width and height. This size is usually expressed in terms of the number of pixels

so that the size of the image is always round, from each pixel having a coordinate point



ISSN: 1548-7741

www.joics.org449

according to its position in the image, coordinates in the image are usually expressed as

positive integers starting from the numbers 0 and 1 depending on the system used. Each

pixel also has a number value that represents information that is represented by pixels.

There are two types of digital images, namely still images and still images. In principle, a

moving image is a collection of images in the form of frames [9].

2.3 Grayscale image

Grayscale image is a gray image that has gradation values ranging from white to

black. The range of values of each pixel can be represented by 8 bits, or 1 byte. The

value contained in each pixel Red, Green, Blue will be divided by 3 so that it will change

the value of each pixel contained in the image [10].

2.4 SURF (Speeded-Up Robust Features)

SURF (Speeded-Up Robust Features) is a computer vision method for extracting

features from detection and describing local features in the image. An image will be

converted into a feature vector which is then used as an approach to detect the points

contained in the image. In the SURF (Speeded-Up Robust Features) method there are

several stages that first detect the Gaussian Filter, the second stage describes the feature.

2.4.1 Gausian filter

Gaussian filter utilizes the integral image in calculating the

convolution of the image so that it will produce a new matrix called the

Hessian matrix. Integral image of an image is a collection of pixels (x, y)

where the value of each at the coordinate point (x, y) is the sum of the

pixel values at point (0,0) to point (x, y) of the source image. The second

Gaussian filter approach can be represented by matrices 0, 1, -1, 2.

Figure 2.1 Gaussian filter left Gyy right Gaussian filter Gxy

The Hessian matrix is obtained by image convolution of the

gaussian filter. This convolution can be done quickly by utilizing the

integral image that has been made before. Gyy is a gaussian filter matrix

convoluted with the Gxy matrix, for the Gxy filter Gaussian matrix is

obtained by rotating the gaussian filter Gyy 900.



ISSN: 1548-7741

www.joics.org450

2.4.2 Find the extremes of Hessian matrix determinants

Calculates the value of the Hessian matrix determinant then looks

for the extreme values (maxima values and minima values compared to

neighbor values). The Hessian matrix at point x = (x, y) of image I with

scale σ is defined by the following equation:

2.4.3 Determine candidate features

Determination of candidate features is determined by non-

maximum suppression methods on the scale of neighboring imagery.

Then the extremes of the Hessian matrix determinant are interpolated

at 3x3x3 space scale by the method proposed by Brown. This method

will be applied to each prospective feature to find the location of the

extrema after interpoilation. 3D quadratic uses taylor expansion to

scale space functions D(x,y,σ) which is shifted such that the origin is

used as a test point using equation 2.

Where D and its derivatives are calculated at the test point and

is a deviation from the test point. Whereas the extreme

location can be calculated using equation 3 as follows:

If the value of the deviation x is greater than 0.5 in any dimension,

then the extreme is located at another point near that point. To find

the extreme location, the test point is moved to a point that has more

than 0.5 deviations and the deviation is recalculated at that point, and

the results are added to the test point.

2.4.4 Feature description

Describe feature points using the Haar wavelet response as vector

units.

2.4.5 Giving orientation

Describing feature points to vector descriptors is done so that

feature points have resistance to rotation, contrast, and change of

viewpoint. To be resistant to the rotation, each feature detected will

be given an orientation. First, we will calculate the Haar wavelet



ISSN: 1548-7741

www.joics.org451

response to the y-axis and the x-axis with points in the neighboring

environment with a 6s radius around the feature point, where s is the

scale at the detected feature point. With the sampling step each scale

is from each scale, as well as the calculation of wavelet responses

according to scale. So that on a large scale the wavelet will be large

too. For this reason, integral image calculation will be used for the

point detection process to be faster. Thus only six operations are

needed that compute the response on the x-axis and y-axis at each

scale on the image. wavelet width is to use 4s, and an orientation

value will be given on the 4s scale. Provision of orientation can be

seen in Figure 2.2.

Figure 2.2 Giving Orientation

After the wavelet response is calculated with Gaussian (σ = 2.5s)

centered on points of interest, responses are represented as vectors in

space with horizontal responses along the vector base and vertical

responses along the coordinates. Dominant orientation is estimated by

counting the number of all responses in a sliding orientation window

that includes an angle π3. Horizontal and vertical responses in the

window are summed. Two horizontal orientation responses and

vertical orientation responses will be added up then produce a new

vector.

The longest vector gives an orientation to the point of interest. The

size of the sliding window is a parameter, which has been chosen

experimentally. The small size of the single wavelet response that

dominates, the large size produces maxima in vector length. Both

result in an unstable scale region orientation.

2.5 Ekstraksi komponen deskriptor

To extract descriptors, the first step taken is to create a rectangular area that is

centered around the feature point, and its orientation leads to a predetermined

orientation. The size of the square window is 20 s. The area is then subdivided into sub-

regions. This area still contains space information in accordance with the original. For

each of these sub-regions, a few simple features will be calculated at 5x5 space sample

points. For reasons of simplicity, the Haar-wavelet response in the horizontal direction

will be called dx and the Haar-wavelet response in the vertical direction is called dy.

What is meant by "vertical" and "horizontal" in this case are defined according to the



ISSN: 1548-7741

www.joics.org452

orientation of the feature's point in question. To increase its resistance to geometric

deformations and localization errors, the response will be weighted with a Gaussian (σ =

3,3s) centered on the feature point.

Then, wavelet responses dx and dy are summarized above each sub-region and form

the first set of entries to the feature vector. To get information about changes in intensity,

we also extract the absolute number of values from the response, | dx | and | dy |

Therefore, each sub-region has a four-dimensional vector descriptor v for the underlying

intensity structure This produces a descriptor vector for all 4 ×

4 sub-regions with a length of 64. The wavelet response does not change with bias in the

lighting (offset). Invariant with contrast (scale factor) can be obtained by turning the

descriptor into a unit vector. Properties of descriptors of three that have different patterns

of image intensity in a sub-region. Combination of local intensity patterns like that, it

produces different descriptors. The four-vector decryptors dx, dy, dy, dy can be seen in

Figure 2.3.

Figure 2.3 Descriptors of sub-regions represent the nature of the intensity pattern. Left:

In terms of homogeneous regions, all values are relatively low. Middle: In the presence

of frequencies in the direction of x value of high, but others remain low. If the

intensity increases gradually in the x-direction, both are valuable and high.

3. PROPOSED WORK

3.1 Research stages

The flowchart in this research is carried out in several stages, the first step is taking

training data and testing data, then changing the RGB pixels to grayscale, then extracting

features using SURF (Speeded-Up Robust Feature). The last stage displays images that

have a match between one image to another.



ISSN: 1548-7741

www.joics.org453

Figure 3.1 Block Diagram of Proposed System

3.1.1 Training Data and Testing data

Training data and testing data here are images that will be processed by

the system. Training data and testing data are taken using a Kinect camera,

each object is taken with a different point of view, the point of view in data

retrieval starts from 1.5-1.30 turning angle in object retrieval. with a range

ranging from 5-30 cm. The data from the image here has undergone a

process in such a way as to test the capability of the system, starting from

taking the viewpoint of the image, changing the scale of the image, and also

changing the rotation with a certain angular rotation, as well as the distance

of the object taking with the camera.

3.1.2 Change the RGB Image to Grayscale

At this stage training data and testing data will be changed from RGB

(Red, Green, Blue) to grayscale image. the value of the intensity of the

RBG (Red, Green, Blue) will be divided by 3. So that the image will be

grayscale.

3.1.3 Feature extraction SURF Method

Feature extraction with SURF (Speeded-Up Robust Feature) is carried out

for the process of finding and determining the keypoint of an image to look

for similarities between one image with another image. descriptors on the

image that will be the basis of this research. the descriptor also to obtain

information about changes in intensity, also extracts the absolute number of

values from the response in each image.

Training data and

Testing Data

Change the RGB

image to Grayscale

Feature Extraction

SURF Method

Result Image

Matching



ISSN: 1548-7741

www.joics.org454

3.1.4 Result Image Matching

The end result of this process is in the form of the same descriptor

between one image with another image, which has similarities and

compatibility between training images and testing images.

4. RESULTS AND DISCUSSION

4.1 Data acquisition

The data used in this study are training data and testing data. Retrieval of

training data and testing data using the kinect xbox 360 camera. With a

distance of taking 5-30 cm, as well as differences in angle of view from 1.5-

1.30. The image is RGB (Red, Green, Blue) and a JPG file.

4.2 Data Prepossing

At this preposinging stage it is the stage of changing the RBG image to

Graysale. Where the purpose of this process is to convert and reduce images

from 24 bits to 8 bits to make it easier to process in the system.

4.3 Results image matching

A. The first scenario of training and testing data is a distance of 1.5 with

an angle of 50.

In the first test training data and testing data used the names view0

and view 1 with a distance of 1.5 cm taking with an angle of 50. The

following Image View0 and View1 image.

Image View0 Citra View1

Figure 4.1 Image View0 Figure 4.2 Image View1

In the first test phase to calculate image compatibility using

the SUFR (Speeded-Up Robust Feature) method. The results of the

SURF feature extraction are those obtained by the keypoint location,

the keypoint descriptor, the keypoint location, which explains the

location where the keypoint is located. Has 4 variables, namely row

(row), column (column), scale (size, length of the keypoint),

Orientation (direction of the keypoint in radians). Descriptors (special

features that distinguish keypoints from other keypoints). The

descriptors consist of 128 special features taken from the 4x4 box



ISSN: 1548-7741

www.joics.org455

around the keypoint. Descriptors who have undergone a

transformation process to find locations that have similarities between

the keypoints with others. The following table is the result of a match

or which has a correspondence from the image transformation.

Table 1. Image transformation results.

1.0063 0.0009 0

0.0624 0.9742 0.9742

54.0183 -0.4748 1.0000

The next stage is the computational process to calculate or

find the index of each view that has similarities. The following results

are computational index that has a match.

Table 2. Matching indexes of each view

Index Image View0 Index Image View1

2 2

3 6

106 105

209 208

From table 2 the smaller the value of each index of an image,

the more it has a high match. Match between images can be seen in

the figure 4.3.

Figure 4.3. The results of the matches of the two images view0 and view1

B. The second scenario of training and testing data is a distance of 1.5

with an angle of 100.

In the second test training data and testing data used the names

view2 and view 3 with a distance of 1.5 cm taking with an angle of 100.

The following Image View2 and View3 images.

Image View2 Image View3



ISSN: 1548-7741

www.joics.org456

Figure 4.4 Citra View2 Figure 4.5 Citra View3















0.9492 0.0113 0

0.0409 0.9787 0

-30.7314 0.0935 1.0000

The next stage is the computational process to calculate or

find the index of each view that has similarities. The following results

are computational index that has a match.



22 23

257 299

278 298

355 377



the figure 4.4.



ISSN: 1548-7741

www.joics.org457

Figure 4.4. The results of the matches of the two images view2 and

view3

C. The third scenario of training and testing data is a distance of 1.5 with

an angle of 200.

In the third test training data and testing data used the names

view4 and view 5 with a distance of 1.5 cm taking with an angle of 200.

The following Image View4 and View5 images.

Image View4 Image View5

Figure 4.5 Image View4 Figure 4.6 Citra View5
















ISSN: 1548-7741

www.joics.org458


0.9924 0.0075 0

0.2356 0.8823 0

-79.3008 33.1025 1.0000

The next stage is the computational process to calculate or find the

index of each view that has similarities. The following results are

computational index that has a match.



15 18

88 88

256 262

400 241



the figure 4.7.

Figure 4.7. The results of the matches of the two images view4 and

view5

5. CONCLUSIONS AND SUGGESTIONS

A. CONCLUSIONS

The results of image matching from three scenarios using the SURF

(Speeded-Up Robust Feature) method produce an optimal average match. With

the test data, four rotational change scenarios have no effect in matching with

each other. Meanwhile, changes in size or scale and angle of image capture are

still at a poor level.

There are several factors that influence the success of the system in

matching images, such as the background of the image, the intensity of each

image, the point of view in image capture, and the distance in image capture, and

the quality of each input image in the image. System accuracy factor in detecting



ISSN: 1548-7741

www.joics.org459

and matching keypoint orientation on each image also influences. A slight change

in keypoint orientation can be deemed unsuitable by the system.

B. SUGGESTIONS

To produce a better level of accuracy of matching in taking the input

image taken with a high level of density, a stable place of shooting from the

lighting, as well as taking reduce the angle is too high, so that the match will be

less too, and changes in scale that are too high so it is necessary to try with other

image matching methods to further enhance the accuracy of the image.

Acknowledgments

I would like to thank poor Binus University @Malang, Indonesia. who has supported

us in this research, and not to forget also Pak Co and his colleagues in Informatics

Engineering Binus who have helped in completing this journal.

References

[1] S. Ayushi, T. Kalp Taru, and R. Kumar, “Automated Image Mosaicing System

With Analysis Over Various Image Noise,” Int. J. Comput. Sci. Appl., vol. 6, no.

3, pp. 13–24, 2016.

[2] H. Bay, T. Tuytelaars, and L. Van Gool, “SURF: Speeded up robust features,”

Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes

Bioinformatics), vol. 3951 LNCS, pp. 404–417, 2006.

[3] A. Setiyawan, “Pencocokan Citra Berbasis Scale Invariant Feature Transform

(SIFT) menggunakan Arc Cosinus,” J. Tek. Inform., pp. 1–4, 2014.

[4] F. Wu and X. Fang, “An improved RANSAC homography algorithm for feature

based image mosaic,” Proc. 7th WSEAS Int. Conf. Signal Process. Comput. Geom.

Artif. Vis., pp. 202–207, 2007.

[5] J. T. Pedersen, “Study group SURF: Feature detection & description,” Dep.

Comput. Sci. Aarhus Univ., pp. 1–12, 2011.

[6] B. B. Swapnali and K. S. Vijay, “Feature Extraction Using Surf Algorithm for

Object Recognition,” Int. J. Tech. Res. Appl., vol. 2, no. 4, pp. 2320–8163, 2014.

[7] F. Badri, E. M. Yuniarno, and S. N. Supeno Mardi, “3D point cloud data

registration based on multiview image using SIFT method for Djago temple relief

reconstruction,” Proc. - 2015 4th Int. Conf. Instrumentation, Commun. Inf.

Technol. Biomed. Eng. ICICI-BME 2015, pp. 191–195, 2016.

[8] P. Qiu, Y. Liang, and H. Rong, “Image Mosaics Algorithm Based on SIFT Feature

Point Matching and Transformation Parameters Automatically Recognizing,” no.

Iccsee, pp. 1560–1563, 2013.

[9] A. Muntasa and M. Purnomo, Hery, Konsep Pengolahan Citra Digital dan

Ektraksi Fitur. Yogyakarta: Graha Ilmu, 2010.

[10] R. D. Kusumanto and A. N. Tompunu, “Pengolahan Citra Digital Untuk

Mendeteksi Obyek Menggunakan Pengolahan Warna Model Normalisasi Rgb,”

Semin. Nas. Teknol. Inf. Komun. Terap., vol. 2011, no. Semantik, 2011.



ISSN: 1548-7741

www.joics.org460

Multiview Image Matching Based on SURF (Speeded- Up …joics.org/gallery/[email protected] Abstract ... LITERATURE REVIEW 2.1 Multiview Image Multiview image is

Documents

Multiview Image Matching Based on SURF (Speeded- Up …joics.org/gallery/[email protected] Abstract ... LITERATURE REVIEW 2.1 Multiview Image Multiview image is