컨볼루션 멀티블럭 HOG를 이용한 퍼지신경망 보행자 검출 방법koreascience.or.kr/article/JAKO201725864426996.pdf · A Neuro-Fuzzy Pedestrian Detection Method Using

ISSN 1975-8359(Print) / ISSN 2287-4364(Online)

The Transactions of the Korean Institute of Electrical Engineers Vol. 66, No. 7, pp. 1117 1122, 2017

http://doi.org/10.5370/KIEE.2017.66.7.1117

Copyright ⓒ The Korean Institute of Electrical Engineers 1117

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/

licenses/by-nc/3.0/)which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

컨볼루션 멀티블럭 HOG를 이용한 퍼지신경망 보행자 검출 방법

A Neuro-Fuzzy Pedestrian Detection Method Using Convolutional Multiblock HOG

명 근 우* ․ 곡 락 도* ․ 임 준 식

(Kun-Woo Myung ․ Le-Tao Qu ․ Joon-Shik Lim)

Abstract - Pedestrian detection is a very important and valuable part of artificial intelligence and computer vision. It can be

used in various areas for example automatic drive, video analysis and others. Many works have been done for the pedestrian

detection. The accuracy of pedestrian detection on multiple pedestrian image has reached high level. It is not easily get more

progress now. This paper proposes a new structure based on the idea of HOG and convolutional filters to do the pedestrian

detection in single pedestrian image. It can be a method to increase the accuracy depend on the high accuracy in single

pedestrian detection. In this paper, we use Multiblock HOG and magnitude of the pixel as the feature and use convolutional

filter to do the to extract the feature. And then use NEWFM to be the classifier for training and testing. We use single

pedestrian image of the INRIA data set as the data set. The result shows that the Convolutional Multiblock HOG we proposed

get better performance which is 0.015 miss rate at 10-4 false positive than the other detection methods for example HOGLBP

which is 0.03 miss rate and ChnFtrs which is 0.075 miss rate.

Key Words : Multiblock HOG, INRIA data set, NEWFM, pedestrian detection

Corresponding Author : Dept. of Computer Engineering,

Seongnam, South Korea.

E-mail:[email protected]

* IT College, Gachon University, Seongnam, South Korea

Received : June 7, 2017; Accepted : June 9, 2017

1. Introduction

Pedestrian detection from images or videos is challenging,

however very important in the field of computer vision. It has

significant impact on the automatic driving and artificial

intelligence field. A lot effort has been done for pedestrian

detection [1,2]. The performance of pedestrian detection has

reached a high level. Nowadays pedestrian detection always

focus on the accuracy of full image (multiple pedestrian in

one image) detection value. But it is not easily to increase

that. We propose a method to increase the single pedestrian

image detection accuracy so that we can easy to increase the

accuracy the full image accuracy. To enhance the detection

performance, we should have some robust features to

discriminate the human form clearly. Based on many years of

study, the HOG (histogram of oriented gradient) descriptors

[3], magnitude of pixel [4] and some other descriptors [5,6,7]

show excellent performance in discriminating the human

form. Besides robust features, efficient and accurate classifiers

are also required. The deep learning and convolutional

concept have shown the good performance in pedestrian

detection. We change the extraction method of HOG and

using the convolutional filter. Using multiblock HOG feature

with convolutional filter to do the feature extraction part. And

the performance shows the result is better than original HOG

and other methods such as LBP and CHNFTRs.

2. Related Works

2.1. HOG and Magnitude

The HOG feature is very useful and robust and is used in

pedestrian detection. It provides a dense overlapping

description of image regions. This feature is robust because

local object appearance and shape can often be characterized

well by the distribution of local intensity gradient or edge

directions, even without precise knowledge of the

corresponding gradient or edge positions.

We calculate and count the gradient values of the local

region of an image to be the HOG features. To obtain the

HOG features, as Fig. 1 shows, we normalize the gamma value

and color of the image. Then, we compute the gradient values

of every pixel by the pixel value, as shown in Equations (1)

and (2).

전기학회논문지 66권 7호 2017년 7월

1118

Fig. 1 The Structure of Extracting HOG Feature

(1)

(2)

Here, Gx(x,y), Gy(x,y), and H(x,y) are the horizontal,

vertical, and pixel values of point (x,y).

After calculating the gradient value of every pixel, we

use the value to get the magnitude of gradient and gradient

direction, as shown in Equations (3) and (4).

(3)

tan (4)

Here, Gx(x,y) and Gy(x,y) are the horizontal and vertical

gradient values. G(x,y) is the magnitude of gradient, and

a(x,y) is the gradient direction.

After calculating the magnitude of gradient and gradient

direction, we divide the image into cells and accumulate

weighted votes for gradient orientation over spatial cells,

after that every cell gets a few values as their features;

then, we use every four cells group one block, collect all

the features of these four cells into one feature set and

normalize the value of this feature set. The feature sets of

all blocks are HOG features, and we collect HOG features

for all blocks over a detection window.

Magnitude is the value of pixel gradient. We use HOG to

extract the direction distribution of pixel and magnitude to

show each pixel value and value distribution of pixel.

2.1. NEWFM

NEWFM is a supervised classification neuro-fuzzy system

using the bounded sum of weighted fuzzy membership

functions (BSWFMs) [11,12]. Fig. 2. shows the structure of

NEWFM.

Fig. 2 The Structure of NEWFM

3. Convolutional Multiblock HOG

In this paper we propose Convolutional Multiblock HOG

feature to do pedestrian detection. We use convolutional

filter to extract Multiblock HOG feature and magnitude

feature. And we also use original image to extract these two

feature to make sure get the global feature and edge

enhanced feature to do the pedestrian detection.

The Convolutional Miltiblock HOG is extract Multiblock

HOG from the image processed by convolutional filters.

Multoblock HOG is a feature depend on the idea of HOG

and using filters to do convolutional operation to extraction.

Different from HOG, the Convolutional Multiblock HOG has

three sizes of block without cells in the block like the Fig.

3 shows. There are three sizes of block(image I size, 0.25

image size and 0.0625 image size). Each size of blocks are

not overlapped.

Fig. 3 The Block Size of Multiblock HOG

There is no cell in the block. So the histogram of

orientation is calculated with all the pixel in each blocks.

Trans. KIEE. Vol. 66, No. 7, JUL, 2017

컨볼루션 멀티블럭 HOG를 이용한 퍼지신경망 보행자 검출 방법 1119

Table 1 Algorithm of Mutiblock HOG

1 Proceduce MB_HOG()

{ //extract Multi Block HOG feature

2 Input INRIA person image train data set

//transform the original image to gray image

3 for i=1 to n //n represents the number of the train data set transform_gray(p[i]);// p[i] represents the image of train data set //calculating gradient value and direction of every pixel4 for i=1 to n for j=1 to p for k=1 to q // p,q represent the length and width of image { //h[i][j][k] represents the pixel value in point(j,k) of image[i] //gradx[i][j][k] represents the horizontal gradient value at point(x,y) of image[i] //grady[i][j][k] represents the vertical gradient value at point(x,y) of image[i] gradx[i][j][k]=h[i][j+1][k]-h[i][j-1][k]; grady[i][j][k]=h[i][j][k+1]-h[i][j][k-1];

//grad[i][j][k] is the actual gradient value in point(j,k) of image[i] //angle[i][j][k] is the gradient direction in point(j,k) of image[i] grad[i][j][k]=sqrt(sqr(gradx[i][j][k])+sqr(grady[i][j][k])); angle[i][j][k]=acrtan(grady[i][j][k]/gradx[i][j][k]);1 }5 //set Multi block size width[3],length[3]; //three kinds blocks width and 6 //Calculating histogram of orientation of blcoks for i=1 to 3 //every kinds cells for j=1 to p for k=1 to q { // get histogram of orientation of cells6.1 for m=j to j+width[i]// m,n represents the block range for n= k to k+ength[i] { switch angle[m][n] get ori[m][n] // get point(m,n) orientation //add value to corresponding orientation his[[hcount]ori[m][n]]=his[ori[m][n]]+grad[m][n]; // hcount represents the count number of all histogram }7 Add weight with three direction filters hcount=hcount+1; //every block builds one histogram }

8 Output all histograms of all blcoks(Multi block HOG) }

Depend on the orientation of pixel get the histogram of

blocks. Different from the HOG, the value of histogram is

the number of each orientation pixel not the magnitude

sum of each orientation pixel. After get the histogram of

orientation, After we extract the Multiblock HOG, we select

highest three orientations in every histogram, and use three

filters(horizontal filtering, vertical filtering and diagonal

filtering) to get the distribution of these orientations. Fig. 4

shows the structure of extracting Mulitblock HOG and

calculating the weight shows by the formula.

or

Fig. 4 The Structure of Extracting Multiblock HOG

The alogorithm of Multiblock HOG shows in Table 1.

As we already know the method to extract the Multiblock

HOG, we need to use convolutional filters to process the

image. We use sobel, sharpness and scharr filters to extract

more edge information from the image and get convolutional

images and using max pooing to extract more representive

information. Then we extract Multiblock HOG from the

convolutional images. And we get the Convolutional

Multiblock HOG. Fig. 5 shows the structure of Convolutinal

and pooling part of Convolutional Multiblock feature.

Fig. 5 Convolutional and Pooling Part of Convolutional Feature

4. Experimental Results

4.1. Dataset

The experiment materials are selected from the INRIA

pedestrian dataset. The INRIA pedestrian dataset is collected

as part of research work on the detection of upright people in

the form of image or video. The INRIA pedestrian dataset is

the most popular pedestrian dataset used in pedestrian

detection. It includes 1671 negative samples (non-person

samples) and 902 positive samples (person samples).


1120

Fig. 6 The example of INRIA dataset

4.2. Experiment

As we already know the method to extract the

Convolutional Multiblock HOG and Multiblock HOG. In this

experiment, we use Multiblock HOG and Convolutional

Multiblock HOG to be the feature for the classification. After

extracting feature we use Bhattacharyya Distance to do the

feature selection. Then we use NEWFM to do the

classification. The Fig. 7 shows the experiment structure.

Fig. 7 The Structure of Experiment

4.3. Results

To evaluate the performance of pedestrian detection, we

always use miss rate and false positive value to do it. Miss

rate is also known as false negative rate. It is calculated by

using false negative sample divide all positive samples. False

positive is calculated by using false negative sample divide

all positive samples.

In the experiment, we first compared the performance of

MultiblockHOG with HOG, LBP, and magnitude features

which are widely used in single image pedestrian detection.

The table 1 shows the performance of these features.

feature HOG MultiblockHOG HOG+MagnitudeMultiblockHOG+

magnitude

miss rate(%)

0.45 0.35 0.4 0.33

featureHOG+

LBP

MultiblockHOG

+LBP

HOG+Magnitude

+LBP

MultiblockHOG+

magnitude+LBP

miss rate(%)

0.41 0.39 0.42 0.37

Table 1 The performance of different features

From Table 1 we can see the Multiblock HOG　 and

magnitdue get better performance than other feature. Then

we try to use the Conolutional Multiblock HOG feature to

see the performance and compare with Multiblock HOG. In

this comparision we use magnitdue in every test. The Table

2 shows the performance. From the Table 2 we can see

using Convolutional Multiblock HOG and Multiblock HOG

together we can get the better result.

featureMultiblock

HOG

Convolutional

Multiblock HOG

Multiblock HOG

+Convolutional

Multiblock HOG

miss

rate(%)0.33 0.3 0.2

Table 2 The performance of Convolutional Multiblock HOG

Fig. 8 The Comparison Results of Convoutional Multiblock

HOG and Other Methods

After using feature selection by Bhattacharyya distance we

use NEWFM to do the classification. We can get the final

result of using convolutional MultiblockHOG and magnitude.

This Fig. 8 shows the performance of method we proposed

and other advanced pedestrian detection methods in single

pedestrian detection using INRIA pedestrian dataset. The

result using miss rate value (the lower the better) at 10-4

false positive value shows the accuracy of pedestrian

detection performance. The result shows the method and

Trans. KIEE. Vol. 66, No. 7, JUL, 2017

컨볼루션 멀티블럭 HOG를 이용한 퍼지신경망 보행자 검출 방법 1121

feature we proposed has better performance than other

method.

6. Conclusion

Comparing the performances of method, the Convolutional

Multiblock HOG shows the better results than other

pedestrian detection method. The Convolutional Multiblock

HOG has lower miss rate than other methods.

From Table 1 we can see using MultiblockHOG and

magniude can get the best performance. Because the

MulitiblockHOG feature shows the pixel gradient distribution

and the magnitude feature shows the pixel value distribution.

And MultiblockHOG not only get the local feature also get the

global feature of the image depend on the multi size blocks.

The noise of color is reduced. So the MultiblockHOG and

manitude feature get better result than other features.

From Table 2 we can see using the Multiblock HOG and

Convolutional Multiblock HOG together can get better result.

Because Convoutional Multiblock HOG get more local feature

information and edge information than Multiblock HOG and

Multiblock HOG feature can get more global feature than

Convolutional Multiblock HOG. So using both two features can

get better result.

Depend on these changes, the convolutional MultiblockHOG

we proposed get better result than other single image

detection method as Figure 3 shows. Because the HOG feature

only focus on the local feature and LBP feature is most focus

on difference of neighbor pixel. The result of our method is

better than them. We also use convolutional layer to get more

representive feature than CHNFTRs, So we get better result

than CHNFTRs.

With the development of pedestrian detection, pedestrian

detection can be more useful in computer vision and artificial

intelligence area.

Acknowledgement

This research was support by Basic Science Research

Program through the National Research Foundation of Korea

(NRF) funded by the Ministry of Education, Science and

Technology(2015R1D1A1A09057409)

References

[1] P. Dollar, C. Wojek, B. Schiele, and P. Perona. Pedestrian

detection: An evaluation of the state of the art. TPAMI

(2011).

[2] P. Dollar, C. Wojek, B. Schiele, and P. Perona,

“Pedestrian detection: a benchmark,” in IEEE Computer

Vision and Pattern Recognition (2009).

[3] N. Dalal and B. Triggs, “Histograms of oriented gradients

for human detection,” in IEEE Computer Vision and

Pattern Recognition (2005).

[4] Dollár, P., Tu, Z., Perona, P., Belongie, S.: Integral

channel features. In: BMVC(2009).

[5] P. Sabzmeydani and G. Mori, “Detecting pedestrians by

learning shapelet features,” in IEEE Computer Vision and

Pattern Recognition (2007).

[6] P. Doll´ar, Z. Tu, H. Tao, and S. Belongie, “Feature

mining for image classification,” in IEEE Conf. Computer

Vision and Pattern Recognition (2007).

[7] Z. Lin and L. S. Davis, “A pose-invariant descriptor for

human det. and seg.” in European Conf. Computer Vision

(2008).

[8] C. Papageorgiou and T. Poggio, “A trainable system for

object detection,” Intl. Journal of Computer Vision, vol.

38, no. 1, pp. 15–33, (2000).

[9] J. S. Lim, Finding Features for Real-Time Premature

Ventricular Contraction Detection Using a Fuzzy Neural

Network System, IEEE Transactions on Neural Networks,

pp. 522-527, (2009).

[10] Z. X. Zhang, S. H. Lee, and J. S. Lim, " Detecting

ventricular arrhythmias by NEWFM," Granular Computing,

2008. GrC 2008. IEEE International Conference on, (2008).

[11] J. S. Lim, D. Wang, Y.-S. Kim, and S. Gupta, A neuro-

fuzzy approach for diagnosis of antibody deficiency

syndrome. Neurocomputing 69, Issues 7-9, pp. 969-974,

(2006).

[12] J. S. Lim, T-W Ryu, H-J Kim, and S. Gupta, “Feature

Selection for Specific Antibody Deficiency Syndrome by

Neural Network with Weighted Fuzzy Membership

Functions,” LNCS 3614. pp. 811-820, Springer-Verlag,

Aug (2005).

[13] J. S. Lim and S. Gupta, “Feature Selection Using

Weighted Neuro-Fuzzy Membership Functions,” The

2004 International Conference on Artificial Intelligence

(IC-AI’04), June 21-24, vol. 1, pp. 261-266, Las Vegas,

Nevada, USA, (2004).

[14] Z. X. Zhang, S. H. Lee, and J. S. Lim, "Discrimination of

Ventricular Arrhythmias Using NEWFM," pp. 176-183,

(2008).

[15] Guorong Xuan, Xiuming Zhu, Peiqi Chai，Feature

Selection based on the Bhattacharyya Distance

International Conference on Pattern Recognition(2006).

[16] H. Jun; M. Claudio, The influence of the sigmoid


1122

function parameters on the speed of backpropagation

learning. From Natural to Artificial Neural Computation,

pp. 195–201 (1995).

[17] Y. Wang, F. S. Makedon, J. C. Ford, and J. Pearlman,

“HykGene: A Hybrid Approach for Selecting Marker

Genes for Phenotype Classification Using Microarray

Gene Expression Data,” Bioinformatics vol. 21, pp.

1530-1537, (2005).

[18] X. Wang, T. X. Han, and S. Yan, “An hog-lbp human

detector with partial occlusion handling,” in IEEE Intl.

Conf. Computer Vision(2009).

저 자 소 개

명 근 우 (Kunwoo Myung)

Kunwoo Myung is studying Computer

Engineering in Gachon University, South

Korea. She is interesting in Artificial

Intelligence. Her research focuses on fuzzy

systems.

곡 락 도 (Letao Qu)

Letao Qu received B.S. degree in Computer

Science from Ludong University, China in

2013. He is pursuing master’s course in

Computer Science from Department of

Computer Software at Gachon University,

South Korea. His research focuses on

neuro-fuzzy systems, biomedical prediction systems, and

signal processes.

임 준 식 (Joon S. Lim)

Joon S. Lim received his B.S., M.S., and Ph.D.

degrees in Computer Science from Inha

University, South Korea, The University of

Alabama at Birmingham, and Ph.D. degree

was from Louisiana State University, Baton

Rouge, Louisiana in 1986, 1989, and 1994,

respectively. He is currently a professor in Department of

Computer Software at Gachon University, South Korea. His

research focuses on neuro-fuzzy systems, biomedical

prediction systems, and human-centered systems. He has

authored three textbooks on Artificial Intelligence

Programming (Green Press, 2000), Javaquest (Green Press,

2003), and C# Quest (Green Press, 2006).

컨볼루션 멀티블럭 HOG를 이용한 퍼지신경망 보행자 검출 방법koreascience.or.kr/article/JAKO201725864426996.pdf · A Neuro-Fuzzy Pedestrian Detection Method Using

Documents