Technical report, IDE0942, June 2009 Automatic Imaging for Face Biometrics and Eye Localization Master’s Thesis in Computer Science and Engineering Tao Wang, Weifeng Lin School of Information Science, Computer and Electrical Engineering Halmstad University
71
Embed
Automatic Imaging for Face Biometrics and Eye Localization239849/FULLTEXT01.pdf · Preface This master’s thesis is part of the research project called “Automatic Imaging for Face
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Technical report, IDE0942, June 2009
Automatic Imaging for Face Biometrics and Eye Localization
Master’s Thesis in Computer Science and Engineering
Tao Wang, Weifeng Lin
School of Information Science, Computer and Electrical Engineering
Halmstad University
Automatic Imaging for Face Biometrics and Eye Localization
Master’s Thesis in Computer Science and Engineering
Tao Wang, Weifeng Lin
School of Information Science, Computer and Electrical Engineering
Halmstad University Box 823, S-301 18 Halmstad, Sweden
June 2009
Preface This master’s thesis is part of the research project called “Automatic
Imaging for Face Biometrics and Eye Localization” which is defined by the
Bigsafe technology AB, and carried out at the school of Information Science,
Computer and Electric Engineering at the Halmstad University. As the
authors, we would like to thank our supervisor Professor Josef Bigun for
always being ready to offer suggestions and ideas on every step and detailed
answers to every question. His valuable support has given us great insights,
and the flexibility to work in the best possible way to achieve our goals in
this project.
Halmstad, Sweden, June 2009
Abstract A proposal for a person authentication system, which localizes facial
landmarks and extracts biometrical features for face authentication, is
presented in this thesis. An efficient algorithm for eye localization and
biometrical feature extraction and person identification is developed by
using Gabor filters. In the eye localization part, we build artificial average
eye models for eye location. In the person identification part, we construct
databases of biometrical features around the eye area of clients and, for
authentication, Schwartz inequality and the sum square error (SSE) are used.
This project is implemented in the ‘Matlab’ programming language, on a
personal computer system, and experimental results on the proposed system
5 Discussion and conclusion.................................................................... 57
Automatic Imaging for Face Biometrics and Eye Localization
1
Chapter 1
Introduction
1.1 Project background
Each person has a variety of unique physiological and behavioral
characteristics. Uniqueness is how well those characteristics separate
individuals from each other. In today’s networked society, in order to prove
one’s identity, instead of using old fashioned methods like ID-cards,
passwords and PINs, the method of biometrics is developed and which
allows those unique characteristics of individuals to best represent
themselves. The following definition of biometrics can be found in [1].
Biometrics refers to methods for uniquely recognizing humans based
upon one or more intrinsic physical or behavioral traits.
In information technology, in particular, biometrics is used as a form
of identity access management and access control. It is also used to
identify individuals in groups that are under surveillance.
One example scenario for this project application would be a door
entrance system. When an unidentified person approaches, the camera of the
entrance system could track the facial characteristics of the person, and our
2
system would open the door or access would be denied, based on the facial
information extracted from this person and data in the system database.
1.2 Aim of the study
Based on a reasonably good face tracking algorithm, this thesis focuses on
two relative aspects for implementing this person authentication system.
1.2.1 Eye localization
The most interesting facial landmarks are eyes, nose and mouth. Eye
localization is a well-researched topic in biometrics. The aim of this part is
to locate eye centers in face images which are generated by a given face
tracking mechanism. In this report, we construct two artificial average eye
models with the aid of Gabor filters, and use these two models for eye
detection and locate the centers of both left and right human eyes.
1.2.2 Client identification
The aim of this part is to identify a client. After locating the eye centers of
face image of a client, we then extract biometrical features around the eye
areas and store the feature information, which best represents this specific
client, into our database along with feature information from other clients.
When information from an unidentified person comes into the system, we
compare that information with the data stored in our client database and give
identity of this person or, deny his or her access.
Automatic Imaging for Face Biometrics and Eye Localization
3
1.3 Environment
The hardware and software environments used for this research are listed
below.
Standard desktop systems based on Intel Pentium Dual-Core.
HP Pavilion dv2000 build-in Web Camera.
The XM2VTS, which is a large, multi-modal database captured onto
high quality digital video, is used in this project. It contains 4 recordings of
295 individuals and, in this project, we choose several groups of subjects as
our data sets.
The algorithms are programmed in the Matlab R14.
The operating system is Windows XP 2002 SP3 and Windows Vista
home basic.
1.4 Outline of Thesis
This thesis is organized as follows: Chapter 2 describes the theoretical
background and the algorithms used for eye location. Chapter 3 explains
algorithms and ideas of client identification. Chapter 4 presents experiments
for testing the performance of proposed methods. Then the results are
discussed, and we conclude in Chapter 5. In the following next two sections
of this chapter, we introduce the basic theoretical knowledge of the
4
retinotopic sampling grid and Gabor decomposition, which are commonly
used in this project.
1.5 The retinotopic sampling grid
Figure 1-1: An example of retinotopic sampling grid
When it comes to extracting the features of images, it is not necessary to
take every pixel into consideration. A simple mathematical abstraction
algorithm, based on a sparse retinotopic sampling grid by log-polar mapping
is introduced by [2]. The term ‘retinotopic’ is used because this method is a
mimic of the human visual system that implements a “focus of attention”
formation. Figure1-1 shows a grid consisting of 50 points, arranged in 5
concentric circles, and the radius of the innermost circle is 3 pixels, and that
Automatic Imaging for Face Biometrics and Eye Localization
5
of the outermost circle is 30 pixels. With rising radius of each concentric
circle, the density of the sampling points decreases exponentially. This
means we automatically concentrate the computational effort on the central
area of the sampling grid. In our project we focus on analyzing those
biometric features of the eye area, other biometric features around a
subject’s eye area such as ears, hair and moles on the forehead could affect
our result of eye detection. This strategy of retinotopic sampling grid
reduces the algorithmic processing demand of the computer on unnecessary
parts to achieve real-time performance. Further discussion and introduction
of this technique was presented in [2, 5].
We construct a retinotopic sampling grid placed on a subject’s eye. The
sampling grid consists of 69 points, with 4 concentric circles and the radius
range from 4 pixels at the innermost circle to 32 at the outermost circle.
The initial inner and outermost radius are empirically determined mainly
by two factors, the proportion of the eye area in an image and its size, and
the biometric features we want to cover. On the retinotopic sampling grid,
we have 1 point at the eye center and 4 points on the first ring, 8 points on
the second ring and then 24 points on the third ring and 32 on the fourth ring
as displayed in figure 1-2. In the figure, there is a properly centered face and
a retinotopic sampling grid is placed on this person’s right eye. We
proceeded as follows. In the training session we placed the grid on the right
eye for every person. The positions of those points are stored into a 1-D data
6
structure. Then, the biometric features around those points are extracted. The
same strategy is used also on the left eye.
Figure 1-2: A retinotopic sampling grid placed on an eye
1.6 Gabor decomposition
In terms of representation, an image can be expressed as a matrix of
brightness values in a Cartesian coordinate system, and also can be
represented as a superposition of sinusoids with different frequencies, phases
and amplitudes, determined by the Fourier transform of the image [4], as it is
shown in Figure 1-3.
Gabor filters can serve as excellent band-pass filters. Such a filter is defined
as the product of a Gaussian kernel times a complex sinusoid, i.e.
( ) ( ) ( )jg t ke w a t s tθ= (1)
Automatic Imaging for Face Biometrics and Eye Localization
7
where 2
( ) tw t e π−= (2)
(3)
(4)
Here
k ,θ , 0f are filter parameters. A Gabor filter can be thought of as two,
out-of-phase filters continently allocated in the real and complex part of a
complex function, with the real part,
0( ) ( ) s in ( 2 )rg t w t f tπ θ= + (5)
and the imaginary part (see figure1-5),
0( ) ( ) c o s ( 2 )ig t w t f tπ θ= + (6)
Figure 1-3: An example image (left) and its logarithmically scaled absolute amplitudes of
the spectral decomposition (right)
0( 2 )0 0( ) (sin (2 ), cos(2 ))j j f te s t e f t j f tθ π θ π θ π θ+ = + +
0( 2 )( ) j f ts t e π=
8
Gabor filters are very powerful tools for processing images. Different Gabor
filters respond to different local orientation and wave number around a
certain point, which is a very unique attribute and could be seen as an
analogy with the human visual system, a further discussion of which can be
found in [2].
In our case of feature extraction, we use log-polar separable Gabor
decomposition to extract the local features around a certain point in an
image, [4]. Since the orientation and wave numbers vary in an image,
several Gabor filters are needed. This set is also called Gabor filter bank.
Our Gabor filters in the filter bank are designed in the log-polar domain,
which is a logarithmically scaled polar space. 2 2
0 0
2 2
( ) ( )( , ) exp( ) exp( )2 2
f Aξ η
ξ ξ η ηξ ηδ δ− −
= − − (7)
The variables of the filter f ( , )ξ η are defined in the log-polar frequency
domain [2], shown in equation (7), where A is the normalization constant.
Then the filter f ( , )ξ η is tuned to orientation 0η and an absolute spatial
frequency 0ξ , which represents the absolute angular frequency 0 0exp( )w ξ= .
The log-polar frequency coordinates are defined in equation 8
1( , ) (log(| |), tan ( , ))x yw w wξ η −= (8)
Visually, the Gabor filters are two-dimensional, Gaussian bell-shaped curves.
While in the log-polar domain, the Gabor filters are symmetric 2D Gaussian
bells, but in the Cartesian frequency coordinates, the Gabor filters are
egg-shaped bells (see figure 1-4).
Automatic Imaging for Face Biometrics and Eye Localization
9
The “daisy” structure of figure1-4 appears in many published studies.
The figure shows a top sectional drawing of Gabor filters in the frequency
domain, with orientation from 0 rad to π rad. 5 frequency channels and 6
orientation channels, a total of 30 filters, are displayed. Each egg-shaped
contour represents one filter, the response of which on the input image is
called a channel. A cross marks the apex of one Gaussian filter. Figure 1-6,
which is based on the cutting plane from figure 1-4, shows a front sectional
drawing of all frequencies.
Figure 1-4: Top sectional drawing of Gabor filters in the frequency domain
A 3D-View of a Gabor filter is displayed in figure 1-5, with the highest
frequency and lowest orientation channel. In the first row, the magnitude of
the frequency spectrum of a Gabor filter (upper left) is displayed and then
we transform the filter back to the image domain, where the modulus of the
filter is shown (upper right); The real part of this filter is a cosine function,
10
whose amplitude are modulated by a Gaussian bell-shaped curve. The
imaginary part of the filter is similarly a Gaussian modulated sine function.
As frequency increases, the modulus of the filter becomes smaller in the
spatial domain.
Figure1-5: 3D-View on a Gabor filter. This shows the magnitude of the frequency spectrum of a Gabor filter (upper left), the modulus of the filter in the spatial domain (upper right), the real part of the Gabor filter (left) and the imaginary part of the Gabor filter (right)
Automatic Imaging for Face Biometrics and Eye Localization
11
Figure 1-6: Front sectional drawing of Gabor filters in the frequency domain
After implementing the above filter bank, we then could calculate the Gabor
filter response on any of the grid points. The Gabor feature vector is
arranged according to wave number and orientation. An element of the
feature vector (Gabor filter response magnitude) is calculated by the
following equation,
1 1
0 0 0 00 0
( , ) | ( , ) ( , , , ) |M N
m nk IM m n f m nξ η ξ η
− −
= =
= ∑∑ (9)
For a local image IM, around a certain point p, the magnitude k is computed
for all responses of all Gabor filters f. The local image IM is cut from the
original image such that the indices m,n, visit the image points inside a
rectangle with size MxN centered at p. A single Gabor filter
0 0( , , , )f m n ξ η is a 2-D complex valued filter corresponding to a certain
frequency 0ξ and orientation η0. The element of the feature vector is
formed by the absolute value of the scalar product of the local image
(cut-out of the input image) and the complex Gabor filter f. The index 0ξ
12
in the equation determines the absolute frequency of each filter f to which it
is tuned. The higher the frequency the smaller the filter size is. Likewise, η0
determines the tune-in orientation of the filter. As for the dimensionality of
the feature vector around grid point p, it is the product of the number of
frequencies and the number of orientations. Note that, in equation (9), the
scalar product between IM and f is calculated in the spatial domain, and 0ξ
and η0 do not denote an actual frequency or orientation value, but an index
number of the applied channel (response of a particular filter).
Automatic Imaging for Face Biometrics and Eye Localization
13
Chapter 2
Eye Localization
The eyes and eye regions are the most important facial landmarks on the
human face in many respects, including for recognition of human identities.
Eye localization, therefore, is an important step in human face recognition.
In this chapter, a novel approach for determining the location of human eye
center using Gabor filters is devised.
2.1 System introduction
The flowchart in figure 2-1 presents the approaches and algorithms of eye
centre localization we proposed.
The accuracy of face normalization is critical to the performance of the
following face analysis steps, thus we first preprocess human face images
and, here, we determine three parameters: retina radius, starting and ending
frequency and picture size
After face normalization, the proposed system begins to train these
images using the training set. We studied two models: one is a model based
on a specific frequency and orientation filter response for each point of the
artificial retina grid, and the other model is an averaged (over 50 people)
feature vector where each vector consists of all Gabor filter responses at a
14
single eye centre of a single individual, also called ‘average eye’. In both
cases, the resulting model can be represented by a vector.
For testing the system, or when the system is operational, first we
extract the feature vector for any image point, which is a candidate for being
an eye location. The elements of this feature vector are obtained by taking
the scalar product between the region determined by the candidate point at
hand and the specific Gabor filter model. The region and the specific Gabor
filter, are determined by the model studied. We then compare this feature
vector to the feature vector of the eye model obtained from the training set.
We determine the location of the eye centre by either the Sum of Square
Error (SSE) method, or the Schwartz inequality method.
Automatic Imaging for Face Biometrics and Eye Localization
15
Frequency1 & Orientation1
Figure 2-1: Flow chart of eye localization system
…
…
…
…
…
…
…
Training Set Testing Set
Preprocessing of face images
Starting and Ending frequency
Picture Size
Scalar product by a set of Gabor filters
Set retina grid with the centre of fovea
Max of first
element
Max of second element
Max of Nth
element
Frequency2 & Orientation2
Features1 Features2
FrequencyN & OrientationN
FeaturesN
Frequency and Orientation Model
Average Eye Model
Scalar product by corresponding
Gabor filter one pixel by one pixel
Retina Radius
Testing features
Determining Eye Centre Schwartz inequality
Sum of Square Error
16
2.2 Preprocessing of face images
The same parameters are used both in the training part and the testing part.
2.2.1 Retina radius
Retina sampling grids contain important information around the pixels they
are placed on. However, the radius of the grid needs to be determined. Our
retinotopic grid consists of 68+1 points distributed onto 4 circles [5], which
are displayed in figure 2-2.
Figure 2-2: The retinotopic grid
From the figure above, we can see that the artificial retina which is
denser at the centre (fovea) than at the periphery. That means that the grid
size is empirically determined by letting it cover the pupil and the eyebrow
area [2].
Automatic Imaging for Face Biometrics and Eye Localization
17
On the other hand, the smaller the radius is, the higher the computational
speedup one can perform an identification. Specifically, we chose the area of
the pupil as a circle with a radius of 2 pixels, whereby the average distance
from eye center to the eyebrow of about 15 pixels was also fixed empirically.
As a consequence of this consideration, the radius between the peripheral
and the foveal vision in our topology was allowed to vary between 2 pixels
and 20 pixels.
2.2.2 Start and end frequency of Gabor filter
Gabor features are widely used for feature extraction to recognize visual
information. The transform coefficients have good discrimination
characteristics, and it is easy to adjust the direction, baseband bandwidth and
center frequency of Gabor filters [23]. Thus, Gabor filters have been widely
utilized to extract components that normally include relatively high energy
in high frequency components, e.g. shapes defined by lines and edges.
However, they are also used to represent and analyze textures. The
fundamental frequencies are used for representing the silhouettes of an
object and can be used to classify objects.
In a face image, eyes have special properties – two gray valleys and rich
edge segments [12]. A Gabor filter in which the center frequency lies in the
high frequency band has a smaller window size, and describes abruptly
18
changing local characteristics of the local image. By contrast, low frequency
Gabor filters are more favorable for slowly varying intensity changes. Hence,
the high frequency of the Gabor filter must be present for locating facial
features which are rich in details, such as eye area. Low frequency Gabor
filters are more important at the periphery of the eye, where the image
intensity changes relatively slowly.
Besides, dynamically choosing among filters having different sizes, we
must remember that we need to keep the total number of candidate points for
being eye centers small. The fewer this number, i.e. the picture size, the
fewer tests will be performed, reducing the searching time. Through
empirical experiments on eye center localization, we found that the
frequency from 0.4π to 0.9π yields better results, when the size of filter
is from 25×25 to 11×11. Table 2-1 shows the different sizes of Gabor
filters.
Table 2-1: Start and end frequency of Gabor filter