Gender recognition

7/28/2019 Gender recognition

1/43

1

ABSTRACT

With the development of more and more identification systems to identify a person, there is a

need of the development of a system which can provide personal identification tasks such as

gender without any human interference. In this thesis, we consider the problem of gender

classification from frontal facial images using genetic feature subset selection. We argue that

feature selection is an important issue in gender classification and demonstrate that Genetic

Algorithms (GA) can select good subsets of features (i.e., features that encode mostly gender

information), reducing the classification error. The gender of a facial image is the most

prominent feature, and improvement in the existing gender classification methods will result in

the high performance of the face retrieval and classification methods for large repositories.Gender recognition system consists of three steps. At the initial stage of this system some pre-

processing are applied on the input image. Secondly, face features are extracted, which will be

taken as the input of the genetic algorithm (GA). In the third step, classification is carried out by

the Genetic Algorithm to identify the gender.

Key Words: Gender, Recognition, Genetic, Algorithms, male, female.


2/43

2

INDEX

CHAPTER 1: INTRODUCTION ......................................................................................................................... 7

1.1 Genetic Algorithm ......................................................................................................................... 7

1.1.1 Methodology of Genetic Algorithm: .......................................................................................... 7

1.1.2 Terminologies of Genetic Algorithm: ......................................................................................... 9

1.1.3 Operators of Genetic Algorithm: ................................................................................................ 10

1.2.5 Parameters of Genetic Algorithm: ............................................................................................. 11

1.2 Motivation: ................................................................................................................................. 12

1.3 Problem Statement: .......................................................................................................................... 13

1.4 Organization of the Thesis : .............................................................................................................. 13

CHAPTER 2:BACKGROUND .......................................................................................................................... 14

2.1 Search Space: .............................................................................................................................. 14

2.2 Basic Recommendations: ................................................................................................................ 15

2.3 Selection techniques: .................................................................................................................. 16

2.3.1 Roulette Wheel Selection .......................................................................................................... 16

2.3.2 Rank Selection ..................................................................................................................... 16

2.3.3 Steady-State Selection ............................................................................................................. 16

2.3.4 Elitism: ........................................................................................................................................ 17

2.4 Encoding Schemes: ........................................................................................................................... 17

2.4.1 Binary Encoding.......................................................................................................................... 17

2.4.2 Permutation Encoding: .............................................................................................................. 17

2.4.3 Value Encoding: .......................................................................................................................... 18

2.4.4 Tree Encoding: ........................................................................................................................... 18


3/43

3

2.5 Eigen space Representation : ....................................................................................................... 19

2.5.1 Computation of the Eigen Faces : ............................................................................................ 20

2.6 Independent Component Analysis : ................................................................................................ 21

2.6.1 Face image representation based on ICA :............................................................................... 21

CHAPTER 3: LITERATURE SURVEY ............................................................................................................... 22

3.1 Overall Review of Previous Work:..................................................................................................... 22

3.2 Methodology ..................................................................................................................................... 23

3.2.1 Face Detection ........................................................................................................................... 23

3.2.2 Feature Extraction ...................................................................................................................... 24

3.2.3 Classifier:........................................................................................................................................ 25

CHAPTER 4: THE PROPOSED APPROACH .................................................................................................... 27

4.1 Introduction: ..................................................................................................................................... 27

4.2 Feature Extraction: ............................................................................................................................ 28

4.3 Genetic Feature Selection: ................................................................................................................ 29

4.3.1 Initial Population: ....................................................................................................................... 30

4.3.2 Crossover: .................................................................................................................................. 30

4.3.3 Mutation: ................................................................................................................................... 30

4.4 Gender Classification: ....................................................................................................................... 30

4.5 Development of the Gender Recognition System: ........................................................................... 32

CHAPTER 5: RESULT AND DISCUSSION ....................................................................................................... 33

5.1 Introduction: ..................................................................................................................................... 33

5.2 Databases Used: ................................................................................................................................ 33

5.3 Experimental Results: ....................................................................................................................... 34

5.3.1 Correct Classification Result: ..................................................................................................... 34

5.3.2 False Classification Result: ......................................................................................................... 39


4/43

4

5.4 Result Analysis: ................................................................................................................................. 40

CHAPTER 6: CONCLUSION AND SCOPE OF THE FUTURE WORK ................................................................. 41

6.1 Conclusion: ........................................................................................................................................ 41

6.2 Limitations: ....................................................................................................................................... 41

6.3 Future Research Work: ..................................................................................................................... 42

REFERENCES ................................................................................................................................................ 43


5/43

5

List of Figures

Figure 1 : Working of Genetic Algorithm .............................................................................................. .8

Figure 2: Example of chromosomes with tree encoding.15

Figure 3 : Zigzag scan of DCT coefficients ................................................................................................... 24

Figure 4 : Example of k-NN Classification ................................................................................................... 25

Figure 5 : Gender Recognition Methodology Example ............................................................................... 26

Figure 6 : Proposed Gender Recognition system flow ............................................................................... .26

Figure 7: Neural network, interconnected group of nodes..31

Figure 8: GUI of the Gender recognition System....32

Figure 9:Indian Facial database sample ....33

Figure 10: Stanford student database sample Flow..34

Figure 11: Indian Female Face...35

Figure 12: Female Gender Recognized35

Figure 13 : Indian Male Face ....................................................................................................................... 35

Figure 14: Male gender recognized ............................................................................................................ 35

Figure 15 : Male Face .................................................................................................................................. 37

Figure 16 : Male gender recognized ........................................................................................................... 38

Figure 17 :Female Face ............................................................................................................................... 37

Figure 18 : Female gender recognized ........................................................................................................ 38

Figure 19 : Experiment on Indian Female Face ........................................................................................... 38

Figure 20 : False gender recognized ........................................................................................................... 39
http://localhost/var/www/apps/conversion/tmp/scratch_12/Thesis.doc#_Toc355690213http://localhost/var/www/apps/conversion/tmp/scratch_12/Thesis.doc#_Toc355690213


6/43

6

List of Tables

Table 1: Examples of chromosomes...10

Table 2: Crossover operator on chromosomes10

Table 3: Mutation operator on chromosomes.11


7/43

7

CHAPTER 1: INTRODUCTION

1.1Genetic Algorithm

In the computer science field of artificial intelligence, a genetic algorithm (GA) is a search

heuristic that mimics the process of natural evolution. This heuristic is routinely used to generate

useful solutions to optimization and search problems. Genetic algorithms belong to the larger

class of evolutionary algorithms, which generate solutions to optimization problems using

techniques inspired by natural evolution, such as inheritance, mutation, selection& crossover. [7]

A typical genetic algorithm requires:

a) A genetic representation of the solution domain.

b) A fitness function to evaluate the solution domain.

1.1.1 Methodology of Genetic Algorithm:

In a genetic algorithm, a population of strings (called chromosomes or the genotype of the

genome), which encode candidate solutions (called individuals or phenotypes) to an optimization

problem, evolves toward better solutions. Traditionally, solutions are represented in binary as

strings of 0s and 1s, but other encodings are also possible. The evolution usually starts from a

population of randomly generated individuals and happens in generations.

In each generation, the fitness of every individual in the population is evaluated; multiple

individuals are stochastically selected from the current population, and modified to form a new

population.


8/43

8

The new population is then used in the next iteration of the algorithm. Commonly, the algorithm

terminates when either a maximum number of generations has been produced, or a satisfactory

fitness level has been reached for the population. If the algorithm has terminated due to a

maximum number of generations, a satisfactory solution may or may not have been reached.[7]

Figure 1 : Working of Genetic Algorithm


9/43

9

1.1.2 Terminologies of Genetic Algorithm:

There are some essentials terminologies used to understand Genetic Algorithm, they are

as follows:

Chromosome should in some way contain information about solution which it represents.

Population of n chromosomes: Set of all possible solutions.

Children: New solution generated from two present and available solutions.

Selection: Select two parent chromosomes from a population according to their fitness

(the better fitness, the bigger chance to be selected).

Accepting: Place new offspring in a new population.

Elitism: At least one best solution is copied without changes to a new population, so the

best solution found can survive to end of run.

Outline of the Basic Genetic Algorithm:

Step 1: [Start] Generate random population of n chromosomes (suitable solutions).

Step 2: [Fitness] Evaluate the fitness f(x) of each chromosome x in the population.

Step 3: [New population] Create a new population by repeating following steps until the

new population is complete

i.[Selection]

ii.[Crossover]

iii.[Mutation]

iv.[Accepting]

Step 4: [Replace] Use new generated population for a further run of algorithm

Step 5: [Test] if the end condition is satisfied, stop, and return the best solution.

Step 6: [Loop] Go to step 2.


10/43

10

1.1.3 Operators of Genetic Algorithm:

The chromosome should in some way contain information about solution which it represents.

The most used way of encoding is a binary string. The chromosome then could look like this:

Chromosome 1 1101100100110110

Chromosome 2 1101111000011110

Table 1: Examples of chromosomes

Each chromosome has one binary string. Each bit in this string can represent some characteristic

of the solution.

Crossover:

After we have decided what encoding we will use, we can make a step to crossover. Crossover

selects genes from parent chromosomes and creates a new offspring. The simplest way how to do

this is to choose randomly some crossover point and everything before this point copy from a

first parent and then everything after a crossover point copy from the second parent.

Crossover can then look like this (| is the crossover point):

Chromosome 1 11011 | 00100110110

Chromosome 2 11011 | 11000011110

Offspring 1 11011 | 11000011110

Offspring 2 11011 | 00100110110

Table 2: Crossover operator on chromosomes


11/43

11

There are other ways how to make crossover, for example we can choose more crossover points.

Specific crossover made for a specific problem can improve performance of the genetic

algorithm.

Mutation:

After a crossover is performed, mutation takes place. This is to prevent falling all solutions in

population into a local optimum of solved problem. Mutation changes randomly the new

offspring. For binary encoding we can switch a few randomly chosen bits from 1 to 0 or from 0

to 1. Mutation can then be following:

Original offspring 1 1101111000011110

Original offspring 2 1101100100110110

Mutated offspring 1 1100111000011110

Mutated offspring 2 1101101100110110

Table 3: Mutation operator on chromosomes

1.2.5 Parameters of Genetic Algorithm:

There are two basic parameters of GA - crossover probability and mutation probability:

Crossover probability says how often will be crossover performed. If there is no crossover,

offspring is exact copy of parents. If there is a crossover, offspring is made from parts of parents'

chromosome. If crossover probability is 100%, then all offspring is made by crossover. If it is

0%, whole new generation is made from exact copies of chromosomes from old population (but

this does not mean that the new generation is the same!).


12/43

12

Crossover is made in hope that new chromosomes will have good parts of old chromosomes and

maybe the new chromosomes will be better. However it is good to leave some part of population

survive to next generation.

Mutation probability says how often will be parts of chromosome mutated. If there is no mutation,

offspring is taken after crossover (or copy) without any change. If mutation is performed, part of

chromosome is changed. If mutation probability is 100%, whole chromosome is changed, if it is

0%, nothing is changed. Mutation is made to prevent falling GA into local extreme, but it should

not occur very often, because then GA will in fact change to random search.[7]

1.2 Motivation:

Automatic gender identification pays an important role in identification of a person. Gender

identification can help effectively to reduce the search time by limiting the subsequent searching

stage to either a male database or a female database. Automatic gender identification can also

provide an important clue in various security and surveillance based applications.

Recognizing human gender plays an important role in many human computer interaction (HCI)

areas. For example, search engines need an image filter to determine the gender of people in

images from the Internet; demographic research can use gender information extracted from

images to count the number of men and women entering a shopping mall or movie theater; a

smart building might use gender for surveillance and control of access to certain areas.

In psychology studies for HCI, the main focus is about how humans discriminate between males

and females and what kind of features is more discriminative. A successful gender classification

approach can boost the performance of many other applications including face recognition and

smart human-computer interfaces.

Most of the gender recognition systems are template based as feature based system requires

automatic facial feature extraction which itself is a complex and time consuming task. Thus the

idea of genetic algorithm is used in order to eliminate the gender irrelevant features and thus

decrease the error rate by 17.7%. [4]


13/43

13

1.3 Problem Statement:

The aim of this project is to apply the efficient working methodology of Genetic Algorithm for

Gender Recogonization.It signifies the process of determining the gender of a subject from face

images and thus replaces the existing limited range conventional method of Face

recogonization.Using Genetic Algorithm we can eliminate the gender irrelevant features and

hence removes the static property of gender classification in present existing system.

1.4 Organization of the Thesis :Besides this chapter,the thesis consists of 5 more chapters. Brief overview of each chapter is

given below:

Chapter 1: This chapter gives a brief introduction to gender identification problem. Motivation

for present work and application of the system have been presented.

Chapter 2: This chapter dicusses the necessary background study required for the development

of the system and all related methods to achieve the objective.

Chapter 3: This chapter presents a brief review of all previous work done on gender recogntion

system and there performance.

Chapter 4:This chapter presents the proposed approach for gender recogntion problem.This

chapter is divided in three sections;feature selection,feature extraction and gender classification.

Chapter 5:Experimental results and performance of proposed gender recogntion model are

presented in this chapter.

Chapter 6: Conclusions are given in this chapter.It also dicusses the scope of future work.


14/43

14

CHAPTER 2:BACKGROUND

Genetic algorithms are a part of evolutionary computing, which is a rapidly growing area of

artificial intelligence. Genetic algorithms are inspired by Darwin's theory about evolution.

Simply said, solution to a problem solved by genetic algorithms is evolved.

2.1 Search Space:

If we are solving some problem, we are usually looking for some solution, which will be the best

among others. The space of all feasible solutions (it means objects among those the desired

solution is) is called search space (also state space). Each point in the search space represents

one feasible solution. Each feasible solution can be "marked" by its value or fitness for the

problem.

We are looking for our solution, which is one point (or more) among feasible solutions - that is

one point in the search space. The looking for a solution is then equal to a looking for some

extreme (minimum or maximum) in the search space. The search space can be whole known bythe time of solving a problem, but usually we know only a few points from it and we are

generating other points as the process of finding solution continues.

The problem is that the search can be very complicated. One does not know where to look for the

solution and where to start. There are many methods, how to find some suitable solution (i.e. not

necessarily the best solution), for example hill climbing, tab search, simulated annealing and

genetic algorithm. The solution found by these methods is often considered as a good solution,because it is not often possible to prove what the real optimum is. [7]

As we have seen in 1st

chapter, the outline of Basic GA is very general. There are many things

that can be implemented differently in various problems. First question is how to create

chromosomes, what type of encoding to choose. With this is connected crossover and mutation,


15/43

15

the two basic operators of GA. Next question is how to select parents for crossover. This can be

done in many ways, but the main idea is to select the better parents (in hope that the better

parents will produce better offspring). [7]

Also you may think, that making new population only by new offspring can cause lost of the

best chromosome from the last population. This is true, so called elitism is often used. This

means, that at least one best solution is copied without changes to a new population, so the best

solution found can survive to end of run.

2.2 Basic Recommendations:

Following are the standard recommendations for respective parameter:

Crossover rate: Crossover rate generally should be high, about 80%-95%.(However some

results show that for some problems crossover rate about 60% is the best.)

Mutation rate: On the other side, mutation rate should be very low. Best rates reported are

about 0.5%-1%.Population size: Good population size is about 20-30, however sometimes sizes 50-100 are

reported as best. Some research also shows that best population size depends on encoding, on

size of encoded string. It means, if you have chromosome with 32 bits, the population should be

say 32, but surely two times more than the best population size for chromosome with 16 bits.

Selection: Basic roulette wheel selection can be used, but sometimes rank selection can be

better.

Encoding: Encoding depends on the problemand also on the size of instance of the problem.


16/43

16

2.3 Selection techniques:

2.3.1 Roulette Wheel Selection: Parents are selected according to their fitness. The better

the chromosomes are, the more chances to be selected they have. Imagine a roulette wheel where

are placed all chromosomes in the population, every chromosomes has its place big accordingly

to its fitness function.

This can be simulated by following algorithm:

Step 1 [Sum] Calculate sum of all chromosome fitnesss in population - sum S.

Step 2 [Select] Generate random number from interval (0, S)- r.

Step 3 [Loop] Go through the population and sum fitnesss from 0- sum s. When the sum sis

greater then r, stop and return the chromosome where you are.

Of course, step 1 is performed only once for each population.

2.3.2 Rank Selection:The previous selection will have problems when the fitnesss differ

very much. For example, if the best chromosome fitness is 90% of the entire roulette wheel then

the other chromosomes will have very few chances to be selected. Rank selection first ranks the

population and then every chromosome receives fitness from this ranking. The worst will have

fitness 1, second worst 2 etc. and the best will have fitness N (number of chromosomes in

population). After this all the chromosomes have a chance to be selected. But this method can

lead to slower convergence, because the best chromosomes do not differ so much from other

ones. [7]

2.3.3 Steady-State Selection: Main idea of this selection is that big part of Chromosomes

should survive to next generation.GA then works in a following way. In every generation isselected a few (good - with high fitness) chromosomes for creating a new offspring. Then some

(bad - with low fitness) chromosomes are removed and the new offspring is placed in their place.

The rest of population survives to new generation.


17/43

17

2.3.4 Elitism: When creating new population by crossover and mutation, we have a big

chance, that we will loose the best chromosome. Elitism is name of method, which first copies

the best chromosome (or a few best chromosomes) to new population. The rest is done in

classical way. Elitism can very rapidly increase performance of GA, because it prevents losing

the best found solution.[7]

2.4 Encoding Schemes:

2.4.1 Binary Encoding:Binary encoding is the most common, mainly because first works

about GA used this type of encoding. In binary encoding every chromosome is a string of bits, 0

or 1. Binary encoding gives many possible chromosomes even with a small number of alleles.

On the other hand, this encoding is often not natural for many problems and sometimes

corrections must be made after crossover and/or mutation.[7]

Example of Problem: Knapsack problem

The problem: There are things with given value and size. The knapsack has given capacity.

Select things to maximize the value of things in knapsack, but do not extend knapsack capacity.

Encoding: Each bit says, if the corresponding thing is in knapsack.

2.4.2 Permutation Encoding: In permutation encoding, every chromosome is a string of

numbers, which represents number in a sequence. Permutation encoding is only useful for

ordering problems. Even for this problems for some types of crossover and mutation corrections

must be made to leave the chromosome consistent (i.e. have real sequence in it).[7]

Example of Problem: Travelling salesman problem (TSP)

The problem: There are cities and given distances between them. Travelling salesman has to

visit all of them, but he does not to travel very much. Find a sequence of cities to minimize

travelled distance.

Encoding: Chromosome says order of cities, in which salesman will visit them.


18/43

18

2.4.3 Value Encoding: Direct value encoding can be used in problems, where some

complicated value, such as real numbers, is used. Use of binary encoding for this type of

problems would be very difficult. In value encoding, every chromosome is a string of some

values. Values can be anything connected to problem, form numbers, real numbers or chars to

some complicated objects. Value encoding is very good for some special problems. On the other

hand, for this encoding is often necessary to develop some new crossover and mutation specific

for the problem.

Example of Problem: Finding weights for neural network

The problem: There is some neural network with given architecture. Find weights for inputs of

neurons to train the network for wanted output.

Encoding: Real values in chromosomes represent corresponding weights for inputs.[7]

2.4.4 Tree Encoding:Tree encoding is used mainly for evolving programs or expressions,

for genetic programming. In tree encoding every chromosome is a tree of some objects, such as

functions or commands in programming language.

( + x ( / 5 y ) ) ( do_until step wall )Figure 2: Example of chromosomes with tree encoding

+

x

y5

do until

step wall


19/43

19

Tree encoding is good for evolving programs. Programming language LISP is often used to this,

because programs in it are represented in this form and can be easily parsed as a tree, so the

crossover and mutation can be done relatively easily.

Example of Problem: Finding a function from given values

The problem: Some input and output values are given. Task is to find a function, which will

give the best (closest to wanted) output to all inputs.

Encoding: Chromosomes are functions represented in a tree.

2.5 Eigen space Representation :

Eigenspace representations of images use PCA [9] to linearly project an image in a low-

dimensional space. This space is spanned by the principal components (i.e., eigenvectors

corresponding to the largest eigen values) of the distribution of the training images. After an

image has been projected in the eigen space, a feature vector containing the coefficients of the

projection is used to represent the image. We refer to these features as eigen features.

The projection coefficients allow us to represent images as linear combinations of the

eigenvectors. It is well known that the projection coefficients define a compact image

representation and that a given image can be reconstructed from its projection coefficients and

the eigenvectors (i.e., basis). The eigen space representation of images is very powerful and has

been used in various applications such as image compression and face recognition.


20/43

20

2.5.1 Computation of the Eigen Faces :

Eigen faces are a set of eigenvectors. They can be considered a set of "standardized face

ingredients", derived from statistical analysis (PCA) of many pictures of faces.

Step 1: Obtain face images I1, I2, ..., Im (training faces).

Step 2: represent every image Ii as a vectori.

Step 3: Compute the average face vector:

M

: 1/ Mii=1

Step 4: subtract the mean face:ii

Step 5: Compute the covariance matrix C:

M

CnTn AAT(N2xN

2matrix)

i=1

whereA [1 2 . . . M] (N2xMmatrix)

Step 6: Compute the eigenvectors ui of AAT.

Step 7: Keep only K eigenvectors (corresponding to the K largest eigenvalues). [5]


21/43

21

2.6 Independent Component Analysis :

Independent component analysis (ICA) [6] is a statistical model where the observed data is

expressed as a linear combination of underlying latent variables. The latent variables are

assumed non-Gaussian and mutually independent. The task is to find out both the latent

variables and the mixing process. The ICA model is formula (1):

x = As.. (1)

Where x=(x1,,xm)Tis the observed random vector, and s=(s1,.,sm) is the components vector

in which si are as independent as possible in the sense of higher order statistics, and the matrix A

is a constant m m mixing matrix. Both A and S is unknown. The above model is identifiable

under the following fundamental restrictions: at most one of the independent components s i may

be Gaussian, and the matrix A must be of full column rank.

2.6.1 Face image representation based on ICA :

Suppose the size of face images normalized is w h, and the ith

face image is expressed as a row

vectorxixi(1,1),.xi(1,h),xi(2,1),xi(w,h)].Themface images in the training set form the

observed vector X=(x1,x2,..,xm)T.

According to ICA principle, these face images can be linearly combined by nbasis images Sand

statistically independent coefficient A. Fast ICA is performed to get the separate matrix Wand

calculate the estimation of basis images Sthrough Y=WX. Then, each face image is projected in

a low-dimension space spanned by the estimated basis images Y. After an image has been

projected in the subspace, a feature vector containing the coefficients of the projection is used to

represent the image.


22/43

22

CHAPTER 3: LITERATURE SURVEY

3.1 Overall Review of Previous Work:Based on the type of features used, existing

gender classification approaches fall into one of two categories: geometry-based and

appearance-based.

Geometry-based methods use metric features, e.g., face width, face length, mouth size, eye size,

distances, angles and areas among salient feature points (eyes, nose, etc.). In Burton et al. [1],

73 points were extracted from a database containing 179 frontal facial images. Discriminantanalysis was then used to classify gender using point-to-point distances. The accuracy reported

on the training data was 85%. Fellous et al. [2] computed 22 normalized distances using a

database with 109 images. The accuracy reported in that work was 90%. Brunnelli et al [3] used

16 geometrical features as the input to two competing hyper-basis function networks. A

database with 168 images was used for training. The reported accuracy was 79% using novel

faces.

Appearance-based methods learn the decision boundary between the male and female classes

from training imagery without extracting any geometrical features. A representative method

belonging to this category is the eigenface approach [4]. Cottrell et al [5] has proposed a face

categorization method using a two-stage neural network, one for face compression and one for

face classification. The output of the hidden layer of the compression network performs

dimensionality reduction similar to the eigenface method. The accuracy reported was 63% on a

database containg 64 images. Colomb et al. [6] used a similar method and referred to their

gender classification network as SEXNET. Using a database containing 90 images, they

reported 91.9% accuracy. Yen et al. [7] followed the same scheme using a larger database (i.e.,

1400 face images).


23/43

23

They reported 90% accuracy. Abdi et al. [8] compared raw image with PCA-based image

representations using Radial Basis Function (RBF) and preceptor networks. Using 160 facial

images, the best performance of 91.8% was achieved by a preceptor classifier trained with

PCA-based features. OToole et al. [9], [10] have also reported good performance using PCA

and neural networks. Using raw images, Moghaddam et al. [11] investigated gender

classification using SVMs on a database with 1755 face images. They reported 96.6% accuracy

using RBF kernels. According to the paper [5], the proposed approach of gender recognition

system is explained below step by step and the system flow is represented in figure 5: Gender

Recognition Methodology Example, given at page no. 26.

3.2 Methodology: Gender classification method consists of three main Modules: face

detection, feature extraction/selection, and Classification. An input facial image is passed to

face detector to extract face from the image, Viola and Jones face detection method is used for

this purpose. Then histogram equalization is performed to stretch the contrast of the image, this

help overcome illumination variation in the images. In they showed that low resolution images

have equal level of classification accuracy, so we can decrease computational cost by reducing

the size of the image.

After face detection, the image is resized to 32x32. This resized image is divided into 16 8x8

size blocks. Then each 8x8 block is sorted according to zigzag scan order. These sorted

coefficients are arranged in a vector and passed to the KNN classifier.[2]

3.2.1 Face Detection:Viola and Jones (2001) in their paper presented a new cascade facedetection technique. This is a well known and robust frontal face detection method; its

calculation is very fast. This detector extract faces from the image by starting from top left

corner and ending at bottom right corner of an image.

Three main modules of technique are: First images are represented in the form of Integral

Images, which make feature computation very fast. Second module is using adboost learning

algorithm for feature selection. And the third module is using a cascade of AdaBoost classifiers,


24/43

24

to quickly eliminate background regions of the image, while spending more computation on

promising object-like regions, speed up the process of detection significantly. [2]

3.2.2 Feature Extraction: DCT can be used for dimension reduction. DCT coefficients are

then sorted according to zigzag scan order, this way we sort the coefficients with decreasing

importance, i.e. high variance coefficients are picked first. Like other transforms, the Discrete

Cosine Transform (DCT) attempts to de correlate the image data. After de correlation each

transformed coefficient can be encoded independently without losing compression efficiency.[1]

The DCT coefficients with high variance are mainly located in the upper-left corner of the DCT

matrix. Accordingly, we scan the DCT coefficient matrix in a zigzag manner starting from the

upper-left corner and subsequently convert it to a one-dimensional (1-D) vector. This is similar

to sorting according to importance. High importance coefficients are located in the top-left

corner of the block. When a total of 16 coefficients are selected from an image, only 1st

coefficient of each of 16 DCT blocks is selected. As the no. of selected coefficients increases so

does the size of the feature vector. For 32 size feature vector first 2 coefficients from each DCT

block are selected, and in the same manner 48, 64, 128 and 256 size feature vectors were created.

Figure 3: Zigzag scan of DCT coefficients


25/43

25

3.2.3 Classifier:

KNN is a supervised learning classifier. For 1-NN we assign test sample to the class of its closest

neighbor, and for KNN we assign the majority class of its K closest neighbors where K

parameter is number of neighbors. It is usual to use the Euclidean distance to find closest

neighbors, though other distance measures such as the Manhattan distance could in principle be

used instead.[2]

Figure 4: Example of k-NN Classification


26/43

26

Figure 5: Gender Recognition Methodology Example


27/43

27

CHAPTER 4: THE PROPOSED APPROACH

4.1 Introduction:

Although several gender classification methods have been reported in the literature, gender

classification has attracted less attention compared to other research topics in computer vision.

Almost every gender classification method reported in the literature uses the complete set of

features extracted from frontal images or uses the raw image directly. Obviously, frontal images

contain lots of information, such asage, race, and gender. If the objective is to perform gender

classification, then information from unrelated sources might confuse the classifier. Automatic

feature subset selection distinguishes our proposed gender classification method from other

reported approaches.A GA is used to select gender-related features automatically and improve

the performance of the gender classifier.

Figure 6: Proposed Gender Recognition System Flow

Frontal Facial Image

Classifier trained on the

available data set

Represent each Image as afeature vector

Select a subset of Gender

relevant features (using Genetic

Al orithm

Gender Classification


28/43

28

4.2 Feature Extraction:

Gender detection can be stated as Feature extraction and Classification based on that feature.

After the acquisition of the image, the features extraction process starts and to extract features of

a face at first the image is converted into grey level image. From this image the centroid of the

face image is calculated. Then from the centroid, only face has been cropped and converted into

the gray level and the features have been collected. Gender relevant features are:

1. Geometrical (nose width, lips width, vertical distance between eyes and nose,

vertical distance between eyes and lips etc.)

2. External (eyebrow thickness, hair information)

3. Textual (pixel details).

Some distinguishing feature properties in both genders are summarized below:

Female: Eye region including eyebrows and lip region is better clue than mouth and chin region.

Male: Lower portion of face is better clue than the eye region.

In total nine features can be selected as:

1. Geometrical features:

I. Vertical distance between eyes and nose tip

II.

Nose widthIII. Vertical distance between eyes and lips centre

IV. Lips width

2. External features:

I. Eyebrow thickness

II. Moustache detection

III. Long hair information

3. Textual features:

I. Eyebrow Pixels

II. Lips pixel


29/43

29

4.3 Genetic Feature Selection:

Most gender classification methods in the literature follow the same basic strategy: (a) feature

extraction is applied on the raw images; (b) a classifier is trained using all the features extracted

from the images. The problem with this strategy is that it uses all the extracted features for

gender classification. As a result, gender-irrelevant information might be fed to the gender

classifier. This might not allow the classifier to generalize nicely, especially when the training set

is small.

GAs is a class of optimization procedures inspired by the mechanisms of natural selection . GAs

operates iteratively on a population of structures, each of which represents a candidate solution

to the problem, encoded as a string of symbols (chromosome). Arandomly generated set of such

strings forms the initial population from which the GA starts its search. Three basic genetic

operators guide this search: selection, crossover and mutation

The goal of feature subset selection is to use fewer features to achieve the same or better

performance. Therefore, the fitness evaluation contains two terms: (i) accuracy and (ii) number

of features used. Only the features in the subset encoded by an individual are used to train the

NNclassifier. The performance of the NN is estimated using a validation data set and used to

guide the GA. Each feature subset contains a certain number of features. If two subsets achieve

the same performance, while containing different number of features, the subset with fewer

features is preferred. Between accuracy and feature subset size, accuracy is our major concern.

Combining these two terms, the fitness function is given as:

Fitness = 104Accuracy + 0.4 *Zeros

where accuracy is the accuracy rate that an individual, and zerosis the number of zeros in thechromosome. The accuracy ranges roughly from 0.5to 1 (i.e., the first term assumes values in

the interval 5000 to 10000). The number of zeros ranges from 0 to lwhere l is the length of the

chromosome (i.e., the second term assumes values in the interval 0 to 100 since l=250here).


30/43

30

Overall, the higher the accuracy is, the higher the fitness is. Also, the fewer the number of

features used the higher the number of zeros and as a result, the higher the fitness. It should be

noted that individuals with higher accuracy will outweigh individuals with lower accuracy, no

matter how many features they contain.

4.3.1 Initial Population:

In general, the initial population is generated randomly, (e.g., each bit in an individual is set by

flipping a coin). In this way, however, we will end up with a population where each individual

contains the same number of 1s and 0s on the average. To explore subsets of different numbers

of features, the number of 1s for each individual is generated randomly. Then, the 1s are

randomly scattered in the chromosome.

4.3.2 Crossover:

In general, we do not know how the features depend on each other. If dependent features are far

apart in the chromosome, it is more probable that traditional 1point crossover, will destroy the

schemata. To avoid this problem, uniform crossover is used here.

4.3.3 Mutation:

Mutation is a very low probability operator and just flips a specific bit. It plays the role of

restoring lost genetic material. Our selection strategy was cross generational. Assuming a

population of size N, the offspring double the size of the population and we select the best N

individuals from the combined parent-offspring population.

Extracted features of the trained face images have been fed in to the Genetic algorithm for

gender classification.

4.4 Gender Classification:

First step in any classification technique is the representation of faces in terms of input vector.

Once all the feature are extracted, neural network classifier is trained which can classify input

vector as male or female.


31/43

31

A neural network consists of units (neurons), arranged in layers, which convert an input vector

into some output. Each unit takes an input, applies a (often nonlinear) function to it and then

passes the output on to the next layer. Generally the networks are defined to be feed-forward: a

unit feeds its output to all the units on the next layer, but there is no feedback to the previous

layer. Weightings are applied to the signals passing from one unit to another, and it is these

weightings which are tuned in the training phase to adapt a neural network to the particular

problem at hand. This is the learning phase.

Once a neural network is configured, it forms an appropriate internal feature extractors and

classifiers based on training examples. In training phase, network uses training set to update

weights of its neuron in order to reduce network error. After the training phase, trained network

is used for classification. The representation is internally distributed across the network as aseries of independent weights has many advantages: noise immunity, pattern generalization, and

interpolation capability.


32/43

32

4.5 Development of the Gender Recognition System:

Matlab R2012b is used for the development of the system including the steps explained in the

above sections. An intuitive Graphical User Interface is developed as shown below:

Figure 8: GUI of Gender Recognition System

Code has been trained and the trained parameters are saved in file gabestopt.mat and these

parameters are loaded when you click on "Gender Recognition" button.If you want to re-train system from "zero" just click on "GA Optimization" button, on which

gender_db_file.mat file will be loaded (all facial images with sex).Code will be trained from a

random state (recognition rate in this random state should be about 50%). You can train the

system only on a subset of original images: in this case you have to load gender_db_file.mat file.

Program info and Source code buttons help you to know about the respective topics from the

documentation.


33/43

33

CHAPTER 5: RESULT AND DISCUSSION

5.1 Introduction:We have developed a system for facial gender recognition that is capable to extract from image

most informative features using an approach based on genetic algorithms. Code uses some features

in spatial domain and uses genetic algorithm to optimize feature vector extraction.

5.2 Databases Used:

In order to train the classifier, we need a large database of facial images of both genders and

subsequently test the classifier to detect it performance. Thus to achieve it I have used Stanford

Medical college facial database (200 male and 200 female faces) and IIT Kanpur facial

database(40 male and 40 female faces).Given below is the sample of each database facial

images.

Figure 9: Indian facial database sample


34/43

34

Figure 10: Stanford student database sample

5.3 Experimental Results:We have performed a number of experiments and comparisons in order to demonstrate the

performance of the proposed gender classification approach. The code has been tested with

Stanford medical student database and IIT Kanpur face database (240 female images and 240

male images, 75% used for training and 25% used for testing, hence there are 360 training

images and 120 test images in total randomly selected and no overlap exists between the training

and test images).Results had come positive for most of the cases but had shown some negative

classifications too as explained below with examples:

5.3.1 Correct Classification Result:

Given below is example of correct gender classification of a female and male faces each. Out of

the 120 tested faces, 113 have shown positive result.


35/43

35

Figure 11: Indian Female Face

Figure 12: Female gender recognized


36/43

36

Figure 13: Indian Male Face

Figure 14: Male gender recognized


37/43

37

The code has been tested with Stanford medical student database which includes 200 female

images and 200 male images, 90% used for training and 10% used for testing, hence there are

360 training images and 40 test images in total randomly selected and no overlap exists between

the training and test images. Since these faces have different pattern in there facial features then

the Indian faces to some extent so in case of ethnic deviation we can train our system with new

data set using GA Optimization button in our GUI. Below are two randomly selected faces from

the database and there respective outputs as recognized gender.

Figure 15: Male Face


38/43

38

Figure 16: Male gender recognized

Figure 17: Female Face


39/43

39

Figure 18: Female gender recognized

5.3.2 False Classification Result:

Given below is an example of false gender identification. In whole we witnessed seven cases of

false classification.

Figure 19: Experiment on Indian Female Face


40/43

40

Figure 20:False Gender Recognized

5.4 Result Analysis:

The code has been tested with Stanford student database and Indian facial database (240 female

images and 240 male images, 75% used for training and 25% used for testing, hence there are

360 training images and 120 test images in total randomly selected and no overlap exists

between the training and test images). The results showed the recognition rate of 94.16%.


41/43

41

CHAPTER 6: CONCLUSION AND SCOPE OF THE FUTURE WORK

6.1 Conclusion:

Automatic gender identification plays an important role in identification of a person. Gender

identification can help effectively to reduce the search time by limiting the subsequent searching

stage to either a male database or a female database. Automatic gender identification can also

provide an important clue in various security and surveillance based applications.

A successful gender classification approach can boost the performance of many other

applications including face recognition and smart human-computer interfaces. Despite its

importance, it has received relatively little attention in the literature. Thus an automatic feature-selection-based gender classification scheme is explained in this paper. We argue that feature

selection is important for gender classification, and demonstrate that, by removing features that

do not encode important gender information from the representation of the faces using genetic

algorithm thus reducing the error rate significantly.

6.2 Limitations:

Despite the good performances of our gender recognition methods as explained in the previous

section, some limitations do exist, as explained.

Gender recognition is based on 2-D images which are sensitive to the viewpoint of the

camera and lighting conditions.

Intrinsic factors of genetics and ethnic deviations such as national or cultural heritance are

not taken into account.

Extrinsic factors of the environment and behavior choices (i.e. sun exposure, drugs,

cigarettes, etc) are not included in our system.


42/43

42

6.3 Future Research Work:

3-D human body shapes obtained by laser scanning can be used for gender recognition to

avoid the limitations caused due to 2-D image. Different machine-learning algorithms andfeature-extraction methods are investigated and analyzed on this issue.

The idea of using Multi-ethnic face feature estimation can also be incorporated to reduce the

fault rate in face feature extraction i.e. patterns become confounded by intrinsic factors of

genetics, gender differences, and ethnic deviations and, equally as important, extrinsic

factors of the environment and behavior choices (i.e. sun exposure, drugs, cigarettes, etc).


43/43

REFERENCES

1. Karl Ricanek,Jr.,Senior Member,IEEE,Yishi Wang and Susan J. Simmons: Generalized Multi-EthnicFace Age

Estimation, 2011 IEEE Computer Society Conference ,Volume 1,2011.

2. S.Ravi,S.Wilson:Face detection with facial features and gender classification based on Supportvector machine.International Journal of Imaging Science and Engineering,India ,2010.

3. ZHEN-HUA WANG, ZHI-CHUN MU: Gender classification using selected independent-featuresbased on genetic algorithm. Proceedings of the eigth International Conference on Machine

Learning,Baoding,2009.

4. C.R Vimal Chand , Global Journal of Computer Science and Technology : Face and genderRecognition Using Genetic Algorithm &Hopfield Neural Network ,2003 IEEE Computer Society

Conference ,p.511,Volume 1,2003.

5. M. Nazir, Muhammad Ishtiaq, Anab Batool, M. Arfan Jaffar, Anwar M. Mirza :Feature Selectionfor Efficient Gender Classification. National University of Computer and Emerging Science, FAST,

Islamabad, Pakistan.

6. M.Turk and A. Pentland,Eigen faces for recognition,journal of cognitive Neuroscience,vol.3,2005.

7. Lindsay I Smith:A tutorial on Principal Component Analysis.8. Moghaddam,B. Yang et al. Learning gender with support faces.IEEE Transactions on Pattern

Analysis.2002,24(5):707-711

9. An Introduction to Genetic Algorithms: Melanie Mitchell10.BEdelman et al., Sex Classification of Face Areas: How Well can a Linear Neural Network

Predict Human Performance?,Journal of Biological Systems,vol. 6, no. 3, 1998.

11.L. Eshelman, The CHC adaptive search algorithm: how to have safe search when engaging innon-traditional genetic recombination, Proceedings of the Foundation of Genetic Algorithms

Workshop,pp. 265-283, 2007.

Gender recognition

Documents

Gender recognition