CHAPTER 1 INTRODUCTION 1.1 A Brief Description Virtual learning is increasing day by day, and Human Computer Interaction is a necessity to make virtual learning a better experience. The emotions of a person play a major role in the learning process. Hence the proposed work, detects the emotions of a person, by his face expressions. For a facial expression to be detected face location and area must be known; therefore in most cases, emotion detection algorithms start with face detection, taking into account the fact that face emotions are mostly depicted using the mouth. Eventually, algorithms for eye and mouth detection and tracking are necessary, in order to provide the features for subsequent emotion recognition. In this project we propose a detection system for natural emotion recognition. 1.2 Need For Face Detection Human activity is a major concern in a wide variety of applications such as video surveillance, human computer interface, face recognition and face database management. Most face recognition algorithms assume that face location is known. Similarly, face-tracking algorithms often assume that initial face location is known. In order to improve the efficiency of the face recognition systems, an efficient face detection algorithm is needed. 1
53
Embed
Final Year Project - Enhancing Virtual Learning through Emotional Agents (Document)
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
CHAPTER 1
INTRODUCTION
1.1 A Brief Description
Virtual learning is increasing day by day, and Human Computer
Interaction is a necessity to make virtual learning a better experience.
The emotions of a person play a major role in the learning process.
Hence the proposed work, detects the emotions of a person, by his face
expressions.
For a facial expression to be detected face location and area must
be known; therefore in most cases, emotion detection algorithms start
with face detection, taking into account the fact that face emotions are
mostly depicted using the mouth. Eventually, algorithms for eye and
mouth detection and tracking are necessary, in order to provide the
features for subsequent emotion recognition. In this project we propose a
detection system for natural emotion recognition.
1.2 Need For Face Detection
Human activity is a major concern in a wide variety of
applications such as video surveillance, human computer interface, face
recognition and face database management. Most face recognition
algorithms assume that face location is known. Similarly, face-tracking
algorithms often assume that initial face location is known. In order to
improve the efficiency of the face recognition systems, an efficient face
detection algorithm is needed.
1
1.3 Need For Emotion Detection
Human beings communicate through facial emotions in day to day
interactions with others. Human perceiving the emotions of fellow
human is natural and inherently accurate. Human can express his/her
inner state of mind through emotions. Many times, emotion indicates
that a human needs help. Computer recognising emotions is an important
research in Human Computer Interfacing (HCI). This HCI can be a
welcoming method for physically disabled and to those who are unable
to express their requirement by voice or by other means and especially to
those who are confined to bed. The human emotion can be detected
through facial actions or through biosensors. Facial actions are imaged
through still or video cameras. From still images, taken at discrete
times, the changes in eye and mouth areas can be exposed. Measuring
and analysing such changes will lead to the determination of human
emotions.
1.4 Existing Face Detection Approaches
1.4.1 Feature Invariant Methods
These methods aim to find structural features that exist even when
the pose, viewpoint, or lighting conditions vary, and then use these to
locate faces. These methods are designed mainly for face localization.
2
Texture
Human faces have a distinct texture that can be used to
separate them from different objects. The textures are computed using
second-order statistical features on sub images of 16X16 pixels. Three
types of features are considered: skin, hair, and others. To infer the
presence of a face from the texture labels, the votes of occurrence of hair
and skin textures are used. Here the colour information is also
incorporated with face-texture model. Using the face texture model, a
scanning scheme for face detection in colour scenes in which the orange
like parts including the face areas are enhanced. One advantage of this
approach is that it can detect faces which are not upright or have features
such as beards and glasses.
Skin Colour
Human skin colour has been used and proven to be an effective
feature in many applications from face detection to hand tracking.
Although different people have different colour, several studies have
shown that the major difference lies largely between their intensity rather
than their chrominance. Several colour spaces have been utilized to label
pixels as skin including RGB, Normalized RGB, HSV, YCbCr, YIQ,
YES, CIE XYZ and CIE LUV.
1.4.2 Template Matching Methods
In template matching, a standard face pattern is manually
predefined or parameterized by a function. Given an input image, the
correlation values with the standard patterns are computed for the four
3
contours, eyes, nose, and mouth independently. The existence of a face is
determined based on the correlation values. This approach has the
advantage of being simple to implement. However, it has proven to be
inadequate for face detection since it cannot effectively deal with
variation in scale, pose, and shape. Multiresolution, multiscale, sub
templates, and deformable templates have subsequently been proposed
to achieve scale and shape invariance.
Predefined Face Template
In this approach several sub templates for nose, eyes, mouth and
face contour are used to model a face. Each sub template is defined in
terms of line segments. Lines in the input image are extracted based on
greatest gradient change and then matched against the sub templates. The
correlations between sub images and contour templates are computed
first to detect candidate location of faces. Then, matching with the other
sub templates is performed at the candidate positions. In other words, the
first phase determines focus of attention or region of interest and second
phase examines the details to determine the existence of a face.
1.4.3 Appearance Based Methods
In the appearance based methods the templates are learned from
examples in images. In general, appearance based methods rely on
techniques from statistical analysis and machine learning to find the
relevant characteristics of face and non face images. The learned
characteristics are in the form of distribution models that are
consequently used for face detection.
4
1.5 Existing Emotion Detection Approaches
1.5.1 Genetic Algorithm
The eye feature plays a vital role in classifying the face emotion
using Genetic Algorithm. The acquired images must go through few pre-
processing methods such as grayscale, histogram equalization and
filtering. A Genetic Algorithm methodology estimates the emotions from
eye feature alone. Observation of various emotions lead to a unique
characteristic of eye, that is, the eye exhibits ellipses of different
parameters in each emotion. Genetic Algorithm is adopted to optimize
the ellipse characteristics of the eye features. Processing time for Genetic
Algorithm varies for each emotion.
1.5.2 Neural Network
Neural networks have found profound success in the area of
pattern recognition. By repeatedly showing a neural network, inputs are
classified into groups, the network can be trained to discern the criteria
used to classify, and it can do so in a generalized manner allowing
successful classification of new inputs not used during training. With the
explosion of research in emotions in recent years, the application of
pattern recognition technology to emotion detection has become
increasingly interesting. Since emotion has become an important
interface for the communication between human and machine, it plays a
basic role in rational decision-making, learning, perception, and various
cognitive tasks.
5
Human's emotion can be detected based on the physiological
measurements, facial expression. Since human shows the same facial
muscles when expressing a particular emotion, therefore the emotion can
be quantified. Primary emotions such as anger, disgust, fear, happiness,
sadness and surprise can be classified using Neural Network.
1.5.3 Feature Point Extraction
Template Matching
An interesting approach in the problem of automatic facial feature
extraction is a technique based on the use of template prototypes, which
are portrayed on the 2-d space in gray scale format. This is a technique
that is, to some extent, easy to use, but also effective. It uses correlation
as a basic tool for comparing the template with the part of the image that
we wish to recognize. An interesting question that arises is, the
behaviour of recognition with template matching in different resolutions.
This involves multi-resolution representations through the use of
Gaussian pyramids. The experiments proved that not very high
resolutions are needed for template matching recognition. For example,
the use of templates of 36x36 pixels proved sufficient. This fact shows
us that template matching is not as computationally complex as we
originally imagined.
This class implements the face detection algorithm which starts by
scanning the given image with the SSR filter and locating the face
candidates, then it assembles the candidates that are close to each other
using connected components (to treat less candidates which means less
processing time, remember this is a real-time application), then we take
6
the centre of each cluster and extract a template based on this centre; we
pass the template to the Support Vector Machine which tells us whether
this template is a face or not, if yes, we locate the eyes, then we locate
the nose.
Face detection techniques are of two categories:
1. Feature based approach
2. Image-based approach.
Template Matching provides for the human face detection system.
1. Feature Based Technique:
The techniques in the first category make use of apparent
properties of face such as face geometry, skin colour, and motion. Even
feature-based technique can achieve high speed in face detection, but it
also has problem in poor reliability under lighting condition.
2. Image Based Technique:
The image based approach takes advantage of current advance in
pattern recognition theory. Most of the image based approach applies a
window scanning technique for detecting face, which requires large
computation.
To achieve high speed and reliable face detection system, we
propose the method which combines both feature-based and image-based
approach using SSR Filter.
7
1.5.4 Template Matching.
Template matching is a technique in digital image processing for
finding small parts of an image which match a template image or as a
way to detect edges in images.
The basic method of template matching uses a convolution mask
(template), tailored to a specific feature of the search image, which we
want to detect.
This technique can be easily performed on grey images or edge
images. The convolution output will be highest at places where the
image structure matches the mask structure, where large image values
get multiplied by large mask values
Eyes and Nose detection using SSR Filter.
A real-time face detection algorithm using Six-Segmented
Rectangular (SSR) filter of the eyes and nose detection.
SSR is a six segment rectangle as illustrated in Figure 1.1.
Figure 1.1 SSR Filter
8
At the beginning, a rectangle is scanned throughout the input
image. This rectangle is segmented into six segments as shown below.
The SSR filter is used to detect the Between-the-Eyes based on
two characteristics of face geometry.
BTE - Between The Eyes
The detection of BTE is based on the property of the image
characteristics of the area on face. The intensity of the BTE image
closely resembles a hyperbolic surface as shown in Figure 1.2. The BTE
is the saddle point on the hyperbolic surface. A rotationally invariant
filter could thus be devised for detecting the BTE area.
9
Figure 1.2 Determination of BTE
The nose area is usually calculated to be 2/3rd of the value of L as
shown in Figure 1.3. The L is calculated as the approximate distance
between both eyes and the distance from eye to nose.
Figure1.3 Nose Tip Search Area Relative to Eyes
The common BTE area on human face resembles a hyperbolic surface.
The proposed work uses this hyperbolic model to describe the BTE
region, the centre of the BTE is thus the saddle point on the surface.
Blobs
Blobs provide a complementary description of image structures in
terms of regions, as opposed to corners that are more point-like.
Nevertheless, blob descriptors often contain a preferred point (a local
maximum of an operator response or a centre of gravity) which means
10
that many blob detectors may also be regarded as interest point
operators. Blob detectors can detect areas in an image which are too
smooth to be detected by a corner detector.
Gabor Filtering
It is possible for Gabor filtering to be used in a facial recognition
system. The neighbouring region of a pixel may be described by the
response of a group of Gabor filters in different frequencies and
directions, which have a reference to the specific pixel. In that way, a
feature vector may be formed, containing the responses of those filters.
Automated Facial Feature Extraction
In this approach, as far as the frontal images are concerned, the
fundamental concept upon which the automated localization of the
predetermined points is based consists of two steps: the hierarchic and
reliable selection of specific blocks of the image and subsequently the
use of a standardized procedure for the detection of the required
benchmark points. In order for the former of the two processes to be
successful, the need of a secure method of approach has emerged. The
detection of a block describing a facial feature relies on a previously,
effectively detected feature. By adopting this reasoning, the choice of the
most significant characteristic -the ground of the cascade routine- has to
be made. The importance that each of the commonly used facial features,
regarding the issue of face recognition, has already been studied by other
researchers. The outcome of surveys proved the eyes to be the most
dependable and easily located of all facial features, and as such they
were used. The techniques that were developed and tried separately,
utilize a combination of template matching and Gabor filtering.
11
The Hybrid Method
The basic question of the desired feature blocks is performed by a
simple template matching procedure. Each feature prototype is selected
from one of the frontal images of the face base. The practiced
comparison criterion is the maximum correlation coefficient between the
prototype and the repeatedly audited blocks of a smartly restricted area
of the face.
In order for the search area to be incisively and functionally
limited, the knowledge of the human face physiology has been applied,
without hindering the satisfactory performance of the algorithm in cases
of small violations of the initial limitations. However, the final block
selection by the mere use of this method has not always been crowned
with success. Therefore, the need of a measure of reliability came forth.
For that reason, the use of Gabor filtering was deemed to be one suitable
tool. As it can be mathematically deduced from the filter’s form, it
ensures simultaneous optimum localization in the natural space as well
as in frequency space.
The filter is applied both on the localized area and the template in
four different spatial frequencies. Its response is regarded as valid, only
in the case that its amplitude exceeds a saliency threshold. The area with
minimum phase distance from its template is considered to be the most