Efficient Face and Facial Feature Tracking Using Search Region Estimation

M. Kamel and A. Campilho (Eds.): ICIAR 2005, LNCS 3656, pp. 1149 – 1157, 2005. © Springer-Verlag Berlin Heidelberg 2005

Efficient Face and Facial Feature Tracking Using Search Region Estimation

C. Direkoğlu, H. Demirel, H. Özkaramanlı, and M. Uyguroğlu

Department of Electrical and Electronic Engineering, Eastern Mediterranean University, Gazimagusa, North Cyprus

[email protected]

Abstract. In this paper an intelligent and efficient combination of several meth-ods are employed for face and facial feature tracking with the motivation for real time applications. Face tracking algorithm is based on color and connected component analysis. It is scale, pose and orientation invariant, and can be im-plemented in real time in controlled environments. The more challenging prob-lem of facial feature tracking uses intensity based adaptive clustering on facial feature sub-images. New search region estimation for each sub-image is pro-posed. The technique employs facial expression aware eye sub-image predic-tion. The simulation results indicate that facial feature tracking is efficient with an average tracking rate of 99% with a three pixel range under different head movements such as translation, rotation, tilt, and scale changes. Furthermore it is robust under varying facial expressions and non-uniform illumination.

1 Introduction

The Face tracking has become an increasingly important research topic. Many possi-ble applications have been studied, including face or gesture recognition, teleconfer-encing, robotics as well as human computer interaction. Template matching using stored templates representing whole face of different people in diverse posses or ex-pressions is a type of approach for face tracking. These templates can also be used for training a classifier, such as neural network, that can help in the face-detection proc-ess. Paul Viola and Michael Jones [1] proposed a machine learning approach for vis-ual object detection, which is capable of processing images rapidly and achieving high detection rates. However, their framework has the disadvantage of not using temporal coherence. Any face detected in a frame provides information such as posi-tion and color, to be used in the next frames to speed up the process. The simplest approach for face tracking is skin-color. Prem Kuchi et al. [2] proposed a face track-ing algorithm by using YCbCr color space. They present CbCr Gaussian skin-color model for skin and non-skin classification. The search region is estimated based on detected position and the min-max boxes for the consecutive frames. In this approach, RGB to YCbCr conversion is required and it is more complex and slow in comparison with RGB to normalized rgb conversion. Moreover, major axis is not reliable refer-ence to estimate search region, because of its variability due to the clothes of persons in skin color applications.

1150 C. Direkoğlu et al.

In this paper, we propose a face tracking algorithm by using skin-color feature of face. Skin-color is computationally cheaper and at the same time orientation, shape, scale and translation invariant. These properties make it a good candidate for real time applications. We choose normalized r-g colorspace for skin color modeling because of its fast and simple conversion from the RGB colorspace. The other important prop-erty of normalized r-g, is that it is more tolerant in non-uniform lighting environments and one does not need to dynamically adapt its distribution. The face region, in the first frame, is located by using the algorithm explained in [3]. Once the center, minor axis and major axis are known of elliptic face, the search region is determined in the next frame by using last center position and minor axis of face. Then stochastic skin color modeling is applied to the search region and we get a set of connected compo-nents in binary format. Morphological operations are applied for smoothing and than holes inside of face candidate regions are filled. Finally the region, which includes the previous face center position, is chosen as a face. The success rate of the proposed face tracking method is on the average 93%.

Facial feature tracking is more challenging task than face tracking. Karin Sobottka and Ioannis Pitas [4] perform facial feature tracking by block (template) matching. Once, the facial features are detected, they initialize feature blocks. Then, tracking is done over time by searching for corresponding blocks in consecutive frames. Tem-plate matching or other types of appearance based techniques are computationally expensive. Jong-Gook Ko et al. [5] proposed a computationally cheap method for tracking eyes, nostrils and lip corners. In their paper, they convert grey scale frame into binary image by using intensity computation, then apply graph matching to get most similar regions in the image. The most similar regions are assigned as eyes. From the eye positions, they estimate mouth region and locate lip corners by search-ing most left and right columns. Finally, they determine nostrils by using eye and lip corner positions. However, in this approach, they convert whole image to binary im-age for graph matching and it is very critical to estimate similarity of eyes when the background is complex. Furthermore, if the lighting is not uniform around the face, there are important feature losses by this approach. In this paper, facial feature tracking is proposed by using intensity based adaptive clustering and binary image processing. Intensity based techniques are computation-ally cheaper and therefore more attractive in real-time applications. Each facial fea-ture is investigated in separate search windows. This compensates non-uniform light-ing conditions on face. For each facial feature, the optimum search window is adapted dynamically in consecutive frames. After intensity based adaptive clustering for each search window, a set of connected components are obtained to determine the feature positions at that frame. Success rate of the feature tracking approach ranges from 82% to 99% for one pixel range and from 99% to 100% for 3 pixel ranges.

2 Face Tracking

Face tracking proposed in this paper is comprised of three main steps: search region estimation, stochastic skin color modeling and locating face region.

Efficient Face and Facial Feature Tracking Using Search Region Estimation 1151

2.1 Search Region Estimation

Once the face region in the first frame is determined, one can determine the face boundary box by estimating its corner coordinates. This can be done from the knowl-edge of the major axis (majax), minor axis (minax) and the center (Xc,Yc) of the ellip-tic face model which can easily be calculated from the detected face region. The cor-ner coordinates of the face region are then given by (see Fig. 1(a)).

)2

,2

()1,1(minax

Ycmajax

XcYX −−= )2

,2

()2,2(minax

Ycmajax

XcYX +−= (1)

)2

,2

()3,3(minax

Ycmajax

XcYX −+= )2

,2

()4,4(minax

Ycmajax

XcYX ++= (2)

Search region, for the next frame, is estimated by using the positions of the last frame and minax, depicted in Fig. 1(b). The corner coordinates of the new search region are defined by

)2

1,2

1()5,5(minax

Yminax

XYX −−= ; )2

2,2

2()6,6(minax

Yminax

XYX +−= ; (3)

)2

3,2

3()7,7(minax

Yminax

XYX −+= ; )2

4,2

4()8,8(minax

Yminax

XYX ++= ; (4)

The reason why the minor axis is used in (3) and (4) is, it is reliable. Note that major axis can vary in open throat or close throat cases of clothes, since we apply skin color modeling.

(a) thi frame (b) thi )1( + frame

Fig. 1. Search region estimation for two consecutive frames

2.2 Stochastic Skin Color Modeling

The skin-color model is generated by supervised training of skin-color regions and the skin-color illumination brightness is reduced through normalization in Eq. (5).

)/(

)/(

BGRGg

BGRRr

++=++= (5)


The colors (r,g) are known as chromatic colors. According to [6] the skin-color

distribution in chromatic color space can be approximated by the Gaussian ),( 2∑mN ,

where ),( grm= is mean vector and Σ is covariance matrix as shown below.

=r ∑=

N

iirN 1

1 ∑=

=N

iig

Ng

1

1 ⎥⎦

⎤⎢⎣

⎡=∑

gggr

rgrr

σσσσ (6)

With this Gaussian model, one can obtain the likelihood of skin color for any pixel x = (r,g) of an image with Eq. (7):

))()(5.0exp(),( 1 mxmxgrP T −∑−−= − (7)

With an appropriate threshold, the image can then be further transformed to a bi-nary image showing skin regions and non-skin regions as shown in Fig. 2.

(a) (b)

Fig. 2. Stochastic skin color modeling to search region

2.3 Locating Face Region

After the skin color modeling of the search region, we get a set of connected compo-nents in binary form. We apply median filtering and erosion morphological operation for smoothing and then fill the holes inside of face candidate regions. This is practi-cally depicted in Fig. 3(a). Finally, the region which includes the previous center position (Xc,Yc) of the face is chosen to be the new face region. This is shown in Fig. 3(b) by placing a boundary box around the region declared as the new face.

(a) (b)

Fig. 3. Locating the face region


Some of the face tracking frames is shown in Fig. 4, with different scales and orientation.

(a) (b) (c)

Fig. 4. Face tracking

The success rate of the proposed face tracking method is on the average 93%. The algorithm only fails for rapidly changing illumination or when skin colored object at the background coincides with face.

3 Facial Feature Tracking

Left eye pupil, right eye pupil, nostrils, left lip corner and right lip corner are the fa-cial features that are tracked. This is achieved by using a separate search region for each of them. This compensates non-uniform lighting conditions. Once we know the facial feature positions, the Euclidean Distance (ED) between left eye pupil and right eye pupil is taken as a reference to determine search regions for pupils and lip corners in the next frame as shown in Fig. 5.

(a) thi frame (b) thi )1( + frame (c) thi )1( + frame

Fig. 5. Eye pupils, lip corners and nostrils search windows estimation

The Euclidean Distance between left and right eye pupil in the ith frame is defined by

22 )()( iiiii yleyrexlexreED −+−= (8)

Where (xlei , ylei) is left eye position and (xrei, yrei) is right eye position in the ith frame. After determining the location of eye pupils and lip corners in the (i+1)th frame, this information together with EDi is used to determine the search region for nostril extraction as shown in Fig. 6.


Fig. 6. Search window estimation for nostrils

The respective equations determining the coordinates of this region are given by

),42

)(()9,9( 1

11+

++ ++

= iiii yle

EDxrexleYX ),

42

)(()10,10( 1

11+

++ ++

= iiii yre

EDxrexleYX (9)

),52

)(()11,11( 1

11+

++ −+

= iiii yle

EDxrlxllYX ),

52

)(()12,12( 1

11+

++ −+

= iiii yre

EDxrlxllYX (10)

where, (xlei+1,ylei+1) is location of left eye pupil, (xrei+1,yrei+1) is location of right eye pupil, (xlli+1,ylli+1) is location of left lip corner, and (xrli+1,yrli+1) is location of right lip corner in the (i+1)th frame. These processes continue dynamically. After the search region estimation, inten-sity based adaptive clustering on each grey scaled region and then binary image proc-essing is applied to determine the positions of features. Intensity based adaptive clus-tering method is a thresholding process. The mean intensity value of the region forms the first threshold. The mean of lower values of first threshold is the second threshold. After three or more iteration we get the darkest region in the grey scaled image. The clustered image is a binary image which includes darkest regions as white connected components. Because of this adaptive process, lighting is not a problem.

3.1 Eye Tracking

The search window of left and right eye does not include any dark region other then pupil. After the intensity based adaptive clustering, pupil candidate regions appear. Then the possible holes inside the candidate regions are filled (see Fig. 7(b)). Finally the biggest candidate region is assigned as pupil and the center of that region is marked (see Fig. 7(c)).

(a) (b) (c)

Fig. 7. Eye pupil detection

There are also other situations where the eyebrow falls inside the eye search region when a person smiles or annoyed. To compensate this effect, the algorithm ignores regions touching the upper border of the search window.


3.2 Lip Corners Tracking

To find left lip corner, intensity based adaptive clustering is performed since left lip corner is the darkest region in its search window, including smiling situations. After we get binary form, the leftmost pixel of the biggest connected component is chosen. The procedure is shown in Fig. 8. Similar procedure is applied for locating the right lip corner.

(a) (b) (c)

Fig. 8. Left lip corner detection

3.3 Nostril Tracking

In order to find the nostrils, intensity based adaptive clustering offers us a set of con-nected components, where the biggest two regions are assigned to be nostrils as de-picted in Fig. 9.

(a) (b) (c)

Fig. 9. Nostril detection

Some of the facial features tracking frames are shown in Fig. 10.

(a) (b) (c)

Fig. 10. Facial feature tracking

4 Results and Discussions

The proposed techniques for face and facial feature tracking were tested on a typical head and shoulder video sequence taken with a webcam in a non-uniformly illumi-nated environment. Each frame is of size 288x352 pixels. The subject in this test


sequence makes translational and rotational movements; approaches and recedes from the camera; tilts his head up and down and is allowed to express facial emotion. The performance of face tracking method is 93%. The algorithm only fails for rapidly changing illumination or when skin colored object at the background coincides with face. Table 1 shows facial feature tracking simulation results which are average of two different subjects at different environments. The results in Table 1 are based on 1, 2 and 3 pixel range accuracy. The performance is obtained by comparing the simu-lated results to the manually extracted feature locations for all frames. Of course from this perspective the 1 pixel range is not reliable due to the human error involved in the extraction process. It should be noted that the performance tends to be almost perfect at an average tracking rate of 99% for the 3 pixel range. The various differences in the performance for 1 pixel range can be attributed mainly to the non-uniform illumina-tion and partly to the dynamic nature of facial expressions.

Table 1. Facial Feature Tracking Rate Performance (%)

Facial Features 1 pixel range 2 pixel range 3 pixel range Left eye 92.68 100 100 Right eye 98.78 100 100 Left nostril 98.78 98.78 98.78 Right nostril 92.68 98.78 98.78 Left lip corner 81.71 100 100 Right lip corner 85.37 96.34 98.78

5 Conclusions

The present paper described intelligent combination of efficient techniques for face and facial feature tracking with the motivation for real time applications. The methods employed were computationally efficient. The proposed region estimation technique backed by facial expression aware approaches resulted in an almost perfect facial feature tracking performance where in the 3 pixel range accuracy, an average of 99% tracking rate was obtained. The performance of the face tracking technique was 93%.

References

1. Paul Viola, Michael Jones: Rapid Object Detection Using a Boosted Cascade of Simple Features, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), ISSN: 1063-6919, vol. 1, pp. 511-518, December 2001.

2. Prem Kuchi, Prasad Gabbur, P. Subbana Bhat and Sumam David: Human Face Detection and Tracking Using Skin Color Modeling and Connected Component Operators, IETE Jour-nal of Research, Special issue on Visual Media Processing, May 2002.

3. C. Direkoglu, H. Demirel, H. Ozkaramanli, M. Uyguroglu and A. M. Kondoz: Scale and Translation Invariant Face Detection and Efficient Facial Feature Extraction, in Proc. IASTED Int. Conf. On Signal and Image Processing (SIP'2004) ), Honollulu, Hawaii, USA August 23-25, 2004, pp. 122-128.


4. Karin Sobottka and Ioannis Pitas,: A Fully Automatic Approach to Facial Feature Detection and Tracking, International Conference on Audio- and Video-based Biometric Person Au-thentication (AVBPA-1997), Crans-Montana, Switzerland, 12-14 March 1997.

5. Jong-Gook Ko, Kyung-Nam Kim and R.S. Ramakrishna: Facial Feature Tracking for Eye-Head Controlled Human Computer Interface, IEEE, TENCON'99, Cheju, Korea, Sept. 1999.

6. J. Yang, W. Lu, and A. Waibel: Skin-color modeling and adaptation, Proceedings of ACCV'98 , vol. II, pp. 687-694, Hong Kong, January 1998.

Efficient Face and Facial Feature Tracking Using Search Region Estimation

Documents