Pattern Recognition, July 2012 Robust Ear Identification ...csajaykr/myhome/papers/PRL2012b.pdf · Robust Ear Identification using Sparse Representation of Local Texture Descriptors
Post on 16-Feb-2018
217 Views
Preview:
Transcript
1
Robust Ear Identification using Sparse Representation of
Local Texture Descriptors
Ajay Kumar, Tak-Shing T. Chan
Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
Abstract: Automated personal identification using localized ear images has wide range of
civilian and law-enforcement applications. This paper investigates a new approach for more
accurate ear recognition and verification problem using the sparse representation of local gray-
level orientations. We exploit the computational simplicity of localized Radon transform for the
robust ear shape representation and also investigate the effectiveness of local curvature encoding
using Hessian based feature representation. The ear representation problem is modeled as the
sparse coding solution based on multi-orientation Radon transform dictionary whose solution is
computed using the convex optimization approach. We also study the nonnegative formulation
such problem, to address the limitations from the regularized optimization problem, in the sparse
representation of localized ear features. The log-Gabor filter based approach and the localized
Radon transform based feature representation has been used as baseline algorithm to ascertain
the effectiveness of the proposed approach. We present experimental results from publically
available UND and IITD ear databases which achieve significant improvement in the
performance, both for the recognition and authentication problem, and confirm the usefulness of
proposed approach for more accurate ear identification.
1. Introduction
The identification of humans by the fellow humans has been the key to the fabrication of our
society and has matured with the evolution of mankind. We have been identifying humans
from their voice, appearance or their gait from thousands of years. However, the systematic
approach to scientifically identify humans is believed to have begun in 19th
century when
Alphonse Bertillon introduced the idea of using a number of anthropomorphic measurements
to identifying criminals. Personal identification using unique physiological and behavioral
characteristics is now increasingly employed in a variety of commercial and forensic
applications. The face, fingerprint and iris have emerged as the three most popular biometrics
Pattern Recognition, July 2012
2
modalities employed for the automated management of human identities. It is generally
believed that there is no single universal or superior biometric modality and each modality
has its unique application, deployment advantages and imaging requirements. Iris recognition
has shown to offer higher accuracy but requires constrained (near-infrared) imaging
environment and suffers from high failure to enroll rate [35]. Visually impaired persons
cannot use iris and retina based biometrics systems. Fingerprint recognition is widely
employed in the law-enforcement, border/immigration crossing and commercial applications.
The NIST report [30] to US Congress has stated that ~2% of the population (elderly, manual
labors, etc) does not have usable fingerprints. Similar conclusions have also been reported in
recently released large scale proof of concept study conducted by UIDAI [65] which
estimates that ~1.9% of subjects cannot be reliably authenticated using their fingerprints. The
fingerprint identification can also be rendered useless for the physically challenged
population who are not privileged to have fingers. There are new challenges to fingerprint
systems as the asylum seekers and criminals have shown to successfully evade the deployed
fingerprint identification technologies using surgically altered fingerprints [44]. Face
recognition technologies also have several limitations on the performance and suffer from the
performance degradations due to varying makeup, pose, expression and aging. The enhanced
research efforts on the face recognition technologies in the last decade have significantly
improved the accuracy of currently available face recognition systems. These technologies
are now grappling with new challenges emerging from the cosmetic/plastic surgery and face
spoofing [45]-[46]. Therefore further research efforts are required to exploit potential from
the other emerging biometrics modalities which can also be conveniently/simultaneously
acquired using advanced imaging sensors which are now widely available at lower cost.
3
Human ear images illustrate rich information which is embedded on the curved 3D
surface which has invited lot of attention from the forensic scientists. The morphology of
external ear is believed to be relatively stable over an acceptable period of time for
biometrics and forensic applications. Several studies on the stability of external ear, i.e.,
auricle, have suggested that ear shape matures quite early while its expansion* continues but
at very slow rate. In this context a study from 883 white Italian (4-73 years) subjects on age-
related trends in [62] suggests that ear length increases more than ear width. Meijerman et al.
[61] have presented detailed anthropometric study on external ear from 1353 subjects which
suggest that cartilage expansion, i.e., difference between auricle expansion and ear lobe
expansion, is greatest during the early adulthood. These studies are quite useful for the
biometrics identification and suggest that automated ear identification can benefit from
template update, whenever possible, for relatively younger and also for the older individuals.
The ear images can also be simultaneously acquired during face imaging and
employed to significantly improve the accuracy for the face recognition. It is possible to use
ear and face as complimentary pieces of information, especially in applications like
surveillance, tracking, or continuous personal authentication, as the best head position for
accurate ear recognition is not good for the accurate face recognition. The key advantages
associated with the use of 2-D ear images as biometric modality include the relative stability
of ear images in varying expressions, its relative immunity to privacy concerns, and
convenience to covertly acquire ear images for the surveillance applications. There has been
steady growth in the research interest to develop automated ear recognition technologies in
last decade. However significant efforts are further required to improve the ear detection,
* A study on 400 Japanese subjects aged 21-94 years has suggested [63] annual increase in auricle length of 0.13
mm while another study [54] on 206 subjects of various decent suggest such increase of 0.22 mm in the age
group of 30-93 years.
4
segmentation and recognition capabilities to make a convincing case for its deployment in
surveillance and other commercial applications.
Figure 1: Anatomy of external ear using a sample 2D sketch [56].
2. Related Prior Work
Automated personal identification using 2-D ear images has invited lot of research efforts in
the biometrics literature. The gray-level ear images typically acquire anatomy of external
human ear (figure 1) [2], [56]. Iannarelli [1] has manually attempted to classify the human ear
photographs into four categories, i.e., triangle, round, oval and rectangular, largely on the
basis of closed contour resulting from shape of the helix ring and lobule of the ear. He is
credited for developing a 12-point measurement scale, also referred to as Iannarelli system,
which was used to align and match ~7000 right ear images of different individuals.
Commercial solutions for the automated ear identification are not yet available (in the best of
our knowledge) but they can be highly useful in variety of forensic and civilian applications.
In this context, the US patent office has issued several patents on ear recognition methods and
systems. The Sandia Corporation was issued the first US patent [59] that exploited acoustic
properties of the ear for personal recognition. US patent no. 7826643 [60] describes 3D ear
recognition approach by generating eigen-ear space from the enrollment images. Another US
patent in [58] describes ear identification by incorporating imaging capabilities into the
5
telephone while recent US patent [57] describes another feature extraction algorithm for 3D
ear biometrics. The currently employed 3D imaging technologies in the literature employ 3D
digitizer which can be bulky and also quite expensive. Therefore the focus of this work has
been to exploit the 2D ear images that can be conveniently acquired from low-cost digital
camera.
A variety of approaches have been explored in the literature to extract discriminant
features from the 2D ear images that can characterize the gray level distribution in these
images for accurate automated personal identification. These approaches can be broadly
categorized into four categories based on the nature of features that are extracted from the
normalized ear images for the matching; (i) structural approaches, (ii) subspace learning-
based approaches, (iii) model-based, and (iv) spectral approaches. Salient features of these
approaches are outlined in Table 1 as the detailed description of these approaches is beyond
the scope of this paper. A structural feature scheme generally uses some of the well defined
geometrical features, such as the distance between crus of helix and ear lobe, to extract
features that can describe the shape of the ear. Such approaches are quite simple to implement
but often achieve limited performance as it is quite challenging to robustly extract the shape
features from limited resolution 2-D ear images and ensure sufficient discrimination in the
characterization of shape features. In this context, it may be noted that that the approach
employed in [2], [8] uses manual procedure to extract the structural features and ascertain the
uniqueness of the ear shape. The subspace learning-based approaches use normalized ear
images to construct subspaces which are built from the training data. Each of the unknown
normalized ear images is then projected into such subspaces. The similarity of resulting
coefficients with those from the training images is used to ascertain the identity of unknown
6
ear image. The subspace learning-based approaches using global appearance-based
representation (e.g. PCA [6]-[7], LDA [36]) are not robust to identify and accommodate local
image variations and are therefore not expected to match the superior performance that is
often possible using nonlinear (e.g. MCPCA [4]) and local subspace feature (coefficients)
matching approaches. The model-based approaches have attracted least attention in the ear
biometrics literature. Reference [3] described such an approach that develops a component
based model which is learned from the clustering of key points during the training phase.
Several applications of frequency and spatial-frequency domain features for the identification
of normalized ear images have been reported in the literature. Such approaches can be
Table 1: Classification of 2D ear identification methods into four categories
S. No. Approach Examples Reference
1
Structural
Vernoi diagram
Geometrical features
Active Shape Model
[2], [8]
[1], [21], [26]
[36]
2
Subspace learning
PCA
LDA
ICA
MCPCA
SIFT
[6]-[7]
[15]
[39]
[4]
[18], [34]
3
Model-based
Clustering of learned
components
Multiple matcher model
Fractal-based encoding
[3]
[43]
[64]
4
Spectral
Fourier Descriptors
1D Log-Gabor filter
Monogenic Log-Gabor
LBP
2D Gabor filter
QuaternionicCode
[14]
[17], [37]
[41]
[40], [42]
[37]
[41]
categorized as spectral approaches and have emerged as most popular volume of references in
the ear biometric literature. The spectral approaches typically characterize the normalized ear
images from the spectral-domain representation and then acquire the local phase or
orientation information to generate the ear templates for the matching.
7
The localization of region of interest from the 2-D ear images, prior to the feature
extraction process, can follow manual or the automated approach. While majority of work in
the literature [4]-[6], [12], [14], [26] uses manually segmented ear images, there have been
some promising efforts to employ automated segmentation [10], [16], [18], [37] and evaluate
the ear recognition performance. The color distribution in 2-D ear images can also be
exploited to improve the identification accuracy and reference [5] has exploited such an
approach using sequential forward selection of color spaces. The nearest neighbor (k-NN)
classifier has the least complexity and therefore has been widely employed for the feature
classifications while there have been some interesting efforts to use neural-network classifier
in [12], [39]. The curved surface images of the human ear profile can provide more
discriminent information for the ear identification. Therefore several promising efforts that
also exploit the 3D range images for the ear recognition have been reported [19], [23]-[24]
[32] in the literature. The slow acquisition speed of 3D ear imaging devices (such as Vivid
910 3D digitizer), its bulk, and high cost limits its possible application for any commercial
exploitation of ear biometrics technologies. This is possibly the reason that 2D ear images
acquired from conventional digital camera have attracted more attention in biometrics.
The use of different evaluation protocols, databases and number of subjects in the ear
biometrics literature makes it very difficult to comparatively ascertain the performance.
Reference [37] has recently attempted to focus on such qualitative comparison which suggest
that the spectral feature extraction approaches (e.g. log-Gabor filtering) that can exploit local
phase characteristics is likely to achieve superior performance on some publicly available
databases. A summary of prior work in the ear recognition literature suggests the need to look
beyond the conventional phase information and identify new features which can be more
8
discriminative in the normalized ear images. In this context, the sparse representation of local
ear shape orientations can be promising alternative for robust feature representation and has
not yet attracted the attention of the researchers in the literature.
2.1 Our Work
This paper investigates a new approach for the more accurate ear identification using visible
illumination 2-D ear images. Our feature extraction scheme exploits the sparse representation
of finite Radon transform based local orientation information. The neighborhood relationship
of gray-levels in the normalized ear images is encoded as the dominant gray-level feature
orientations in a local region using local Radon transform. Our efforts detailed in this paper to
exploit the nonnegative formulation of the regularized optimization problem, to more
effectively encode the sparse orientation representation, further illustrate promising
improvement in the accuracy for the ear identification problem. The local Radon transform
(LRT) is computationally economical and highly effective in identifying the continuous line
structures in 2D images which has also been employed in finger knuckle identification [29]
and palm biometrics [38]. The experimental results presented in this paper, from the two
publicly available databases, illustrate superiority of LRT based feature extraction and
matching strategy over the several other approaches presented in the literature for the ear
verification problem. The Hessian based approach introduced in this paper can more
effectively characterize the ear shape information by encoding local curvature (second order
derivative) information, i.e., Hessian matrix, as can also be observed from our analysis in
figure 3. Our experimental results presented in this paper suggest that the sparse
representation of local gray-level orientation using LRT consistently outperform LRT based
(without sparse encoding) and log-Gabor filter based approach presented in the literature. It
9
may be noted that the log-Gabor filter based matching approach is used as baseline in this
work as it has shown to outperform [37] several other competing approaches on publicly
available ear databases.
The development of any viable personal identification system for civilian applications,
using the ear biometric, will also require automated segmentation of region of interest images
from the face profile images which are acquired in a contactless but relatively cooperative
environment. This paper focuses on such problem and employs gray-level ear images
acquired using contactless ear imaging. The automated segmentation of ear (region of
interest) has been achieved using the approach based on morphological operators and Fourier
descriptors as detailed in [37]. The segmented ear images are firstly enhanced to normalize
the influence of uneven illumination, noise and shadow. The resulting image is subjected to a
new feature extraction approach using the sparse representation of local grey level orientation
information, using computationally attractive LRT operations, to generate the ear template for
the matching. We present extensive experimental results to ascertain the superiority of such
sparse coding solution, using convex optimization approach, both for the ear verification and
ear recognition problems.
The rest of this paper is organized as follows. We firstly detail the theoretical
formulation of the feature extraction and matching strategies in section 3 using sparse
representation of local This section also introduces the Hessian LRT based local orientation
encoding and the nonnegative formulation of the optimization problem to preserve the loss of
information. Section 4 details on the experiments and results, both for the recognition and
verification problems. The discussion on the key observations and analysis is presented in
section 5 while section 6 summarizes the key conclusions from this paper.
10
3. Sparse Representation based Localized Feature Extraction
The local grey-level information from the normalized ear images often describe the ear shape
and the appearance of the surface texture. One possible approach to effectively represent the
distribution of such local texture information is to compute their spatial orientation across
multiple scales. Such spatial orientation information can be acquired using the convolution
with the popular multiscale and multiorientation filters. However the convolution with such
filters, e.g. Gabor filters, Ordinal filters, or second Derivative of Gaussian filters, is
computationally expensive. Therefore our approach is to construct an overcomplete dictionary
using a set of binarized masks which are designed to recover the localized orientation
information from the normalized ear images. The elements of this dictionary are defined by
[ ] with k n. In this work, the elements of dictionary , i.e., , are
introduced to estimate the spatial orientation of dominant local grey-level appearances in one
of the possible directions as illustrated in the figure 1. These elements are constructed from a
set of points on a finite grid , where with q as a positive integer,
Figure 2: Estimating the dominant orientation of local gray-levels in normalized ear images
using binarized LRT masks in a 10 10 pixel region in the directions of 0o, 1π/6,
2π/6, 3π/6, 4π/6, 5π/6 and the is 2 pixel wide.
11
and defined as in the following:
{{( )| ( ) ( ) }
{( )| }
( )
where [ ] and denotes the angle between line and the positive x-axis, and is the
line passing through the center ( ) of [29].
The feature extraction approach investigated in this work is inspired by recent advances
and exciting results in the sparse representation of biometrics features [48]-[50]. We use the
sparse representation to model computationally efficient Localized Radon Transform (LRT)
based dictionary which encodes the spatial orientation of local gray-level relationship that
constitutes the localized ear shape and texture features. The sparse representation uses image
patches which are uniformly sampled from the normalized (and enhanced) ear image, with
adjacent patch centers of l l LRT mask. The sparse representation, i.e. coefficients , of
each of such vectorized patch centered at ( ), requires solution for the following
-regularized optimization problem:
‖ ‖
‖ ‖
(2)
The solution to above optimization problem can be computed by well studied convex
optimization approaches which have been detailed in the literature [55]. We can use magic
[22], FOCUSS [13], or fast iterative shrinkage thresholding algorithm (FISTA) [25] to solve
such unconstrained optimization problem. In this work we employed FISTA as it is
significantly faster. The Lipschitz constant is defined in [25], is equal to 2*(max Eigen
value of DTD), can be regarded as a fixed value. The magnitude of is empirically selected
12
and fixed as for all the experiments in this paper. All the negative 's are then
clipped to zeros and the resulting (clipped) coefficients from the same orientation of LRT
dictionary are added to compute representative orientation of the feature ( ) The index
of the dominant orientation is binary encoded to extract the feature template. This process is
repeated for every patch in the normalized ear images to extract sparse representation of LRT
based localized orientation information. This algorithm can be summarized as follows:
Algorithm: Sparse Orientation Representation using Localized Radon Transform
Input: (normalized ear image)
Output: (feature template)
for each image patch
1: Sparse Representation from LRT Dictionary
‖ ‖
‖ ‖
2: Clipping and Summation
( ) ∑ ( )
3: Feature Template using Dominant Orientation Estimation ( )
end
The matching scores between a registered feature template and an unknown feature
template can be computed as follows:
( ) [ ] [ ] (∑ ∑ ( ( ) ( ))
∑ ∑ ( ( )⨁ )
) (3)
where m represents the width and n represent the height of the binary encoded feature
templates. The registered feature template image is represented as with width and height
expanded to and , and ⨁ is the conventional Exclusive OR operator that
generates unity while two operands are different and zeros otherwise, while
(
) (
) (4)
( ) { ( )
[ ] [ ]
(5)
13
( ) {
(6)
We divide the encoded template ear images, i.e. (x, y), into disjoint sub-regions which are
matched using equation (3). Thus our matching strategy [47] attempts to accommodate the
influence of local image variations in the normalized ear images by matching the
corresponding template sub-regions with a small amount of shifting. Therefore the employed
approach is expected to be more robust against the image variations in the normalized ear
images.
3.1 Hessian LRT
We also attempted to estimate more accurate estimation of local texture orientations using
Hessian LRT. The proposed Hessian LRT is motivated by the principle of least action in
classical mechanics and secondly by the success of Hessian phase algorithm explored in [47].
Similar to as for the LRT, given a window of size l we create D rotationally equispaced paths
intersecting at the center of the window, with line width w, and choose the path with the least
squared curvature (also known as the curve of least energy) as the dominant orientation. Let
( ) represents an ear image. We rotate its Hessian by degrees and project it along the
rotated line. It may be noted that we are not using the Hessian eigenvalues here because the
Radon transform uses Dirac’s delta to remove all points except the rotated line; in other
words, only the curvature along the line has meaning. Such rotated Hessians are computed as
follows:
[ ] [
] [
] ( )
14
We compute the energy of each equispaced path by separately summing the squared values of
along each path/orientation and choose the orientation with the least energy. Such
dominant orientations are encoded and matched in a similar manner as for LRT discussed in
previous section. Figure 3 illustrates that the estimated gray-level neighborhood relationship,
using the localized dominant orientation, can significantly differ from four approaches
considered in this work.
Figure 3: Normalized ear image sample from the (a) IITD v1 and (b) UND dataset with typical
dominant grey-level orientation estimated in the local region (blue color block) using four
approaches considered in this work. The representative dominant orientation (in red color)
can differ from these four methods.
3.2 NNG
The sparse representation of the local orientation features for each of the normalized ear
image patch is achieved by solving the regularized optimization problem as depicted in
equation (2) using FISTA [25] and then clipping all negative 's to zeros. Therefore all the
negative coefficients are effectively removed in such formulation. Given this loss of
information, we conjectured that it might be better to solve the nonnegative version of the
problem ‖ ‖
∑ directly. As it turns out, this nonnegative
version can be solved by an existing implementation called NNG [28], [31]. Henceforth, we
15
simply call such as approach as NNG and also ascertain its effectiveness in the ear recognition
experiments.
4. Experiments and Results
The experimental results using the sparse representation of local orientation features, detailed
in previous section, are reported in this section. We performed extensive experiments with the
one training and two training image samples on two publicly available ear image datasets. The
evaluation approach and the results are detailed in following subsections.
4.1 Databases
We evaluated the proposed ear identification approach on two publicly available ear image
databases. The first database is the UND dataset [6] which contains images from 110 subjects
used in our experiments. The second dataset is the IITD ear image database [33], [37] which
contains ear images from 125 subjects. The IITD database also provides a larger dataset of ear
images from 221 subjects, which is formed by the combination of the 125-subject dataset with
another dataset. The automated ear segmentation and enhancement approach are same as
detailed in [37] for both UND and IITD dataset. The segmented ear images are further
subjected to illumination normalization and the resulting grayscale images of size
pixels are employed for the feature extraction. Each of the two databases was also enlarged to
include {0, 3, -3, 6, -6, 9, -9} degrees of rotated image samples (using bicubic interpolation)
to accommodate rotational variations in the ear images. The images in figure 4 illustrate
typical image samples from the two databases employed in our work.
16
(a) (b)
Figure 4: Ear image samples from publicly available (a) IITD v1 and (b) UND dataset.
4.2 Verification Experiments
The comparative performance for the ear verification problem was firstly investigated using
the feature extraction approaches discussed in section 3. In all the verification experiments we
compare each subject to every other subject on a one-to-one basis using the corresponding
matching criteria to ascertain the similarity distance. In order to ascertain the performance
from the proposed approach using minimum or one training image and also using two training
image, we report experimental results from the two test protocols. The test protocol A
generates average of three tests where each of the first three images from each of the subjects
are used as the test images while remaining other images from the respective subjects are used
as training images. The test protocol B (all-to-all) reports average of performance when every
image of the every subject in the dataset is used as respective training image/sample. In each
of the verification experiments, we report average of experimental results using equal error
rate (EER) and receiver operating characteristics (ROC). For the 125 subject IITD dataset,
using protocol A with three tests, we generate 125×3 genuine comparisons and 124×125×3
imposter comparisons, and we generate 493 genuine comparisons and 124×493 imposter
comparisons using protocol B. For the 221 IITD dataset, using protocol A with three tests, we
generate 221×3 genuine comparisons and 220×221×3 imposter comparisons while 793
genuine comparisons and 220×793 imposter comparisons are generated using protocol B.
17
Finally, for the UND dataset, we generate 110×3 genuine comparisons and 109×110×3
imposter comparisons using protocol A, and 433 genuine comparisons and 109×443 imposter
comparisons using protocol B.
(a) (b)
(c)
Figure 5: The ROC curves from the verification experiments using protocol A using (a) 125
subject, (b) 221 subject, and (c) UND database.
The experimental results from the verification experiments are presented on three sets
of ear data (125, 221 and UND), with log-Gabor filter approach as the baseline. Specifically,
the log-Gabor filters are used, with bandwidths of 2.0310 octaves and wavelengths of 18 and
18
54 (these filter parameters are same as used in [37] for log-Gabor filters and generate the best
performance). The verification experimental results are summarized in Table 2 (best results in
blue) while the corresponding receiver operating characteristics are shown in figure 5-6.
Table 2: Average equal error rate (%) from the verification experiments.
Protocol A Protocol B
125 221 UND 125 221 UND
NNG 1.8688 2.1106 5.1473 1.6227 2.0185 5.3065
s-LRT 2.3817 2.1428 5.4518 2.0300 2.0168 5.0808
LRT 4.2677 3.0481 7.0684 3.8679 2.9070 7.0132
Hessian 5.2806 3.6285 9.8471 4.6416 3.6553 9.9731
LG 4.3183 3.1335 8.1818 4.0552 3.1274 7.3956
(a) (b)
Figure 6: The ROC curves from the verification experiments using protocol B using (a) 125
subject, (b) 221 subject, and (c) UND database.
19
The experimental results from the verification experiments using 125 subject ear dataset
with protocol A suggest that the sparse representation approach using nonnegative
formulation (NNG) achieves the best performance (EER = 1.87%). For the 221 subject
database, the best performance using protocol A is again achieved sparse representation using
nonnegative formulation (EER = 2.11%). It is worth noting that the performance from NNG
and s-LRT is very close (ROCs in figure 5-6). However this performance is significantly
improved as compared to those achievable from using log-Gabor filter based approach. The
best performing filters for 125 subject dataset using protocol B is again achieved using NNG
(EER = 1.62%). Corresponding best performing approach for 221 subject database is NNG
(EER = 2.02%) and also s-LRT (EER = 2.02%). For the UND dataset, the NNG achieves the
best results (EER = 5.15%) using protocol A while s-LRT achieves best results (EER =
5.08%) using protocol B. The experimental results for the verification experiments using
Hessian based approach illustrate competing performance in some cases (table 2), it has
however performed poorly among all the approaches considered in this work. It is worth
noting that there has not been any effort in the literature to explore ear verification or
recognition performance using LRT, Hessian LRT, sparse representation of orientation
features using LRT or NNG (section 3.2). Our experimental results (table 2-3) suggest that the
LRT, NNG and s-LRT consistently outperforms conventional log Gabor filter based approach
explored in the literature.
4.3 Recognition Experiments
In all the recognition experiments, we follow the same two protocols as discussed in section
4.2 on the same three sets of data. The test protocol A generates average recognition accuracy
from three tests where each of 3 images from each of the subjects are used as probe image
20
while remaining other respective images are used as gallery images. The test protocol B
generates average of the recognition performance when every image of the every subject is
used as the gallery while other respective images of that subject are used as the probe images.
The average rank-one recognition rate (R1RR) is reported from all the experiments in the
Table 3 (best results in blue). The cumulative match characteristic (CMC) curves from each of
the recognition experiments is illustrated in figure 7-8.
(a) (b)
(c)
Figure 7: The CMC curves from the recognition experiments using protocol A using (a) 125
subject, (b) 221 subject, and (c) UND datbase.
21
(a) (b)
(c)
Figure 8: The CMC curves from the recognition experiments using protocol B using (a) 125
subject, (b) 221 subject, and (c) UND database.
The recognition experiments using protocol A and protocol B on all the dataset
considered in this work illustrate that the sparse representation LRT achieves the best
performance, i.e., rank-one recognition accuracy, among all the approaches considered in this
work. The experimental results shown in table 3 suggest that the sparse representation using
NNG achieves competing performance as from s-LRT which is significantly improved than
those from LRT, Hessian, and log-Gabor filter based approaches. It is interesting to note that
22
the LRT achieves worst performance on the UND database using protocol A and also using
protocol B. However, the performance from LRT on 221 subject database is better than those
from Hessian using protocol A and also using protocol B. The best performing approach for
the recognition experiments (s-LRT) is also robust against the changes in the test protocol.
Our experimental results for both recognition and verification experiments have suggested
that the sparse representation of local features using LRT achieves significantly improved
performance than using those using baseline (log-Gabor and LRT) approaches considered in
this work.
Table 3: Average rank-one recognition rate from the experiments.
Protocol A Protocol B
125 221 UND 125 221 UND
NNG 96.8000 96.6817 91.5152 97.1602 96.9735 92.3788
s-LRT 97.0667 97.7376 91.5152 97.5659 97.8562 92.6097
LRT 94.4000 94.2685 84.8485 94.9290 94.4515 84.7575
Hessian 95.4667 93.8160 86.0606 95.9432 94.0731 85.9122
LG 95.7333 94.7210 87.8788 95.7404 94.8298 88.9145
5. Discussion
There has not been any effort to exploit sparse representation of local gray-level information,
or even the local Radon transform, for the ear identification problem while log-Gabor filter-
based approach has shown its superiority [37] over the Gabor phase, force field transform,
Gabor orientation and geometrical feature-based approaches on the publicly available ear
database. We therefore employed the log-Gabor filter-based phase encoding approach and
also the localized Radon transform based orientation encoding approach, as the baseline
approaches to evaluated its performance against the sparse representation based approach
developed in this paper. The experimental results from the verification and recognition
experiments have consistently suggested the superiority of sparse representation of orientation
features using LRT.
23
The goals for the verification and the recognition problems are different [51]. Therefore
any specific feature extraction or matching approach that performs better for the verification
problem may not necessarily be the best performing approach for the recognition problem.
Our experimental results (table 2) from the verification experiments also suggest that LRT
itself can achieve better performance than log-Gabor filter based approach while achieving
competing or similar performance (table 3) from the recognition experiments. In this context,
it may be noted that the LRT is significantly attractive because of its computational simplicity
[29]. Therefore in case of competing or similar performance from log-Gabor and LRT based
approach, LRT based approach will have high preference primarily due to its computational
simplicity.
The use of such LRT based dictionary brings two key advantages for the sparse
representation based ear identification. Firstly, it significantly reduces the computational
complexity and this advantage can be explained as in the following. The computational
complexity of spatial filters, e.g. Gabor filters, Derivative of Gaussian, etc., can be significant
as they require expensive pixel-based convolution operation. On the other hand the LRT
based feature extractor requires only simple summation operations which significantly reduce
the computational complexity during the feature extraction process. If we can reasonably
assume† that the complexity of addition and multiplication is equivalent, then we can state
that the LRT (with line width w) based sparse representation employed in this paper requires
2Rw times fewer operations (additions/multiplications) as compared to those approaches using
the spatial filters (with size ) in the sparse representation literature. In our experiments
we also noted that this reduction in complexity also comes with the superior performance.
† Exact computational complexity of multiplication is higher than that of addition.
24
Figure 9 in the following reproduces such sample results from our experiments for the
superior performance when second derivatives of Gaussian (s-DoG) dictionary is employed.
These results obtained using two versions of s-DoG which use initial size of 33×33 with (x,
y) pairs of (2.4, 6.0), (2.7, 6.75), (3.0, 7.5) and a scaled-down version of size 22×22 with (x,
y) pairs of (1.4, 4.0), (1.8, 4.5), (2.0, 5). The choice of 22 is informed by our successes in
utilizing the best from various window sizes. Experimental procedures and protocols are the
same as in Section 4. The experimental results suggested significantly improved performance
with the sparse representation using LRT as compared to those from s-DoG.
(a)
(b)
Figure 9: Comparative ROC‡ from using s-DoG filters using (a) protocol A and (b) protocol B.
The second key advantage of LRT based feature extraction strategy pursued in this
paper is that it significantly reduces the template size. This reduction results from the nature
‡ All the experiments in this figure use three enlargements simply to reduce/conserve computations.
25
of LRT operations which has associated down-sampling effect to reduce the dimension of
extracted features relative to the size of normalized ear images. For example if the size of
normalized ear images is M N pixels, the template size will be . Therefore the
template size reduces by (a factor of) w2 times relative to the size of normalized/original ear
image. It may be noted that in most cases the line-like features (representing piecewise linear
approximation of curved ear-lines) in the normalized ear images are more than one pixel,
which would guarantee significant computational savings for most of the time/cases. It may
be noted that the reduction in the template size not only reduces the storage requirements but
also improves the matching time, i.e., time required authentication or identification, as the
smaller templates can more efficiently generate the matching scores.
The sparse representation using NNG is expected to be better than s-LRT, but in our
experiments it did not always outperform s-LRT (their ROC curves are quite similar). It is
worth nothing that for real variables the l1-minimization problem ‖ ‖ subject to
is equivalent to the nonnegative version of the same problem ∑ subject to
[ ] ; here the optimal can be obtained by [ ] and vice versa
[11]. This means that FISTA is equivalent to NNG with a negative version of the dictionary
appended to the original dictionary (modulo sign changes). This may explain why the
performance using s-LRT and NNG based approach can be similar.
Despite elegant differential geometric formulation to exploit the least squared curvature
from ear images, the Hessian LRT based approach could not achieve competing performance
in our experiments. There can be some plausible explanation for this failure: first, the least
energy curve might not be a good model for ear shape lines; second, Hessians might not be
robust to the noise; and third, our proposal is equivalent to the trace transform with a
26
curvature functional, and it has been often conjectured that the trace transform often fail to
work [52] for anything other than the Radon functional. The sparse coefficients (equation 2)
are computed with Matlab implementations while the template matching scores (equation 3)
are generated/computed from the C++ implementation. We employed Farid’s [53] 7-tap
second-order differentiator (as implemented by Kovesi [20]) to compute the Hessians. Farid’s
differentiators are especially optimized for rotations so we do not even need the smoothing
(e.g. force field transform; [10]) for good results.
Figure 10: Comparative ROC from quaternionic quadrature filters using protocol A.
We have recently developed QuaternionicCode representation [41] for the ear recognition
using quadrature filters and achieved promising results. We have compared the approach
proposed in this paper with those using such quadarature filters using same protocols (all the
parameters and settings are the same as illustrated in [41]) and consistently achieved superior
results, both for the verification and recognition problems. Some sample results from the
comparative experiments are reproduced in figure 10. It can also be observed from figure 9
that the sparse representation based approach developed in this paper can achieve
significantly improved performance than those in [37].
27
The experimental results illustrated in section 4 consistently suggest superior
recognition results using the feature encoding based on the sparse representation of local pixel
orientations (NNG and s-LRT) as compared to those using conventional 1-D log-Gabor filter
based approach and is therefore recommended for the ear recognition and also for the
verification.
6. Conclusions
This paper has developed a new approach for the personal identification using 2D ear images.
This approach has been motivated by the recent advances in the sparse representation of
biometric features using the convex optimization approach. The ear-shape in sparse image
representations can be represented by large coefficients that locally present some oriented ear
shape structures. In order to efficiently compute these coefficients on prior structures, a Radon
transform based dictionary of oriented directions is constructed. Our experimental results
presented of two publicly available ear biometric image databases illustrate significant
improvement in the performance over the best or the baseline approaches (using log Gabor,
monogenic quadrature filter, QuaternionicCode or 2-D quadrature filter), both for the ear
verification and recognition problem. One of the key advantages of selecting the Radon
transform based dictionary for sparse representation is related to it’s the computational
simplicity as it just requires simple summation operations. Another favorable factor for using
the LRT based dictionary lies in the significantly reduced template size as the LRT operations
have associated down sampling effect which reduces the template size by a factor of w2 (w as
the line width of LRT mask) as compared to those using Gabor filters, derivative of Gaussian
(DoG) filters or other spatial filters.
28
The efforts detailed in this paper to exploit the sparse representation of local ear shape
descriptors have illustrated superior performance for the automated ear identification problem.
However, further work is required to ascertain the effectiveness of sparse representation for
the identification of partial ear images or for the ear images which are occluded due to
occlusion resulting from hair, reflection, etc.. The occlusion typically corrupts only fraction
of segmented ear region pixels and is therefore sparse in standard basis given by individual
pixels. When the error has such sparse representation, it can be handled uniformly, similar to
as in [49]. Therefore our further efforts will also extend the developed sparse representation
framework to help compensate for the commonly occurring occlusions in the segmented ear
images.
7. References
[1] A. Iannerelli, Ear Identification, Forensic Identification Series, Paramount Publishing Company,
Fremount, California, 1989.
[2] M. Burge and W. Burger, “Ear Biometrics,” Biometrics: Personal Identification in networked society,
A. K. Jain, R.Bolle, S. Pankanti (Eds), pp. 273-286, 1998.
[3] Arbab-Zavar and M. S. Nixon, “On guided model-based analysis of ear biometrics,” Computer Vis. &
Image Understanding, pp. 487-502, vol. 115, 2011.
[4] Y. Xu, D. Zhang, J.-Y. Yang, “A feature extraction method for use with bimodal biometrics,” Pattern
Recognition, pp. 1106-1115, vol. 43, 2010.
[5] L. Nanni and A. Lumini, “Fusion of color spaces for ear authentication,” Pattern Recognition, vol. 42,
pp. 1906-1913, 2009.
[6] K. Chang, B. Victor B., K.W. Bowyer, and S. Sarkar, “Comparison and combination of ear and face
images in appearance-based biometrics,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 8, pp.
1160-1165, 2003.
[7] D. J. Hurley, M. S. Nixon, J. N. Carter, “Force field energy functionals for ear biometrics,” Computer
Vision and Image Understanding, vol. 98, no. 3, pp. 491- 512, 2005.
[8] M. Burge, W. Burger, “Ear biometrics in machine vision,” Proc. 21 Workshop of the Australian
Association for Pattern Recognition, 1997.
29
[9] B. Victor, K.W. Bowyer, S. Sarkar, “An Evaluation of face and ear Biometrics,” Proc. Intl. Conf. on
Pattern Recognition, pp. 429-432, 2002.
[10] D. J. Hurley, M. S. Nixon, and J. N. Carter, “Force field energy functionals for image feature
extraction,” Image and Vision Computing, vol.20, no. 5-6, pp. 311-318, 2002.
[11] M. Fornasier, Numerical methods for sparse recovery, De GRUYTER, 2010
http://www.ricam.oeaw.ac.at/people/page/fornasier/FornasierLinz.pdf
[12] B. Moreno, A. Sanchez, J.F. Velez, “On the use of outer ear images for personal identification in
security applications,” Proc. 33rd Annual Intl. Carnahan Conf., Madrid, pp. 469-476, 1999.
[13] B. I. F. Gorodnitsky and B. D. Rao, “Sparse signal reconstruction from limited data using focus: a
reweighted minimum norm algorithm,” IEEE Trans. Signal Processing, vol. 45, pp. 600-616, Mar.
1997.
[14] A. F. Abate, M. Nappi, D. Riccio, and S. Ricciardi, “Ear recognition by means of a rotation invariant
descriptor,” Proc. ICPR 2006, Hong Kong, 2006.
[15] Z. Xiaoxun and J. Yunde, “Symmetrical null space for face and ear recognition,” Neurocomputing,
pp. 842-848, vol. 70, 2007.
[16] M. Adbel-Mottaleb and J. Zhou, “Human ear recognition from face profile images,” Proc. ICB 2006,
LNCS 3832, pp. 786-792, 2006.
[17] A. Kumar and D. Zhang, “Ear authentication using log-Gabor wavelets,” Proc. SPIE, vol. 6539, pp.
65390A, doi:10.1117/12.720244, 2007.
[18] J. D. Bustard and M. S. Nixon, “Towards unconstrained ear recognition from two dimensional
images,” IEEE Trans Sys. Man Cybern., Part C, vol. 40. no. 3, pp. 486-494, May 2010.
[19] P. Yan and K. W. Bowyer, “Biometric recognition using 3D ear shape,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 29, no. 8, pp. 1297-1308, 2007.
[20] P. Koveski, DERIVATIVE5 - 7-Tap 1st and 2nd discrete derivatives, 2010
http://www.csse.uwa.edu.au/~pk/research/matlabfns/Spatial/derivative7.m
[21] M. Choras, “Ear biometrics based on geometrical feature extraction,” Electronics Lett,. Comput. Vis.
& Image Analy., vol. 5, no. 3, pp. 84-95, 2005.
[22] E. Cands and J. Romberg, “l1-magic: Recovery of sparse signals via complex programming,”
Technical Report, California Institute of Technology, 2005.
http://users.ece.gatech.edu/~justin/l1magic/downloads/l1magic.pdf
[23] B. Bhanu, H. Chen, “Human ear recognition in 3D,” Proc. Multimodal User Authentication
workshop (MMUA), Santa Barbara, CA, pp. 91-98, 2003.
[24] H. Chen and B. Bhanu, “Human ear recognition in 3D”, IEEE Trans. Pattern Anal. Mach. Intell.,
vol. 29, no. 4, pp. 718-737, Apr. 2007.
30
[25] A. Beck and M. Teboulle, “A fast iterative shrinkage thresholding algorithm for linear inverse
problem,” SIAM Journal on Imaging Sciences, vol. 2, pp. 183–202, 2009.
[26] Z. Mu, L. Yuan, Z. Xu, D. Xi and S. Qi, “Shape and structural feature based ear recognition,”
Sinobiometrics 2004, LNCS 3338, pp. 663-670, 2004.
[27] L. Yuan, Z. Mu, and Z. Xu, “Using ear biometrics for personal recognition,” Proc. IWBRS 2005,
LNCS 3781, pp. 221-228, 2005.
[28] L. Breiman, “Better subset regression using nonnegative garrote,” Technometrics, vol. 37, pp. 373-
384, 1995.
[29] A. Kumar, and Y. Zhou, “Human identification using knucklecodes,” Proc. IEEE 3rd International
Conference on Biometrics: Theory, Applications, and Systems, BTAS '09, Washington D.C., pp. 1-6,
2009.
[30] NIST Report, Summary of NIST standards for biometric accuracy, temper resistance and
interoperability, 2002. http://biometrics.nist.gov/cs_links/pact/NISTAPP_Nov02.pdf
[31] X. Liu, S. Yan, and H. Jin, “Projective nonnegative graph embedding,” IEEE Trans. Image
Processing, vol. 9, no. 5, pp. 1126-1137, May 2010.
[32] T. Theoharis, G. Passalis, G. Toderici, and I. Kakadiaris, “Unified 3d face and ear recognition using
wavelet on geometry images,” Pattern Recognition, vol. 41, pp. 796-804, 2008.
[33] IIT Delhi Ear Database Version 1, http://web.iitd.ac.in/~biometrics/Database_Ear.htm
[34] D. R. Kisku, H. Mehrotra, P. Gupta, and J. K. Sing, “SIFT-based ear recognition by fusion of
detected keypoints from color similarity slice regions,” CoRR abs/1003.5861, 2010.
[35] E. Mordini and D. Tzovaras (Eds), Second Generation Biometrics: The Ethical Legal and Social
Context, The International Library of Ethics and Technology 11, Springer, 2012, DOI 10.1007/978-
94-007-3892-8_3.
[36] L. Lu, X. Zhang, Y. Zhao, Y. Jia, “Ear recognition based on statistical shape model,” Proc. First Intl.
Conf. Innovative Computing, Information and control, ICICIC’06, pp. 1-4, 2006.
[37] A. Kumar and Chenye Wu, “Automated human identification using ear imaging,” Pattern
Recognition, vol. 45, pp. 956-968, Mar. 2012.
[38] W. Jia, D.-S. Huang, and D. Zhang, “Palmprint verification based on robust line orientation code,”
Pattern Recognition, vol. 41, no. 5, pp. 1504-1513, May, 2008.
[39] H. Zhang, Z. Mu, W. Qu, L. Liu, and C. Zhang, “A novel approach for ear recognition based on ICA
and RBF network,” Proc. 4th Intl. Conf. Machine Learning & Cybern., pp. 4511-4515, 2005.
[40] Y. Wang, Z. Mu, and H. Zheng, “Block-based and multiresolution methods for ear recognition using
wavelet transform and uniform local binary pattern,” Proc. ICPR’08, ICPR, 2008.
31
[41] T.-S. Chan and A. Kumar, “Reliable ear identification using 2D quadrature filters,” Pattern Recognition
Letters, 2012.
[42] M. I. S. Ibrahim, M. S. Nixon, and S. Mahmoodi, “The effect of time on ear biometrics,” Proc. IJCB
2011, Washington DC, Oct. 2011.
[43] L. Nanni and A. Lumini, “A multi-matcher model for ear recognition,” Pattern Recognition, vol. 28,
pp. 2219-2216, Dec. 2007.
[44] K. Singh, “Altered fingerprints,” Interpol Report, INTERPOL. France, 2008,
http://www.interpol.int/Public/Forensic/fingerprints/research/alteredfingerprints.pdf.
[45] Hong Kong airport security fooled by these hyper-real silicon masks. http://www.cnngo.com/hong-
kong/visit/hong-kong-airport-security-fooled-these-hyper-real-silicon-masks-743923 Accessed 27
Jan 2012.
[46] Fake biometric eye stamps: Three arrested at Dubai international airport,
http://innovya.com/2011/09/fake-biometric-eye-stamps-three-arrested-at-dia Accessed 28 January
2012
[47] Y. Zhou and A. Kumar, “Human identification using palm-vein images,” IEEE Trans. Info.
Forensics & Security, Dec. 2011.
[48] D. Xu, Y. Huang, Z. Zeng, and X. Xu, “Human gait recognition using patch distribution feature and
locality-constrained group sparse representation,” IEEE Trans. Image Processing, vol. 21, pp.316-
326, Jan. 2012.
[49] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse
representation,” IEEE Trans. Pattern Anal. Mach. Intell., pp. 210-227, Feb. 2009.
[50] W. Zuo, Z. Lin, Z. Guo, and D. Zhang, “The multiscale competitive code via sparse representation
for palmprint verification,” Proc. CVPR 2010, pp. 2265-2272, Jun. 2010.
[51] R. M. Bolle, J. H. Connell, S. Pankanti, N. K. Ratha, and A. Senior, “The relation between ROC
curve and CMC, Proc. 4th IEEE Workshop on Automatic Identification Advanced Technologies, pp.
15-20, 2005.
[52] A. Kumar, Ch. Srikanth, “Online personal identification in night using multiple representation,”
Proc. ICPR 2008, pp. 1-4, Tampa, Florida, Dec. 2008
[53] H. Farid and E. Simoncelli, “Differentiation of discrete multi-dimensional signals,” IEEE Trans.
Image Processing, vol. 13, pp 496-508, Apr. 2004.
[54] J. A. Heathcote, “Why do old man have big ears?” BMJ vol. 311, pp. 1668, Dec. 1995.
[55] E. van den Berg and M. P. Friedlander, SPGL1: A solver for large-scale sparse reconstruction,
http://www.cs.ubc.ca/labs/scl/spgl1/, 2010
32
[56] Michael P. D'Alessandro, A digital library of anatomy information
http://www.anatomyatlases.org/firstaid/Otoscopy.shtml, 2012
[57] A. A. C. M. Kalkar and A. H. M. Akkermans, “Feature extraction algorithm for automatic ear
recognition,” US Patent No. 20080013794, Jan. 2008.
[58] M. W. J. Coughlan, A. Q. Forbes, C. Gannon, P. R. Michaelis, P. D. Runcle, and R. Warta, “Method
and apparatus for controlling access and presence information using ear biometrics,” US Patent No.
20090061819, Mar. 2009.
[59] A. M. Bouchard and G. C. Osbourn, “Systems and methods for biometric identification using the
acoustic properties of the ear canal,” US Patent No. 5,787,187, 1998.
[60] Z. J. Geng, “Three dimensional ear biometrics system and method,” U. S Patent No. 7826643, Nov.
2010.
[61] L. Meijerman, C. v. d. Lugt, and G. J. R. Matt, “Cross-sectional anthropometric study of the external
ear,” J. Forensic Sci., vol. 52, no. 2, pp. 286-293, Mar. 2008.
[62] C. Sforza, G. Grandi, M. Binelli, D. G. Tommasi, R. Rosati, and V. F. Ferrario, “Age- and Sex-
related changes in the normal human ear,” Forensic Science Intl., 187, 110e1-110e7, 2009.
[63] Y. Asai, M. Yoshimura, N. Nago, and T. Yamada, “Correlation of ear length with the age in Japan,
BMJ, vil. 312, pp. 582, Mar. 1996.
[64] M. D. Marsico, M. Nappi, and R. Daniel, “HERO: Human Ear Recognition against Occlusions,”
Proc. CVPR 2010, pp. 178-183, CVPR'W 2010, San Francisco, June 2010.
[65] Role of biometric technology in Aadhaar authentication, Authentication Accuracy Report, UIDAI,
27th March 2012.
http://uidai.gov.in/images/role_of_biometric_technology_in_aadhaar_authentication_020412.pdf
top related