Changeable and Privacy Preserving Face Recognition by Yongjin Wang A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of The Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto c Copyright by Yongjin Wang, 2010
159
Embed
Changeable and Privacy Preserving Face Recognition
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Changeable and Privacy Preserving
Face Recognition
by
Yongjin Wang
A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy
Graduate Department of The Edward S. Rogers Sr. Department of Electrical andComputer EngineeringUniversity of Toronto
This dissertation systematically studies random transformation based approaches for
changeable and privacy preserving face recognition. A set of contributions that have
been made are summarized as follows:
• We introduce a random projection based method for changeable and privacy pre-
serving face verification. Detailed mathematical analysis on the distance preserving
properties of random projection using i.i.d. Gaussian entries are presented. Our
analysis provides a method for computing the probability of preserving the distance
when projected onto an arbitrary lower dimension, and demonstrates lower projec-
tion bound than those presented in the literature. A geometric based approach is
presented to analyze the changeability of the proposed method, and a vector trans-
lation method is presented for achieving strong changeability of the generated bio-
metric template. Two application scenarios, user-independent and user-dependent
random projections are discussed. In both cases, the biometric template can be
regenerated by simply varying the random projection matrix generation key. We
analyze the privacy preserving property by studying how efficiently an attacker
can reconstruct the original signal, even in the worst case that the template and
the projection matrix are both known. Our analysis shows that there is a tradeoff
between accuracy and privacy, and it is possible to provide privacy protection with
slightly degraded performance. Furthermore, unlike other dimensionality reduction
tools which usually require the collection of a large number of images for training,
the proposed method uses random projection as both dimensionality reduction and
privacy preserving tools. It is data-independent and easy to implement.
• We propose a novel approach for face recognition based on biometric features in
the continuous domain. Instead of using the original features as templates and
for biometric matching, the proposed method only stores a set of sorted index
Chapter 1. Introduction 14
numbers (SIN), which is obtained by sorting the original features, and storing the
corresponding indices. A new algorithm is introduced to measure the similarity
between SIN vectors. We analyze the privacy preserving property of the proposed
method, and introduce two privacy measures to evaluate the level of privacy pro-
tection for both the individual attributes and global characteristics of the feature
vectors. Our experimental results suggest that the proposed method may improve
the recognition accuracy.
• We further present the application of the SIN method in conjunction with three
types of random transformations, namely random additive transform, random mul-
tiplicative transform, and random projection, for achieving both changeability and
privacy protection. The statistical properties of each random transformation in
both same key and different key scenarios are analyzed to provide insight into how
strong changeability can be obtained. The element and vector level privacy pro-
tecting characteristics of each random transformation in combination with SIN are
analyzed and demonstrated. Our analysis and experimental results demonstrate
that the proposed methods are capable of producing privacy preserving biomet-
ric templates with strong changeability, while maintaining and even improving the
recognition accuracy of the original features.
1.4 Organization
The organization of this dissertation is as follows:
Chapter 1 presents the general background of biometrics, changeability and privacy
issues, the contribution of this research, and the organization of this dissertation.
Chapter 2 provides a detailed review of the related works. This includes both the
biometrics based cryptographic systems and the repeatable and non-invertible transfor-
mation based cancelable biometrics methods.
Chapter 1. Introduction 15
Chapter 3 presents the proposed random projection based method for changeable
and privacy preserving face verification. The distance preserving property of random pro-
jection, and the changeability and privacy analysis are presented in detail. Experimental
evaluation is performed on a complex generic data set and the results are presented.
Chapter 4 introduces the proposed sorted index number (SIN) approach for face
recognition. The rationale of the proposed SIN method as well as the privacy protecting
property are discussed. The experimental results on both identification and verification
scenarios are reported.
Chapter 5 introduces a framework for the integration of random transformations
with the SIN approach, to achieve reissuable and privacy protecting biometric template
generation. The changeability and privacy preserving properties of three types of random
transformations are analyzed and discussed. The effectiveness of the proposed methods
are demonstrated through extensive experimentation.
Chapter 6 summarizes the work presented in this dissertation and outlines the di-
rections for future research.
The technical contents of Chapters 3, 4, and 5 have been submitted or appeared in
the following referred journal and conference publications:
• Y. Wang, D. Hatzinakos, On random transformations for changeable face verifica-
tion, submitted to IEEE Transactions on Systems, Man and Cybernetics, Part B,
January 2010.
• Y. Wang, D. Hatzinakos, Sorted index numbers for privacy preserving face recog-
nition, EURASIP Journal on Advances in Signal Processing, vol. 2009, Article ID
260148, 16 pages, 2009. doi: 10.1155/2009/260148.
• Y. Wang, K. N. Plataniotis, An analysis of random projection for changeable and
privacy preserving biometric verification, IEEE Transactions on Systems, Man and
Cybernetics, Part B, In Press. Reprint permission granted by IEEE.
Chapter 1. Introduction 16
• Y. Wang, D. Hatzinakos, Cancelable face recognition using random multiplicative
transform, accepted by the 20th International Conference on Pattern Recognition
(ICPR), Istanbul, Turkey, August 23-26, 2010.
• Y. Wang, D. Hatzinakos, Random translational transformation for changeable face
verification, Proceedings of IEEE 16th International Conference on Digital Signal
Processing (DSP), Page(s): 1-6, Santorini, Greece, July 5-7, 2009.
• Y. Wang, D. Hatzinakos, Face verification with changeable templates, Proceed-
ings of IEEE 22nd Canadian Conference on Electrical and Computer Engineering
(CCECE), pp. 31-36, St. John’s, Newfoundland and Labrador, Canada, May 3-6,
2009.
• Y. Wang, D. Hatzinakos, Face recognition with enhanced privacy protection, Pro-
ceedings of IEEE International Conference on Acoustics, Speech, and Signal Pro-
cessing (ICASSP), pp. 885-888, Taipei, Taiwan, April 19-24, 2009.
• Y. Wang, K. N. Plataniotis, Face based biometric authentication with changeable
and privacy preservable templates, Proceedings of IEEE Biometrics Symposium
(BSYM), Page(s): 1-6, Baltimore, USA, September 11-13, 2007.
Chapter 2
Literature Review
The advances in biometric technology have significantly improved the recognition perfor-
mance of various biometric modalities over the past two decades. To support the potential
deployment of biometrics in a broad spectrum of government and civilian applications,
it is important to offer changeability and privacy protection to the biometric templates.
Secure biometric recognition has drawn extensive attention in the research community
in the past a few years, with a large number of solutions being proposed on various bio-
metric traits. The design of a privacy preserving biometric system is critically dependent
on the characteristics of the biometric data and features. Existing works on privacy
preserving biometric recognition can be roughly grouped into two categories, namely
biometric crypto-systems which combine biometrics with cryptographic technology, and
cancelable biometrics that employ repeatable and non-invertible transformations. This
chapter provides a review of the related works.
2.1 Biometric Crypto-system
Cryptography is an important technique in information security and related applications,
particularly in encryption, authentication, and access control. Different cryptographic
algorithms have been introduced in the past. In general, data are secured using symmetric
17
Chapter 2. Literature Review 18
cipher systems, while public-key systems are used for digital signatures and key exchange
among users. The level of security relies on the secrecy of the secret for symmetric
systems, and private key for public-key systems. Due to the large size of the cryptographic
key, a short password is usually used to encrypt the cryptographic key. The users only
need to remember the short password to retrieve their cryptographic key. When following
this strategy, the cryptographic key has the same level of security as that of a password,
which can be forgotten or stolen.
Biometrics and cryptography are two complementary security technologies. Biomet-
rics utilize the unique characteristics of an individual and hence provide true user authen-
tication. By combining them, high level security can be expected. The major problem in
the combination of biometrics with cryptographic systems is primarily due to the drastic
variation of biometric representation and the imperfect nature of biometric feature extrac-
tion and matching algorithms. Unlike password-based cryptographic system where exact
key generation and matching can be obtained, the biometric information of the same
person presented at different time and location may suffer significant variation. Since it
is difficult to produce exactly the same representation, the matching of a biometric sys-
tem is usually fuzzy. Among the pioneering works of biometrics based cryptographic key
generation, Bodo [21] first proposed to use the data derived from the biometric template
as the cryptographic key in a German patent. Obviously, the noisy nature of biometrics
makes it a questionable choice when used as a cryptographic key directly. Also, in this
case, if the key is ever compromised, then this biometric is irrevocably lost.
A biometric crypto-system is essentially a process of secure binding of a cryptographic
key with biometric data, and retrieval of the key based on new a biometric presentation
and the stored template. Figure 2.1 shows a general block diagram of a biometric crypto-
system. Due to the binary nature of cryptographic keys, the biometric information is
usually required to be represented in discrete domain. During enrollment, a discrete
feature vector is extracted from the original biometric data, and a randomly generated
Chapter 2. Literature Review 19
key is combined with the biometric features through a binding algorithm. The resulting
representation is stored as a template, and the biometric features and the key are both
discarded. Binding should be performed in a secure way such that neither the key nor the
biometric information can be retrieved, even when the stored template is compromised.
During authentication, the original cryptographic key can be retrieved, if the presented
biometric data is sufficiently similar with the enrolled ones. The system will either
generate a key that can be used to decrypt a certain application, or a yes/no decision if
the hash value of the original key is also stored and an exact match of the hash values of
the retrieved key and original key is successful. The major challenge in the design of the
binding and retrieval algorithms is to bridge the gap between the fuzziness of biometrics
and the exactitude of cryptography. The security level of a biometric crypto-system can
be evaluated by the entropy of the generated key, which measures the cryptographic
strength of the key and usually in unit of bit. For example, for a cryptographic key of
length N , if all the bits in the key are random and independent to each other, then the
key has an entropy of N bits. The larger the entropy, the greater the uncertainty of the
key, and the higher the security level. For symmetric cipher systems, a key length of 90
bits is generally considered the minimum for strong security, and the most commonly
used are 128-bit keys [22].
Among the earliest efforts, Soutar et al. [23] introduced a fingerprint based system,
the Bioscrypt, which is also the first that has been commercialized into a product. The
proposed biometric encryption technique extracts phase information from the fingerprint
images using Fourier Transform. The extracted features are combined with a randomly
generated phase array to create two output arrays, a filter and a correlation function.
The filter function is stored in the Bioscrypt, while the correlation function is used to
link with a predefined randomly generated key, and then create an identification code.
During verification, a new image is combined with the filter function in the Bioscrypt
to produce a new output pattern, which is used to retrieve the key and compute the
Chapter 2. Literature Review 20
Figure 2.1: General block diagram of a biometric crypto-system.
identification code. The newly generated identification code is compared with the one
stored in the Bioscrypt. This paper assumes a constrained image acquisition system, and
all the fingerprint images are completely aligned. No performance results are reported
in this work. However, Adler [16] shows that the biometric encryption technique is
vulnerable to a hill-climbing attack, where an estimate of the enrolled image can be
obtained to decrypt the code.
Monrose et al. [24] proposed to enhance the security of password by combining
keystroke biometrics. Keystroke duration and latency features are extracted and each
feature is discretized into a single bit. A short string is formed by concatenating the
bits. The short bit string is used in conjunction with a randomized number r to gener-
ate a lookup table via Shamir secret sharing [25]. The lookup table essentially contains
instructions on how to generate a hardened password using the keystroke biometric, the
password, and the random number r. The produced hardened password is then used
Chapter 2. Literature Review 21
to encrypt a history file. During authentication, it uses the random number r, the look
up table, the newly presented keystroke biometric, and the authentication password to
compute a hardened password. This newly generated hardened password is then used to
decrypt the history file. If the decryption is successful, then the authentication is success-
ful. However, their system can only provide around 12 bits of entropy to the password
and approximately 51.6% success rate for legitimate logins. Later, they apply a similar
scheme on voice based cryptographic key generation [26], which is more distinctive than
the keystroke biometric. The biometric key entropy can be increased to 46 bits, and FRR
can be decreased to below 20%.
Davida et al. [27] proposed an iris based cryptographic signature verification system,
which carries a storage device with user-specific error correction parameters stored. These
parameters are used to decode and rectify the offset of biometric data, and an one way
hash is used for verification. Furthermore, they proposed a scheme to hash the biometric
with the user’s password if the desired entropy can not be provided. Their method
provides rigorous resolution of biometric uncertainty through Hamming error correction
codes, and the recovery of iris data is protected by complexity theory. However, the stored
error correction parameters, if compromised, can be used to reveal the information of an
user’s biometric. Also, no experimental results are reported.
Juels and Wattenberg [28] introduced a fuzzy commitment scheme, which generalized
and improved Davida’s method using error correction algorithms. During enrolment, a
secret message c is selected randomly from a set of vectors of error correction codes C,
which can correct up to t errors, and combined with biometric features x by computing
a difference vector e = x − c. For example, if the biometric features are binary, then
an XOR operation can be applied to bind the key and the features. A hashed version
of c, denoted as h(c), and the vector e are stored as templates. During authentication,
a newly presented biometric sample with features y can be used to retrieve the original
key c by computing c’ = e + y. If c’ and c are close to each other up to a Hamming
Chapter 2. Literature Review 22
distance t, then the error correction capability of C makes it possible to reconstruct c
such that h(c’) = h(c).
Feng et al. [29] presented an iris based system which stores a string of error correction
data. A two-layer error correction technique that involves a combination of Hadamard
and Reed-Solomon codes is devised to cope with the error patterns in iris codes. The key
is generated from an user’s iris image with the aid of auxiliary error-correction data, and
can be stored in a token. The reproduction of the key will be dependent on the biometric
and token. They further show that their system can be easily extended to incorporate
other factors such as passwords. Their paper claims that up to 140 bits of biometric key
can be generated and the FRR can be improved to below 0.5%.
Draper et al. [30] introduced a fingerprint based system using distributed source
coding. A statistical model is developed to model the movement, deletions and insertions
of minutiae points. The extracted enrolment biometric information x is then compressed
and scrambled to produce a ”syndrome” s using a graphical code. During authentication,
a newly presented probe biometric representation y is used to estimate x from s. The
proposed method stores s, a cryptographic hash of x and the joint distribution of x, s, and
y. The authentication will be successful if the hash of the estimated x matches exactly
with the stored hash value of the original enrolment biometric. The security property of
this method is analyzed using information theory and random codes [31]. Experimental
results on a fingerprint data set show that the employed low-density parity-check (LDPC)
code is not strong enough to obtain information theoretic security yet.
Juels and Sudan [32] proposed a fuzzy vault scheme that enables unordered biometric
representation. The hardness of their scheme is based on the polynomial reconstruction
problem. During enrollment, a user selects a polynomial p(x) and encodes his crypto-
graphic key c into the polynomial’s coefficients, where the encoding can be achieved by
dividing c into non-overlapping chunks and mapping to the coefficients. The polynomial
p(x) can then be evaluated at each value of x and all the pairs of {x, p(x)} are stored as
Chapter 2. Literature Review 23
the genuine set G. The user then generates a random set of pairs Q which are merged
with the G set to generate the final vault. Within the final vault, it is not known whether
the points belong to the G or the Q set. At verification, only when the biometric rep-
resentation of the authenticator has substantial overlap with the enrolled user, will the
pairs lying on the polynomial be identified and the key be reconstructed. This scheme is
expected to tolerate more variation in the biometric representation.
Clancy et al. [33] first applied the fuzzy vault scheme in a fingerprint application.
They used a bounded nearest neighbor algorithm to find canonical minutiae positions
from a set of five fingerprints of a user. The fingerprint images in the training set are used
to estimate the variance of features. To build the vault, the maximal number of chaff
points are added with a preset minimal distance to the features. Their implementation
assumes pre-aligned fingerprints, and a 69-bit biometric key is derived with 30% FRR.
To address the alignment problem, Yang et al. proposed an automatic feature extraction
method [34, 35]. Their system achieves 83% successful unlocking rate. They also show
that the chaff points must have a minimal distance to the lock set points that is at least
twice as large as the acceptable distance of a minutiae position between different scans.
Uludag et al. [36] reported another implementation of fuzzy vault using fingerprints.
A 128-bit secret c in combination with the 16-bit cyclic redundancy check bits are divided
into non-overlapping 16-bit segments and mapped to a real number as the coefficients
of the constructed polynomial. Manually marked coordinates of minutiae points are
used as the biometric features. To tolerate slight variation, a simple quantization of
the minutiae data is performed. The fuzzy vault is formed by evaluating the minutiae
data on the polynomial, and adding chaff point pairs that do not lie on the polynomial.
During decoding, the probing minutiae data are compared with the vault, and the k exact
matches are identified. All possible combinations of m+1 points from the k points are
experimented to reconstruct the m degree polynomial. Therefore, if the query features
overlap with the template features in at least m+1 points, for some combination, the
Chapter 2. Literature Review 24
secret can be decoded. Their system achieves 21% FRR and zero FAR on a database of
100 fingerprint image pairs. One disadvantage of their approach is the high computational
complexity due to the evaluation of multiple point combinations during decoding. In a
later paper [37], they extended the fingerprint fuzzy vault scheme in combination with
”helper” data to provide information for minutiae alignment.
Following the similar scheme as in [36], a handwritten signature based cryptographic
key generation is presented in [38], and a face based method is introduced in [39]. Lee et
al. [40] presented a fuzzy vault based private key generation system using iris features.
To produce an unordered set of features for vault encoding and decoding, multiple iris
features are extracted from several local iris patches, and the exact values of the set are
generated through the k-means clustering method. Nandakumar and Jain introduced
a fuzzy vault based multi-biometric scheme using fingerprint and iris [41]. The fuzzy
vault scheme has also been implemented in a multiple people secret sharing problem [42].
However, although the fuzzy vault scheme is shown to be secure in an information-
theoretic sense, it is generally computationally complex, and also vulnerable to cross
matching attacks when multiple templates of the same person are compromised [17].
Dodis et al. [43] presented a theoretical work for generating keys from noisy data,
where error correction codes are applied to the input followed by a hash function. In
their paper, they proposed two primitives, termed secure sketch and fuzzy extractor re-
spectively. The secure sketch only addresses the problem of error tolerance, while the
fuzzy extractor addresses both error tolerance and the nonuniformity of the input. They
showed that the fuzzy extractor can be constructed from the secure sketch using a ran-
domness extractor. Different constructions for three distance metrics, Hamming distance,
set difference, and edit distance are introduced. They also suggested a modification of the
fuzzy vault scheme by using a higher order polynomial to replace the chaff points. How-
ever, Boyen [44] showed that the proposed fuzzy extractor is not secure for the multiple
uses of the same secret.
Chapter 2. Literature Review 25
The secure sketch scheme assumes discrete representation of biometric information.
Li et al. [45] extended the sketch to the continuous domain by first quantizing the contin-
uous features, followed by a known sketch in the discrete domain. They also introduced
the usage of relative entropy loss to measure the quality of a given quantization strat-
egy. Sutcu et al. [46] presented an implementation of [45] on a face verification problem.
Singular Value Decomposition (SVD) is first performed on the face images to extract
features. The resulting features are then randomized through a user-specific random
mapping, and a scalar quantizer is used to map the coefficients to discrete values. Dur-
ing training, the midpoint and its range of each dimension is recorded. The sketch is
constructed by simply computing the difference between the midpoint with the closest
codeword. At authentication, feature extraction and random mapping is performed on
the probe image. The decoder then computes the difference between the closest codeword
of the probe and the stored sketch. The original biometric can be reconstructed if the dif-
ference is smaller than a predefined user-specific range value. Their experimental results
show that it is possible to produce better recognition accuracy compared to the original
features. However, it is not clear why user-specific random mapping is employed, and
what the performance would be if the same random mapping is applied to all the users.
A similar scheme has also been applied in a multi-modality scenario using fingerprint and
face [47], where feature fusion is performed by a simple concatenation.
Hao and Chan [48] proposed a handwritten signature based system for key genera-
tion. Dynamic information such as velocity, pressure, altitude, and azimuth are extracted,
quantized, and encoded into bits. A binary string is generated by concatenating the fea-
ture derived bits. The private key is generated by a standard digital signature algorithm
proposed by NIST. Their system achieves around 40 bits entropy with 28% FRR and
1.2% FAR.
A helper data system was introduced in [49,50], where the enrollment biometric data
x and a selected matrix V are used to generate helper data W together with a key
Chapter 2. Literature Review 26
c through an encoding function. In each dimension, x is quantized at a step size of
q, and the helper data W is obtained by adding a small value to c depending on the
corresponding c value. The helper data and a hashed version of c, h(c), is stored in the
database. During authentication, the input biometric data y is combined with the help
data W to reconstruct a key c’, and h(c’) is then compared with h(c). It is assumed that
the variation in each dimension is relatively small compared to the quantization size q.
In [51], the above mentioned scheme is applied in an acoustic ear identification problem,
and a FRR of 3.89% can be achieved with a 100-bit key length.
Kevenaar et al. [52] proposed a method to produce a binary feature vector from face
images. A set of six key objects are first identified from the human face as fiducial points,
and Gabor filters are applied to extract texture features x from a small patch centered
around every fiducial point. The mean vector of a user is compared against the mean
vector u of all the users to determine a binary feature vector q. A reliability measure is
then defined based on the normal distribution assumption of the features and only those
with high reliability are selected, denoted as b, and the indices are arranged in a vector
W1. Their experiments demonstrate that the performance of the binary feature vectors
only degrades slightly compared to the original features. To produce a renewable and
privacy reserving template, a binary string c is generated randomly, and encrypted using
error correction codes. The binary vector b and the error correction code encrypted
random string e are combined through an XOR operation to produce help data W2.
The mean vector u, index vector W1, helper data W2, and an one-way cryptographic
hash of c, h(c) are stored. During verification, a binary vector b’ is generated using the
authenticator’s features, u, and W1, and b’ are then combined with W2 through XOR
operation to produce e’. Finally, c’ is obtained by decoding e’ and h(c’) is compared
with h(c). However, their approach results in an unacceptable FRR of 35% and can only
tolerate small within class variation. Furthermore, the performance of their system is
critically dependent on the accurate localization of the fiducial facial points.
Chapter 2. Literature Review 27
2.2 Cancelable Biometrics
To deal with the non-revocable and privacy problems in biometric systems, Ratha et
al. [15] introduced the concept of Cancelable Biometrics, which is defined as an intentional
and repeatable distortion of a biometric signal, through a chosen transform. Figure 2.2
depicts the general block diagram of a cancelable biometric system. With this approach,
every instance of enrollment can use a different transform. If one variant of the biometric
template is compromised, then a new variant can be created by simply changing the
transform control, e.g., a seed or key associated with a random number generator. In
general, the repeatable transform should be selected to be non-invertible such that even if
the exact transform and the resulting transformed biometrics are both known, the original
biometrics can not be recovered. The distortion can be implemented either in the signal
domain, where a morphed version of the biometric signal is enrolled, or in the feature
domain, where the distortion is performed on the processed biometric signal. They also
proposed to transform the face images using a morphing method [15,20]. However, their
method requires an alignment before the transformation. Moreover, the face image may
be revealed by an adversary if the morphing function is invertible.
Following the line of cancelable biometrics, Ratha et al. [53] introduced a framework of
generating cancelable fingerprint templates. A few different methods including Cartesian,
polar and surface folding transformations of the minutiae positions are discussed analyt-
ically and empirically. Their paper demonstrates the revocability and non-invertibility
of the proposed transformations, and anticipates that the feature level cancelable bio-
metric construction can be applied in large biometric deployments. However, this work
focuses on fingerprints whose features are usually a set of unordered minutia positions,
and the number of which varies. It is not clear how such methods can be applied to other
biometrics such as face and iris, whose features are usually of fixed length and order.
Jeong et al. [54] proposed a method to scramble and add the normalized principal
component analysis (PCA) and independent component analysis (ICA) coefficient vec-
Chapter 2. Literature Review 28
Figure 2.2: General block diagram of a cancelable biometric system.
tors that are extracted from face images together to produce a new feature vector as a
template. Since the transformed template is generated by the addition of two vectors, the
original coefficients can not be recovered. When a template is compromised, a new scram-
ble rule may be applied to generate a new template. Although their experiments show
that the performance does not degrade significantly compared to the original features,
no further analysis is given in the case where the scramble key is stolen.
Savvides et al. [55] proposed an approach for cancelable biometric authentication in
the encrypted domain. The training face images are first convolved with a random kernel,
and the transformed images are used to synthesize a single minimum average correlation
energy filter. At the point of verification, a query face image is convolved with the
same random kernel, and then correlated with the stored filter to check similarity. If
the storage card is ever attacked, a new random kernel may be applied. They show that
the performance is not effected by the random kernel. However, it is not clear how the
system preserves privacy if the random kernel is obtained by an adversary. The original
biometrics may be retrieved through de-convolution if the random kernel is known.
Boult [56] introduced a method for face based revocable biometrics based on robust
Chapter 2. Literature Review 29
distance measures. In this scheme, the face features are first transformed through scal-
ing and translation, and the resulting values are partitioned into the integer and the
fractional part. The integer part is encrypted using Public-Key algorithms, while the
fractional part is retained for local approximation. A user-specific pass-code is included
to address the revocation problem. In a subsequent paper [57], a similar scheme is applied
on a fingerprint problem, and a detailed security analysis is provided. Their methods
demonstrate both improvement in accuracy and privacy. However, it is assumed that the
private key can not be obtained by an imposter. In the case of known private key and
transformation parameters, the biometrics features can be successfully recovered.
Lee et al. [58] introduced a two-factor method for generating cancelable fingerprint
templates using local minutiae information. A feature vector that contains the orientation
information of the neighboring area of each minutia point is first extracted. A rotation
and translation invariant value is then estimated by computing the inner product between
a user-specific random vector and the feature vector of each minutiae points. The result-
ing invariant value for each minutiae point is used as input to two user-specific changing
functions, and the transformed features are stored as templates for authentication. The
biometric templates can be changed by simply replacing the changing functions, which
are associated with a PIN number for random number generators. However, it is shown
that the proposed method has a tradeoff between performance and changeability, i.e.,
the weaker the changeability, the higher the accuracy, and vice versa. The reported ex-
perimental results are also based on the ideal case, i.e., the user-specific PIN number is
always legitimate. No detailed analysis about the stolen PIN scenario is provided.
Teoh et al. [59] introduced a two factor authenticator, BioHashing, for cancelable
human identity recognition based on biometrics and a tokenized random number. The
base BioHashing method is composed of two steps. In the first step, a feature vector x ∈RN is extracted from the biometric characteristic. The second step involves discretization,
where the extracted feature vector is reduced down to a bit vector b ∈ {0, 1}M , with M
Chapter 2. Literature Review 30
the length of the bit string, M ≤ N , by using the pseudo-random numbers generated by
the given Hash key. The procedure of creating the BioHash code b is as follows:
1. Use Hash key k to generate a set of pseudo random vectors ri ∈ RN , i=1,. . . , M.
2. Apply the Gram-Schmidt orthonormalization method to transform the basis ri into
an orthonormal set of vectors ori, i = 1, . . . , M .
3. Compute the inner product between the biometric feature vector x and ori, i =
1, . . . ,M, denoted as 〈x|ori〉.
4. Compute the M bits BioHash code bi, i = 1, . . . ,M, according to:
bi =
0, if 〈x|ori〉 ≤ τ
1, if 〈x|ori〉 > τ(2.1)
where τ is a preset threshold.
By using this method, a unique compact code is generated for each individual. The
Hash key is given to the user during the enrollment, and is different among different
users and different applications. The generated BioHash codes are compared for sim-
ilarity matching using the Hamming distance. An individual needs to have both the
correct biometrics and Hash key to pass the authentication. The BioHashing technique
is applied to a fingerprint problem, and it is claimed that zero equal error rate (EER)
can be achieved, i.e., reducing the FAR without increasing the FRR. Later, the same
method is applied to other biometric systems, such as palmprint [60], and face [61,62], in
combination with different feature extraction and matching techniques. In [63], a similar
procedure with a different thresholding method was introduced for better error tolerance.
In this work, instead of an one step thresholding for bit extraction, the bit is determined
immediately when the inner product 〈x|ori〉 is greater than u + σ or smaller than u− σ,
where u and σ are experimental parameters. When < x|ori >∈ [u− σ, u + σ], the inner
product is recomputed as < x|orj >, where orj is an unused orthonormal random vector.
Chapter 2. Literature Review 31
The base BioHashing method is based on the assumption that the Hash key will
not be stolen. The main drawback of the method is the low performance when an
imposter B steals the Hash key of A and attempts to be authenticated as A. When this
problem occurs, the performance of BioHashing can be lower than using the biometric
data only [64]. Kong et al. [65] investigate the applicability of the BioHashing method on a
face recognition problem and conclude that the claim of having achieved near zero EER is
based on an unpractical assumption that the Hash key can not be stolen. They also point
out that if the assumption holds, there would be no need for biometrics to be combined
with the user-specific random numbers, since the latter can itself serve as a password.
Their experiments show that if an imposter has the Hash key, the BioHashing method is
even worse than the biometric method alone. However, no solutions are suggested.
To improve the performance of BioHashing in the case of stolen key, a multi-matcher
fusion method was proposed in [66]. The biometric signature is divided into a number
of training sets, half of which are used to train the classifiers, while the other half are
combined with pseudo random numbers. Fusion is performed using a sum rule of the
similarity scores of the two ensembles, and a max rule is used to select the final score for
each of them. In [64], a multi-modal method is proposed to combine the scores of selected
fingerprint matchers with the scores obtained by a face authenticator where the facial
features are combined with pseudo-random numbers. Fusion is performed by treating
the similarity matching scores of each system as new features, and using a linear support
vector machine for the final classification. In [67], a random subspace based method is
further proposed to combine the similarity matching scores and enhance the performance
of the system.
In addition, Lumini et al. [68] proposed another approach to enhance BioHashing tech-
nique. Their experimental results show that the the performance of the base BioHashing
method relies on the selection of parameter M (number of bits), and τ (discretization
threshold value). To deal with these problems, an improved version of BioHashing in-
Chapter 2. Literature Review 32
cludes: 1) Normalization of the biometric vectors before BioHashing. 2) Instead of using
a fixed value for τ , several values of τ may be used and combined according to the sum
rule. 3) Use more projection spaces to generate more BioHash codes per user. This
can be achieved by performing the BioHashing method iteratively k times on the same
biometric vector to obtain k projection spaces. Verification is carried out by combining
the classification scores of each BioHash code. 4) Another way to generate more Bio-
Hash codes is to use several permutations of the biometric features during the projection
calculation. Experiments were performed on face, fingerprint, and signature biometric
data. The experimental results demonstrate performance improvement in the stolen key
case, and the fusion of biometric data and space augmented BioHash codes is expected
to achieve a good compromise between the best case of non-stealing and the worst case
of being always stolen.
The BioHashing technique has also been used as the first step for cryptographic key
generation. In [69], eigenprojections are extracted from face images as features, each of
which is then hashed with pseudorandom numbers to extract a single bit. A bit string
is formed by concatenating the bits, and this bit string is further securely reduced to a
single cryptographic key via Shamir secret sharing. This paper reports 80-bit entropy
with 0.93% FRR. In [70], the same bit string and key generation technique as in [69]
is used, while the face images are represented with a wavelet Fourier-Mellin transform.
In [71], a Reed-Solomon error code is incorporated as an error correction step to correct
the bit disparity between the gallery and probe sample of BioHash. Their methods are
tested on three different fingerprint databases and on average, an FRR of below 1% can
be achieved.
Theoretical analysis of the BioHashing technique is presented in [72] using random
projection theory. However, random projection theory addresses the distance preserving
property in the domain of real numbers, and it is not clear how this is preserved in the
quantized domain. Moreover, it should be noted that for a certain system threshold value,
Chapter 2. Literature Review 33
the FRR is not affected by the employment of user-specific key. Therefore, the system
threshold value that is selected for near zero EER will produce large FAR in the stolen-key
scenario. Furthermore, for a M bit BioHash code b, assume each bit in b is independent,
let t be the system threshold value in terms of Hamming distance, then, when different
keys are applied on the biometric features of the same user, the probability of false
acceptance is∑t
i=0 (Mi )
2M . This probability depends on two factors, the system threshold t
and the dimensionality M , which reflect the separability and characteristics of the data
and the feature extractors. Therefore, the changeability (as well as the performance in
the user-specific key scenario ) of BioHashing is highly dependent on the characteristics
of the extracted features.
More recently, Teoh et al. [73] proposed a Multispace Random Projection (MRP)
method, which applies user-specific random projection on reduced low-dimensional fea-
ture vectors without the quantization procedure of BioHashing. The distance preserving
property of MRP is analyzed based on normalized inner product, and near zero EER is
achieved in the user-specific MRP scenario. However, their papers lack of rigorous pri-
vacy and changeability analysis. As shown in this thesis, the privacy protecting property
of their method is subject to certain attacks. Similar to the BioHashing technique, the
near zero EER in the user-specific key scenario will produce high FAR in the stolen-key
scenario. Even in the both-legitimate scenario, the performance of BioHashing and MRP
techniques are highly dependent on the characteristics of the extracted features, therefore
their methods do not provide strong changeability.
2.3 Summary
In summary, existing works either can not provide robust privacy protection, or sacrifice
verification accuracy for privacy preservation. The biometric crypto-system integrates
biometrics with cryptographic techniques to provide strong security of the biometric
Chapter 2. Literature Review 34
templates. Due to the binary nature of traditional cryptographic keys, a discrete rep-
resentation of the biometric signal is generally required. However, due to the enormous
variation of biometric signal, it is difficult to extract a discriminatory discrete represen-
tation which will allow the error correction algorithms to efficiently correct the errors.
This will lead to significant performance degradation. This is particularly challenging for
biometric traits such as face, whose features are usually represented in the continuous
domain. Most of the existing biometric crypto-systems are computationally complex,
and usually suffer performance degradation. Furthermore, the majority of these works
only produce a repeatable cryptographic key, while the biometric itself is not changeable.
Biometrics are not secret. Human leave their fingerprint everywhere, and human faces
can be easily captured by a camera. As such, if the biometric information is obtained
by an adversary, then all the biometric systems that use the same biometric trait are
compromised.
Motivated by the cancelable biometrics framework in [15], this thesis focuses on the
development of computationally efficient repeatable and non-invertible transforms for ad-
dressing both the changeability and privacy protection problems. Independent from [73],
we present a random projection based technique for secure biometric template generation.
To improve the recognition performance, a distance and privacy preserving mechanism
in a low-dimensional space, termed sorted index numbers (SIN) approach, is introduced.
The SIN method is then combined with different random transformations to obtain strong
changeability and enhanced privacy protection. This thesis presents detailed analysis on
the changeability and privacy preserving properties of the proposed methods. The Bio-
Hashing [61, 62] and MRP [73] methods are adopted for comparison. The effectiveness
of the introduced solutions is supported by extensive experimentation.
Chapter 3
Random Projection Based Face
Verification
3.1 Introduction
This chapter presents a systematic analysis of random projection (RP) as an intentional,
repeatable, and non-invertible transformation for changeable and privacy preserving bio-
metric template generation. RP is a technique to project a set of high-dimensional
data points to a randomly selected low-dimensional subspace, with the pairwise distance
between the data points approximately preserved. It is fundamentally based on the
Johnson-Lindenstrauss (J-L) lemma [74]. RP has been used as a dimensionality reduc-
tion or a privacy preserving tool in many different application contexts. Applications
of RP for dimensionality reduction include nearest neighbor search [75], face recogni-
tion [76], image and text data processing [77] and clustering [78], and learning of mixture
of Gaussian [79]. For privacy protection, RP has been applied for data mining [80], data
clustering [81], and biometric applications [62,73].
In this chapter, we elaborate its application in biometric verification as both di-
mensionality reduction and privacy preserving tools. The proposed method transforms
35
Chapter 3. Random Projection Based Face Verification 36
biometric data using a random matrix with each entry an independently and identically
distributed (i.i.d.) Gaussian random variable. This chapter contributes comprehensive
and detailed mathematical analysis on the similarity preserving and privacy protecting
properties of the generated biometric template. Our analysis introduces a precise method
of computing the probability of preserving the distance at an arbitrarily projected di-
mensionality, and achieves better projection lower bound than the best known in existing
works. Detailed privacy protection analysis is presented by studying the statistical prop-
erties of the reconstructed signal. The changeability of the biometric information in the
transformed domain is analyzed in detail using a geometric based approach and a vector
translation method is introduced to generate biometric templates with strong changeabil-
ity. Specifically, RP on both high-dimensional image vectors and dimensionality reduced
feature vectors are discussed and compared. Two different application scenarios, user-
independent (UI) and user-dependent (UD) RP are presented. The UI scenario utilizes
the same projection matrix for all the users, while the UD scenario is a two factor scheme
that applies user-specific RP. In both scenarios, the biometric template can be regener-
ated by simply varying the projection matrices. The proposed method is capable of
producing zero EER in UD scenario when both the biometric data and projection matrix
are legitimate. This also indicates strong changeability of the generated biometric tem-
plate. This is supported by both the probabilistic analysis and extensive experimentation
on a face verification problem.
The remainder of this chapter is organized as follows: Section 3.2 provides an overview
of the proposed solution. Section 3.3 analyzes the similarity preserving property of RP.
Section 3.4 presents the changeability analysis, and introduces the vector translation
technique for obtaining strong changeability. Section 3.5 presents an analysis of the
privacy protecting property of the proposed method. Section 3.6 reports the detailed
experimental results on a face verification problem, and Section 3.7 summarizes this
chapter.
Chapter 3. Random Projection Based Face Verification 37
3.2 Method Overview
The proposed method is based on random projection of face image vectors. An input
image is first preprocessed by detecting the face region. The preprocessed face image is
converted to a vector of size N × 1 by concatenating each row. The resulting vector, z,
is regarded as the input vector for feature extraction. The procedure of producing the
changeable and privacy preserving biometric template is as follows:
1. Preprocess and obtain an image vector z ∈ RN from the input face image.
2. Use a key k to generate an N × M (M < N) random matrix R. Each entry of
R is i.i.d. according to a Gaussian distribution with mean zero and variance 1N
,
rij ∼ N(0, 1N
), i = 1, ...,N, j = 1, ...,M.
3. Compute x =√
NM
RTz, where superscript T denote the transpose.
The extracted feature vector x ∈ RM is stored as the template for verification.
3.3 Accuracy Analysis
This section provides a detailed mathematical analysis of the similarity preserving prop-
erty of RP. RP is motivated by the J-L lemma [74]:
Lemma 3.1 (J-L lemma): For any 0 < ε < 1, and an integer n, let M be a positive
integer such that M ≥ M0 = O(ε−2 log n). For any set B of n points in RN , there exists
This lemma states that the pairwise distance between any two vectors in the Eu-
clidean space can be preserved up to a factor of ε, when projected onto a random
M -dimensional subspace. The original paper used heavy mathematical machinery to
Chapter 3. Random Projection Based Face Verification 38
prove that such mapping can be achieved by using a random matrix with orthonor-
mal columns. Frankl and Meahara [82] simplified the proof and introduced a bound of
M0 = d9(ε2 − 2ε3/3)−1 log ne + 1. Independently, simplified versions of this proof were
provided by Indyk and Motwani [83] and Dasgupta and Gupta [84]. In addition, Ar-
riaga and Vempala [85], Achlioptas [86], and and Li et al. [87] showed that it is possible
to achieve such embedding through much simpler random matrices for fast operation.
Achlioptas [86] provided a sharper lower bound of M0 = d(4 + 2γ)(ε2/2− ε3/3)−1 log ne,such that with probability of at least 1−n−γ, where γ controls the probability of success,
the pairwise distance between all n points can be preserved. Vempala [88] also intro-
duced a random projection method for mapping high-dimensional binary vectors into
low-dimensional ones, with the Hamming distance between the binary vectors approxi-
mately preserved.
As illustrated [84] and [86], the key issue in producing such distance preserving map-
ping is to show that the squared length (norm) of a random vector is sharply concentrated
around its mean when projected onto a random M -dimensional subspace, i.e., the Re-
stricted Isometry Property (RIP) [89]. Then, the assertion of the J-L lemma can be
proved by applying an union bound on all(
n2
)pairs such that none of the pairwise dis-
tance can be distorted more than (1 ± ε). Most of the existing works utilize inequality
properties to provide a bound for the probability of preserving distance between two
points, and then extend to n points to compute the lower bound M0. However, exper-
imental results in [76] and [77] suggest that the lower bound M0 is not tight, and it is
possible to produce good results in a lower dimensionality. Therefore, we are interested
in finding the extent to which the distance between two vectors can be approximately
preserved, when projected onto a lower dimensional subspace. This is particularly impor-
tant for applications that have a high demand in storage or computational complexity.
In [85] and [86], it is suggested that RP can be achieved by using a random matrix
with i.i.d. Gaussian entries. Such methods do not need to conduct the computationally
Chapter 3. Random Projection Based Face Verification 39
expensive Gram-Schmidt procedure for orthonormalization, and therefore are more ap-
propriate for practical applications. Following this line, this chapter introduces a precise
method for computing the probability of preserving the Euclidean distance between two
vectors when projected onto an arbitrary M-dimensional subspace. The probability lower
bound of preserving the pairwise distances for all n points, with respect to an arbitrary
M is further analyzed. As demonstrated later in this chapter, for the same probability of
preserving distance for all n points, we can get better a lower bound M0 than that shown
in [86]. To begin with, we first look into the properties of a random matrix with i.i.d.
Gaussian entries. Throughout this thesis, we use E[·] and Var[·] to denote expectation
and variance respectively.
Lemma 3.2 : Let R be an N × M (M < N) matrix. Each entry of R is an i.i.d.
Gaussian random variable with mean zero and variance 1N
, rij ∼ N(0, 1N
), i = 1, ...,N,
j = 1, ...,M. Let W = RT R and W ′ = RRT , then:
E[wi,j] =
1 i = j;
0, i 6= j;(3.2)
Var[wi,j] =
2N
, i = j;
1N
, i 6= j;(3.3)
E[w′i,j] =
MN
, i = j;
0, i 6= j;(3.4)
Var[w′i,j] =
2MN2 , i = j;
MN2 , i 6= j;
(3.5)
where wi,j and w′i,j are elements of W and W ′ respectively.
Please see Appendix 3-I for the proofs.
The results in Lemma 3.2 show that E[RT R] = I, where I denote identity matrix.
When N is large, the elements of RT R are sharply concentrated around their mean with
a very small variance, i.e. RT R ≈ I. This suggests that in a high-dimensional space,
Chapter 3. Random Projection Based Face Verification 40
when the entries of a random matrix R are i.i.d. Gaussian random variables, the columns
in R are almost orthogonal. The higher the dimensionality, the better the approximation
of orthogonality. Intuitively, the results show that in a high-dimensional space, vectors
with random directions are very likely to be close to orthogonal [90]. In particular, it
is straightforward to verify that when rij ∼ N(0, 1N
), E[‖rj‖2] = E[∑N
i=1 r2ij] = 1, and
Var[‖rj‖2] = Var[∑N
i=1 r2ij] = 2
Nwhere rj denote each column of R. This suggests that
the length of each column vector in R is strongly concentrated around 1, and subsequently
the vectors in R are close to orthonormal. These properties of a random matrix with
i.i.d. Gaussian entries imply that it is possible to relax the enforced orthogonality and
normality as in the original J-L lemma. Similarly, it can be shown that E[RRT ] = MN
I.
When R is scaled by√
NM
, and with large M , we have√
NM
R√
NM
RT ≈ I.
Lemma 3.3 : Let u be an arbitrary vector in N -dimensional Euclidean space, u ∈RN . Let R be an N ×M(M < N) matrix. Each entry of R is an i.i.d. Gaussian random
variable with mean zero and variance 1N
, rij ∼ N(0, 1N
), i = 1, ...,N, j = 1, ...,M. Let
x =√
NM
RTu, then:
E[‖x‖2] = ‖u‖2 , (3.6)
Var[‖x‖2] =2
M‖u‖4 . (3.7)
Please see Appendix 3-II for the proofs.
Lemma 3.3 shows that, up to a scaling factor√
NM
, the squared length of an arbitrary
vector is concentrated about its original one when the vector is projected onto a random
M-dimensional subspace. This explains the key issue in producing a distance preserving
mapping as illustrated in [84] and [86]. The variation of the squared length is inversely
proportional to the dimensionality of the projected subspace. As the dimensionality M
increases, the degree of concentration becomes sharper. Lemma 3.3 can be naturally
extended to the following lemma:
Chapter 3. Random Projection Based Face Verification 41
Lemma 3.4 : Let u and v be two arbitrary vectors in an N -dimensional Euclidean
space, u ∈ RN and v ∈ RN . Let R be an N × M(M < N) matrix. Each entry of R
is an i.i.d. Gaussian random variable with mean zero and variance 1N
, rij ∼ N(0, 1N
),
i = 1, ...,N, j = 1, ...,M. Let x =√
NM
RTu,y =√
NM
RTv, then:
E[‖x− y‖2] = ‖u− v‖2 , (3.8)
Var[‖x− y‖2] =2
M‖u− v‖4 . (3.9)
Proof: Replace x by x− y, and u by u− v in Lemma 3.3.
Lemma 3.4 shows that the expectation of the squared Euclidean distance (SED)
between two randomly projected vectors is the SED between the two original vectors,
and accordingly the variance is inversely proportional to the projected dimensionality.
The higher the projected dimensionality, the smaller the variance, and hence the better
the SED between two vectors in the transformed domain being preserved. It should
be noted that, since the entries of the projection matrix R are i.i.d. Gaussian random
variables, for a fixed vector u, all elements in the projected vector x = RTu are also
independent Gaussian random variables. This is due to the 2-stability of the Gaussian
distribution [86]: for any real numbers a1, a2, ..., ak, if {qi}ki=1 is a family of independent
Gaussian random variables with zero mean and unit variance, let X =∑k
i=1 aiqi, then
X ∼ cN(0, 1), where c = (a21 + ... + a2
k)1/2. Similarly, for a vector u− v, the elements of
RTu−RTv = RT (u− v) are independent Gaussian random variables.
Lemma 3.5 : For any ε > 0, and an integer M, let u and v be two arbitrary vectors
in N -dimensional Euclidean space, u ∈ RN and v ∈ RN . Let R be an N ×M(M < N)
matrix. Each entry of R is an i.i.d. Gaussian random variable with mean zero and
variance 1N
, rij ∼ N(0, 1N
), i = 1, ...,N, j = 1, ...,M. Let x =√
NM
RTu,y =√
NM
RTv,
then we have the probability of:
P ((1− ε) ‖u− v‖2 ≤ ‖x− y‖2 ≤ (1 + ε) ‖u− v‖2)
= G
(M
2,(1 + ε)M
2
)−G
(M
2,(1− ε)M
2
). (3.10)
Chapter 3. Random Projection Based Face Verification 42
where G(a, x) is the regularized Gamma function, G(a, x) = 1Γ(a)
∫ x
0e−tta−1dt, and Γ
denote the Gamma function [91].
Proof: Let xj and ui denote the elements of vectors x and u respectively, we have:
E[xj] = E
[N∑
i=1
√N
Mrijui
]
=
√N
M
N∑i=1
E[rij]ui
= 0,
Var[xj] = Var
[N∑
i=1
√N
Mrijui
]
=N
M
N∑i=1
Var[rijui]
=N
M
N∑i=1
(E[r2iju
2i ]− E[rijui]
2)
=N
M
N∑i=1
E[r2iju
2i ]
=N
M
N∑i=1
1
Nu2
i
=1
M‖u‖2 ,
Therefore√
M‖u‖2 xj ∼ N(0, 1). Since the elements of x are independent, let Z = M‖x‖2
‖u‖2 ,
then the random variable Z is distributed according to a Chi-square distribution. Replace
x and u by x − y and u − v respectively, then Z =M‖x−y‖2‖u−v‖2 also follows a Chi-square
distribution with M degrees of freedom. We have:
P (‖x− y‖2 ≤ (1 + ε) ‖u− v‖2) = P (X ≤ (1 + ε)M)
= G
(M
2,(1 + ε)M
2
),
P (‖x− y‖2 ≤ (1− ε) ‖u− v‖2) = P (X ≤ (1− ε)M)
= G
(M
2,(1− ε)M
2
),
Chapter 3. Random Projection Based Face Verification 43
Figure 3.1: Probability of preserving distance between two vectors as a function of M
and ε.
Hence:
P ((1− ε) ‖u− v‖2 ≤ ‖x− y‖2 ≤ (1 + ε) ‖u− v‖2)
= G
(M
2,(1 + ε)M
2
)−G
(M
2,(1− ε)M
2
).
Eqn. (3.10) provides a precise method for computing the probability of preserving the
SED between two vectors in the projected subspace. Figure 3.1 plots the probability as
a function of dimensionality M and error ε. It can be observed that for any fixed error ε,
the probability of preserving the distance between two vectors increases as the projected
dimensionality increases. On the other hand, for any fixed projected dimensionality, the
larger the error factor, the higher the probability of preserving the distance. For example,
even when projected to a low dimensionality of M = 200, with probability of 99.68%,
the SED between two vectors can be preserved up to an error factor of ε = 0.3.
Having obtained the probability of preserving the distance between two fixed points,
now we can apply the union bound to analyze the probability of preserving the pairwise
distance for all the n points. Let λ denote the probability in Eqn. (3.10), then for each
Chapter 3. Random Projection Based Face Verification 44
of the(
n2
)pairs, the probability of the distortion being larger than (1 ± ε) is 1− λ. For
all the(
n2
)pairs, the chance that some pairs do not preserve the distance is at most
(n2
)× (1− λ). Hence the probability of preserving the pairwise distance for all the pairs
simultaneously is 1− (n2
)× (1− λ). This proves the following lemma:
Lemma 3.6 : For any ε > 0, and an integer M , let any set B of n points in RN
being represented as a matrix D of size N × n. Let R be an N × M(M < N) matrix.
Each entry of R is an i.i.d. Gaussian random variable with mean zero and variance 1N
,
rij ∼ N(0, 1N
), i = 1, ...,N, j = 1, ...,M. Let A =√
NM
RT D, and f denote the map
RN → RM from the ith column of D to the ith column of A. Then with probability of at
From Eqn. (3.15), it is clear that the probability of error depends on the characteris-
tics of the features, and the dimensionality M . In general, zero Pf can not be achieved
by applying RP on the biometric data directly. However, since P (lxp ≤ lxg + t|lxg ≤t)P1 ≤ 1, and P (lxg > t)P (lxg − t ≤ lxp ≤ lxg + t|lxg > t) ≤ 1, Eqn. (3.15) can be
simplified as:
Pf ≤ P (lxg ≤ t) +tM
(lxg + t)M − (lxg − t)M, (3.16)
This probability can be minimized by adding an extra vector d ∈ RN , di >> t, to the
biometric data, z’ = z + d, such that after RP, P (lxg < t) = 0. We have:
Pf ≤ tM
(lxg + t)M − (lxg − t)M, (3.17)
and
limt
lxg
→0,∀MPc = lim
tlxg
→0,∀M(1− Pf ) = 1. (3.18)
It should be noted that the addition of vector d does not change the similarity between
two vectors since∥∥RT (u + d)−RT (v + d)
∥∥2=
∥∥RTu−RTv∥∥2
. The preceding analysis
shows that with appropriate vector translation, the proposed method can produce bio-
metric templates with changeability 1, by applying different RPs on the biometric data
of the same user. The system threshold t determines the choice of vector d. The elements
of vector d should satisfy di >> t such that lxg >> t and Pc = 1. This indicates the
strong changeability of the proposed method.
3.5 Privacy Analysis
To preserve the privacy of the users, it is expected that no information should be disclosed
if the stored biometric template is compromised. The proposed method utilizes RP for
Chapter 3. Random Projection Based Face Verification 50
biometric template generation. Due to the randomness of projection matrix, the user’s
privacy information can not be compromised if only the template is obtained by an
adversary. However, it is possible that an attacker can acquire more knowledge and
estimate the original signal.
Assuming the worst case that both the template and the projection matrix are com-
promised, then an adversary can estimate the original biometric data. For a robust pri-
vacy preserving mechanism, the estimated individual elements in the data vector should
not be exactly the same as the original ones. Furthermore, the global characteristics of
the estimated data vector should be far apart from the genuine data vector up to some
similarity functions.
Considering a projection function x = RTz, R ∈ RN×M , where the entries of R are
i.i.d. Gaussian random variables, and an adversary tries to estimate the values of z.
Since M < N , this is an under-determined system, where there are more unknowns
than linear equations. There are infinitely many solutions that satisfy x = RT z. To
solve this problem, one classical approach is to find the minimum norm solutions, using
z = R(RT R)−1x, where R(RT R)−1 is essentially the pseudo-inverse of R. Since RT R ≈ I,
the above estimation function can be simplified as z = Rx.
However, although the estimation involves an under-determined system, and hence
there are infinitely many solutions, it is possible that an adversary can estimate partial
of the real values, and therefore reveal part of the user’s information. If as many linearly
independent equations as the unknown elements can be found, then some elements may
be completely identified. To solve this problem, Du et al. [93] introduced the concept of
k-secure. For a matrix Q = RT of size M × N(M < N), if the remaining sub-matrix
after removing k columns of Q is still of full row rank, the matrix Q is called k-secure,
which guarantees that it is impossible to generate an equation (except the trivial zero
combination) that contains less than k+1 variables [93]. It is further shown in [80] and [93]
that for a matrix Φ of size (k+1)×N , where each row of Φ is a nonzero linear combination
Chapter 3. Random Projection Based Face Verification 51
of row vectors in Q, if Q is k-secure, the linear system of equations y = Φx involves at
least 2k+1 unknown variables. This property illustrates that if Q is k-secure, any linear
combinations of the equations contains at least k+1 variables. Therefore, to solve the
problem of identifying a few of the elements, the projected dimensionality should satisfy
M ≤ N2, such that each unknown variable is disguised by at least M other variables [94].
Since it is impossible to find M linearly independent equations that involve these M
variables, the solutions to each of the unknown variable are infinite, and therefore it is
impossible to find the exact value of any element in the original data vector.
Recall that the projection model in this chapter is x =√
NM
RTz, we can estimate z
using z =√
NM
Rx [80]. Since x =√
NM
RTz, we have z =√
NM
R√
NM
RTz = NM
RRTz.
To analyze the statistical properties of the estimated individual element, let zi be the ith
element of the estimated data vector, using the results in Lemma 3.2, it is straightforward
to derive that:
E[zi] = E
[N∑
j=1
N
Mw′
i,jzj
]= zi, (3.19)
Var[zi] = Var
[N∑
j=1
N
Mw′
i,jzj
]
= E
(N∑
j=1
N
Mw′
i,jzj
)2− E
[N∑
j=1
N
Mw′
i,jzj
]2
=N2
M2E
[N∑
j=1
(w′i,j)
2z2j + 2
∑
j 6=k
w′i,jzjw
′i,kzk
]− z2
i
=N2
M2E
[N∑
j=1
(w′i,j)
2z2j
]− z2
i
=
(2
M+ 1
)z2
i +1
M
∑
i6=j
z2j − z2
i
=1
M
∑
i6=j
z2j +
2
Mz2
i
=1
M(‖z‖2 + z2
i ). (3.20)
Chapter 3. Random Projection Based Face Verification 52
It can be seen that the expected value of each estimated element is equal to the true
value. Since when M ≤ N2, no single element can be exactly recovered, the variance of
zi can be considered as a measure of privacy.
Although the individual element in the original data vector can not be correctly
estimated, it is possible that the characteristics of the whole estimated data vector are
still close to the original data vector up to some similarity function. In this case, the
privacy of the user still can not be protected. To solve this problem, we should make sure
that the estimated data vector has a large distance to the original one, i.e. ‖z− z‖2 > ϕ,
where ϕ is a privacy threshold. For a biometric verification problem, the privacy threshold
value ϕ represents the natural variance of face images, and should be set to a value that is
larger than the largest possible distance between data vectors of the same human subject.
To quantify the probability of preserving privacy, we first note that the estimation
error of individual elements zi− zi approximates a Gaussian distribution with zero mean
and variance‖z‖2+z2
i
M. This is due to the fact that the elements R are i.i.d. Gaussian
random variables, and according to the Central Limit Theorem (CLT) [95], the elements
of W ′ = RRT are also almost Gaussian. To validate this, we generate a random vector
of size 10000× 1 and normalize it to unity length. This vector is considered as the data
vector. A matrix of size 10000×500 is then generated randomly with each entry an i.i.d.
Gaussian random variable. The data vector is then projected onto a low-dimensional
space using the generated random matrix, followed by a reconstruction procedure as
described above. This process is repeated 1000 times on the same data vector using
different random matrices. Figure 3.6 plots the estimation error of the first element of
the data vector. It can be seen that experimental error distribution fits well with the
statistics shown in Eqn. (3.19) and Eqn. (3.20).
For real applications, such as face recognition, the dimensionality of a face image
vector N is usually large, and |zi|2 << ‖z‖2. The expected value and variance of zi − zi
are E[zi − zi] = 0 and Var[zi − zi] ≈ ‖z‖2M
. Due to zi − zi ∼ N(0, ‖z‖2
M), we have
Chapter 3. Random Projection Based Face Verification 53
Figure 3.6: Gaussian approximation of estimation error.
(√
M‖z‖2 )(zi − zi) ∼ N(0, 1), and therefore M
‖z‖2 ‖z− z‖2 = M‖z‖2
∑Ni=1(zi − zi)
2 follows a
Chi-square distribution with N degrees of freedom. Then the probability of ‖z− z‖2 > ϕ
can be computed as:
P (‖z− z‖2 > ϕ) = P
(M
‖z‖2 ‖z− z‖2 >Mϕ
‖z‖2
)
= 1−G
(N
2,
Mϕ
2 ‖z‖2
). (3.21)
where G denote the regularized Gamma function.
It can be seen that the probability of preserving privacy with respect to ϕ is asso-
ciated with the dimensionality N , the squared length of the data vector ‖z‖2, and the
projected dimensionality M . When N and ‖z‖2 are fixed, the probability value mono-
tonically increases as M decreases. The above analysis is based on the minimum l2 norm
reconstruction model. Recent advances in the theory of Compressive Sensing (CS) [96],
a closely related area to random projection, demonstrate that for an K-sparse signal
of dimensionality N , the minimum l0 norm reconstruction model can recover the orig-
inal signal through exhaustive enumeration of all(
NK
)possible combinations, and the
minimum l1 norm reconstruction model can exactly reconstruct K-sparse vectors and
Chapter 3. Random Projection Based Face Verification 54
stably approximate compressible vectors with high probability in polynomial time when
M ≥ M0 = cKlog(N/K). Therefore, to ensure privacy, the projected dimensionality M
should be set to a smaller value than M0. However, as shown in previous section, the M
value is also associated with the similarity preserving property. This demonstrates that
the RP based method has a tradeoff between the privacy level and verification accuracy.
With higher projected dimensionality, better accuracy but possibly lower privacy level,
and vice versa.
Recall that the variance of the estimated individual element (Eqn. (3.20)) and the
probability of privacy preserving (Eqn. (3.21)) will both increase as the squared length
of the data vector ‖z‖2 increases. Therefore, the translation vector d, which is used to
enhance the changeability, can enlarge the vector length and be used as a complementary
approach to enhance privacy. It should be noted that when the vector d is also obtained
by the adversary, the privacy level is not improved but remains the same as without
translation. In real applications, the d vector is not associated with the user’s key, and
can be kept secret by a central controller.
3.6 Experimental Results
To evaluate and compare the performance of the introduced method for face based bio-
metric verification, we conduct experiments on a generic database that consists of face
images from several well-known face databases [97]. In this section, we first give a descrip-
tion of the employed database, followed by the experimental results along with detailed
discussion.
3.6.1 Database Description
In real life face recognition application scenarios, it is common that the user’s face images
are not available for training. As such, the intrinsic properties of the human subjects
Chapter 3. Random Projection Based Face Verification 55
are usually trained from subjects that are not those to be recognized. Moreover, other
conditions such as illumination, resolution, lighting, facial expression, and pose, may
vary from time to time, such that the image conditions for training and testing are differ-
ent. To simulate such situations, a generic database was organized in [97]. The generic
database originally contains 5676 images of 1020 subjects from 5 well-known databases,
FERET [98, 99], PIE [100], AR [101], Aging [102], and BioID [103]. In the FERET
database, 3817 images of 1194 subjects are officially provided with eye coordinates. In
addition, 1016 more images have had the eye coordinates manually determined by the
author of [97]. Therefore, altogether 3881 images of 750 subjects with at least 3 images
per subject are collected from the FERET database to form the data set. For the PIE
database [100], 816 images of 68 subjects are selected. In detail, 7 different poses and 5
different lighting conditions are included. Following the PIE’s naming rule, pose group
[27,37,05,29,11,07,09] is selected, which contains both horizontal and vertical rotations.
Images with pose variations are under normal lighting conditions and with neutral ex-
pressions. For illumination variations, 5 frontal face images with neutral expressions are
randomly selected from all 21 different illumination conditions with room lighting on. For
the AR database [101], all ground-truthed images (480 images of 120 subjects) are in-
cluded. Although the images in the FG-NET Aging database [102] are all ground-truthed
images, some of the low-quality or extremely difficult to recognize images are discarded
(e.g., baby images and old adult images are not selected at the same time for a specific
subject). Finally 276 images of 63 subjects are included. For the BioID database [103],
227 images of 20 subjects are selected to form the data set. A detailed configuration of
the whole data set is illustrated in the Table 3.1, with some example images shown in
Figure 3.7.
For face verification, we exclude image samples with large pose variation (> 15o), and
selected 4666 images for our experiments. The detailed configuration of the verification
data set is given in Table 3.2.
Chapter 3. Random Projection Based Face Verification 56
Figure 3.7: Image examples from the generic data set.
All the selected color images are first transformed to gray scale images by taking the
luminance component in YCbCr color space. All images are preprocessed according to
the recommendation of the FERET protocol, which includes: (1) images are rotated and
scaled so that the centers of the eyes are placed on specific pixels and the image size is
150 × 130; (2) a standard mask is applied to remove non-face portions; (3) histogram
equalized and image normalized to have zero mean and unit standard deviation. The
three steps for image preprocessing are illustrated in Figure 3.8. Finally, each image is
represented as a vector of dimensionality 17154.
In our experiments, we randomly select samples from 520 subjects as the training set,
while samples of the rest 500 subjects as the testing set. The training set includes 2388
images, and the testing set contains 2278 images. There is no overlap between the training
and the testing subjects. To simulate a real application, we perform evaluation on an
exhaustive basis, where every single image is used as a template once, and the rest of the
images as the probe set. All the elements in the translation vector, di, i = 1, 2, ..., N , are
set to 100, and the same d is applied to all users. To minimize the effect of randomness,
all the experiments were performed 5 times, and the average of the results are reported.
Chapter 3. Random Projection Based Face Verification 57
Database No. of No. of No. of
subjects images per subject images
FERET 750 ≥ 3 3881
AR 119 4 476
Aging 63 ≥ 3 276
BioID 20 ≥ 6 227
PIE 68 12 816
Total 1020 ≥ 3 5676
Table 3.1: Identification data set configuration.
Figure 3.8: Procedures for image preprocessing.
3.6.2 RP vs PCA
For the purpose of comparative study, we first need to compare the performance of RP
with other dimensionality reduction tools. Principal Component Analysis (PCA) [104]
and Linear Discriminant Analysis (LDA) [105] are two of the most popular methods for
dimensionality reduction, and have been used extensively in the literature as powerful
tools for face recognition applications. LDA is a supervised learning technique that
provides a class specific solution. It produces the optimal feature subspace in such a way
Chapter 3. Random Projection Based Face Verification 58
Database No. of No. of No. of
subjects images per subject images
FERET 750 ≥ 2 3029
AR 119 4 476
Aging 63 ≥ 3 276
BioID 20 ≥ 6 227
PIE 68 ≥ 8 658
Total 1020 ≥ 2 4666
Table 3.2: Verification data set configuration.
that the ratio of the between- and within-class scatters is maximized. Although LDA
based algorithms are superior to PCA based methods in some cases, it is shown in [106]
that PCA outperforms LDA when the training sample size is small and the training
images are less representative of the testing subjects. It is confirmed in [97] that PCA
performs much better than LDA in a generic learning scenario, where the image samples
of the human subjects are not available for training. Since the small sample size (SSS)
problem and the unavailability of training images are common in real life applications, and
PCA provides more reliable performance, we adopt the PCA algorithms for comparison
in this chapter.
PCA is an unsupervised learning technique which provides an optimal, in the least
mean square error sense, representation of the input in a lower dimensional space. In the
eigenfaces method [104], given a training set Z = {Zi}Ci=1, containing C classes with each
class Zi = {zij}Cij=1 consisting of a number of face images zij, a total of K =
∑Ci=1 Ci
images, the PCA is applied to the training set Z to find the K eigenvectors of the
covariance matrix,
Scov =1
K
C∑i=1
Ci∑j=1
(zij − z)(zij − z)T . (3.22)
Chapter 3. Random Projection Based Face Verification 59
where z = 1K
∑Ci=1
∑Ci
j=1 zij is the average of the ensemble. The eigenfaces are the
first M(≤ K) eigenvectors corresponding to the largest eigenvalues, denoted as Ψ. The
original image is transformed to the M -dimensional face space by a linear mapping:
xij = ΨT (zij − z).
The PCA transformation matrix Ψ and mean image z are obtained based on the
images in the training set, and the images in the testing set are used for evaluation.
There is no overlap between the training and testing human subjects. Since RP does not
need a training process, to produce comparable results, we perform evaluation on the
same set of testing images as PCA. In the report of the experimental results, RP denotes
the application of random projection on the high-dimensional image vectors directly.
Figure 3.9 compares the obtained EER at different dimensionalities when PCA and RP
are applied as feature extractors respectively. It can be seen that PCA provides better
EER than RP, and the verification accuracy of RP improves at higher dimensionality.
This is because PCA projects the image vectors to directions with highest variance, while
RP projects to random directions. As shown in Lemma 3.6, as the dimensionality M
increases, with higher probability the Euclidean distance can be preserved up to a smaller
error factor, hence the performance improves.
Another observation is that the verification accuracy of both methods levels off after
certain dimensions, 100 for PCA (EER=17.54%), and 200 for RP (EER=18.68%) in our
experiments. For PCA, the projected features after a certain dimension will have very
small variance, therefore contribute little to the classification. For RP, the verification
accuracy is associated with both the dimensionality of the projected features, and the
discriminant power of the image vectors. When M exceeds a certain dimension, with
probability 1, the Euclidean distance can be preserved up to a very small error factor,
and therefore the verification accuracy depends on the separability of the original image
vectors. To illustrate this, we performed experiments on the non-projected original image
vectors, where Euclidean distance is used as the dissimilarity measure. This produces an
Chapter 3. Random Projection Based Face Verification 60
Figure 3.9: EER obtained by using PCA and RP as feature extractors.
EER of 18.19%. Figure 3.10 plots the Receiver Operating Characteristic (ROC) curves of
RP (M = 200), and the verification results of the original image vectors. The ROC curves
are plotted by Genuine Acceptance Rate (GAR, complement of FRR) against FAR. It
can be observed that RP and original images have almost overlapping ROC curves. This
demonstrates that the Euclidean distance of the original images can be approximately
preserved. Generally, in a face recognition problem, PCA provides more discriminatory
representation than the original noisy face images. This explains why PCA outperforms
RP in our experimentation.
3.6.3 RP vs PCARP
Although the PCA algorithm performs better than RP in general, it provides neither
privacy protection, nor revocability. To solve these problems, a possible solution is to
apply RP on dimensionality reduced PCA feature vectors, as in [73]. In this chapter, this
method is denoted as PCARP. Due to the fact that the original image can be approxi-
mately reconstructed from its PCA coefficients, the revealing of these PCA coefficients
Chapter 3. Random Projection Based Face Verification 61
Figure 3.10: ROC curve of RP and original image vectors.
can be considered as a privacy breach. To protect the PCA coefficients, the PCARP
projected features should satisfy M ≤ J2, where J is the dimensionality of PCA feature
vectors.
Depending on the application context, the proposed changeable biometric system can
be implemented in two scenarios: user-independent (UI) and user-dependent (UD). In the
UI scenario, all the users use the same projection, i.e., same key to generate the same set
of random matrices. The randomness generation key can be controlled by the application
provider, and therefore the users do not need to carry the key for authentication. The
UD scenario is a two-factor scheme that requires user-specific projection, i.e., each user
use a different key to generate different random matrix. In both cases, the biometric
template can be regenerated by simply changing the key.
Since the UD scenario is a two factor scheme, there exist some situations that need to
be considered: both-legitimate, stolen-key, and stolen-biometrics. In the both-legitimate
case, different users utilize distinct keys for RP, and it is assumed that the key and bio-
metric data are not stolen. As we discussed in Chapter 2, the UD projection does not
change the FRR since for the same projection is still applied on the biometric represen-
Chapter 3. Random Projection Based Face Verification 62
tation of the same user. If the data points that are originally within a distance of t to a
vector u are rejected by using a different projection, then it provides changeability for u.
Therefore, the generated FAR in the both-legitimate case provides a measure of change-
ability, and the smaller the FAR, the better the changeability. The stolen-biometrics
case is essentially the changeability problem, and it has the same performance as the
both-legitimate case. In the stolen-key case, the same random matrix is applied to the
biometric features of both the genuine and imposter users. This is equivalent to the UI
scenario. Therefore, the performance of the changeable face verification system can be
evaluated through experiments on UI and UD (both-legitimate) scenarios.
User-independent Scenario
In the UI scenario, all the users utilize the same projection matrix. Since the same vector
d is used for all users, the translation procedure does not effect the similarity between
vectors. Figure 3.11 compares the obtained EER of PCARP and RP at different M ,
with the dimensionality of PCA vectors J = 2 × M . Table 3.3 lists the experimental
results of PCA, RP, and PCARP. Overall, RP and PCARP achieves similar performance,
and both produce lower recognition accuracy comparing with the original PCA method.
Due to the fact that PCA features provide better discriminant power than the original
image vectors, the PCARP method requires lower dimensionality than the RP method
to achieve the same accuracy. Figure 3.12 shows the ROC curve of RP and PCARP at
M = 200. It can be observed that RP and PCARP have almost overlapping ROC curves.
User-dependent Scenario
The UD scenario is a two-factor scheme where each user utilizes a distinct projection
matrix. Figure 3.13 depicts the obtained EER as a function of dimensionality. It can be
seen that when RP is applied directly on image or PCA feature vectors, zero EER can not
Chapter 3. Random Projection Based Face Verification 63
Figure 3.11: EER obtained in the user-independent scenario.
Figure 3.12: ROC curve of RP and PCARP in the user-independent scenario.
Chapter 3. Random Projection Based Face Verification 64
Table 3.3: Experimental results (EER, in %) of PCA, RP, and PCARP at different
dimensionalities.
Figure 3.13: EER obtained in the user-dependent scenario.
be obtained. The EER decreases as the dimensionality increases. This is consistent with
our analysis in Eqn. (3.15), that the probability of error depends on the characteristics of
the vectors and the dimensionality M , and the probability of error decreases at higher M .
However, as shown in Eqn. (3.18), by proper translation, the ratio of system threshold
and the length of vector approaches zero, and zero error rate can be obtained. This is
confirmed in our experiments that zero EER is obtained at all dimensionalities.
The above experimental results demonstrate that it is possible to produce zero EER
when the biometric data and the projection matrix generation key are both legitimate.
Chapter 3. Random Projection Based Face Verification 65
Figure 3.14: ROC curve of RP in the user-dependent scenario.
Figure 3.15: ROC curve of PCARP in the user-dependent scenario.
Chapter 3. Random Projection Based Face Verification 66
Figure 3.14 and 3.15 depict the ROC curve of RP and PCARP at M = 200 respectively,
with the threshold values selected based on stolen-key case. It can be observed that
in the both-legitimate case, without vector translation, the FAR is dependent on the
system threshold value, and hence can not provide strong changeability. On the other
hand, with proper vector translation, zero FAR is obtained for all selections of threshold
values. This demonstrates that the biometrics is strongly changeable, and the FAR is
zero even the biometrics is stolen. In the stolen-key case, the performance is evaluated
by using the same projection matrix for all the users. For fixed system threshold, the
FRR is the same for the both-legitimate case and the stolen-key case. A smaller FRR
will produce higher FAR when the key is stolen, and vice versa. The selection of the
system threshold is dependent on the requirement of the applications.
Changeability
As discussed before, the performance in the UD scenario actually implies the changeabil-
ity of the proposed method. The smaller the FAR, the stronger the changeability. Since a
zero FRR corresponds to the largest threshold value t, zero EER indicates strong change-
ability. To confirm this point, we also demonstrate the changeability of the proposed
method through experiments on RP and PCARP projected features. The image samples
from the same user are projected using different RP matrices and matched against each
other. Each individual image is also matched against itself by using different projection
matrices. The experiment consists of a total number of 13922 verification attempts. The
experimental results are shown in Figure 3.16, where the FAR is plotted as a function of
the system threshold t, and CH denotes changeability experiments. The t is normalized
such that 0 represents the lowest value, and 1 is the highest value. The obtained FAR
in the UD scenario is also depicted for comparison purposes. Since Euclidean distance
is applied as the dissimilarity measure, a smaller t means lower FAR and higher FRR,
and vice versa. It can be observed that without vector translation, the changeability is
Chapter 3. Random Projection Based Face Verification 67
Figure 3.16: Experimental results for changeability: RP (left) and PCARP (right).
dependent on t, and hence can not produce strong changeability. On the other hand,
with proper translation, it is capable of producing zero FAR for all selections of system
threshold values, i.e., for any system. The experimental results in changeability and that
of the UD scenario almost overlap FAR plots; this confirms that the performance in the
UD scenario indicates changeability of the system. In the remainder of the thesis, we
will use the performance in UD scenario to demonstrate the changeability.
3.6.4 Discussion
The experimental results indicate that RP offers slight degradation in verification accu-
racy comparing with PCA based method. However, the RP method preserves the user’s
privacy if the stored template is compromised. The RP based privacy preserving solu-
tion can be applied on either high-dimensional image vectors or dimensionality reduced
feature vectors. As shown in our experiments, PCARP and RP methods produce similar
performance in UI scenario, and both are capable of producing zero EER in UD scenario
with proper vector translation. In the UD scheme, if the key is stolen, the performance
will be the same as in the UI scenario. If only the biometric data is stolen, the FAR will
be zero. This also explains the changeability, which means that two biometric vectors
Chapter 3. Random Projection Based Face Verification 68
that are generated from the same biometric using different projection matrices can not
be used to authenticate each other successfully.
An advantage of the PCARP method is that it can produce similar performance at
a lower dimensionality. However, the PCA based method requires a training process,
which usually involves a large number of training images, and hence it has much higher
computational requirements. Also, the collection of these training images pose a privacy
problem. On the other hand, the RP method is data independent, does not require
training, and is much easier to implement. More importantly, the PCARP method is
vulnerable to cross-matching attack. For example, given a PCA vector of dimensionality
J = 200, to produce privacy preserving template, and also highest possible accuracy, we
can project the PCA features to a vector of size M = 100 using RP. However, if the tem-
plates of two applications that use the same set of PCA coefficients are revealed, and the
RP matrix for these two applications are different and also obtained, then an adversary
can form a set of J linear equations with J unknowns, and the PCA coefficients can be
exactly reconstructed. By using RP directly on image vectors, since the dimensionality
of such vectors is usually very high (e.g. N = 17154 in the generic data set), and the
projected dimensionality is low (e.g. M = 200), an adversary will need to compromise⌈
NM
⌉= 85 templates from one user to recover the original image. Although it is possible
to produce better verification accuracy using advanced feature extraction method, the
vulnerability to cross-matching attack is essentially a weakness of applying RP to such
low-dimensional feature vectors. Considering all these aspects, RP on high-dimensional
image vectors is a more appropriate solution for privacy preserving biometric verification.
3.7 Summary
This chapter has presented a systematic analysis of random projection based method for
addressing the challenging problem of template changeability and privacy protection in
Chapter 3. Random Projection Based Face Verification 69
biometrics enabled verification systems. Two different scenarios, user-independent and
user-dependent random projection have been discussed. Detailed mathematical analysis
shows that the similarity between two vectors can be approximately preserved when
projected onto a random subspace with appropriate dimensionality. We have introduced
a precise method for computing the probability of preserving distance between two points
with respect to the error factor and projected dimensionality, and provided a probability
lower bound of preserving the pairwise distance for all the points. Our method achieves
better dimensionality lower bound than existing works. The user-dependent scenario is a
two-factor scheme that utilizes user-specific matrix for random projection. We have used
a geometric-based approach to approximate the probability of error, and introduced an
effective method of vector translation to improve the changeability.
The proposed method produces changeable biometric templates, which can be achieved
by simply varying the RP matrix. To explore the privacy preserving characteristics of
such method, we have provided detailed analysis in both the estimation of individual
element and the whole vector. For the purpose of comparative study, we have performed
computer simulations by using RP on both image vectors and PCA reduced feature
vectors. Experimental results show that these two methods have similar verification
accuracy in user-independent scenario, and are both capable of producing zero EER in
user-dependent scenario with vector translation. It is pointed out that better privacy pro-
tection can be obtained by applying RP on high-dimensional image vectors directly, which
is also data-independent, computationally economical, and easy to implement. However,
due to the noisy representation of the original image vectors, and the requirement of
using lower projected dimensionality to ensure privacy, the recognition performance is
usually degraded when RP is applied on high-dimensional image vectors. To improve the
recognition performance as well as maintaining privacy protection, a sorted index num-
ber approach is introduced to preserve privacy for discriminant low-dimensional feature
vectors, which will be presented in the following chapters.
Chapter 3. Random Projection Based Face Verification 70
3.8 Appendix
3.8.1 Appendix 3-I
Proof of Lemma 3.2 : Let W = RT R, R ∈ RN×M , and the entries of R are i.i.d. Gaussian
random variables, rij ∼ N(0, 1N
). Let wij denote the elements of W .
If i = j, we have:
E[wij] = E
[N∑
k=1
r2kj
]=
N∑
k=1
E[r2kj] = N × 1
N= 1,
Since the entries of R, rij, are i.i.d. Gaussian random variables with mean zero and
variance 1N
, then the random variable Z = NN∑
k=1
r2kj follows a Chi-square distribution
with degree of freedom N :
Var
[N
N∑
k=1
r2kj
]= N2Var
[N∑
k=1
r2kj
]= 2N,
Hence:
Var[wij] = Var
[N∑
k=1
r2kj
]=
2
N,
If i 6= j, we have:
E[wij] = E
[N∑
k=1
rkirkj
]=
N∑
k=1
E[rkirkj]
=N∑
k=1
E[rki]E[rkj] = 0,
Var[wij] = E[w2ij]− E[wij]
2 = E
(N∑
k=1
rkirkj
)2
= E
[N∑
k=1
r2kir
2kj +
∑
l 6=k
rlirkirljrkj
]
= E
[N∑
k=1
r2kir
2kj
]=
N∑
k=1
E[r2kir
2kj] =
1
N.
Similarly, Eqn. (3.4) and Eqn. (3.5) can be proved.
Chapter 3. Random Projection Based Face Verification 71
3.8.2 Appendix 3-II
Proof of Lemma 3.3 : Let x =√
NM
RTu, where u ∈ RN , R ∈ RN×M , and the entries of
R are i.i.d. Gaussian random variables, rij ∼ N(0, 1N
). Let ui denote the elements of u,
we have:
E[‖x‖2] = E
M∑j=1
(N∑
i=1
√N
Mrijui
)2
=N
M
M∑j=1
E
(N∑
i=1
rijui
)2
=N
M
M∑j=1
E
[N∑
i=1
r2iju
2i + 2
∑
l 6=k
rljrlrkjrk
]
=N
M
M∑j=1
E
[N∑
i=1
r2iju
2i
]
=N
M
M∑j=1
1
N‖u‖2
= ‖u‖2 ,
To compute Var[‖x‖2], we first define αj = (∑N
i=1 rijui)2, we have:
E[αj] = E
(N∑
i=1
rijui
)2
= E
[N∑
i=1
r2iju
2i + 2
∑
l 6=k
rljulrkjuk
]
= E
[N∑
i=1
r2iju
2i
]
=N∑
i=1
1
Nu2
i
=1
N‖u‖2 ,
Since rij ∼ N(0, 1N
), E[r4ij] = 3
N2 , then:
E[α2j ] = E
(N∑
i=1
rijui
)4
Chapter 3. Random Projection Based Face Verification 72
= E
[N∑
i=1
r4iju
4i + 6
∑
l 6=k
r2lju
2l r
2kju
2k
]
=3
N2
N∑i=1
u4i +
6
N2
∑
l 6=k
u2l u
2k
=3
N2
(N∑
i=1
u4i + 2
∑
l 6=k
u2l u
2k
)
=3
N2
(N∑
i=1
u2i
)2
=3
N2‖u‖4 ,
We have:
E[‖x‖4] = E
M∑j=1
(N∑
i=1
√N
Mrijui
)2
2
=N2
M2E
(M∑
j=1
αj
)2
=N2
M2E
[M∑
j=1
α2j + 2
∑
l 6=k
αlαk
]
=N2
M2
(M∑
j=1
E[α2j ] + 2
∑
l 6=k
E[αl]E[αk]
)
=N2
M2
(3
N2‖u‖4 + 2
M(M − 1)
2
‖u‖2
N
‖u‖2
N
)
=
(1 +
2
M
)‖u‖4 ,
and the variance of ‖x‖2 can be computed as:
Var[‖x‖2] = E[‖x‖4]− E[‖x‖2]2 =2
M‖u‖4 .
Chapter 4
Sorted Index Numbers for Face
Recognition
4.1 Introduction
The recognition accuracy is of fundamental importance in biometrics based recognition
systems. Many face recognition (FR) techniques have been proposed in the literature,
and the state-of-the-art in the area can be found in a series of surveys [107–109]. In
general, geometrical local feature based approach and holistic template matching based
approach are considered to be two of the major FR methodologies. In a geometrical
feature based FR system, some local facial features such as eyes, nose, and mouth are
identified, and their location or geometry characteristics are used for face representation.
Examples of geometrical approaches include the Hidden Markov Model (HMM) based
method [110,111], and the Elastic Bunch Graph Matching (EBGM) method [112]. How-
ever, the performance of such methods usually relies heavily on the exact localization of
facial features, which is a difficult task in many application scenarios [113]. Appearance
based approaches, which treat the human face as a holistic pattern, are among the most
successful methods [108,114]. In an appearance based FR system, the face image is con-
73
Chapter 4. Sorted Index Numbers for Face Recognition 74
verted to a high-dimensional vector that consists of the pixel values in the image, and
dimensionality reduction techniques are applied to obtain a lower-dimensional representa-
tion. The extracted features are usually a set of real numbers in the continuous domain,
and the similarity between images is evaluated by distance measures. Representative
techniques include PCA and LDA and their variants.
In Chapter 3, a random projection (RP) based method is introduced for privacy
preserving face recognition. Due to the noisy nature of the original images, RP on high-
dimensional image vectors produces slightly lower performance than PCA. Many other
advanced appearance based techniques may provide more discriminant representation
than PCA [97]. It is highly probable that RP on these discriminant features will produce
better recognition performance. However, when RP is applied on dimensionality reduced
feature vectors, then the biometric template is vulnerable to cross matching attack.
This chapter presents a novel approach for privacy preserving face recognition using
appearance based continuous features. Unlike traditional appearance based FR systems,
where the original features are usually stored as templates for matching, the proposed
method stores the sorted index numbers (SIN) of the extracted features as template.
Since it is impossible to recover any of the exact values of the original features, the
transformation from original features to the SIN vectors is non-invertible. A matching
algorithm is introduced to measure the similarity between two SIN vectors. Extensive
experimentation demonstrates that the proposed solution may improve the recognition
accuracy in both identification and verification scenarios.
The remainder of this chapter is organized as follows: Section 4.2 provides an overview
of the proposed SIN method. Detailed analysis on the SIN method is given in Section
4.3. Section 4.4 introduces two privacy measures that evaluate the privacy protection at
individual attribute and global vector levels, and presents a privacy analysis of the SIN
method. Detailed experimental results in both identification and verification scenarios
are presented in Section 4.5, and a summary of this chapter is given in Section 4.6.
Chapter 4. Sorted Index Numbers for Face Recognition 75
4.2 Method Overview
This section presents an overview of the proposed solution for privacy preserving face
recognition. The proposed method assumes that the extracted features of a biometric
signal can be represented by a vector of continuous numbers, and the similarity of the
vectors can be evaluated by some (e.g., Euclidean) distance measures. The procedure of
creating the proposed SIN feature vector is as follows:
1. Extract feature vector w ∈ RN from the input face image.
2. Compute u = w − w, where w is the mean feature vector calculated from the
training data.
3. Sort the feature vector u in descending order, and store the corresponding index
numbers in a new vector g.
4. The generated vector g ∈ ZN that contains the sorted index numbers is stored as
template for recognition.
For example, given u = {u1, u2, u3, u4, u5, u6}, the sorted vector in descending order
is g = {u4, u6, u2, u1, u3, u5}, then the template is g = {4, 6, 2, 1, 3, 5}.The method for computing the similarity between two SIN vectors, denoted as the
SIN distance in this thesis, is as follows:
1. Given two SIN feature vectors g ∈ ZN and p ∈ ZN , where g denote the template
vector, and p denote the probe vector. Start from the first element g1 of g.
2. Search for the corresponding element in p, i.e., pj = g1. Record ξ1 = j − 1, where
j is the index number in p.
3. Eliminate the obtained pj from p, and obtain p1 = {p1, p2, ..., pj−1, pj+1, ..., pN}.
4. Repeat step 2 and 3 on the subsequent elements of g until gN−1. Record ξ2, ξ3, ..., ξN−1.
Chapter 4. Sorted Index Numbers for Face Recognition 76
5. The similarity measure of g and p is computed as S(g,p) =∑N−1
i=1 ξi.
Illustration example:
1. For two SIN feature vectors g = {4, 6, 2, 1, 3, 5} and p = {2, 5, 3, 6, 1, 4}, we first
search the 1st element g1 = 4, and find that p6 = 4. Therefore ξ1 = 6 − 1 = 5.
Eliminate p6 from p and we form a new vector of p1 = {2, 5, 3, 6, 1}.
2. Search the 2nd element g2 = 6, and find that p14 = 6. Therefore ξ2 = 4 − 1 = 3.
Eliminate p14 from p1 and form a new vector of p2 = {2, 5, 3, 1}.
3. Search the 3rd element g3 = 2, and find that p21 = 2. Therefore ξ3 = 1 − 1 = 0.
Eliminate p21 from p2 and form a new vector of p3 = {5, 3, 1}.
4. Search the 4th element g4 = 1, and find that p33 = 1. Therefore ξ4 = 3 − 1 = 2.
Eliminate p33 from p3 and form a new vector of p4 = {5, 3}.
5. Search the 5th element g5 = 3, and find that p42 = 1. Therefore ξ5 = 2− 1 = 1.
6. Compute S(g,p) =∑5
i=1 ξi = 5 + 3 + 0 + 2 + 1 = 11.
4.3 SIN Method
The idea of SIN is originated from the pairwise relation of any two elements in a vector.
Xiang et al. [115] utilized the relative relation of groups of two bins to represent the
shape of a histogram. In the proposed method, the pairwise relative relation of vector
elements is used for distance approximation. To understand the underlying rationale of
the proposed algorithm, we first look into an alternative presentation of the method,
named Pairwise Relational Discretization (PRD). The procedure of producing the PRD
feature vector is as follows:
1. Extract feature vector w ∈ RN from the input face image.
Chapter 4. Sorted Index Numbers for Face Recognition 77
2. Compute u = w − w, where w is the mean feature vector calculated from the
training data.
3. Compute binary representation of u by comparing the pairwise relation of all the
elements in u according to:
bij =
1 ui ≥ uj;
0, ui < uj;(4.1)
4. Concatenate all the bits into one vector b = {b12, ..., b1N , b23, ..., b2N , b34, ..., bN−1,N}.Store the binary vector b as template for recognition.
The similarity measure of the PRD method is based on Hamming distance.
Unlike the traditional discretization method, which quantizes individual elements
based on some predefined quantization levels, the proposed method takes the global
characteristics of the feature vectors into consideration. This is interpreted by comparing
the pairwise relation of all groups of two elements in the vector. From a geometric point
of view, the PRD method is equivalent to partitioning an N -dimensional space into N !
cells, where N ! is the total number of possible outputs of the PRD vector. An original
vector is mapped onto the corresponding cell, and the Euclidean distance between two
vectors is approximated by the spatial distance of the cells, i.e., the Hamming distance of
the corresponding PRD vectors. More precisely, since the pairwise relation is invariant
to the norm of the vectors, the Hamming distance of the PRD vectors approximates the
Euclidean distance between vectors that are normalized to the same length. Figure 4.1
provides a graphic view of the partition of a 3-D sphere, assuming all the vectors have
unit length. It can be observed that the 3-D surface is partitioned into 3! = 6 cells, and
the distance of the cells can be measured by the Hamming distance of the corresponding
binary PRD vectors.
Alternatively, the PRD method interprets an N -dimensional space as combinations
of 2-D planes. In an N -dimensional subspace, when the similarity of two vectors is
Chapter 4. Sorted Index Numbers for Face Recognition 78
Figure 4.1: 3-D demonstration of SIN method.
evaluated by Euclidean distance, the vector elements are treated as coordinates in the
corresponding basis {h1,h2, ...,hN}, and the similarity is based on the spatial closeness.
The elements are essentially the projection coefficients of the vector onto each basis
(i.e., lines). Here, instead of projecting onto lines, we explore the projection onto 2-
D planes. Figure 4.2 offers a diagrammatic illustration of the PRD method. For two
points in an N -dimensional subspace, if they are spatially close to each other, then in
a large number of 2-D planes, their projection location should be close to each other,
i.e., small Hamming distance, and vise versa. Therefore, the Euclidean distance between
two vectors can be approximated by the Hamming distance between the corresponding
PRD vectors. The mean subtraction step ensures zero mean of each dimension. It
deleverages the significance of each element such that no single dimension will overpower
others. The discretization step partitions a 2-D plane into two regions by comparing the
pairwise relation. It reduces the sensitivity of the variation of individual elements, and
therefore can potentially provide better error tolerance. Figure 4.3 shows the intra-class
and inter-class distributions of the first 100 PCA coefficients based on 1000 randomly
Chapter 4. Sorted Index Numbers for Face Recognition 79
Figure 4.2: Diagram of Pairwise Relational Discretization (PRD) method.
selected images from the experimental data set. The PCA vectors are normalized to
unit length, and Euclidean distance and Hamming distance are used as dissimilarity
measure. Note that the size of the overlapping area of the intra-class and inter-class
distributions indicates the recognition error. It can be observed that the PRD method
produces smaller error than the original features, therefore will possibly provide better
recognition performance.
A major drawback of the PRD method is the high dimensionality of the generated
binary PRD vector. For an N -dimensional vector, the generated binary vector b will
have a size of N(N−1)2
. For example, for a feature vector with N = 100, the PRD
vector will have a size of 4950. This problem introduces high storage and computational
requirements. This is particularly important for applications with high processing speed
demands. To improve this, we note that the PRD method is based on pairwise relation
Chapter 4. Sorted Index Numbers for Face Recognition 80
Figure 4.3: Comparison of intra-class and inter-class distribution using Euclidean and
Hamming distances.
of all the vector elements, and the same information can be exactly preserved from the
sorted index numbers, i.e., any single bit in b can be derived from the SIN vector.
Let g and p denote the SIN vectors of template and probe images respectively, bg
and bp represent the corresponding PRD vectors, then we have:
H(bg,bp) = S(g,p) =N−1∑i=1
ξi. (4.2)
where H(bg,bp) and S(g,p) denote the Hamming distance and SIN distance respectively,
and ξi, i = 1, ......, N represents the Hamming distance associated with every single ele-
ment in g.
Proof of Eqn. (4.2): Since g and bg are derived from the same feature vector, in bg,
there are N−1 bits that are associated with the first element of g, g1 . If pj = g1, where j
is the index number of the corresponding element in p, then all the index numbers to the
left of pj will have different bit values in bp, i.e., ξ1 = j−1. It should be noted that since
the Hamming distance for all the bits associated with pj = g1 have been computed, the
Chapter 4. Sorted Index Numbers for Face Recognition 81
pj element should be removed for the calculation of next iteration. After the Hamming
distances for all the elements in g and p are computed, the sum of them will correspond
to the Hamming distance of bg and bp, i.e., H(bg,bp) = S(g,p) =∑N−1
i=1 ξi.
Eqn. (4.2) shows that the proposed SIN and PRD methods produce exactly the
same results. The equivalence of PRD and SIN methods also indicates that in an N -
dimensional space, the total number of possible outputs is N !. Because the events associ-
ated with each permutation are mutually exclusive to each other with equal probability,
the N ! cells on the surface of an N -dimensional sphere have the same volume. To test the
effectiveness of SIN over PRD in computational complexity, we performed experiments
on a computer with Intel CoreTM2 CPU 2.66GHz. With an original feature vector of
dimensionality 100, the average time for PRD feature extraction and matching is 26.2
ms, while the SIN method only consumes less than 0.9 ms.
The approximation of the Euclidean distance of original vectors and the Hamming
distance of the corresponding SIN/PRD vectors are demonstrated in Figure 4.4, at differ-
ent dimensionalities. A number of 1000 vectors are generated randomly and normalized
to unit length. Taking the first vector as a reference vector, the Euclidean distance with
all the other vectors are computed, sorted, and plotted as the red curve. The Hamming
distances between the corresponding SIN/PRD vectors are then computed and plotted
in blue. It can be seen that the Euclidean distance can be approximately preserved by
the Hamming distance of the PRD vectors, and the higher the dimensionality, the better
the distance approximation.
4.4 Privacy Analysis
Since the SIN method only stores the index numbers of the sorted feature vector u, the
transformation from u to the corresponding SIN vector g is non-invertible. There is no
effective reconstruction being possible to recover any of the exact values of u from g.
Chapter 4. Sorted Index Numbers for Face Recognition 82
Figure 4.4: SIN approximation of Euclidean distance.
However, an adversary may be able to estimate the distribution of the original features,
generate a set of random numbers according to the known distribution, and rearrange
the random numbers based on the SIN vector. As such, it is possible to provide an
approximate estimation of the original features. For simplicity, we assume the features
are i.i.d. in this chapter.
Let ρ1, ρ2, ..., ρN denote N i.i.d. random variables, and ρ1:N , ρ2:N , ..., ρN :N denote the
ordered variates, then we have the mean and variance of the jth order statistic are [116]:
mj:N =
∫ +∞
−∞sfj:N(s)ds (4.3)
σ2j:N =
∫ +∞
−∞(s−mj:N)2fj:N(s)ds (4.4)
Chapter 4. Sorted Index Numbers for Face Recognition 83
where fj:N(s) is the probability density function (pdf) of ρj:N , and
fj:N(s) =N !
(j − 1)!(N − j)!F j−1(s)[1− F (s)]N−Jf(s) (4.5)
where F (s) is the cumulative distribution function (cdf) of ρ.
Let ρj:N denote the estimation of ρj:N , then E[ρj:N − ρj:N ] = mj:N − mj:N , and
Var[ρj:N− ρj:N ] = σ2j:N + σ2
j:N . When the distribution of ρ is unknown, then the expected
value of the estimation is not zero since mj:N 6= mj:N . In this case, the estimation will
be less accurate and the user’s privacy can be protected. However, it is possible that the
attacker may estimate the distribution of the original features. Considering the worst
case that the exact distribution is known, then we have:
E[ρj:N − ρj:N ] = mj:N − mj:N = 0, (4.6)
Var[ρj:N − ρj:N ] = 2σ2j:N (4.7)
Therefore, the expected value of the ρj:N − ρj:N will be zero. Since the exact value of any
element in the original feature vector can not be recovered, the variance of ρj:N− ρj:N can
be considered as a privacy measure. The larger the variance, the better the individual
elements being protected.
Figure 4.5 plots the variances of the order statistics as functions of vector dimension-
ality N , and ρ and ρ are assumed to be i.i.d. Gaussian random variables with zero mean
and unit variance. It can be seen that with higher dimensionality, the variances become
smaller. This suggests that the SIN method provides better privacy protection at lower
dimensionality.
Eqn. (4.7) provides a privacy measure of individual element. To evaluate the degree
of privacy protection for all the individual elements in a vector, as well as the privacy
preserving property of the global characteristics of the features, we define the following
privacy measures:
Definition 1: A feature vector u ∈ <N is called privacy protected at element-wise
Chapter 4. Sorted Index Numbers for Face Recognition 84
level α, where α is computed as:
α =1
N
N∑i=1
1− (1− ηi)h(1− ηi), ηi =Var[ui − ui]
Var[ui]. (4.8)
where ui denote the estimated value of element ui, and h(x) is the unit step function,
i.e., h(x) = 1 if x ≥ 0 and h(x) = 0 otherwise. The function h(x) is utilized to regulate
the significance of all the elements, such that the variance ratio of any single dimension
is maximum 1.
Using the variance ratio of the estimated difference and the original variate has been
used as a privacy measure for individual attributes in data mining [117]. Here we take
the average of the variance ratio as a measure of the privacy protection for the indi-
vidual elements. When the variance ratio of any attribute is greater or equal to 1, i.e.,
Var[ui − ui] ≥ Var[ui], then the estimation of that attribute essentially provides no
useful information, and the attribute is strongly protected. The element-wise privacy
level α measures the average privacy protection of individual elements. The greater the
α value, the better the privacy protection. For the SIN method, assuming the elements
in u follow a distribution of mean zero and variance σ2u, then for the estimation of the
jth order element, we have ηj =2σ2
j:N
σ2u
, and α = 1N
∑Nj=1 1− (1− ηj)h(1− ηj).
Besides measuring the privacy protection of the individual elements, it is also impor-
tant to measure the global characteristics of the feature vector such that the estimated
vector is not close to the original one up to certain similarity functions. In [118], it is
shown that any arbitrary distance functions can be approximately mapped to Euclidean
distance domain through certain algorithms. In this chapter, we consider the squared Eu-
clidean distance (SED) between the estimated and original feature vectors as a measure
of privacy:
Definition 2: A feature vector u ∈ <N is called privacy protected at vector-wise
level β, where β is computed as:
β =E[‖u− u‖2]
E[‖r− u‖2]. (4.9)
Chapter 4. Sorted Index Numbers for Face Recognition 85
where r denote a random vector in the estimation feature space, with the same dis-
tribution as u. If the average distance between the estimated and original vector is
approaching the average distance between any random vector and the original vector,
then the estimated vector essentially exhibits randomness, and therefore does not dis-
close information about u, i.e., the larger the β, the better privacy. Considering the
worst case that the distribution of the elements in u is known to have zero mean and a
variance of σ2u, then we have:
β =E[
∑Ni=1(ui:N − ui:N)2]
E[∑N
i=1(ri − ui)2]=
∑Ni=1 2σ2
i:N
2Nσ2u
=
∑Ni=1 σ2
i:N
Nσ2u
. (4.10)
Figure 4.6 depicts the privacy measures as functions of N using 1000 randomly selected
PCA feature vectors. The PCA vectors are normalized to have mean zero and variance
1/N . In both cases, the estimation is based on Gaussian distribution and it is assumed
that the mean and variance values are known by the adversary. It can be observed that
the SIN method provides better privacy level at lower dimensionality.
Figure 4.5: Variance σ2j:N as function of dimensionality N .
Chapter 4. Sorted Index Numbers for Face Recognition 86
Figure 4.6: Privacy measures of SIN as functions of dimensionality.
4.5 Experimental Results
The performance of the proposed method is evaluated on the same generic data set
as described in Chapter 3. To study the effects of different feature extractors on the
performance of proposed methods, we compare Principal Component Analysis (PCA) and
Kernel Direct Discriminant Analysis (KDDA). PCA has been introduced in Chapter 3.
PCA produces the most expressive subspace for face representation, but is not necessarily
the most discriminant one. This is due to the fact that the underlying class structure
of the data is not considered in the PCA technique. It was shown in [97] that KDDA
outperforms other techniques in most of the cases. Therefore we also adopt KDDA for
comparison in this chapter.
KDDA was proposed by Lu et al. [119] to address the nonlinearities in complex face
patterns. Kernel based solution finds a nonlinear transform from the original image
space RJ to a high-dimensional feature space F using a nonlinear function φ(·). In
the transformed high-dimensional feature space F , the convexity of the distribution is
expected to be retained so that traditional linear methodologies such as PCA and LDA
Chapter 4. Sorted Index Numbers for Face Recognition 87
can be applied. The optimal nonlinear discriminant feature representation of z can be
obtained by:
y = Θ · ν(φ(z)) (4.11)
where Θ is a matrix representing the found kernel discriminant subspace, and ν(φ(z)) is
the kernel vector of the input z. The detailed implementation algorithm of KDDA can
be found in [119].
4.5.1 Face Identification
For face identification, we use all the 5676 images in the generic data set for experiments.
A set of 2836 images from 520 human subjects are randomly selected for training, and
the rest of 2840 images from 500 subjects for testing. There is no overlap between the
training and testing subjects and images. The test is performed on an exhaustive basis,
such that each time, one image is taken from the test set as a probe image, while the
rest of the images in the test set as gallery images. This is repeated until all the images
in the test set were used as the probe once. The classification is based on the nearest
neighbor classifier.
Table 4.1 shows the correct recognition rate (CRR) of SIN method with Euclidean
and Cosine distance measures at different dimensionalities, and a graphical comparison is
depicted in Figure 4.7. It can be observed that at a higher dimensionality, the SIN method
may improve the recognition accuracy of PCA significantly, while maintaining the good
performance of the stronger feature extractor KDDA. The PCA method projects images
to directions with highest variance, but not the discriminant ones. This will become more
severe in large image variations due to illumination, expression, pose and aging. When
computing the similarity between two PCA vectors, the distance measure is sensitive to
the variation of individual element, particularly those directions corresponding to noise.
The SIN method, on the other hand, reduces this sensitivity by simply comparing the
relative relation of the projections, and therefore possibly provides better error tolerance.
Chapter 4. Sorted Index Numbers for Face Recognition 88
In the case of strong extractors such as KDDA, the SIN method will approximate the
distance between two vectors, and hence preserve the recognition accuracy.
PCA KDDA
Dim. Euc. Cos. SIN Euc. Cos. SIN
20 56.30 56.31 52.32 40.04 41.09 34.86
40 60.09 61.09 61.94 61.44 65.28 61.94
60 63.52 62.96 66.06 71.73 74.86 74.68
80 64.37 64.44 68.84 81.76 83.27 81.76
100 65.14 65.18 71.27 79.05 80.42 80.07
Table 4.1: Face identification results (CRR in %).
Figure 4.7: CRR in face identification scenario.
4.5.2 Face Verification
For face verification, the experiments are performed on the generic verification data set,
where 2388 images from 520 subjects are randomly selected as the training set, and 2278
images of the rest 500 subjects as the testing set. There is no overlap between the training
Chapter 4. Sorted Index Numbers for Face Recognition 89
and the testing subjects and images. The evaluation is also performed on an exhaustive
basis, where every single image is used as a template once, and the rest of the images in
the test set as the probe images.
Table 4.2 details the obtained EER of SIN with Euclidean and Cosine distance mea-
sures at different dimensionalities when PCA and KDDA are used as feature extractors,
and a graphic comparison is presented in Figure 4.8. In general, the Cosine distance mea-
sure outperforms the Euclidean distance, and the proposed SIN method improves both
the verification accuracy of PCA and KDDA at almost all dimensionalities. This fur-
ther demonstrates that the SIN approach indeed offers better error tolerance and provide
more discriminant representation.
PCA KDDA
Dim. Euc. Cos. SIN Euc. Cos. SIN
20 20.05 19.23 13.78 25.22 20.42 20.97
40 19.09 17.81 11.46 21.49 16.22 14.54
60 18.52 17.42 10.28 18.80 13.41 10.97
80 18.50 17.15 9.72 10.96 9.90 7.19
100 18.20 16.94 9.46 10.41 8.84 6.52
Table 4.2: Face verification results (EER in %).
4.6 Summary
This chapter has introduced a novel approach for face recognition based on feature vec-
tors in continuous domain. The proposed method stores the sorted index numbers of
dimensionality reduced feature vectors as biometric templates for recognition. The SIN
method is originated from the pairwise relation of any two elements in a vector, and it is
shown that it is capable of approximating the Euclidean distance between two vectors. A
Chapter 4. Sorted Index Numbers for Face Recognition 90
Figure 4.8: EER in face verification scenario.
new distance measure has been presented for evaluating the similarity between SIN vec-
tors. Since it is impossible to recover the exact value of any of the original features, the
transformation from the original features to the SIN vector is non-invertible. To study
the privacy protecting property of SIN method, two privacy measures that evaluate the
protection at both element and vector levels are introduced. It has been shown that the
SIN method may provide better privacy protection at lower dimensionality. Experimental
results on both face identification and verification demonstrate that the proposed method
may improve the recognition performance. Such characteristics of the SIN method make
it a candidate for being applied in conjunction with random transformations to obtain
changeability and enhanced privacy protection.
Chapter 5
Random Transformations for
Changeable Biometrics
5.1 Introduction
To support the deployment of biometrics in a wide range of applications, the same bio-
metric trait should be able to be used in different applications. For example, a user
should be able to register his face images for different bank account access, or for com-
puter/network logins. For security purposes, the biometric templates that are generated
for different applications should not be able to authenticate each other. In Chapter 4, a
privacy preserving scheme that utilizes the sorted index numbers of the extracted features
is proposed. The SIN method is capable of providing a certain level of privacy protection
in which the original features can not be exactly recovered. However, the SIN method
itself does not address the changeability problem. In other words, two SIN vectors of
the same biometric can be used to authenticate each other. To solve this problem, a
repeatable transform is necessary to be applied prior to the SIN operation. This can be
achieved by introducing randomness into the biometric features.
In this chapter, we present methods for changeable face verification using random
91
Chapter 5. Random Transformations for Changeable Biometrics 92
transformations. The proposed method applies random transformations on the origi-
nal features first. The randomized vector is then sorted in descending order, and the
corresponding index numbers are recorded and stored as template for future matching.
Random transformations have been used extensively as data perturbation techniques for
privacy preserving data mining, which include additive data perturbation [120,121], mul-
tiplicative data perturbation [122, 123], and random projection based approach [80]. In
this chapter, we explore their capability of producing changeable biometric templates, as
well as the privacy preserving properties when applied in conjunction with the irreversible
SIN technique. Random projection has shown its capability of obtaining changeability in
Chapter 3. Therefore, it is possible to apply random projection before the sorting opera-
tion. In addition, two other random transformations, namely random additive transform
and random multiplicative transform, are discussed and compared. It is shown that
since it is impossible to retrieve the original features from the sorted index numbers
of the randomized vector, the combination of random transformations and SIN com-
prises repeatable and non-invertible transformations, hence the generated templates are
changeable and privacy preserving.
The remainder of this chapter is organized as follows: Section 5.2 presents an overview
of the introduced solution. Detailed changeability and privacy analysis are presented in
Section 5.3 and 5.4 respectively. Section 5.5 presents the experimental results, and a
conclusive summary is provided in Section 5.6.
5.2 Method Overview
The proposed methods assume that a biometric signal is represented by a vector in
the continuous domain, and the similarity of the vectors can be evaluated by distance
measures such as Euclidean distance. The procedure of generating a biometric template
is as follows:
Chapter 5. Random Transformations for Changeable Biometrics 93
1. Extract feature vector w ∈ RN from the input face image.
2. Compute u = w − w, where w is the mean feature vector calculated from the
training data.
3. Use a key k as a control factor for randomness generation. Transform the vector u
by x = fk(u), x ∈ RM where fk() is a random transformation function associated
with the key k.
4. Sort vector x in descending order, and store the corresponding index numbers in a
new vector g.
5. The generated vector g ∈ ZM that contains the sorted index numbers is stored as
template.
For example, given x = {x1, x2, x3, x4}, the sorted vector in descending order is
g = {x4, x2, x3, x1}, then the template is g = {4, 2, 3, 1}. The similarity matching of the
SIN vectors is based on the SIN distance, which has been introduced in Chapter 4.
5.3 Changeability Analysis
The proposed methods utilize randomness to address the changeability problem, and in
combination with the SIN method for achieving privacy protection. A feature vector
u ∈ RN is first transformed by x = fk(u), and the resulting SIN vector of x is stored as
a biometric template. In this section, we study three types of random transformations:
random additive transform (RAT), random multiplicative transform (RMT) and random
projection (RP). To illustrate the changeability of the proposed methods, the statistical
properties of the random transformations are analyzed in detail.
Chapter 5. Random Transformations for Changeable Biometrics 94
5.3.1 Random Additive Transform
The RAT transform performs element-wise addition by adding a random vector to the
original biometric feature vector. Let u and v be two biometric feature vectors in an
N -dimensional Euclidean space, u ∈ RN and v ∈ RN . Let r ∈ RN and s ∈ RN be
two N -dimensional random vectors. Each entry of r and s follows an i.i.d. Gaussian
distribution of mean zero and variance σ2, ri ∼ N(0, σ2), si ∼ N(0, σ2), i = 1, ...,N. Let
x = u + r, y = v + s. If the same key (SK) is applied, i.e., r = s, then we have:
‖x− y‖2 = ‖u + r− v− s‖2 = ‖u− v‖2, (5.1)
Therefore, when the same RAT is applied, the squared Euclidean distance (SED)
between any two vectors is exactly preserved. If different keys (DK) are applied to u and
v, i.e., r 6= s, and r and s are independent to each other, we have:
E[‖x− y‖2] = ‖u− v‖2 + 2Nσ2, (5.2)
Var[‖x− y‖2] = 8‖u− v‖2σ2 + 8Nσ4, (5.3)
Please see Appendix 5-I for the proofs.
Eqn. (5.2) shows that that when DK are applied, for any two vectors with fixed
dimensionality, the mean of the SED will increase as σ2 increases. To facilitate demon-
stration, we assume that the distribution of ‖x − y‖2 is Gaussian. This is validated in
Figure 5.1(a), where we randomly select two PCA feature vectors from our experimen-
tal data set, perform the DK scenario 2000 times, and plot the SED. The PCA feature
vectors are normalized to unit length, and σ2 is set to 0.005. It can be observed that
the experimental values of mean and variance fit well with our theoretical results in Eqn.
(5.2) and Eqn. (5.3), and the distribution of the obtained SED can be well approximated
as a Gaussian distribution. Assuming u and v are biometric feature vectors from the
same human subject, to obtain changeability, we hope the transformed biometric repre-
sentation using different keys can not authenticate each other, i.e., their distance should
Chapter 5. Random Transformations for Changeable Biometrics 95
be larger than the system threshold t. Figure 5.1(b) depicts the distribution of ‖x− y‖2
at different σ2 values. As σ2 increases, the probability of getting ‖x − y‖2 < ‖u − v‖2
will decrease to zero. By setting a larger σ2 value, we can produce changeable biometric
templates with probability 1, i.e., P (‖x− y‖2 > t) = 1.
Figure 5.1: RAT: Distribution of SED (a) Gaussian approximation (σ2=0.005); (b) at
different σ2 values.
The above analysis demonstrates the changeability of RAT using the SED. Since the
SIN method also approximates the Euclidean distance between two vectors, it is expected
that similar property can be preserved by applying the SIN method on RAT transformed
vectors, noted as RAT-SIN in this chapter. Figure 5.2(a) plots the distribution of the
normalized SIN distance (NSD) in both SK and DK scenarios with σ2=0.005. The SIN
distance is normalized by dividing the largest possible value N(N−1)2
. It can be seen that
the distribution of both scenarios can be well approximated by Gaussian distributions.
Figure 5.2(b) plots the distributions as functions of the variance of additive vector σ2.
It can be observed that by increasing σ2, the SK and DK distributions become well
separated, and strong changeability can be obtained. The DK distribution moves to the
right toward 0.5, which implies stronger randomization. The SK distribution, on the
other hand, shift to the left. This implicates that with larger σ2, the randomness of the
Chapter 5. Random Transformations for Changeable Biometrics 96
additive vector may overpower the distribution of the original features, hence produce
larger deviation from the original characteristics of the features. Therefore, the larger
the σ2, the better the changeability, but possibly the lower the recognition accuracy.
Figure 5.2: RAT: Distribution of NSD (a) Gaussian approximation (σ2=0.005); (b) at
different σ2 values.
5.3.2 Random Multiplicative Transform
The RMT transform performs element-wise multiplication between a randomly generated
vector and the original feature vector. Let u and v be two biometric feature vectors in
an N -dimensional Euclidean space, u ∈ RN and v ∈ RN . Let r ∈ RN and s ∈ RN be
two N -dimensional random vectors. Each entry of r and s follows an i.i.d. Gaussian
distribution of mean one and variance σ2, ri ∼ N(1, σ2), si ∼ N(1, σ2), i = 1, ...,N. Let
x = u. ∗ r, y = v. ∗ s, where .∗ denote multiplication by elements. In the SK scenario,
i.e., r = s, we have:
E[‖x− y‖2] = (σ2 + 1)‖u− v‖2, (5.4)
Var[‖x− y‖2] = (2σ4 + 4σ2)N∑
i=1
(ui − vi)4, (5.5)
Please see Appendix 5-II for the proofs.
Chapter 5. Random Transformations for Changeable Biometrics 97
Eqn. (5.4) and Eqn. (5.5) show that the RMT preserves the mean of the SED between
two vectors in the transformed domain up to a scaling factor σ2 + 1, and the variance is
proportional to σ2. The larger the σ2, the bigger the variance. In the DK case, where
r 6= s and they are independent to each other, we can derive the statistical properties of
the SED in the transformed domain:
E[‖x− y‖2] = σ2(‖u‖2 + ‖v‖2) + ‖u− v‖2, (5.6)
Var[‖x− y‖2] = 2σ4
N∑i=1
(u2i + v2
i )2 + 4σ2
N∑i=1
(ui − vi)2(u2
i + v2i ), (5.7)
Please see Appendix 5-III for the proofs.
The statistical properties of RMT are validated through two randomly selected unit-
length PCA feature vectors of dimensionality 100 from our experimental data set, with
each experiment performed 2000 trails. Figure 5.3 demonstrates that the theoretical SED
distributions in Eqn. (5.4) - (5.7) fit well with the experimental results in both SK and
DK scenarios, and the distributions are approximately Gaussian.
Figure 5.3: RMT: Gaussian approximation of the distribution of SED (σ2 = 0.005).
To obtain changeability, we expect the distributions of SK and DK cases are well
separated. Eqn. (5.4) and Eqn. (5.6) can be rewritten as:
E
[‖x− y‖2
σ2 + 1
]= ‖u− v‖2, (5.8)
Chapter 5. Random Transformations for Changeable Biometrics 98
E
[‖x− y‖2
σ2 + 1
]=
σ2(‖u− v‖2 + 2uTv) + ‖u− v‖2
σ2 + 1
= ‖u− v‖2 +2σ2uTv
σ2 + 1, (5.9)
Eqn. (5.8) and Eqn. (5.9) show that the separation of the distributions is dependent
on the σ2 and the inner product of the vectors. Since there is no guarantee that uTv > 0,
it is possible that the SED in the DK transformed domain is even smaller than the
original SED, i.e., weak changeability. This is confirmed in Figure 5.4(a), where the
distribution of SED is plotted at different σ2 values. It can be observed that the SK and
DK distributions are not well separated with significant overlap.
Figure 5.4: RMT: Distribution of SED (a) at different σ2 values; (b) at different d values
(σ2 = 0.01).
To solve this problem, we note that 2σ2
σ2+1> 0, and the SED in the DK case can be
enlarged by increasing uTv. This can be achieved by adding a translation vector d to
u and v, such that uTv is augmented, and the SED of the original vectors ‖u − v‖2
is unaltered. As such, the distributions of SK and DK cases can be well separated,
and strong changeability can be obtained. This is shown in Figure 5.4(b), where the
distributions of SED at different translation values are plotted. For simplicity, all the
elements in d are set to the same value d. Since the addition of d does not change the SED
Chapter 5. Random Transformations for Changeable Biometrics 99
when the same key is applied, the distribution of the SK case is the same for different d
values. On the other hand, it can be seen that by adding appropriate translation value,
the mean of the DK distribution shifts to the right, away from the SK distribution. The
clear separation of SK and DK distributions indicates the possibility of producing strong
changeability.
Figure 5.5: RMT: Gaussian approximation of the distribution of NSD (σ2 = 0.005).
Due to the distance approximation property of SIN, similar properties may be ob-
tained when the SIN method is applied in the RMT transformed domain. Figure 5.5
demonstrates that in both SK and DK cases, the distributions of NSD can also be ap-
proximated by Gaussian distributions. The NSD distributions in the RMT transformed
domain are depicted in Figure 5.6(a), at different σ2 values. Similar to the SED distri-
butions, the significant distribution overlap indicates weak changeability. As shown in
Figure 5.6(b), by adding a translation value, it is possible to produce clear separation of
SK and DK distributions. Note that, different from Figure 5.4(b), where the addition of
d does not change the distribution of SED in SK case, the mean of NSD shifts to the left
as the translation value increases. This is similar to the RAT-SIN method, the larger the
d value, the better the changeability, but possibly the lower the verification performance.
Chapter 5. Random Transformations for Changeable Biometrics 100
Figure 5.6: RMT: Distribution of NSD (a) at different σ2 values; (b) at different d values
(σ2 = 0.01).
5.3.3 Random Projection
The changeability of RP has been analyzed in Chapter 3 using a geometric based ap-
proach. In this section, we provide an alternative analysis by study the statistical prop-
erties of the features in the projected domain. Let u and v denote two vectors in an
N -dimensional Euclidean space, u ∈ <N and v ∈ <N . Let R be an N × M(M ≤ N)
matrix with each entry rij, i = 1, ...,N, j = 1, ...,M follows an i.i.d. Gaussian distribu-
tion, rij ∼ N(0, 1N
). Let x =√
NM
RTu, and y =√
NM
RTv, then it is shown in Chapter
3, Lemma 3.4 that:
E[‖x− y‖2] = ‖u− v‖2 , (5.10)
Var[‖x− y‖2] =2
M‖u− v‖4 . (5.11)
It shows that when SK is applied, the mean of the SED in the projected domain
equals the SED of the original vectors, and the variance is inversely proportional to the
projected dimensionality M . Therefore, the higher the projected dimensionality, the
better the distance can be preserved in the transformed domain.
To provide changeability, the biometric templates that are generated using DK should
not be able to authenticate each other. For two vectors u ∈ <N and v ∈ <N , let R and
Chapter 5. Random Transformations for Changeable Biometrics 101
S be two independent N × M(M ≤ N) matrices with each entry of R and S an i.i.d.
Gaussian random variable, i.e., rij ∼ N(0, 1N
), sij ∼ N(0, 1N
), i = 1, ...,N, j = 1, ...,M.
Let x =√
NM
RTu, and y =√
NM
STv, then we have:
E[‖x− y‖2] = ‖u‖2 + ‖v‖2 , (5.12)
Var[‖x− y‖2] =2
M(‖u‖2 + ‖v‖2)2. (5.13)
Please see Appendix 5-IV for the proofs.
Eqn. (5.12) and Eqn. (5.13) show that when different RP matrices are applied, the
mean of the SED equals the sum of the squared vector length, and the variance is inversely
proportional to the projected dimensionality M . Figure 5.7(a) shows the distribution of
the SED between two feature vectors in SK and DK scenarios. We randomly selected
two PCA feature vectors (N = 100) of the same human subject from the employed
data set, normalized to unit length, and performed RP 2000 trials. It can be seen that
the theoretical results in Eqn. (5.10) - (5.13) fit very well with the experimental results
(M = 80). The distributions of SED approximate Gaussian in both SK and DK scenarios.
Note that although the DK distribution has a mean that is larger than that of the
SK distribution, the separation of the SK and DK distributions is dependent on the
characteristics of the features. For example, let u and v denote two vectors from the
same subject, if ‖u− v‖2 is large, i.e., large within-class variation, then the SK and DK
distributions will possibly have overlap, hence clear separation of distribution can not
be obtained, and strong changeability can not be achieved. Figure 5.7(b) depicts the
SK and DK distributions at different projected dimensionalities. The relation between
the projected dimensionality and the variance of distance distribution can be easily ob-
served. The lower the M , the higher the variance. The degree of distribution overlap
also increases as the projected dimensionality decreases.
The Euclidean distance approximation property of the SIN method indicates possibly
similar changeability characteristic when the SIN method is applied after RP (RP-SIN).
Chapter 5. Random Transformations for Changeable Biometrics 102
This is confirmed in Figure 5.8(a) and Figure 5.8(b) that the NSD also approximate
Gaussian distributions in both SK and DK scenarios, and the variance increase as M
decreases. Similarly, the SK and DK distributions have overlapping and clear distribution
separation can not be obtained.
Figure 5.7: RP: Distribution of SED (a) Gaussian approximation (M=80); (b) at different
projected dimensionalities.
Figure 5.8: RP: Distribution of SED (a) Gaussian approximation (M=80); (b) at different
projected dimensionalities.
Note that in Eqn. (5.12), the expected SED is equal to the sum of the squared
length of the two vectors. By increasing the length of the vectors, the SED in the
Chapter 5. Random Transformations for Changeable Biometrics 103
DK case will be further enlarged. To achieve this, we can apply a vector translation,
i.e., x =√
NM
RT (u + d). Figure 5.9 shows the impact of vector translation with dif-
ferent translation values. All the elements in d are set to the same value d. Since∥∥∥√
NM
RT (u + d)−√
NM
RT (v + d)∥∥∥
2
=∥∥∥√
NM
RTu−√
NM
RTv∥∥∥
2
, the vector translation
by d does not change the SED between two vectors using the same key, therefore in Fig-
ure 5.9(a), the distribution of SK case does not change with d. However, as d increases,
the distribution of DK case shifts to the right, and clear separation of SK and DK distri-
butions can be obtained. For the SIN distance in Figure 5.9(b), due to the randomness
of the projection, the DK distribution is always centered around 0.5. The vector trans-
lation operation shifts the SK distribution to the left, and the distributions can be well
separated. The clear separation of the distributions indicates strong changeability.
Figure 5.9: RP: Distribution of (a) SED, and (b) NSD, at different vector translation
values.
5.4 Privacy Analysis
In this section, the privacy preserving properties of the random transformations in combi-
nation with the SIN method are discussed in detail using the previously defined element-
wise and vector-wise privacy measures in Chapter 4.
Chapter 5. Random Transformations for Changeable Biometrics 104
5.4.1 RAT-SIN
The RAT method itself (without applying the SIN operation) is capable of producing
changeability for the generated templates. Without any knowledge of the RAT additive
vector, it is impossible for an attacker to recover the values of the original features.
However, such a method only offers limited privacy protection since the exact value of the
original biometric features will be computed by a simple element-by-element subtraction
if the RAT additive vector is known. To solve this problem, we introduce the combination
of RAT with SIN for achieving enhanced privacy protection.
In the RAT based SIN framework (RAT-SIN), a random vector r ∈ <N with each
element ri an i.i.d. random variable of mean zero and variance σ2r is added to the biometric
feature vector u ∈ <N , and the SIN vector of the resulting vector x = u + r is stored
as template. Since the biometric features are mean centralized, u has zero mean. Let
σ2u denote the variance of the elements in u, then the variance of the elements in x is
σ2x = σ2
u +σ2r . Due to the randomness of r, it is impossible for an adversary to accurately
estimate x without knowing r. Assuming the worst case that an attacker knows the
distribution of u, and also obtains r, then he can estimate the variance of r, generate a
set of random numbers of mean zero and variance σ2u + σ2
r , estimate x according to the
SIN vector g, and then subtract r to get u = x− r.
For fixed dimensionality N , Figure 5.10 plots the variance of order statistics σ2j:N
as functions of variance σ2x, assuming Gaussian distribution. It can be seen that σ2
j:N
is proportional to the variance σ2x. The larger the σ2
x, the greater the σ2j:N . Since the
element-wise privacy α and vector-wise privacy β are both proportional to σ2j:N , hence
the larger the σ2r , the greater the σ2
x, and the better the privacy protection. This is
confirmed in Figure 5.11, where the α and β values are plotted as functions of σ2r , with
the dimensionality set to 100, and the PCA vectors are normalized to zero mean and
a variance of 0.01. It can be seen that better privacy protection can be obtained by
increasing the variance of additive vector r.
Chapter 5. Random Transformations for Changeable Biometrics 105
Figure 5.10: Variance σ2j:N as function of variance of x σ2
x.
Figure 5.11: Privacy measures of RAT method as functions of variance σ2r .
5.4.2 RMT-SIN
In the RMT based framework RMT-SIN, the SIN vector g of RMT transformed vector x
is stored as template, with each element of x obtained by xi = ri(ui + d), i = 1, 2, ..., N ,
where ri ∼ N(1, σ2r), ui is the ith element of feature vector u of mean zero and variance σ2
u,
Chapter 5. Random Transformations for Changeable Biometrics 106
and d is a translation value. It is straightforward to derive that E[xi] = E[ri(ui +d)] = d,
E[x2i ] = E[r2
i (ui +d)2] = (σ2r +1)(σ2
u +d2), and the variance of xi is σ2x = E[x2
i ]−E[xi]2 =
σ2u(σ
2r + 1) + d2σ2
r .
Assuming the worst case that an attacker knows the distribution of u, and obtains the
value of d and r, he can generate a set of N random numbers of mean d and variance σ2x,
estimate x by mapping the numbers according to the SIN vector g, perform element-wise
division followed by subtraction of d to obtain an estimate of ui as ui = xi/ri − d. As
shown in Figure 5.10, the variance of the order statistics σ2j:N increases as the variance
of σ2x increases. In the RMT-SIN method, σ2
x is proportional to σ2r and d. Since both
privacy measures α and β are proportional to σ2j:N , the larger the σ2
r and d, the greater
the σ2j:N , and hence the better the privacy. Figure 5.12 shows the element-wise privacy
α and vector-wise privacy β as functions of the variance of multiplicative vector σ2r and
translation value d respectively, using PCA feature vectors. It can be seen that the
privacy protection level improves at higher σ2r and d values.
Figure 5.12: Privacy measures of RMT-SIN as functions of variance σ2r and translation
value d: (a) α, (b) β.
Chapter 5. Random Transformations for Changeable Biometrics 107
5.4.3 RP-SIN
In the RP based framework RP-SIN, the SIN vector g of x =√
NM
RT (u + d) is stored
as template, where R ∈ <N×M with each entry rij ∼ N(o, 1N
), u is the feature vector of
mean zero and variance σ2u, and d is the translation vector. Assuming all the elements
in d have the same value d, for each element in x, we have:
E[xi] = E
[√N
M
N∑j=1
rji(uj + d)
]= 0, (5.14)
σ2x = Var[xi] = E[x2
i ]− E[xi]2
= E
(√N
M
N∑j=1
rji(uj + d)
)2
=N
ME
(N∑
j=1
rji(uj + d)
)2
=N
ME
[N∑
j=1
r2ji(uj + d)2
]
=N
M
N∑j=1
E[r2ji(uj + d)2]
=N
M
N∑j=1
1
N(σ2
u + d2)
=N
M(σ2
u + d2). (5.15)
Similar to our previous analysis, we assume the worst case where g, R, and d are
all compromised by an attacker. For a projection function x = RT (u + d), the most
an attacker can do is to generate a set of M random numbers with mean and variance
shown in Eqn. (5.14) and Eqn. (5.15), map to x according to g, then estimate u by
R(RT R)−1x−d, where R(RT R)−1 is essentially the pseudo-inverse of R. In Eqn. (5.15),
it is shown that σ2x increases as M decreases and d increases. According to Figure 5.10,
the greater the σ2x, the larger the variance of the order statistics σ2
j:N , and the larger
the privacy measures α and β. Therefore, in the RP-SIN method, better privacy can be
Chapter 5. Random Transformations for Changeable Biometrics 108
achieved at a lower projected dimensionality M and greater translation value d. This is
demonstrated in Figure 5.13 using PCA feature vectors, where α and β are plotted as
functions of M and d respectively.
Figure 5.13: Privacy measures of RP-SIN as functions of projected dimensionality M
and translation value d: (a) α, (b) β.
5.5 Experimental Results
To evaluate the effectiveness of the proposed method, we conduct experiments on the
generic verification data set with the experimental setup the same as in Chapter 3 and
4. The training set contains 2388 images from 520 subjects and 2278 images from 500
human subjects are used for testing. All the experiments are performed 5 times, and the
averages of the results are reported. PCA and KDDA are selected as feature extractors,
and the original dimensionality is set to 100 for both of them. The detailed experimental
results are presented in this section.
Chapter 5. Random Transformations for Changeable Biometrics 109
5.5.1 RAT-SIN
Table 5.1 shows the obtained EER of applying the RAT-SIN method on PCA and KDDA
feature vectors at different additive vector variance value σ2, and a graphical presentation
is provided in Figure 5.14. Note that when the variance is zero, it is equivalent to apply
SIN on the original vectors. It can be observed that as the variance increases, the EER
in the UD scenario decreases gradually to zero, while that of the UI scenario increases
slightly. This is consistent with our analysis in Section 5.3.1. As shown in Figure 5.2,
when the variance of the random additive vector increases, clearer separation of the
SK and DK distributions can be obtained, hence better changeability can be achieved.
However, it can also be observed that the distribution of the NSD in the SK scenario
shifts to the left as the variance increases. This indicates larger deviation from the
original characteristics of the features, hence possibly degrades the performance in the