A cascade fusion scheme for gait and cumulative foot pressure image recognition

Pattern Recognition 45 (2012) 3603–3610

Contents lists available at SciVerse ScienceDirect

Pattern Recognition

0031-32

http://d

n Corr

E-m

kqhuan

dacheng

journal homepage: www.elsevier.com/locate/pr

A cascade fusion scheme for gait and cumulative foot pressureimage recognition

Shuai Zheng a, Kaiqi Huang a,n, Tieniu Tan a, Dacheng Tao b

a National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, Chinab Centre for Quantum Computation & Intelligent Systems, Faculty of Engineering and Information Technology, University of Technology, Sydney, NSW 2007, Australia

a r t i c l e i n f o

Article history:

Received 19 November 2010

Received in revised form

28 February 2012

Accepted 10 March 2012Available online 5 April 2012

Keywords:

Gait

Foot pressure image

Human recognition

03/$ - see front matter & 2012 Elsevier Ltd. A

x.doi.org/10.1016/j.patcog.2012.03.008

esponding author. Tel.: þ86 10 82610278; fa

ail addresses: [email protected], kylezheng

[email protected] (K. Huang), [email protected] (

[email protected] (D. Tao).

a b s t r a c t

Cumulative foot pressure images represent the 2D ground reaction force during one gait cycle.

Biomedical and forensic studies show that humans can be distinguished by unique limb movement

patterns and ground reaction force. Considering continuous gait pose images and corresponding

cumulative foot pressure images, this paper presents a cascade fusion scheme to represent the potential

connections between them and proposes a two-modality fusion based recognition system. The

proposed scheme contains two stages: (1) given cumulative foot pressure images, canonical correlation

analysis is employed to retrieve corresponding gait pose image candidates in gallery dataset;

(2) pedestrian recognition is achieved via small samples matching between retrieved gait pose images

and unlabeled ones. The proposed fusion recognition system is not only insensitive to slight changes of

environment and the individual users, but also can be extended to multiple biometrics retrieval.

Experimental results are conducted on the CASIA gait–footprint dataset, which contains cumulative

foot pressure images and its corresponding gait pose image sequence from 88 subjects. Evaluation

results suggest the effectiveness of the proposed scheme compared to other related approaches.

& 2012 Elsevier Ltd. All rights reserved.

1. Introduction

Recent biomedical and forensic studies [1] reveal that humanscan be distinguished by unique walking patterns (e.g., limbmovement pattern and ground reaction force). Unique limbmovement patterns seem to help to recognize individuals atdistance (e.g., gait recognition [2]), while ground reaction forceand its variants such as footprints are also separately utilized toidentify criminals by human experts [3]. Recent walking patternbased recognition systems are not practical for many reasons dueto the single camera sensor. For example, viewpoint problems ingait recognition under constrains could be avoided if we useKinect [4]. The cumulative foot pressure image is a cumulativeimage-type record of ground reaction force change during onegait cycle [5]. It provides richer information to identify differentwalking patterns, compared to the existing 1D ground reactionforce or simple 2D footprint pictures. The cumulative footpressure image has been applied in biomedical assistant, forensicinvestigation, sports assistant training and custom shoes [5]. Inorder to address the problems in existing walking pattern

ll rights reserved.

x: þ86 10 62551993.

[email protected] (S. Zheng),

T. Tan),

recognition systems, we propose to develop a computationalcorrelation model for cumulative foot pressure images and gait.

As far as we know, there are a few previous works proposed todevelop the computational correlation model for cumulative footpressure image and gait, although a lot of works have beenproposed to study gait recognition [6–8] or the cumulative footpressure images [5]. Multimodal biometric system [1] have beenproposed to combine evidences from different sources. Thesesources might simultaneously come from various sensors [9],different classification algorithms [10], multiple instances for theevidence or directly from diverse biometric traits [11]. Naivefeature combinations may not always improve the performance,since some components in different sources may not be comple-mentary. Zhang et al. found that the performance of humanrecognition using multiple sources could be improved by redu-cing the redundant classes in the gallery dataset [12]. Specifically,in order to evaluate the computational correlation model weobtain, we also develop a human recognition system using gaitpose images and corresponding cumulative foot pressure imageswithout these limitations using a cascade fusion scheme.

The proposed study is necessary since it not only provides acomputational correlation model but also a solution for entrancecontrol applications. For example, in jailhouse security system orsuspect identification, cumulative foot pressure images and gaitpose images of the same individual can be captured at differenttimes. Hence, cumulative foot pressure images and gait pose

www.elsevier.com/locate/pr

www.elsevier.com/locate/pr

dx.doi.org/10.1016/j.patcog.2012.03.008



mailto:[email protected]






Gait PoseImages(GPIs)

Cumulative FootPressure Images

(CFPIs)

Retrieval Result

HigherSimilarity Score

GPIs

Camera IdentificationResults

Foot pressure measurement plateLabeled GPIs Dataset

GPI matching

Corresponding GPIs retrieval using CCA

Foot pressure measurement device

Fig. 1. Overview of pedestrian recognition using gait and cumulative foot pressure images.

S. Zheng et al. / Pattern Recognition 45 (2012) 3603–36103604

images can be used to identify escaped criminal or investigatesuspects noninvasively. Furthermore, the proposed cascade two-modality fusion scheme can also be employed to develop a cross-multiple-biometrics information retrieval system [13], whichcould allow users to retrieve the gait pose images from the personof interest given the corresponding cumulative foot pressureimages.

Fig. 1 gives an overview of the propose scheme. When a personwalks over the foot pressure measurement plate, cumulative footpressure images are captured. Simultaneously or not, correspond-ing gait pose image sequences are captured via an off-the-shelfcamera. After preprocessing and feature extraction, canonicalcorrelation analysis (CCA) is employed to find correspondingimages in the gait pose gallery dataset, given the cumulative footpressure images. Then, pedestrian recognition is achieved viamatching the unlabeled gait pose images with labeled retrievedones. Briefly, the scheme consists of three parts:

�
Feature representation for gait pose images and cumulativefoot pressure images. � Corresponding labeled gait pose image retrieval using CCA. � Pedestrian recognition via gait pose image matching.
The remainder of this paper is organized as follows. Section 2presents related work about gait, cumulative foot pressure andmultiple evidence fusion schemes for pedestrian recognition.Section 3 illustrates the proposed cascade fusion scheme. InSection 4, feature representation for gait pose images and cumula-tive foot pressure images are presented. Section 5 describes thefusion based recognition system. Section 6 introduces the dataset.Section 7 reports the experiments. We draw conclusion in Section 8.

2. Related works

Gait recognition is potentially useful for personal identification[14]. It is quite attractive for identification purposes since itsadvantages are that it is completely unobtrusive, and does notinvolve any subject cooperation or contact. The state-of-the-artapproaches in gait recognition can be divided into model-basedand model-free approaches.

Model-based approaches tend to recover the underlyingmathematical construction of gait with a structure motion model.The mean shapes of gait silhouettes are modeled by Wang et al.via employing procrustes analysis [15]. Bouchrika and Nixonextract crucial feature descriptions from human joints by devel-oping a motion-based model using elliptic Fourier descriptors[16]. However, the performance of the approaches suffers frompoor localization of the torso and difficult extraction of under-lying models from gait sequences.

The other kind of approach is model-free one. One kind ofmodel-free approach preserves temporal information in recogni-tion and training states. Hidden Markov models (HMMs) areutilized to achieve gait recognition [17]. Principal componentanalysis (PCA) [18,19] is employed to extract statistical spatial–temporal feature descriptors of gait [20]. In this kind of approach,large-scale training samples are required for probabilistic tem-poral modeling approaches to obtain a good performance. Hence,the disadvantage for the approach is the high computationalcomplexity of sequence matching during recognition and thehigh storage requirement of the dataset. Another kind of model-free approach converts a sequence of images into a singletemplate. Gait recognition by averaging all the silhouettes ispresented by Liu et al. [21]. Han proposed a gait energy image(GEI) to construct real and synthetic gait templates [22]. Therecognition performance may degrade since the temporal infor-mation in gait sequences are discarded. Wang et al. developed aspatial-temporal walking template called chrono-gait image (CGI)to encode the temporal information via color mapping to improvethe recognition rates [23]. The main drawback of theseapproaches is that they easily suffer from slight changes ofenvironment such as illumination variation in probe and gallerydata collections or crowded scenarios. Besides, traditionalmotion-based gait representation is not practical and stable inmore wide application scenarios such as internet videos or imagesequences from IP camera.

Recent works on action recognition started to introduce somefeature descriptors like histograms of oriented gradients (HOG) torepresent several action key poses [24]. Such kinds of featuredescriptors have also helped to achieve state-of-the-art perfor-mance in object detection and object recognition [25]. In thesetasks, they are proved to overcome environmental challenges andbe able to represent objects without background priority or

S. Zheng et al. / Pattern Recognition 45 (2012) 3603–3610 3605

motion-based segmentation. Inspired by their success, we pro-pose to use combination feature descriptors obtained fromseveral continuous gait pose images to represent individual gait.

Footprints are an important identification evidence for forensicinvestigation purposes. Although it has been applied since ancienttimes, only recently Kennedy first proved the uniqueness of barefootprints and their use as a possible means of identification [26].Previous works can be divided into two types, one is groundreaction force and another is a still footprint image. Groundreaction force is a 1D data signal record of walking. Moustakidisproposed a subject recognition system based on ground reactionforce measurement [27]. Cattin developed a general PCA fusionbased biometric system using both gait and ground reactionforce [7]. However, this method is neither robust to slight noisenor able to distinguish different walking manners betweendifferent pedestrians. Besides, the strictly controlled data collec-tion environment limits its potential applications. The secondmethod is utilizing still footprint image to achieve pedestrianrecognition. Nakajima proposed person recognition using normal-ized pairs of raw barefoot prints [28]. Uhl developed a footprint-based biometrics verification system using scanned barefootimages [29]. The method failed in recognizing the same individualwhen users wore different shoes. Cumulative foot pressureimages contain cumulative spatial and temporal force informa-tion during one gait cycle, which may help to handle thedifficulties in recognizing individuals wearing different shoes.Previous work shows that feature descriptors based on hierarch-ical models are invariant to different shoes [30]. Inspired by thesuccess of hierarchical models on translation-invariant objectrecognition datasets, we propose a Gaussian mixture model(GMM) based on cumulative foot pressure images.

Existing multimodal biometric systems are proposed by com-bining evidence from different sources [7,31]. Different combina-tion approaches can be divided into three types: the feature level,matching score level and decision level. Cattin utilized Bayesiandecision theory to fuse ground reaction force and gait [7]. Thisloses much correlation information. Zhang et al. proposed to fuseevidence at the feature level [31]. However, this is not practicalsince the multiple modalities may be incompatible and directconcatenating feature vectors may suffer from the ‘‘curse ofdimensionality’’ problem. He et al. investigated the performanceof various score level fusion approaches in multimodal biometricsystems [32]. But these approaches do not address the issue offusing evidence obtained at different times. Motivated by multi-modal document cross retrieval systems [13], we propose a

Original Image

Probability ofboundary (pb)

Gradients HOG

pbHOG

Origi

pbHOG

Fig. 2. Comparison of different gait representations and overvie

cascade fusion scheme to fuse gait and cumulative foot pressureimages both at the feature level and score matching level.

3. Feature representation

In this section, we attempt to achieve gait representation fromstill image sequences and a cumulative foot pressure imagerepresentation which is translation invariant.

3.1. Gait representation

A very basic assumption in all gait recognition research is thatall walking sequences from the same person follow a similarwalking pattern, where the walking pattern involves the movingrange of the limbs. However, there are many walking cyclesrepeated in one walking image sequence. To estimate the walkingperiod and separate a single walking cycle from one walkingimage sequence, we compute the movement of the limbs as theframe changes. We firstly employ Felzenszwalb’s detector [33] toextract the bounding box for the pedestrian in each frame. Thenwe compute the width change of bounding box as the framechanges. This is similar to the method of Sarkar et al. [21].

We compute the gait representation as the concatenated prob-ability of boundary based histogram of oriented gradients (pbHOG)descriptors based on the normalized cropped walking poses for onewalking cycle. We extract the HOG descriptors based on the prob-ability of boundary (Pb) operator [34] responses. As Fig. 2 illustrates,the Pb operator suppresses small noise in the image. Hence, pbHOGcaptures more salient walking pose details.

3.2. Translation-invariant representation for cumulative foot

pressure image

Our goal is to learn a feature representation model for cumu-lative foot pressure images which preserves translation-invariance.Recent work [35] has been proposed to address the shoe-invariantcumulative foot pressure image recognition. In contrast, weaddress the problem of slight changes in barefoot pressure images.Current low-level descriptors are proved to be invariant to minortranslation and effective in many object recognition applications.However, they are not practical for representing cumulative footpressure image since they lose a lot of structure information (interms of object parts) which is crucial for shoes-invariant cumu-lative foot pressure image recognition. Previous works [12] prove

nal gait pose images

for gait pose images Gait pose imagerepresentations

w of concatenated pbHOG gait pose image representation.

Dense SIFT featureextraction

NormalizeCBPIs

Center of Pressure(COP) curve

EM algorithm basedGMM modeling

COP curveextraction

EM algorithm basedGMM modeling

GMM parametersrepresentation results

GMM parametersrepresentation results

Concatenated CBPIdescriptor vectors

Fig. 3. Diagram of GMM based representation for cumulative foot pressure images.


that hierarchical model will bring advantages of translation-invar-iant power. Hence, we propose to model cumulative foot pressureimage with hierarchical Gaussian mixture models, as Fig. 3illustrates.

Suppose z denotes a p-dimensional feature vector (SIFTdescriptor) or COP (center of pressure curve) [27,5] from cumu-lative foot pressure image I. We model z by GMM model:

pðz9yÞ ¼XM

k ¼ 1

wIkNðz;mI

k,SIkÞ,

where M denotes the total number of Gaussian components andðwI

k,mIk,SI

kÞ are the weight, mean and covariance matrix of the kthGaussian component, respectively. For computational efficiency,the covariance matrices SI

k are restricted to be a diagonal matrix.The number of model parameter y¼ ðwI

k,mIk,SI

kÞ increases withrespect to N. N is the number of images. pðyÞ is the distribution ofthe parameters. Following [36], the prior mean vector, priorweights, and covariance matrix are estimated by fitting a globalGMM. The other parameters are learned by maximum a posteriori(MAP) loss:

max½ln pðz9yÞþ ln pðyÞ�:

Center of pressure histograms in the cumulative foot pressureimage and dense SIFT feature descriptors are separately extractedand modeled with GMMs. We achieved a CFPI representation viaconcatenating these descriptor vectors. We represent the cumu-lative foot pressure image with the parameters of the GMM.Hence, following the suggestion in [36], we represent cumulativefoot pressure image as

x¼ ½ffiffiffiffiffiffiffiwI

1

qS�1=2

1 mI1, . . . ,

ffiffiffiffiffiffiffiffiwI

M

qS�1=2

M mIM �:

4. Cascade fusion scheme

In this section, we present a cascade fusion scheme usingcanonical correlation analysis and its variant regression algorithm.

4.1. Feature selection using canonical correlation analysis

In order to identify shared correlated structure among vari-ables from two evidence sources, Canonical correlation analysis(CCA) [37] is employed as a feature selection method firstly. Thealgorithm estimates two basis vectors so that, after linear projec-tion, the correlation between the two classes is mutually max-imized. Given two sets of samples vectors S¼ ðr1,u1Þ,ðrn,unÞ, andtheir projection matrices wr and wu, CCA mutually maximizes the

object function:

r¼maxwr ,wu

wTr Cruwuffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

wTr CrrwrwT

uCuuwu

p ,

where Crr and Cuu are within-sets covariance matrices andCru ¼ CT

ur are between-sets covariance matrices. A closed formsolution can be computed by solving a generalized eigen-valueproblem. Large problems can be solved efficiently using predictivelow-rank decomposition with partial Gram–Schmidt orthogona-lization [38].

Based on the learned projection matrices, we achieve the lowembed feature vectors X ¼ ðx1,x2, . . . ,xnÞ ¼ ðwT

r r1, . . . ,wTr rnÞ and

Y ¼ ðy1,y2, . . . ,ynÞ ¼ ðwTuu1, . . . ,wT

uunÞ.

4.2. Correlated scores computation via CCA regression

In order to evaluate the correlation scores between the datafrom two evidences, CCA regression is adopted again. Given probefeature vectors X : x1, . . . ,xn and gallery feature vectorsY : y1, . . . ,yn, a mapping pair W x,W y is learned via CCA regres-sion:

arg maxW x ,W y

E½WT

x XYT W y�ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiE½W

T

x XXT W x�E½WT

y YYT W y�

q :

Based on the mapping pair, we can compute projections asfollows:

x0 ¼ WT

x x,

y0 ¼ WT

y y:

Correlated scores sðx0i,y0iÞ are computed between each gallery

and probe via CCA regression:

sðx0,y0iÞ ¼x0y0i

Jx0JJy0iJ:

Based on the scores, gallery samples that have high correlatedscores are retrieved as candidates for further pedestrian recognition.

5. Cascade fusion scheme for two-modality based pedestrianrecognition

Based on the proposed cascade fusion scheme, we achievepedestrian recognition using gait and cumulative foot pressureimages. As Fig. 4 illustrates, there are two inputs of the system:cumulative foot pressure images and corresponding continuousgait pose images. The two evidence sources can be simultaneousor not, but need to be collected from the same individual during

Unlabeled Cumulative foot pressure

images (CFPIs)

Labeled gait pose images (GPIs)CCA-based

dimensionality reduction

Correlated Projection matrix

Cosine distance Measurement

Between embeddingand each labeled GPIs

Correlated GPI

embedding

Retrieval ResultHigher Similarity Score

Fig. 4. Diagram of cascade fusion scheme for pedestrian recognition using gait and cumulative foot pressure images.


labeled dataset construction. Following Section 3, the concate-nated HOG descriptors are proposed to represent a given gaitwhile a super-vector using GMM parameters is employed topreserve the characteristics of given cumulative foot pressureimages. In the cascade fusion scheme, cumulative foot pressureimages are firstly employed to prune irrelevant parts of the searchspace in the labeled gait dataset and predict the correlationsimilarity scores between the given cumulative foot pressureimages and labeled gait image sequences. Then, the pedestrianrecognition process is achieved by matching the given gait imagesequences with a small number of labeled ones, using a trainedlinear SVM classifier.

6. Dataset

As far as we know, this dataset is the first publicly accessibledataset containing both cumulative foot pressure images andcorresponding gait together. This dataset consists of 3496 gaitpose images and 2658 cumulative foot pressure images. Thedistributions of subjects in some basic attributes are presentedin Fig. 5. The data were collected from 88 pedestrians, 20 femaleand 68 male, in an indoor environment. Experimental factors likeillumination, background and clothes are not under strict control,as Fig. 6 illustrates.

Fig. 5. The distribution of age and body measurement index (BMI) in GPFP

dataset. Most of subjects in GPFP dataset are youth, while most of them are

healthy or little fat.
7. Experiments
The purpose of the proposed system is to develop a computa-tional evaluation framework for studying the relations betweengait and cumulative foot pressure images. Although the applica-tions of the study presented in this paper are not limited torecognition, we evaluate the performance of the proposedapproach following recognition evaluation criteria. To evaluatethe effectiveness of proposed cascade framework in recognition,

we choose three other comparison approaches including a singlesource approach (only gait feature descriptor), naive baselinefusion approach, and CCA based fusion approach [10].

In experiments, we randomly divided the dataset into twosubsets: a training set of 440 groups of data containing gait poseimages and correlated cumulative foot pressure images and thetesting set containing the other 434 groups of data.

Fig. 6. The sample of dataset containing gait pose and cumulative foot pressure images.

50

40

30

20

10

0

Pre

cisi

on %

PCA+CCA

HOGpbHOG

PLSPCA

Best precision at recall level 10.67%.

50

40

30

20

10

0

Pre

cisi

on %

PCA+CCA

HOGpbHOG

PLSPCA

Best precision at recall level 16.67%.

Fig. 7. Precision of GPIs pruning using CFPIs images as queries, when recall value is 10.67%. The experimental results show that PCAþCCA feature selection method with

pbHOG GPIs representation outperforms others.


We evaluate the recognition performance using accuracy. To getthe best results, we compare the different potential features andfeature selection approaches considering the performance of retrie-val, which often uses precision and recall as evaluation criteria.Suppose the number of retrieved gait pose images is Rg, and thenumber of the retrieved gait pose images which is correlated withgiven cumulative foot pressure is Rcg. The precision is defined as

precision¼ Rcg=Rg :

Recall is the fraction of the gait pose images which are correlated tothe correct retrieved ones. Assume that the number of correlated gaitpose images is Rc. Then, the recall is defined as

recall¼ Rcg=Rc:

7.1. Feature and feature selection

To choose the features and feature selection approach, weperform a retrieval evaluation. During the training stage, we learnthe feature subspace projection matrix. During testing, we utilizethe learned projection matrix to reduce the dimensionality of the

pbHOG and GMM feature vectors. Then our proposed CCA-basedcascade scheme is employed to assign correlation scores betweengaits pose images in dataset and cumulative foot pressure imagequeries.

As Fig. 7 illustrates, the pbHOG feature descriptor and PCA–CCA feature selection are the best combination. Considering thecomparison of different feature descriptors, pbHOG performs bestsince the probability of boundary largely reduces noise responsein still images which helps to reduce the within class divergence.Considering different feature selection approaches, PCA and CCAcombination performs best because of the natural advantages ofPCA and CCA. PCA captures the principle components and ensuresthat the feature vectors from the two evidence sources are not ill-posed, while CCA maximizes the correlations between cumulativefoot pressure images and gait pose for the same person.

The precision–recall curve is presented only at the recall levelat 10.67% because it is difficult to visualize the precision–recallcurve perfectly since we only have 88 subjects. Hence, rather thangiving the whole precision–recall curve, we give comparisonresults of precision values under two different recalls. Recall isthe fraction of relevant instances that are retrieved. The precision


drops significantly when the recall increases. The correspondingprecision value is too small when recall is larger than 10.67%. Inthis experiment, we find that the system achieves best perfor-mance when we pick up the recall at 10.67%.

7.2. Fusion scheme for pedestrian recognition

To evaluate the performance of the proposed fusion approach,we perform a recognition performance evaluation. During thetraining stage, feature selection projection matrices and linearSVM classifier are trained on the gallery dataset. During thetesting stage, there are two inputs: one is cumulative footpressure images, another is gait pose images. Given probe datain the form of cumulative foot pressure images and gait poseimages, the proposed approach will feed back the correspondinglabel according to the matching process between the probe andgallery data. Specifically, the process in our proposed cascadescheme for pedestrian recognition is presented as follows: First,unlabeled cumulative foot pressure images are used as a probedata to retrieve correlated labeled gait pose images in gallerydataset. Then, the correlation score helps to prune much irrele-vant labeled gait pose images and obtain a small number oflabeled gallery dataset. Second, unlabeled gait pose images aresent to the classifier via matching with the small number oflabeled gallery dataset.

In our experiments, the penalty parameter C in the linear SVMsare all set as 10. In the cascade scheme, the threshold is set to 10,which means we choose only the top 10 correlated scores oflabeled gait pose images for the gait matching process. We choosethis threshold because the recognition performance of proposedsystem achieves the best results under this setting.

In Fig. 8, we choose the result of PCA feature selection basedgait recognition to compare the fusion scheme with humanrecognition using only single source.

In the other approaches illustrated in Fig. 8, we comparedifferent fusion schemes while fixing the feature representationand feature selection approach. For the feature representation andfeature selection approach, we use pbHOG for gait representationand the GMM representation for cumulative foot pressure imageswhile we use PCA–CCA as feature selection. Naive fusion meanscombining the dimension reduced feature vectors from gait and

99

86.18

67.97

39.17

Cascade fusion with PCA-CCA feature selectionBenchmark CCA based fusion with PCA-CCA as feature selectionNaïve (combination) fusion with PCA-CCA as feature selectionSingle gait recognition with PCA feature selection

0102030405060708090

100

Acc

urac

y %

Comparison of Different Pedestrian Identification Methods

Fig. 8. Comparison on different recognition systems: single gait recognition,

different fusion schemes. Single gait recognition with PCA feature selection

represents the human recognition with single evidence rather than two fusion

evidences. The gait feature presentation is pbHOG. The fusion schemes presented

here adopt all same input feature vector. The fix input are pbHOG gait representa-

tion and GMM cumulative foot pressure image representation. PCA–CCA combi-

nation is adopted as supervised feature selection method. Naive fusion scheme

combines the two feature vectors together. Benchmark CCA based one use the

method in [10]. Cascade fusion one is the proposed one.

cumulative foot pressure image directly together. The benchmarkCCA based fusion method is the same as illustrated in [10].Cascade fusion is the method developed in this paper.

Fig. 8 shows the performance result of different pedestrianrecognition schemes. The correlated-model-based cascade two-modality fusion scheme outperforms the other simple concate-nated ones. This is not a surprising result. The correlated modelhelps to reduce the number of labeled GPIs in the gallery datasetand prune the irrelevant GPIs. Hence, it casts the original difficultmulti-class matching problem into a small-class or even binaryclass matching problem. Further, the proposed cascade methodoutperforms the concatenated-based fusion scheme. This isbecause the proposed method utilizes the correlation betweenthe two modal data in correlation common space, while theconcatenated-based ones ignore this information.

8. Conclusions and future work

This paper reveals a new problem on how to reveal thewalking behavior pattern and demonstrate a possible way toexploit the correlation among foot pressure, footprint and walk-ing motion patterns for recognition. Specifically, we presented awork that allows to use different walking patterns to recognizepedestrian without camera setting. The gait pose and cumulativefoot pressure are two types of correlated person walking behaviorpatterns. In order to study the potential connections between thetwo walking behavior patterns, we establish a standard database,develop a pedestrian recognition system with the cascade fusionscheme. The cascade scheme is proved to be effective to extractthe correlated parts between the two modalities, according to theevaluation on the pedestrian recognition performance.

The proposed cascade fusion scheme could be used in design-ing a cross-biometric searching system which allows to retrievepedestrian or any biometric pattern with other query biometricpatterns. There are also some applications of interest withoutcamera settings, especially for criminal investigation and otherrelated specific applications. For example, it will be more con-venient to prune the suspect criminal or others via comparingamong footprint, cumulative foot pressure images and gait poseimages.

The drawback of this paper is to assume that all subjects arenot wearing shoes, which is not very convenient for normal use.Wearing different shoes would generate different footprintshapes which might cause the recognition failure. In this case,future work should be conducted to address the deformableinvariant cumulative foot pressure image recognition. Further,the future work will also focus on studying the correlationbetween footprint and cumulative foot pressure images, so thatthe computational model can help forensic investigation, whichoften collects a lot of footprint data from a crime scene.

Acknowledgment

This work is supported by National Basic Research Program ofChina (2012CB316302) and National Science Foundation of China(Nos. 60875021 and 61175007). It also in part supported by ARCdiscovery project with Project No. DP-120103730. We alsoappreciate the help and suggestions of Dr. Tianyi Zhou, Prof.Liming Shi, Prof. Liang Wang, and Dr. Ran He.

References

[1] L. Hong, A.K. Jain, S. Pankanti, Can multiplebiometrics improve performance,in: IEEE Workshop on Automatic Identification Advanced Technologies, 1999,pp. 1–8.


[2] D. Tao, X. Li, X. Wu, S.J. Maybank, General tensor discriminant analysis andgabor features for gait recognition, IEEE Transactions on Pattern Analysis andMachine Intelligence 29 (10) (2007) 1700–1715.

[3] R.K. Lasrsen, E.B. Simonsen, N. Lynnerup, Gait analysis in forensicmedicine—art. no. 64910m, Videometrics IX (2007) 6491.

[4] S. Sivapalan, D. Chen, S. Denman, S. Sridharan, and C. Fookes, Gait energyvolumes and frontal gait recognition using depth images, in: IEEE Interna-tional Joint Conference on Biometrics, 2011, pp. 501–506.

[5] RS. Scan.Lab, Footscan: An Floor Pressure Sensing Devices Product Introduc-tion /http://www.rsscan.co.uk/systems.phpS.

[6] A. Bertani, A. Cappello, M.G. Benedetti, L. Simoncini, F. Catani, Flat footfunctional evaluation using pattern recognition of ground reaction data,Clinical Biomechanics 14 (7) (1999) 484–493.

[7] P.C. Cattin, D. Zlatnik, R. Borer, Biometric authentication system using humangait, in: Mechatronics and Machine Vision in Practice, 2001, pp. 1–4.

[8] P.J. Phillips, S. Sarkar, I. Robledo, P. Grother, K.W. Bowyer, The gait identifica-tion challenge problem: data sets and baseline algorithm, in: InternationalConference on Pattern Recognition, 2002, pp. 385–388.

[9] K.I. Chang, K.W. Bowyer, P.J. Flynn, An evaluation of multimodal 2dþ3d facebiometrics, IEEE Transactions on Pattern Analysis Machine Intelligence 27(2005) 619–624.

[10] C. Shan, S. Gong, P.W. McOwan, Fusing gait and face cues for human genderrecognition, Neurocomputing 71 (2008) 1931–1938.

[11] X. Geng, K. Smith-Miles, L. Wang, Q. Wu, Context-aware fusion: a case studyon fusion of gait and face for human identification in video, PatternRecognition 43 (2010) 3660–3673.

[12] X. Zhang, Z. Sun, T. Tan, Hierarchical fusion of face and iris for personalidentification, in: IEEE International Conference on Pattern Recognition,2010, pp. 217–220.

[13] NSTC, The National Biometrics Challenge, National Science and TechnologyCouncil Subcommittee on Biometrics, vol. 1, 2006, pp. 1–20.

[14] W.M. Hu, T.N. Tan, L. Wang, S. Maybank, A survey on visual surveillance ofobject motion and behaviors, IEEE Transactions on Systems, Man, andCybernetics, Part C: Applications and Reviews 34 (3) (2004) 334–352.

[15] L. Wang, H. Ning, W. Hu, T. Tan, Gait recognition based on procrustes shapeanalysis, in: International Conference on Image Process, 2002, pp. 433–436.

[16] I. Bouchrika, M. Nixon, Model-based feature extraction for gait analysis andrecognition, in: International Conference on Computer Vision/ComputerGraphics Collaboration Techniques, 2007, pp. 150–160.

[17] C. Chen, J. Liang, H. Zhao, H. Hu, Gait recognition using hidden Markov model,in: Advanced in Natural Computation, 2004, pp. 399–407.

[18] N. Guan, D. Tao, Z. Luo, B. Yuan, Nenmf: an optimal gradient method forsolving non-negative matrix factorization and its variants, IEEE Transactionson Signal Processing, http://dx.doi.org/10.1109/TSP.2012.2190406, in press.

[19] X. Tian, D. Tao, Y. Rui, Sparse transfer learning for interactive video searchreranking, ACM Transactions on Multimedia Computing, Communicationsand Applications, http://dx.doi.org/10.1145/0000000.0000000, in press.

[20] L. Wang, T. Tan, H. Ning, W. Hu, Silhouette analysis-based gait recognition forhuman identification, IEEE Transactions on Pattern Analysis and MachineIntelligence 25 (2003) 1505–1518.

[21] Z. Liu, S. Sarkar, Simplest representation yet for gait recognition: averagedsilhouette, in: International Conference Pattern Recognition, 2004, pp. 211–214.

[22] J. Han, B. Bhanu, Individual recognition using gait energy image, IEEETransactions on Pattern Analysis and Machine Learning 28 (2006) 316–322.

[23] C. Wang, J. Zhang, J. Pu, X. Yuan, L. Wang, Chrono-gait image: a noveltemporal template for gait recognition, in: European Conference on Compu-ter Vision, 2010, pp. 257–270.

[24] N. Ikizler, R. Cinbis, S. Sclaroff, Learning actions from the web, in: IEEEInternational Conference on Computer Vision, Kyoto Japan, 2009, pp. 995–1002.

[25] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in:IEEE Computer Vision and Pattern Recognition, vol. 1, 2005, pp. 886–893.

[26] R.B. Kennedy, Uniqueness of bare feet and its use as a possible means ofidentification, Forensic Science International 82 (1) (1996) 81–87.

[27] S.P. Moustakidis, J.B. Theocharis, G. Giakas, Subject recognition based onground reaction force measurements of gait signals, IEEE Transactions onSystems, Man, and Cybernetics, Part B: Cybernetics 38 (6) (2008) 1476–1485.

[28] K. Nakajima, Y. Mizukami, K. Tanaka, T. Tamura, Footprint-based personalrecognition, IEEE Transactions on Biomedical Engineering 47 (11) (2000)1534–1537.

[29] A. Uhl, P. Wild, Footprint-based biometric verification, Journal of ElectronImaging 17 (1) (2008).

[30] H. Lee, R. Grosse, R. Ranganath, A.Y. Ng, Convolutional deep belief networksfor scalable unsupervised learning of hierarchical representation, in: Inter-national Conference on Machine Learning, 2009, pp. 77–85.

[31] T. Zhang, X. Li, D. Tao, J. Yang, Multimodal biometrics using geometrypreserving projections, Pattern Recognition 41 (2008) 805–813.

[32] M. He, S. Horng, P. Fan, Performance evaluation of score level fusion inmultimodal biometric systems, Pattern Recognition 43 (2010) 1789–1800.

[33] P.F. Felzenszwalb, R.B. Girshick, D. McAllester, D. Ramanan, Object detectionwith discriminatively trained part-based models, IEEE Transactions onPattern Analysis and Machine Intelligence 32 (2010) 1627–1645.

[34] D.R. Martin, C.C. Fowlkes, J. Malik, Learning to detect natural imageboundaries using local brightness, color, and texture cues, IEEE Transactionson Pattern Analysis and Machine Intelligence 26 (2004) 530–549.

[35] S. Zheng, K. Huang, T. Tan, Evaluation framework on translation-invariantrepresentation for cumulative foot pressure image, in: the 18th IEEE Inter-national Conference on Image Processing (ICIP), 2011, pp. 201–204.

[36] X. Zhou, N. Cui, Z. Li, F. Liang, T.S. Huang, Hierarchical gaussianization forimage classification, in: IEEE International Conference on Computer Vision,Kyoto, Japan, 2009, pp. 1971–1979.

[37] D.R. Hardoon, S. Szedmak, J.S. Taylor, Canonical correlation analysis: anoverview with application to learning methods, Neural Computation 16(2004) 2639–2664.

[38] J. Shawe-Taylor, N. Cristianini, Kernel Methods for Pattern Analysis, Cam-bridge University Press, 2004.

Shuai Zheng received his B.Eng. from Beijing Institute of Technology in 2008. He is now a master student from National Laboratory of Pattern Recognition, at the Instituteof Automation, Chinese Academy of Sciences. His current research interests include computer vision and pattern recognition, especially on image representation, imageclassification, object detection and gait recognition. He is a student member of the IEEE and the IEEE Computer Society.

Kaiqi Huang received his B.Sc. and M.Sc. from Nanjing University of Science Technology, China, and obtained his Ph.D. degree from Southeast University. He has worked atthe National Laboratory of Pattern Recognition (NLPR), the Institute of Automation, and the Chinese Academy of Science (CASIA) in China and he has been an associateprofessor at NLPR since 2005. He is a Senior Member of the IEEE. He is an executive team member of the IEEE SMC Cognitive Computing Committee as well as AssociateEditor of the International Journal of Image and Graphics (IJIG) and Electronic Letters on Computer Vision and Image Analysis (ELCVIA) as well as Guest Editor of a SignalProcessing special issue on Security.

His current research interests include visual surveillance, digital image processing, pattern recognition and biological based vision. He has published over 80 papers inimportant international journals and conferences such as IEEE TIPAMI, T-IP, T-SMC-B, TCSVT, Pattern Recognition (PR), Computer Vision and Image Understanding (CVIU),ECCV, CVPR, ICIP, ICPR. He received the Best Student Paper Awards from ACPR10 and was a prize winner in the detection task in both PASCAL VOC’10 and PASCAL VOC’11.He received the honorable mention prize for the classification task in PASCAL VOC’11.

Tieniu Tan received the B.Sc. degree in Electronic Engineering from Xi’an Jiao tong University, China, in 1984 and the M.Sc., DIC, and Ph.D. degrees in ElectronicEngineering from Imperial College of Science, Technology and Medicine, London, UK, in 1986, 1986, and 1989, respectively. He joined the Computational Vision Group,Department of Computer Science, The University of Reading, England, in October 1989, where he worked as research fellow, senior research fellow, and lecturer. In January1998, he returned to China to join the National Laboratory of Pattern Recognition, the Institute of Automation of the Chinese Academy of Sciences, Beijing. He is currently aprofessor and director of the National Laboratory of Pattern Recognition as well as president of the Institute of Automation. He has published widely on image processing,computer vision and pattern recognition. His current research interests include speech and image processing, machine and computer vision, pattern recognition,multimedia, and robotics. Dr. Tan serves as referee for many major national and international journals and conferences. He is an associate editor of Pattern Recognition andof the IEEE Transactions on Pattern Analysis and Machine Intelligence and the Asia editor of Image and Vision Computing. He was an elected member of the ExecutiveCommittee of the British Machine Vision Association and Society for Pattern Recognition (1996–1997) and is a founding co-chair of the IEEE International Workshop onVisual Surveillance.

Dacheng Tao is currently a professor with the Centre for Quantum Computation and Information Systems and the Faculty of Engineering and Information Technology inthe University of Technology, Sydney. He mainly applies Statistics and Mathematics for data analysis problems in data mining, computer vision, machine learning,multimedia, and video surveillance. He has authored and co-authored more than 100 scientific articles at top venues including IEEE T-PAMI, T-KDE, T-IP, and NIPS, withbest paper awards.

http://www.rsscan.co.uk/systems.php

dx.doi.org/10.1109/TSP.2012.2190406

dx.doi.org/10.1145/0000000.0000000