Top Banner
Pattern Recognition 41 (2008) 3481--3492 Contents lists available at ScienceDirect Pattern Recognition journal homepage: www.elsevier.com/locate/pr Recognising online spatial activities using a bioinformatics inspired sequence alignment approach Daniel E. Riedel, Svetha Venkatesh, Wanquan Liu Department of Computing, Curtin University of Technology, GPO Box U1987 Perth, Western Australia 6845, Australia ARTICLE INFO ABSTRACT Article history: Received 25 May 2007 Received in revised form 16 April 2008 Accepted 17 April 2008 Keywords: Activity recognition Bioinformatics Sequence alignment Dynamic time warping In this paper we address the problem of recognising embedded activities within continuous spatial se- quences obtained from an online video tracking system. Traditionally, continuous data streams such as video tracking data are buffered with a sliding window applied to the buffered data stream for activity detection. We introduce an algorithm based on Smith--Waterman (SW) local alignment from the field of bioinformatics that can locate and accurately quantify embedded activities within a windowed sequence. The modified SW approach utilises dynamic programming with two dimensional spatial data to quantify sequence similarity and is capable of recognising sequences containing gaps and significant amounts of noise. A more efficient SW formulation for online recognition, called Online SW (OSW), is also developed. Through experimentation we show that the OSW algorithm can accurately and robustly recognise man- ually segmented activity sequences as well as embedded sequences from an online tracking system. To benchmark the classification performance of OSW we compare the approach to dynamic time warping (DTW) and the discrete hidden Markov model (HMM). Results demonstrate that OSW produces higher precision and recall than both DTW and the HMM in an online recognition context. With accurately segmented sequences the SW approach produces results comparable to DTW and superior to the HMM. Finally, we confirm the robust property of the SW approach by evaluating it with sequences containing artificially incorporated noise. Crown Copyright 2008 Published by Elsevier Ltd. All rights reserved. 1. Introduction Video surveillance and automatic recognition of human activities is becoming increasingly important in modern society. Such recog- nition systems can be used to detect suspicious activities in complex environments such as airports and railway stations, to control inter- faces in human computer interaction (HCI) applications or to pro- vide a means for monitoring elderly individuals in a caring capacity. In this paper we recognise human activities, represented by spatial sequences, in a simulated smart home environment and further re- strict the problem domain to the elderly. In doing so, we assume that the elderly carry out activities in a habitual manner; an assumption consistent with the findings of Monk et al. [1] and Suzuki et al. [2]. Providing smart houses for the elderly is important in maintaining the quality of life and independence of the aging population and re- ducing the on-going costs of care associated with that maintenance. Corresponding author. Tel.: +61 8 92662746. E-mail addresses: [email protected] (D.E. Riedel), [email protected] (S. Venkatesh), [email protected] (W. Liu). 0031-3203/$30.00 Crown Copyright 2008 Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2008.04.019 Activity recognition is the task of identifying an action or se- ries of actions taken in pursuit of an objective. This view of activity recognition differs from existing works where researchers view the detection of walking, running or similar primitive actions as activity recognition. In a smart home setting, objectives could include but not be limited to having breakfast, reading a newspaper, watching television, cooking, having a shower or going to sleep. The series of actions that one would need to recognise in order to determine whether an objective has been accomplished may involve walking to a table, opening a cupboard, sitting on a chair or even lying on a bed. We focus our attention on the spatial component of activities as they can be obtained non-invasively from video tracking systems and for the majority of activities spatial signatures are unique. Much work has been done in the area of human activity recog- nition over the last decade. Existing methodologies can be classified according to the manner by which activities are modelled, produc- ing two distinct methodologies: state-space models and template matching techniques [3]. State-space models attempt to capture the variation in spatial sequences. These approaches include neural networks [4--6], hidden Markov models (HMMs) [7] and extensions to the HMM [8--12]. The HMM and its' variants have been used
12
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Pattern Recognition 41 (2008) 3481 -- 3492

    Contents lists available at ScienceDirect

    Pattern Recognition

    journal homepage: www.e lsev ier .com/ locate /pr

    Recognising online spatial activities using a bioinformatics inspired sequencealignment approach

    Daniel E. Riedel, Svetha Venkatesh, Wanquan Liu

    Department of Computing, Curtin University of Technology, GPO Box U1987 Perth, Western Australia 6845, Australia

    A R T I C L E I N F O A B S T R A C T

    Article history:Received 25 May 2007Received in revised form 16 April 2008Accepted 17 April 2008

    Keywords:Activity recognitionBioinformaticsSequence alignmentDynamic time warping

    In this paper we address the problem of recognising embedded activities within continuous spatial se-quences obtained from an online video tracking system. Traditionally, continuous data streams such asvideo tracking data are buffered with a sliding window applied to the buffered data stream for activitydetection. We introduce an algorithm based on Smith--Waterman (SW) local alignment from the field ofbioinformatics that can locate and accurately quantify embedded activities within a windowed sequence.The modified SW approach utilises dynamic programming with two dimensional spatial data to quantifysequence similarity and is capable of recognising sequences containing gaps and significant amounts ofnoise. A more efficient SW formulation for online recognition, called Online SW (OSW), is also developed.Through experimentation we show that the OSW algorithm can accurately and robustly recognise man-ually segmented activity sequences as well as embedded sequences from an online tracking system. Tobenchmark the classification performance of OSW we compare the approach to dynamic time warping(DTW) and the discrete hidden Markov model (HMM). Results demonstrate that OSW produces higherprecision and recall than both DTW and the HMM in an online recognition context. With accuratelysegmented sequences the SW approach produces results comparable to DTW and superior to the HMM.Finally, we confirm the robust property of the SW approach by evaluating it with sequences containingartificially incorporated noise.

    Crown Copyright 2008 Published by Elsevier Ltd. All rights reserved.

    1. Introduction

    Video surveillance and automatic recognition of human activitiesis becoming increasingly important in modern society. Such recog-nition systems can be used to detect suspicious activities in complexenvironments such as airports and railway stations, to control inter-faces in human computer interaction (HCI) applications or to pro-vide a means for monitoring elderly individuals in a caring capacity.In this paper we recognise human activities, represented by spatialsequences, in a simulated smart home environment and further re-strict the problem domain to the elderly. In doing so, we assume thatthe elderly carry out activities in a habitual manner; an assumptionconsistent with the findings of Monk et al. [1] and Suzuki et al. [2].Providing smart houses for the elderly is important in maintainingthe quality of life and independence of the aging population and re-ducing the on-going costs of care associated with that maintenance.

    Corresponding author. Tel.: +61892662746.E-mail addresses: [email protected] (D.E. Riedel), [email protected]

    (S. Venkatesh), [email protected] (W. Liu).

    0031-3203/$30.00 Crown Copyright 2008 Published by Elsevier Ltd. All rights reserved.doi:10.1016/j.patcog.2008.04.019

    Activity recognition is the task of identifying an action or se-ries of actions taken in pursuit of an objective. This view of activityrecognition differs from existing works where researchers view thedetection of walking, running or similar primitive actions as activityrecognition. In a smart home setting, objectives could include butnot be limited to having breakfast, reading a newspaper, watchingtelevision, cooking, having a shower or going to sleep. The seriesof actions that one would need to recognise in order to determinewhether an objective has been accomplished may involve walkingto a table, opening a cupboard, sitting on a chair or even lying on abed. We focus our attention on the spatial component of activitiesas they can be obtained non-invasively from video tracking systemsand for the majority of activities spatial signatures are unique.

    Much work has been done in the area of human activity recog-nition over the last decade. Existing methodologies can be classifiedaccording to the manner by which activities are modelled, produc-ing two distinct methodologies: state-space models and templatematching techniques [3]. State-space models attempt to capturethe variation in spatial sequences. These approaches include neuralnetworks [4--6], hidden Markov models (HMMs) [7] and extensionsto the HMM [8--12]. The HMM and its' variants have been used

    http://www.sciencedirect.com/science/journal/prhttp://www.elsevier.com/locate/prfile:[email protected]:[email protected]:[email protected]

  • 3482 D.E. Riedel et al. / Pattern Recognition 41 (2008) 3481 -- 3492

    successfully in dealing with uncertainty but suffer from high trainingcomplexity, in particular the multi-layer and hierarchical models.Template matching approaches such as [13--17] compare extractedfeatures to pre-stored patterns or templates, but have issues withhigh runtime complexity, noise intolerance, spatial activity variation,and/or viewpoint specificity.

    The activity sequences used by these activity recognition ap-proaches are typically captured using online video tracking sys-tems. These systems produce continuous and uniform streams oftracking data that contain known activities and non-activity sub-sequences, corresponding to movement between activities and de-viations from known activity paths. In order to isolate individualspatial activity sequences for quantification with models or tem-plates one can either use a sliding window with width w on thebuffered stream (Fig. 1(a)--(c)) or segmentation of the data stream.Sliding window approaches compare the fixed length window se-quence to pre-stored templates for similarity quantification prior toclassification, as shown in Fig. 1(d). Segmentation techniques recog-nise and extract embedded activities through location of activityboundaries, prior to similarity quantification and classification. Seg-mentation of continuous data is not a new problem and has beenaddressed previously in domains such as speech and gait recogni-tion. Unfortunately, in the spatial activity domain segmentation ismore difficult as activity boundaries are not so obvious.

    Few methods have been proposed to specifically deal with onlineactivity segmentation. In the methods of Bobick and Ivanov [18] andIvanov and Bobick [19] a sliding window is applied to an observedsequence to allow for inferencing with low level HMMs. The ob-served sequence is then labelled accordingly. Like other sliding win-dow approaches, the performance of this technique is sensitive tothe specified window size. Another segmentation approach is givenby Peursum et al. [20], where observed sequences are segmentedand classified using HMMs trained with manually labelled activitysequences. During classification, the probability of a sequence hav-ing a particular label is determined and through calculation of theprobability at each time instance, the boundaries of the activities canalso be found.

    To apply any of the above activity segmentation methodolo-gies in isolation, for segmentation of a continuous data stream, isproblematic. This is because the segmentation components of theapproaches are intertwined with the recognition capabilities. There-fore, one must still adopt a sliding window approach to identifyembedded activities, particularly if one wishes to use un-related se-quence matching techniques for similarity quantification. Given thisconstraint, two issues relating to sliding windows need to be ad-dressed. The first of these is the window size w. If one assumes thatactivities are conducted over a similar duration, as are habitual ac-tivities with the elderly, and in ideal tracking conditions (such asin controlled indoor environments) then it is appropriate to use awindow size corresponding to the length of the longest activity. Re-alistically, the window size must be set to some value larger thenthe longest activity length, taking into account a feasible increase inpossible activity duration. With an appropriately sized sliding win-dow (normally set to the length of the longest activity or the averagelength 2. 5 standard deviations), the second issue relates to locat-ing an activity within that window sequence as shown in Fig. 1(d).Quantifying window sequences poses a problem for classificationas the corresponding sequences can contain additional subsequenceelements, which are not part of an embedded activity. These super-fluous elements can in turn reduce the probability of an activity oc-curring in relation to a learnt model or increase the aligned distancebetween a class template. Even if duration is constant across activ-ities, variation in captured sequence length occurs, due to trackingsystems failing to consistently track objects. Some possible reasonsfor the failure result from occlusions, lighting variation, deficien-

    cies in background subtraction techniques and geometric modellinglimitations.

    Techniques like the HMM [21] take the whole window sequenceinto account when calculating the probability of an observed se-quence belonging to a given model. As a result, the superfluous el-ements decrease the resulting sequence probability, particularly ifthey have estimated symbol probabilities close to zero. Dynamictime warping (DTW) [22] and similar global sequence alignment ap-proaches such as edit distance with real penalty (ERP) [23] and editdistance on real sequence (EDR) [24] are also susceptible to super-fluous sequence elements. This occurs as the techniques attempt tominimise the distance across the entirety of both the known and ob-served sequences, taking into account the additional distance fromthe superfluous elements. Similarity algorithms based on the longestcommon subsequence (LCSS) [25,26] address this global limitationby ignoring superfluous elements in the observed sequences. Unfor-tunately, the techniques also allow significant deviations in a pat-tern, which can lead to incorrect classification. For instance, if onemeasures the LCSS between a known 1D sequence a= [13] and twoobserved sequences b= [123] and c= [1224443], both b and c havethe same LCSS length (of three), indicating that both are equally sim-ilar to a. However, by visual inspection, one would say that b is moresimilar to a.

    In order to recognise known spatial activity patterns andto address the above-mentioned deficiencies, we apply theSmith--Waterman (SW) local alignment approach from bioinfor-matics and modify the algorithm for two dimensional online spatialactivity recognition. In previous work [27], we proposed the use ofSW for spatial activity recognition and successfully evaluated theapproach using accurately and inaccurately segmented activity se-quences. In this paper, we provide OSW (online SW), a more efficientSW formulation for online recognition. Unlike DTW and HMMs,OSW does not require accurate sequence segmentation to correctlyrecognise embedded sequences, such as those found in sliding win-dows of online recognition systems. To prove the superiority of theOSW approach over DTW and the discrete HMM, we evaluate it inan online context with a sliding window and using a 12 activity dataset. We also demonstrate the effectiveness of SW with accuratelysegmented activity sequences. The robustness claim of SW overDTW and the HMM is further validated by evaluating the approachwith accurately segmented spatial sequences containing noise.

    The layout of the paper is as follows. In Sections 2--4 we intro-duce sequence alignment, discuss the SW algorithm and the mod-ifications for spatial sequence recognition, and then the proposedOSW algorithm. In order to perform a benchmark comparison of SW,we also provide a discussion of DTW (Section 5) and the discreteHMM (Section 6) and factors involved with their application in spa-tial activity recognition. Following from this, we present our datacollection and experimental methodology in Section 7. Results fromevaluation with a 12 activity data set are shown in Section 8 and aconclusion is presented in Section 9.

    2. Sequence alignment

    Sequence alignment methods are concerned with finding the bestmatching alignments of two query sequences according to specifiedoptimisation criteria; typically maximising similarity or minimisingdistance. In order to derive the optimal alignments, each symbolis compared sequentially with the symbols of the other sequence.During this stage the local similarity or distance is calculated be-tween the opposing symbols and using techniques such as dynamicprogramming (DP), optimal subalignments and finally an optimalalignment are produced. With the maximising similarity criteria apositive score is associated with matching symbols, while negativescores are given to non-matching symbols and insertions/deletions

  • D.E. Riedel et al. / Pattern Recognition 41 (2008) 3481 -- 3492 3483

    Fig. 1. Online spatial activity recognition using a sliding window. (a) Sliding window at t=1, (b) sliding window at t=2, (c) sliding window at t=3, (d) classification process.

    Fig. 2. An example sequence alignment containing gaps of size one and two.

    (referred to as indels). Indels, denoted by the symbol (-) in an align-ment diagram, are used to represent insertions or deletions in eithersequence and are incorporated into alignments to fill gaps causedby differences in either of the sequences. For non-matching sym-bols, the choice whether to include an indel in the alignment or notis dependent on which of the options is more optimal, that is has ahigher total similarity or smaller overall distance.

    An alignment assumes that two sequences a and b satisfy thefollowing constraints [28]:

    1. All symbols in the sequences a and b must also be in the align-ment. For example, if a= [123] and b= [124] the resulting align-ment must be of the form:

    1 2 3 | |

    1 2 4 where represents zero or more indels.

    2. All symbols in the alignment must appear in the same order asdefined in the sequences a and b, except that zero or more indelsor gaps may be present between the symbols, as in the aboveexample.

    3. Symbols of one sequence can be aligned with an indel in the othersequence. For example, if a= [123] and b= [13], the 2 in a can bealigned with an indel as shown below:

    1 2 3| |1 - 3

    4. Indels of different sequences cannot be aligned together.An example pairwise alignment of one dimensional sequences a =[12223421144] and b = [111341121114] is shown in Fig. 2. Thegiven alignment contains seven matching symbols, three symbolmismatches and three indels.

    Fig. 3. Schematic global and local alignments between two sequences. (a) Globalalignment, (b) local alignment.

    Sequence alignment was pioneered in bioinformatics with a DPapproach in Ref. [29]. This technique aligned two amino acid se-quences across their entirety, maximising the similarity score of thematching individual elements. A DP basis is used as it is possible foroptimal alignments to be calculated from incrementally derived sub-alignments as each subalignment is itself optimal. Local alignmentapproaches such as SW [30] differ from global techniques as theyfind and quantify related regions of similarity within sequences. Thelocal alignments are typically found via an optimisation search pro-cess, originating at the beginning of the sequences until the ends. Anillustrative example of global and local alignment using sequences aand b can be found in Fig. 3.

    Throughout the rest of the paper we use the following notation.A symbol ai or bj represents a trajectory tuple (x, y), where x and ydenote the position within a two dimensional tracking space. Thesequences a and b with lengths |a| and |b| are composed of sym-bols organised in time sequential order, where i and j (1 i |a|and 1 j |b|) determine the position within that correspondingsequence. Thus, a sequence of symbols a= [a1, a2, . . . , |a|] can also berepresented as a sequence of tuples a=[(x1, y1), (x2, y2), . . . , (x|a|, y|a|])and in combination as in a = [(ax1, a

    y1), (a

    x2, a

    y2), . . . , (a

    x|a|, a

    y|a|)]. To

    simplify the notation we will predominately be using the symbolform.

    3. SW local alignment

    SW local alignment was originally developed in Ref. [30] to lo-cate biological sequence patterns within known sequence databasesand was first applied to spatial recognition in Ref. [27]. SW is simi-lar to the Needleman--Wunsch global alignment approach [29], ex-cept that SW includes an extra zero. Inclusion of the zero allows

  • 3484 D.E. Riedel et al. / Pattern Recognition 41 (2008) 3481 -- 3492

    termination of subsequence alignments that perform poorly, as non-matching subsequences produce negative similarity, which reducesthe similarity between the subsequences. When the similarity de-creases to less than zero, the zero of the SW relation terminates anyfurther decrease in similarity and allows new optimal subsequences,referred to as local alignments, to be found.

    Similarity-based sequence alignment techniques like Needleman--Wunsch global alignment typically have distance counterparts asthe negative penalties assigned to non-matching sequence elementscan be replaced by a positive penalty in the distance form. SW isdistinct in that has no distance counterpart [31]. This is becausethe algorithm uses negative similarity in conjunction with the extrazero to terminate poorly matching subsequence alignments, whichcannot be mimicked in a distance based approach.

    In order to apply SW to two dimensional spatial sequences re-quired several modifications to be made to the original algorithm.Firstly, an Euclidean matching function d(ai, bj) and correspondingmatching threshold were proposed. The matching threshold al-lows one to specify the maximum allowable Euclidean distance be-tween points for matching, which adds flexibility to the matchingprocess of the original specification. Formally, if the Euclidean dis-tance between the symbols ai and bj is less than thematching thresh-old then a match occurs and a positive score is attributed, that iss(ai, bj) = . If the Euclidean distance is equal to or larger than thematching threshold , reflecting a non-matching state, a real penaltyis assigned. The real penalty d(ai, bj) is based on the Euclideandistance (1) and was proposed for use with the stepwise functions(ai, bj), in the modified SW algorithm.

    d(ai, bj)=(axi bxj )

    2 + (ayi byj ) (1)

    The use of a real penalty function, rather than a constant, was shownto improve the discrimination capability of the SW algorithm, asdemonstrated in Ref. [27].

    Some consideration is required when choosing an appropriatematching threshold . If the specified of an x, y coordinate space istoo large, SW over generalises and matches dissimilar sequences asdepicted in Fig. 4(a), whilst if is too small, then matching becomeshighly specific (Fig. 4(b)), preventing recognition of similar sequencesand thus reducing the recall statistics.

    For gap scores a linear gap model with a gap penalty associatedwith each indel is adopted in the modified SW algorithm. Gap scoresare typically a function of the gap length l, denoted by g(l). A lineargap model is where g(l)=l, that is each indel is equally weighted.

    Resulting alignments are dependent on the values of the gappenalty , match cost and the matching threshold . If is largerin relation to the average mismatch penalty, which is dependenton , mismatches are favoured over gaps, producing shorter, morecompact alignments. The opposite occurs when is smaller thanthe average mismatch penalty. In relation to the match cost , onedoes not want ? or the mismatch penalty otherwise SW ignoresmismatches and gaps and therefore behaves similar to LCSS.

    To calculate the similarity of two spatial sequences a and b usingthe proposed SW based approach one simply applies (3)--(5) to theDP matrix C (2) for i = 0, 1, . . . , |a| and j = 0, 1, . . . , |b| and finds themaximum value in C.

    C(0, 0) . . . C(0, |b|)...

    . . ....

    C(|a|, 0) . . . C(|a|, |b|)

    (2)

    At each C(i, j) where i, j = 0, four choices (match or mismatch, gap ina, gap in b or start a new subsequence) are evaluated with the choicecorresponding to the maximum similarity value being selected foreach C(i, j). The match or mismatch score at each C(i, j) is derived

    Fig. 4. The affect of on spatial activity recognition. (a) too large, activity B isrecognised as A, (b) too small, alternate activity A sequence not recognised.

    using s(ai, bj) as previously described, while the gap scores for thesequences are derived using the linear gap model. If a negative sim-ilarity score results from C(i 1, j 1) + s(ai, bj), C(i 1, j) + andC(i, j 1) + , due to poor subsequence correspondence, then thefourth option of starting a new subsequence, represented by zero, isselected as the maximum.

    C(i, 0)= 0, 0 i |a| (3)C(0, j)= 0, 0 j |b| (4)C(i, j)= max{C(i 1, j 1)+ s(ai, bj),

    C(i 1, j) ,C(i, j l) , 0} (5)

    where

    s(ai, bj)={ d(ai, bj)

  • D.E. Riedel et al. / Pattern Recognition 41 (2008) 3481 -- 3492 3485

    Fig. 5. Example sequences a and b.

    Table 1SW C matrix using example sequences a and b

    (1.0,1.0) (2.0,2.0) (3.0,3.0)

    0.00 0.00 0.00 0.00(0.0,0.0) 0.00 0.00 0.00 0.00(1.0,1.0) 0.00 1.00 0.00 0.00(2.0,2.0) 0.00 0.00 2.00 1.00(4.0,4.0) 0.00 0.00 1.00 0.59(5.0,5.0) 0.00 0.00 0.00 0.00(6.0,6.0) 0.00 0.00 0.00 0.00(1.0,1.0) 0.00 1. 00 0.00 0.00(2.0,2.0) 0.00 0.00 2. 00 1.00(3.0,3.0) 0.00 0.00 1.00 3. 00

    Fig. 6. An optimal SW alignment using sequences a and b.

    4. OSW local alignment

    The naive approach for determining SW similarity in an onlinerecognition context is to calculate the full DP matrix for each of thesliding windows of width w, resulting in an O((w + 1) (|Ci| + 1))complexity for each window, where |Ci| is the length of the classtemplate sequence i. The window size w is typically set to the lengthof the longest training sequence. Instead, a more computationallyefficient method is proposed. In this formulation, the online systemretains a DP matrix of size (w + 1) (|Ci| + 1) in memory for thei=1, 2, . . . ,n class templates. Each of the i=1, 2, . . . ,n DP matrices areinitialised by calculating the SW score using (3)--(5) with the firstbuffered window sequence at time t0 to tw1 and the correspond-ing class template Ci, according to Fig. 7. The n SW scores generatedby the initialisation process are stored in memory for comparison tosubsequent SW scores of sliding windows, where t >0, and for com-parison to predetermined classification thresholds i for a specificclass Ci. The row of the maximum SW score is also retained for eachDP matrix to ensure the current maximum remains in the matrixas the window slides during incremental updating. The rationale forthis is explained later.

    To quantify further window sequences, we take advantage ofthe fact that DP optimal alignments are derived incrementally fromsubalignments that are also optimal. In order to illustrate the in-cremental derivation of optimal alignments we use the sequencesa = [(1, 1), (2, 2)] and b = [(1, 1), (2, 2), (3, 3)], with = 1, = 1 and = 1. The first row of the resulting DP matrix is initialised usingEq. (3), as shown in Table 2(a). The second row of the DP matrix(Table 2(b)), which corresponds to the optimal alignment betweena and b1 = [(1, 1)], is then calculated using Eq. (5), with the first col-umn of the row initialised to zero according to Eq. (4). The third rowof the DP matrix, seen in Table 2(c), is calculated by comparing awith b2 = (2, 2) according to Eq. (5). This extends the optimal align-ment to between a and b1:2 = [(1, 1), (2, 2)], as the optimal align-ment between a and b1:2=[(1, 1), (2, 2)] comprises the optimal align-ment between a and b1 = [(1, 1)], obtained through calculation ofthe second row of values. To calculate the final row in the DP matrix(Table 2(d)) we apply a similar methodology.

    With incremental derivation we can add new rows to the endof each Ci DP matrix and along the window sequence axis (thevertical axis in the example), and calculate only the values of therow according to the new online element, the class template Ci andEq. (5). This requires O(|Ci|+1) time for each window, in comparisonto the O((w+ 1) (|Ci| + 1)) time of the naive approach. To preventthe DP matrices from growing in size as new rows are added and toallow a traceback procedure to be carried out on the DP matrix (tofind segmentation beginning and end points with a sufficiently largewindow sizew), we remove the first rows of the DP matrices prior tonew rows being added. This also means that the variable that storesthe row of the current max value (Row Max) for each DP matrix mustconsequently be decremented. In the event that any of the variablesdesignate rows that have been deleted from the DP matrix, a searchof the matrix is made to find the new maximum score and Row Maxis consequently updated.

    Optimal subsequences, corresponding to an embedded activitywithin a sliding window (assuming w is large enough), can existanywhere within the DP matrix, as the beginning and ends of theDP matrices are constantly changing. Therefore, to address this issuewe store the current maximum similarity value and its row (RowMax) for each of the i matrices. The row position corresponds to theend of the optimal local alignment and is used as the origin for atraceback procedure, while the maximum similarity values are usedin thresholding and classification. For clarity we show the online DPmatrix update procedure with 1D sequences in Fig. 8 (a) and (b), witha gap penalty, mismatch penalty and match cost of one. In Fig. 8(a)the windowed sequence [9876] at t0 to t3 is compared to the classtemplates C1 and C2 using Eqs. (3)--(5), thereby initialising the DPmatrices. As the sliding window is moved to the next window at t1 tot4 (Fig. 8(b)), the first underlined rows of the DP matrices in Fig. 8(a),are deleted and a new row is added to the end of the matrices. Foreach of the new rows at j=4, we apply Eq. (5) using the previous rowsvalues at j=3, the class template Ci and the new element in the newsliding window, that is five in the given example, in order to derivethe new SW local alignment. The row-by-row update procedure isrepeated for further sliding window sequences. If an observed spatialsequence does not fit within the specified window size w, possiblydue to an activity taking longer than expected (e.g. watching TV),the DP matrices retain the similarity scores of the previous matchesand thus a similarity score can still be determined. Unfortunately,full segmentation can no longer occur as the beginning point of theobserved sequence would have been deleted to allow calculation ofthe new buffer elements.

    In specific applications, OSW can be modified to decrease spacerequirements and improve segmentation performance. For example,if segmentation and recovery of an optimal alignment is not neces-sary the space requirements of the OSW DP matrices can be signifi-cantly reduced by utilising only the last row of the matrices. This ispossible as new elements only require the last row for the currentSW calculation. Additionally, the Row Max variables are no longerrequired.

    Some spatial activities like watching TV have inconsistent se-quence lengths, thus making the selection of a suitable window sizew difficult. If w is sufficiently large to deal with the highly variablesequence lengths, the efficiency of the OSW algorithm deterioratesrapidly. To maintain computational efficiency we propose using awindow size of w = 1 and only a single row of previously calcu-lated SW values with which to perform the current SW calculation.

  • 3486 D.E. Riedel et al. / Pattern Recognition 41 (2008) 3481 -- 3492

    Fig. 7. Initialisation of the DP matrices using OSW.

    Table 2Incremental derivation of DP matrices

    (a) Step 1

    (1,1) (2,2)

    0 0 0

    (b) Step 2(1,1) (2,2)

    0 0 0(1,1) 0 1 0

    (c) Step 3(1,1) (2,2)

    0 0 0(1,1) 0 1 0(2,2) 0 0 2

    (d) Step 4(1,1) (2,2)

    0 0 0(1,1) 0 1 0(2,2) 0 0 2(3,3) 0 0 1

    Additionally, a traceback threshold is required for triggering thetraceback procedure. To carry out the traceback another data streambuffer Bufpast containing previous window elements is necessaryto store past elements. As new buffer elements are retrieved, a newrow of the DP vectors is calculated using the previous row and des-ignated template Ci. The maximum value in that row SWmax is thendetermined and the value compared to the traceback threshold .If SWmax

  • D.E. Riedel et al. / Pattern Recognition 41 (2008) 3481 -- 3492 3487

    Fig. 8. OSW recognition with a window size of w = 4: (a) DP initialisation with window sequence t0 to t3 and (b) online updating with window sequence t1 to t4.

    Fig. 9. Elastic matching using DTW.

    restricts the allowable steps in the warping path to adjacent cellsin the DP matrix. It is stated formally as, given wk+1 = (i, j) thenwk = (i, j), where i i1 and j j1.

    To efficiently calculate the DTW distance of two time series se-quences a and b a DP approach is utilised. A DPmatrix C of size |a||b|is initialised according to Eq. (6). To calculate the DTW distance weapply Eq. (7) for values of i = 1, 2, . . . , |a| 1 and j = 1, 2, . . . , |b| 1.The resulting DTW distance is obtained from the DP matrix at C(|a|1, |b| 1).

    C(0, 0)= d(a1, b1)C(i, 0)= C(i 1, 0)+ d(ai, b1), i= 1, 2, . . . , |a| 1C(0, j)= C(0, j 1)+ d(a1, bj), j= 1, 2, . . . , |b| 1 (6)

    C(i, j)=minC(i 1, j 1)C(i 1, j)C(i, j 1)

    + d(ai, bj) (7)

    A warping path or alignment can be recovered from C using atraceback procedure, originating at C(a 1,b 1) and terminat-ing at C(0, 0), or by using a pointers matrix and retaining point-

    ers to the local minima selected at each i and j during the DPcalculation.

    The DTW algorithm presented previously has no local continuityconstraints (does not restrict the slope of the warping), unlike thosespecified in Ref. [22]. Normally, choosing an optimal local constraintis application and domain specific. In the case of spatial activityrecognition, preliminary investigations have shown that no restric-tion on the slope of the warping is optimal in relation to recognitionperformance.

    6. Hidden Markov model

    The HMM is a stochastic state transition model, capable ofdealing with time sequential data [21]. It was first applied in theactivity recognition domain, in Ref. [7], where mesh features wereextracted from time sequential images of tennis strokes and usedin training and evaluation of the discrete model. Since then theHMM has been utilised extensively in activity recognition research,particularly through the multi-layer and hierarchical forms [8--12].Adoption of the approach has been motivated by the models abil-ity to deal with noisy observations and its' high discriminationproperties.

    In this study, we utilise the discrete HMM to recognise spa-tial activity sequences. A discrete HMM is characterised by a num-ber of hidden states N, distinct observation symbols per state M,state transition probability matrix A (A = {aij}), observation sym-bol probability distribution matrix B (B = {bj(k)}) and the initialstate distribution vector . A derived HMM is typically repre-sented by the tri-tuple of parameters {,A,B}, which represent thefollowing:

    = Pr(q1 = Sj), 1 iNaij = Pr(qt+1 = Sj|qt = Si), 1 i, jNbj(k)= Pr(vk at t|qt = Sj), 1 jN, 1kM

  • 3488 D.E. Riedel et al. / Pattern Recognition 41 (2008) 3481 -- 3492

    where qt is the state at time t, S is the individual states suchthat S = {S1, S2, . . . , SN} and V denotes the individual symbolsV={v1, v2, . . . , vM}. The HMM model parameters ,A,B are estimatedusing the Baum--Welch (Forward--Backward) algorithm; however,scaling [21] is used in both the model estimation and inferencing,due to the use of lengthy observation sequences.

    The discrete HMM can produce models capable of accuratelyrepresenting a class of sequences but some considerations must betaken into account when dealing with spatial sequences, due totheir inherent noise and long length. Firstly, the issue of choos-ing an appropriate number of hidden states N for each model. Thevalue of N affects both the runtime of the Baum--Welch algorithmand the generalisation capability of the model. If N is too small, thetraining and inferencing runtime are smaller but the models abil-ity to adequately represent the data also decreases. In contrast ifN is large, training and inferencing complexity is increased alongwith generalisation capability; however, the model may overfit thedata if N is too large, reducing the recall statistics. Also, if the spa-tial sequence length is long (>1000), the HMM also fails to cap-ture the nature of the training data resulting in lower classificationaccuracy.

    Another significant issue with the discrete HMM is the amountof data required for training. HMMs require a large number of train-ing sequences to produce good generative models. If only limitedspatial sequences are available for training, models exhibit lowerrecognition accuracy when evaluated with observed sequences. Inaddition to the amount of training data required, the discrete HMMalso encounters issues when calculating the Pr(O|), where O is theobserved spatial sequence. If an observed sequence contains a se-ries of symbols not present in the training sequences arising fromnoise or superfluous sequence elements, that is bj(k) = 0, the de-rived probability of the overall sequence, calculated using a scaledforward procedure, will tend to zero in turn causing a negative in-finity result when calculating the log likelihood. This characteristiccan be minimised by initialising the symbol probability matrix B tosmall positive values, but it is possible that long runs of low prob-ability symbols not observed in the training data, will dramaticallyreduce the total sequence probability.

    7. Data collection and methodology

    For this evaluation, spatial activity sequences are collected in asimulated smart house environment (Fig. 10(a)) using the multiplecamera, QOS-based tracking system of Nguyen et al. [11]. The dataset comprises 12 single person activities of 90 s in length with 20sequences per activity. Activities include such things as making toast,having breakfast, washing the dishes and watching television. Videosequences are captured at 10 frames per second and then processedto obtain two dimensional trajectories for each frame. To obtainsymbolised one dimensional sequences for the discrete HMM, thesmart house environment is discretised into one square metre grids(Fig. 10(b)) with each activity sequence, composed of x, y trajectories,being mapped to a sequence of unique integers u, where u U andU = 1, 2, 3, . . . , 72.

    In the following experiments we divide the 12 activity data setinto training and testing sets. In our case the training set sequencesare used to empirically determine optimal algorithm parameters andfor use as class templates in testing. To quantify the recognition per-formance of the algorithms with the testing sets, a cross-validationmethodology with threshold-based nearest neighbour (NN) classifi-cation is adopted. In this approach each experiment utilises 30 ran-domly generated training sets for evaluation, from which we takethe mean of the 30 test results as the conclusive result. Thresholdsare derived and used per activity to determine whether an activityoccurs. This is necessary as it is unrealistic to firstly learn all activi-

    ties that may occur in a given environment and secondly to assumethat learnt activities are always occurring.

    8. Experimental results

    In this section we give our experimental results that demonstratethe ability of SW to accurately recognise windowed online spatialsequences as well as its high accuracy and robust characteristicswith accurately segmented sequences. The algorithms used in theexperiments are developed in Matlab and C according to the spec-ifications in Sections 3--7. For the DTW benchmark comparison weuse the symmetric algorithm defined in Ref. [22] with no local orglobal constraints. Optimal SW and HMM algorithm parameters areempirically derived from the accurately segmented training data asshown in Section 8.1. Using the optimally derived parameters wefurther evaluate the proposed algorithms in relation to accuracy androbustness and contrast these results to the existing DTW and HMMspatial activity recognition approaches.

    8.1. Parameter selection

    With accurately segmented activity sequences we find the op-timal SW sequence alignment parameters, in relation to accuracy,with the exception of which is set according to the required recog-nition task. For instance, automatic activity recognition in a caringcapacity requires observed patterns to be strictly matched to thosein the training set, thus requiring small values of . On the otherhand in a retrieval type role where one wishes to recognise all sim-ilar activities then should be set to a larger value. Throughout theexperimentation, we use = 1. 0m to coincide with the size of thesymbolic states used in HMM mapping.

    Using a fixed we first address the issue of selecting an appropri-ate value of for the proposed SW algorithm. As specified in Section3 is the linear gap penalty associated with insertion or deletionof one more trajectories in either sequence. To minimise the effectof the match cost during the evaluation we set it to a constantand use values of between 1 to 10 with cross-validation and NNclassification in order to find an optimum value. The results withthe given data set are shown in Fig. 11(a). From Fig. 11(a), a valueof =2. 0 provided the maximum classification accuracy with largervalues producing only marginally worse performance. Using = 2. 0we then determined an optimum value for the match score withvalues between 1 and 10. Results are shown in Fig. 11(b) with =5. 0producing maximum classification accuracy. Therefore, the follow-ing experiments used SW parameters of =1. 0m, =2. 0 and =5. 0.It it interesting to note that with the current data set recognitionperformance does not appear to be sensitive to the values of the dif-ferent parameters (excluding ), as evidence by the small changes inrecognition performance with changing parameter values. The uni-form recognition behaviour with different parameters has also beennoted in evaluation with other data sets [27]. From this trend, it ispossible for one to choose a set of SW parameters without having toempirically determine the optimal parameter set, and expect a nearoptimal recognition performance.

    For the evaluation, the discrete HMMs used a fixed number ofsymbols M = 72 and an empirically determined number of hiddenstates N. To ensure adequate training of the HMMs the number ofiterations of the Baum--Welch estimation algorithm were limited bya threshold (

  • D.E. Riedel et al. / Pattern Recognition 41 (2008) 3481 -- 3492 3489

    Fig. 10. Layout and mapping of the mock smart house environment: (a) smart house layout and (b) spatial grid for 1D sequence mapping.

    Fig. 11. SW parameter optimisation: (a) evaluation and (b) evaluation.

    8.2. Online activity recognition using SW, DTW and the HMM

    As previously mentioned, superfluous elements caused by thetransition of one activity to the next, can interfere with sequencequantification methods that utilise whole window sequences forevaluation. To address this issue we propose the OSW approach

    Fig. 12. HMM number of hidden states N versus classification accuracy.

    which is able to quantify embedded activities within the window se-quence, therefore ignoring superfluous elements. Furthermore, weevaluate the effectiveness of SW with a synthetically generated on-line data stream and contrast the results to DTW and the discreteHMM.

    In the following experiment ten training sequences are randomlyselected per activity to use as class templates and additionally fordetermining class thresholds for threshold-based NN classification.The remaining 10 testing sequences per activity are used to generatea synthetic online sequence in conjunction with superfluous transi-tion sequences between the activities. Class thresholds are derivedby measuring the average intraclass distance between training se-quences and/ormodels two standard deviations (minus for SW andHMM, and plus for DTW). With the intraclass thresholds and the de-rived optimum parameters in Section 8.1, a threshold-based NN clas-sification experiment is carried out with the online data stream andusing different window sizes. Initially, we set the window size w ac-cording to the length of the longest activity in the set and evaluate thedifferent techniques. We then increasew by 5% and 10% of the lengthof the longest sequence to observe the affect of larger windowssizes on the approaches. A true positive (TP) occurs when values

  • 3490 D.E. Riedel et al. / Pattern Recognition 41 (2008) 3481 -- 3492

    of the correct activity exceed their specified threshold within theground truthed online sequence, with no other class template fromother activities exceeding corresponding thresholds. A false positive(FP) occurs if any incorrect activity exceeds their threshold withinthe ground truthed online sequence. The precision and recall resultsof the online evaluation are shown in Table 3.

    With the window size equal to the length of the longest sequencein the class template set (w = 100%), SW was still able to achievehigh precision and recall with the given synthetic online sequence incontrast to the HMM and DTW. Furthermore, these high values wereconsistent with the larger evaluated windows sizes of w=105110%as seen in Table 3. The observed high precision and recall of theOSW algorithm across the different window sizes can be explainedby the SW technique optimally locating embedded patterns (localalignments) within the online window sequence and terminatingpoorly matching local alignments arising from significant gaps ormismatches. These poorly matching subsequences generate regionsof local negative similarity which are terminated by the zero condi-tion in the relation specified in Eq. (5).

    The results in Table 3 demonstrate that DTW is sensitive to extra-neous elements from window sequences and furthermore is sensi-tive to the specified window size. This can be seen by the reductionin recall with the increase in online window size. As DTW is global

    Table 3Threshold-based NN classification with online recognition

    w= 100% w= 105% w= 110%Precision (%) Recall (%) Precision (%) Recall (%) Precision (%) Recall (%)

    HMM 70.00 5.83 0.00 0.00 0.00 0.00DTW 58.62 42.5 59.38 31.67 58.33 23.33SW 83.05 81.67 83.90 82.50 82.75 83.33

    Table 4Threshold-based NN classification with accurate activity segmentation

    Precision (%) Recall (%)

    HMM 83.9 75.6DTW 97.3 96.9SW 98.1 97.6

    Fig. 13. Confusion matrix for SW. The legend represents the percentage of activity sequences classified.

    and accounts for the additional, non-activity elements of the windowsequence the calculated distance typically increases with increasingwindow size preventing recognition with derived thresholds. OverallDTW does manage to maintain its precision across the evaluatedwindow sizes and does achieve precision and recall values higherthan the HMM.

    The discrete HMM also was not able to recognise the observedwindow sequences across the different window sizes. The HMM'shigh sensitivity to the window size, resulting in a low precision andrecall, is due to two reasons. The first is due to the log likelihood ofPr(O|), where O is the observed window sequence and is the de-rived model for an activity, encountering symbols with zero proba-bility, thus resulting in a log likelihood of negative infinity. These zerosymbol probabilities occur due to the failure to observe such sym-bols in the training sequences during HMM parameter estimation.The second and less significant reason for HMM's poor performanceis due to the forward inferencing algorithm encountering significantnumbers of symbols with low probability in the online window se-quence (due to the global matching nature of HMM inferencing). Asa result the derived log likelihoods are reduced such that they do notexceed the specified thresholds and the activities are not recognised.

    8.3. SW, DTW and HMM with accurate activity segmentation

    We have shown how OSW is capable of accurate recognition inan online activity recognition context but we are also interested inwhether the modified SW approach is capable of adequate recogni-tion with accurately segmented spatial sequences. To evaluate thisaspect we use 10 accurately segmented training sequences from eachof the 12 activity classes, derive intraclass distances as previouslymentioned, then specify thresholds2 standard deviations. With theapplied thresholds for each activity, we then use threshold-basedNN classification with the accurately segmented observed sequencesand compare the results to DTW and the discrete HMM. The exper-imental results are depicted in Table 4.

    Table 4 shows that the modified SW technique is capable ofproducing high precision (98%) and recall (97%) in classification ofaccurately segmented spatial activity sequences with the given dataset. Furthermore, the recognition performance of SW is comparable

  • D.E. Riedel et al. / Pattern Recognition 41 (2008) 3481 -- 3492 3491

    Fig. 14. Spatial patterns for confused activities 4 and 5.

    to the global DTW approach. This finding therefore shows that it ispossible to apply the local SW alignment technique successfully in aglobal matching role. In contrast, both the SW and DTW alignmenttechniques significantly outperformed the HMM, further providingevidence of the strong discriminatory capability of the proposed SWalignment approach.

    Elaborating further on the SW results, the SW confusion matrix,presented in Fig. 13, demonstrates near perfect classification withonly minor errors occurring between activities four and five, whichrepresent two variations of having breakfast. The misclassificationseen here is understandable as the variants of having breakfast share90% spatial similarity (see Fig. 14) and the remainder of the se-quences have only a small spatial disparity.With future developmentof activity segmentation algorithms for online recognition, it is pos-sible to apply the SW technique for accurate sequence quantification.

    8.4. Robustness evaluation of SW, DTW and the HMM with accuratelysegmented activities

    Robustness to noise is an important characteristic of any spa-tial activity recognition approach as video tracking systems typicallyproduce spatial sequences intertwined with noise and gaps. To eval-uate the robustness of the modified SW algorithm we artificially in-corporated random noise with varying magnitudes into each of theaccurately segmented testing sequences during threshold-based NNclassification. A training set size of 10 was used in the evaluation.Benchmarking of the robustness performance was made in relationto the global DTW technique and the HMM. The experimental resultswith noise magnitudes of between 0 and 3m is presented in Fig. 15.

    The results from Fig. 15 demonstrate that the modified SW algo-rithm is more resilient to noise than the HMM, as indicated by thesmaller decrease in recall with the increased magnitudes of noise: 3%decrease for SW versus 14% for the HMM. Importantly, SW was ableto maintain a precision and recall of >93% with the largest evaluatedmagnitude of noise, while the discrete HMM achieved only 60%. Theobserved maintenance of high accuracy across the different magni-tudes of artificially introduced noise also reinforces the belief thatthe proposed SW approach is more robust to noise than the discreteHMM.

    DTW was also seen to perform similar in relation to noise as theSW approach in Fig. 15. This was not expected as DTW is sensitiveto noise [24] and thus should have exhibited a decrease in recall. It

    Fig. 15. Noise magnitude versus classification accuracy: (a) DTW; (b) SW; and (c)HMM.

    is implicit that introduction of artificially introduced noise increasesthe average DTW distance to the class templates. As we use NNclassification with thresholding, an increase in distance will preventsequences from being recognised as they are more likely to exceedthe specified thresholds. Taking this in to account, it is likely thatthe derived thresholds, taken from the average intraclass distancesof the training sets 2 standard deviations, were sufficiently largesuch that the increased distances did not exceed the thresholds andthus did not affect the recall statistics.

    9. Conclusion

    In this paper we proposed an online modification of SW, the OSWalgorithm, for spatial activity recognition. The unique local align-ment property of the OSW algorithm allows efficient recognition

  • 3492 D.E. Riedel et al. / Pattern Recognition 41 (2008) 3481 -- 3492

    of embedded and partial spatial activities within sliding windows,preventing the need for accurate sequence segmentation. To demon-strate the effectiveness of OSW, we evaluated the online classi-fication performance with a 12 class data set and compared theresults to DTW and the discrete HMM. The results showed that theOSW approach obtained a significantly higher precision and recallin comparison to both DTW and the HMM. Experimentation witha modified SW algorithm and accurately segmented spatial activitysequences also showed that the approach is capable of accurate dis-crimination, producing a classification performance similar to DTW,and significantly outperforming the HMM. Further experimentationalso confirmed that DTW and the modified SW approach are robustto the intrinsic noise commonly found in non-invasively capturedspatial sequences. Our investigation of SW and OSW has shown somepromising results in the spatial activity domain. Future work willnow focus on applying the approach with multi-sensor data, partic-ularly combining spatial and event information to improve spatialactivity discrimination.

    References

    [1] T.H. Monk, C.F. Reynolds, M.A. Machen, D.J. Kupfer, Daily social rhythms in theelderly and their relation to objectively recorded sleep, Sleep 15 (4) (1992)322--329.

    [2] R. Suzuki, S. Otake, T. Izutsu, M. Yoshida, T. Iwaya, Monitoring daily livingactivities of elderly people in a nursing home using an infrared motion-detectionsystem, Telemed. e-Health 12 (2) (2006) 146--155.

    [3] J. Aggarwal, Q. Cai, Human motion analysis: a review, Comput. Vis. ImageUnderstand. 73 (3) (1999) 428--440.

    [4] N. Johnson, D. Hogg, Learning the distribution of object trajectories for eventrecognition, in: Sixth British Conference on Machine Vision, vol. 2, 1995,pp. 583--592.

    [5] J. Owens, A. Hunter, Application of the self-organising map to trajectoryclassification, in: IEEE International Workshop on Visual Surveillance, 2000,pp. 77--83.

    [6] N. Sumpter, A. Bulpitt, Learning spatio-temporal patterns for predicting objectbehaviour, Image Vis. Comput. 18 (9) (2000) 697--704.

    [7] J. Yamato, J. Ohya, K. Ishii, Recognizing human action in time-sequential imagesusing hidden Markov model, in: Computer Vision and Pattern Recognition,1992, pp. 379--385.

    [8] H. Bui, S. Venkatesh, G. West, Tracking and surveillance in wide-areaspatial environments using the abstract hidden Markov model, Int. J. PatternRecognition Artif. Intell. 15 (1) (2001) 177--195.

    [9] N. Oliver, E. Horvitz, A. Garg, Layered representations for human activityrecognition, in: IEEE International Conference on Multimodal Interfaces, 2002,pp. 3--8.

    [10] S. Luhr, H. Bui, S. Venkatesh, G. West, Recognition of human activity throughhierarchical stochastic learning, in: IEEE International Conference on PervasiveComputing and Communications (PerCom-03), 2003, pp. 416--422.

    [11] N. Nguyen, H. Bui, S. Venkatesh, G. West, Recognising and monitoring high-levelbehaviours in complex spatial environments, in: IEEE International Conferenceon Computer Vision and Pattern Recognition (CVPR-03), 2003, pp. 620--625.

    [12] T. Duong, H. Bui, D. Phung, S. Venkatesh, Activity recognition and abnormalitydetection with the switching hidden semi-Markov model, in: IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition, vol. 1, 2005,pp. 838--845.

    [13] A. Bobick, J. Davis, Real-time recognition of activity using temporal templates,in: IEEE Workshop on Applications of Computer Vision, 1996, pp. 39--42.

    [14] A. Bobick, Y. Ivanov, The recognition of human movement using temporaltemplates, IEEE Trans. Pattern Anal. Mach. Intell. 23 (3) (2001) 257--267.

    [15] J. Ben-Arie, Z. Wang, P. Pandit, S. Rajaram, Human activity recognition usingmultidimensional indexing, IEEE Trans. Pattern Anal. Mach. Intell. 24 (8) (2002)1091--1104.

    [16] F. Bashir, W. Qu, A. Khokhar, D. Schonfeld, HMM-based motion recognitionsystem using segmented PCA, in: IEEE International Conference on ImageProcessing, 2005, pp. 1288--1291.

    [17] J. Han, B. Bhanu, Human activity recognition in thermal infrared imagery, in:IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2005.

    [18] A. Bobick, Y. Ivanov, Action recognition using probabilistic parsing, in: IEEEComputer Society Conference on Computer Vision and Pattern Recognition,1998, pp. 196--202.

    [19] Y. Ivanov, A. Bobick, Recognition of visual activities and interactions bystochastic parsing, IEEE Trans. Pattern Anal. Mach. Intell. 22 (8) (2000)852--872.

    [20] P. Peursum, H. Bui, S. Venkatesh, G. West, Human action segmentation viacontrolled use of missing data in HMMs, in: IAPR International Conference onPattern Recognition, 2004, pp. 440--445.

    [21] L. Rabiner, A tutorial on hidden Markov models and selected applications inspeech recognition, Proc. IEEE 77 (2) (1989) 257--286.

    [22] H. Sakoe, S. Chiba, Dynamic programming algorithm optimization for spokenword recognition, IEEE Trans. Acoustics Speech Signal Process. ASSP-26 (1)(1978) 43--49.

    [23] L. Chen, R. Ng, On the marriage of LP-norms and edit distance, in: 30th VLDBConference, 2004, pp. 792--803.

    [24] L. Chen, M.T. Ozsu, V. Oria, Robust and fast similarity search for moving objecttrajectories, in: 24th ACM International Conference on Management of Data,2005.

    [25] M. Vlachos, D. Gunopulos, G. Kollios, Robust similarity measures for mobileobject trajectories, in: 13th International Workshop on Database and ExpertSystems Applications, 2002, pp. 721--726.

    [26] M. Vlachos, G. Kollios, D. Gunopulos, Discovering similar multidimensionaltrajectories, in: 18th International Conference on Data Engineering, 2002,pp. 673--684.

    [27] D. Riedel, S. Venkatesh, W. Liu, A Smith--Waterman local sequence alignmentapproach to spatial activity recognition, in: IEEE International Conference onAdvanced Video and Signal based Surveillance, 2006.

    [28] I. Eidhammer, I. Jonassen, W. Taylor, Protein Bioinformatics: An AlgorithmicApproach to Sequence and Structure Analysis, Wiley, New York, 2004 pp. 3--23,(Chapter 1.2).

    [29] S.B. Needleman, C.D. Wunsch, A general method applicable to the search forsimilarities in the amino acid sequence of two proteins, J. Mol. Biol. 48 (1970)443--453.

    [30] T. Smith, M. Waterman, Identification of common molecular subsequences,J. Mol. Biol. 147 (1981) 195--197.

    [31] M.S. Waterman, Introduction to Computational Biology, first ed., Chapman &Hall, London, UK, 1995.

    [32] L. Rabiner, A. Rosenberg, S. Levinson, Considerations in dynamic time warpingalgorithms for discrete word recognition, IEEE Trans. Acoustics Speech SignalProcess. 26 (6) (1978) 575--582.

    [33] S. Das, Some experiments in discrete utterance recognition, IEEE Trans. AcousticsSpeech Signal Process. ASPP-30 (5) (1982) 766--770.

    [34] M. Vlachos, D. Gunopulos, G. Das, Rotation invariant distance measures fortrajectories, in: 10th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, 2002, pp. 707--712.

    [35] J. Aach, G. Church, Aligning gene expression time series with time warpingalgorithms, Bioinformatics 17 (6) (2001) 495--508.

    [36] T. Rath, R. Manmatha, Word image matching using dynamic time warping, in:Computer Vision and Pattern Recognition, 2003, pp. 521--527.

    About the Author---WANQUAN LIU received the BSc degree in Applied Mathematics from Qufu Normal University, PR China, in 1985, the MSc degree in Control Theoryand Operation Research from Chinese Academy of Science in 1988, and the PhD degree in Electrical Engineering from Shanghai Jiaotong University, in 1993. He once holdthe ARC Fellowship and JSPS Fellowship and attracted research funds from different resources. He is currently a Senior Lecturer in the Department of Computing at CurtinUniversity of Technology. His research interests include large scale pattern recognition, control systems, signal processing, machine learning, and intelligent systems.

    Recognising online spatial activities using a bioinformatics inspired sequencealignment approachIntroductionSequence alignmentSW local alignmentOSW local alignmentDynamic time warpingHidden Markov modelData collection and methodologyExperimental resultsParameter selectionOnline activity recognition using SW, DTW and the HMMSW, DTW and HMM with accurate activity segmentationRobustness evaluation of SW, DTW and the HMM with accurately segmented activities

    ConclusionReferences