HMM-based Activity Recognition with a Ceiling RGB-D Camera Daniele Liciotti, Emanuele Frontoni, Primo Zingaretti, Nicola Bellotto, and Tom Duckett Università Politecnica delle Marche (Italy), University of Lincoln (UK) {d.liciotti, e.frontoni, p.zingaretti}@univpm.it, {nbellotto, tduckett}@lincoln.ac.uk Introduction The main goal is to classify and predict the probability of an analysed subject action. We perform activity detection and recognition us- ing an inexpensive RGB-D camera. Human ac- tivities, despite their unstructured nature, tend to have a natural hierarchical structure; for instance, generally making a coffee involves a three-step process of turning on the coffee ma- chine, putting sugar in the cup and opening the fridge for milk. Action sequence recognition is then handled using a discriminative hidden Markov model (HMM). The innovative aspects are in proposing an ade- quate HMM structure and also the use of head and hands 3D positions to estimate the prob- ability that a certain action will be performed, which has never been done before, for the best of our knowledge, in ADLs recognition in indoor environments. ADLs Model Activity 1 Activity 2 Activity 3 Activity n 3D Head Point 3D Hand Point HMM 1 HMM 2 HMM 3 Activity 1 Activity 2 Activity 3 Activity n ... 3D Head Point 3D Hand Point HMM 1 HMM 2 HMM 3 ... Observations (O) Classification Select model with Maximum Likehood Figure 1: Block diagram of the recognition process. Information provided by head and hands detec- tion algorithms can be used as input for a set of HMMs. Each of these recognise different actions sequence. After training the model, we consider an action sequence s = {s 1 ,s 2 ,...,s n } and calculate its probability λ for the observa- tion sequence P (s|λ). Then we classify the ac- tion as the one which has the largest posterior probability. Figure 1 depicts the general scheme of the recognition process. In particular, we used three different HMMs, which have as observa- tions 3D points of: the head; the hands; both head and hands together. Finally, the classifica- tion module provides the action x j that maxi- mizes P HMM i . It is the HMM trajectory proba- bility that follows the activity sequence s given the sequence of n observations, i.e.: x j = arg max i P HMM i (X 1:n ∈ seq n (s)|o 1:n ) Setup and Acquisition (a) (b) (c) (d) Figure 2: Snapshots of RADiAL session registration. To evaluate the usefulness of our approach for activity recognition, we built a new dataset (RADiAL) that contains common daily activi- ties such as making coffee, making tea, open- ing the fridge and using the kettle. The RGB-D camera was installed on the ceiling of L-CAS laboratory at approximatel 4m above the floor (Figure 3). fridge tea sugar kettle coffee machine RGB-D Camera Figure 3: Reconstructed layout of the kitchenette where RGB-D camera is installed. The RADiAL dataset 1 was collected in an open- plan office of the Lincoln Centre for Autonomous Systems (L-CAS). The office consists of a kitch- enette, resting area, lounge and 20 working places that are occupied by students and post- doctoral researchers. We installed a ceiling RGB-D camera (Figure 3) that took a snapshot (with dimensions of 320 × 240 pixels, Figure 2) of the kitchenette area every second for 5 days, and we hand-annotated activities of one of the re- searchers over time. Furthermore, the RADiAL dataset contains the 3D positions of the head and hands for each person with a minute-by-minute timeline of 5 different activities performed at the kitchen over the course of days. RADiAL con- tains 100 trials. Each trial includes the actions related to one person. Experimental Results Five models were used to recognize activities in the RADiAL dataset and correspond re- spectively to the activities “other” (this action contains all the other activities perfomed in a kitchen environment), “coffee” (making a coffee), “kettle” (taking the kettle), “tea/sugar” (making tea or taking sugar), and “fridge” (opening the fridge). The results were obtained using two dif- ferent validation techniques. other coffee kettle tea/sugar fridge Predicted label other coffee kettle tea/sugar fridge True label 0.574 0.236 0.029 0.105 0.056 0.072 0.797 0.114 0.015 0.002 0.053 0.070 0.701 0.161 0.015 0.069 0.048 0.121 0.704 0.058 0.091 0.033 0.025 0.243 0.609 0.08 0.16 0.24 0.32 0.40 0.48 0.56 0.64 0.72 (a) HMM 1 other coffee kettle tea/sugar fridge Predicted label other coffee kettle tea/sugar fridge True label 0.704 0.106 0.032 0.089 0.068 0.045 0.831 0.052 0.070 0.003 0.020 0.194 0.579 0.202 0.006 0.036 0.067 0.198 0.678 0.021 0.056 0.084 0.022 0.184 0.654 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 (b) HMM 2 other coffee kettle tea/sugar fridge Predicted label other coffee kettle tea/sugar fridge True label 0.759 0.073 0.024 0.087 0.058 0.031 0.875 0.041 0.052 0.002 0.014 0.139 0.696 0.143 0.007 0.027 0.047 0.156 0.748 0.021 0.036 0.063 0.019 0.168 0.713 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 (c) HMM 3 Figure 4: k -fold cross-validation confusion matrices. Table 1: Classification Results Cross Validation. HMM 1 HMM 2 HMM 3 PPV TPR F 1 PPV TPR F 1 PPV TPR F 1 other 0.73 0.57 0.64 0.89 0.70 0.79 0.93 0.76 0.84 coffee 0.67 0.80 0.73 0.69 0.83 0.75 0.76 0.87 0.81 kettle 0.60 0.70 0.65 0.47 0.58 0.52 0.58 0.70 0.63 tea/sugar 0.66 0.70 0.68 0.64 0.68 0.66 0.70 0.75 0.72 fridge 0.74 0.61 0.67 0.74 0.65 0.69 0.78 0.71 0.74 avg 0.68 0.68 0.68 0.73 0.71 0.71 0.78 0.77 0.77 a http://vrai.dii.univpm.it/radial-dataset