HMM-based Activity Recognition with a Ceiling RGB …vrai.dii.univpm.it/sites/default/files/adls-poster.pdfactivity recognition, we built a new dataset (RADiAL) that contains common

HMM-based Activity Recognitionwith a Ceiling RGB-D Camera

Daniele Liciotti, Emanuele Frontoni, Primo Zingaretti,Nicola Bellotto, and Tom Duckett

Università Politecnica delle Marche (Italy), University of Lincoln (UK){d.liciotti, e.frontoni, p.zingaretti}@univpm.it, {nbellotto, tduckett}@lincoln.ac.uk

Introduction

The main goal is to classify and predict theprobability of an analysed subject action. Weperform activity detection and recognition us-ing an inexpensive RGB-D camera. Human ac-tivities, despite their unstructured nature, tendto have a natural hierarchical structure; forinstance, generally making a coffee involves athree-step process of turning on the coffee ma-chine, putting sugar in the cup and openingthe fridge for milk. Action sequence recognitionis then handled using a discriminative hiddenMarkov model (HMM).The innovative aspects are in proposing an ade-quate HMM structure and also the use of headand hands 3D positions to estimate the prob-ability that a certain action will be performed,which has never been done before, for the bestof our knowledge, in ADLs recognition in indoorenvironments.

ADLs Model

Activity1

Activity2

Activity3

Activityn

3D HeadPoint

3D HandPoint

HMM1

HMM2

HMM3

Activity1

Activity2

Activity3

Activityn

...3D Head

Point

3D HandPoint

HMM1

HMM2

HMM3 ...

Observations (O)

ClassificationSelect model with

MaximumLikehood

Figure 1: Block diagram of the recognition process.

Information provided by head and hands detec-tion algorithms can be used as input for a set ofHMMs. Each of these recognise different actionssequence. After training the model, we consideran action sequence

s = {s1, s2, . . . , sn}and calculate its probability λ for the observa-tion sequence P (s|λ). Then we classify the ac-tion as the one which has the largest posteriorprobability. Figure 1 depicts the general schemeof the recognition process. In particular, we usedthree different HMMs, which have as observa-tions 3D points of: the head; the hands; bothhead and hands together. Finally, the classifica-tion module provides the action xj that maxi-mizes PHMMi

. It is the HMM trajectory proba-bility that follows the activity sequence s giventhe sequence of n observations, i.e.:xj = arg max

iPHMMi

(X1:n ∈ seqn(s)|o1:n)

Setup and Acquisition

(a) (b)

(c) (d)Figure 2: Snapshots of RADiAL session registration.

To evaluate the usefulness of our approach foractivity recognition, we built a new dataset(RADiAL) that contains common daily activi-ties such as making coffee, making tea, open-ing the fridge and using the kettle. The RGB-Dcamera was installed on the ceiling of L-CASlaboratory at approximatel 4m above the floor(Figure 3).

fridge

teasugar

kettlecoffee

machine

RGB-DCamera

Figure 3: Reconstructed layout of the kitchenette whereRGB-D camera is installed.

The RADiAL dataset1 was collected in an open-plan office of the Lincoln Centre for AutonomousSystems (L-CAS). The office consists of a kitch-enette, resting area, lounge and 20 workingplaces that are occupied by students and post-doctoral researchers. We installed a ceilingRGB-D camera (Figure 3) that took a snapshot(with dimensions of 320×240 pixels, Figure 2) ofthe kitchenette area every second for 5 days, andwe hand-annotated activities of one of the re-searchers over time. Furthermore, the RADiALdataset contains the 3D positions of the head andhands for each person with a minute-by-minutetimeline of 5 different activities performed at thekitchen over the course of days. RADiAL con-tains 100 trials. Each trial includes the actionsrelated to one person.

Experimental Results

Five models were used to recognize activitiesin the RADiAL dataset and correspond re-spectively to the activities “other” (this actioncontains all the other activities perfomed in akitchen environment), “coffee” (making a coffee),“kettle” (taking the kettle), “tea/sugar” (makingtea or taking sugar), and “fridge” (opening thefridge). The results were obtained using two dif-ferent validation techniques.

othe

r

coffe

e

kettl

e

tea/

suga

r

fridg

e

Predicted label

other

coffee

kettle

tea/sugar

fridge

Tru

e lab

el

0.574 0.236 0.029 0.105 0.056

0.072 0.797 0.114 0.015 0.002

0.053 0.070 0.701 0.161 0.015

0.069 0.048 0.121 0.704 0.058

0.091 0.033 0.025 0.243 0.609 0.08

0.16

0.24

0.32

0.40

0.48

0.56

0.64

0.72

(a)HMM1

othe

r

coffe

e

kettl

e

tea/

suga

r

fridg

e

Predicted label

other

coffee

kettle

tea/sugar

fridge

Tru

e lab

el

0.704 0.106 0.032 0.089 0.068

0.045 0.831 0.052 0.070 0.003

0.020 0.194 0.579 0.202 0.006

0.036 0.067 0.198 0.678 0.021

0.056 0.084 0.022 0.184 0.6540.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

(b)HMM2

othe

r

coffe

e

kettl

e

tea/

suga

r

fridg

e

Predicted label

other

coffee

kettle

tea/sugar

fridge

Tru

e lab

el

0.759 0.073 0.024 0.087 0.058

0.031 0.875 0.041 0.052 0.002

0.014 0.139 0.696 0.143 0.007

0.027 0.047 0.156 0.748 0.021

0.036 0.063 0.019 0.168 0.713 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

(c)HMM3

Figure 4: k-fold cross-validation confusion matrices.

Table 1: Classification Results Cross Validation.HMM1 HMM2 HMM3

PPV TPR F1 PPV TPR F1 PPV TPR F1

other 0.73 0.57 0.64 0.89 0.70 0.79 0.93 0.76 0.84coffee 0.67 0.80 0.73 0.69 0.83 0.75 0.76 0.87 0.81kettle 0.60 0.70 0.65 0.47 0.58 0.52 0.58 0.70 0.63

tea/sugar 0.66 0.70 0.68 0.64 0.68 0.66 0.70 0.75 0.72fridge 0.74 0.61 0.67 0.74 0.65 0.69 0.78 0.71 0.74avg 0.68 0.68 0.68 0.73 0.71 0.71 0.78 0.77 0.77

ahttp://vrai.dii.univpm.it/radial-dataset

http://vrai.dii.univpm.it/radial-dataset

HMM-based Activity Recognition with a Ceiling RGB …vrai.dii.univpm.it/sites/default/files/adls-poster.pdfactivity recognition, we built a new dataset (RADiAL) that contains common

Documents