Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.

Action and Gait Recognition Action and Gait Recognition FromFromRecovered 3-D Human JointsRecovered 3-D Human Joints

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST 2010

Junxia Gu, Member, IEEE, Xiaoqing Ding, Senior Member, IEEE,

Shengjin Wang, Member, IEEE, and Youshou Wu

Adviser ： Ming-Yuan Shieh

Student ： shun-te chuang

ppt 製作： 100%

OutlineOutline

ABSTRACTINTRODUCTIONPREVIOUS WORKFULL-BODY TRACKING METHOD

FOR POSE RECOVERYCLASSIFICATIONEXPERIMENTAL RESULTS AND ANALYSISCONCLUSION

AbstractA common viewpoint-free framework that fuses pose

recovery and classification for action and gait recognition is presented in this paper.

First, a markerless pose recovery method is adopted to automatically capture the 3-D human joint and pose parameter sequences from volume data.

Second, multiple configuration features (combination

of joints) and movement features (position, orientation, and height of the body) are extracted from the recovered 3-D human joint and pose parameter sequences.

AbstractA hidden Markov model (HMM) and an exemplar-

based HMM are then used to model the movement features and configuration features, respectively.

Finally, actions are classified by a hierarchical

classifier that fuses the movement features and the configuration features, and persons are recognized from their gait sequences with the configuration features.

INTRODUCTIONINTRODUCTION

VIDEO-BASED study of human motion has been receiving increased attention in the past decades.

This has been motivated by the desire for application of intelligent video surveillance and human–computer interaction.

With increased awareness in security issues, motion analysis is becoming increasingly important in surveillance systems.


Action recognition is a new requirement for understanding what the person is doing.

Current intelligent surveillance systems are in urgent need of noninvasive and viewpoint-free research on motion analysis.

This paper focuses on the movement of main body segments (arms, legs, and torso). A human gait is extracted from a “walk” action.


In this paper, a vision-based markerless pose recovery approach is proposed to extract 3-D human joints.

Human joint sequence is one of the most effective and discriminative representations of human motion.

It contains much information, including position, orientation, and joint position.

The information is categorized into two types: movement features and configuration features.


The changes of position, orientation, and height of the body, which describe the global movement of the subject, are defined as movement features.

The sequences of human joint positions, which describe the change of relative configuration of body segments, are defined as configuration features.

A hidden Markov model (HMM) and an exemplar-based HMM (EHMM) are employed to characterize the movement and configuration features, respectively.

Both the HMM and the EHMM have been used to recognize an action and a gait .


Fig. 1. Flowchart of the video-based motion recognition.

PREVIOUS WORKPREVIOUS WORK

1. Appearance-Based Methods： Appearance-based approaches are widely used in

action and gait representation. They directly represent human motion using image information, such as a silhouette, an edge, and an optical flow.

2. Human Model-Based Methods： Human model-based approaches represent an action

or a gait with body segments, joint positions, or pose parameters.


This paper combined stochastic search with a gradient descent for local pose refinement to recover complex whole body motion.

The initialization of the model was automatic, with an

initialization pose standing upright with his/her arms and legs spread in the “Da Vinci” pose.

The tracking speed was below 1 s per frame. In this paper, an adaptive particle filter method is proposed for pose recovery.


First, the whole body of a subject in each frame is segmented into several body segments.

A particle filter with an adaptive particle number is then used to track each body segment.

This method decomposes the search space and reduces the computational complexity.

FULL-BODY TRACKING FULL-BODY TRACKING METHODMETHODFOR POSE RECOVERYFOR POSE RECOVERY

Human Model

Human Model-Based Full-Body Tracking

Human ModelHuman Model

Human Model-Based Full-Body Human Model-Based Full-Body TrackingTracking

CLASSIFICATIONCLASSIFICATIONA. HMM and EHMM Learning

B. Classifier for Gait Recognition

C. Classifier for Action Recognition

A. HMM and EHMM A. HMM and EHMM LearningLearningThe EHMM is different from the HMM in the definition of observation densities. For the HMM, the general representation of observation densities is the Gaussian mixture model (GMM) of the following form:

In the EHMM, the definition of the observation probability is as follows:

B. Classifier for Gait B. Classifier for Gait RecognitionRecognition

For each person c ∈ {1, . . . , Cp} in the database, we learn an EHMM gait model λ(c) Sgait with features Sgait and an EHMM gait model λ(c) Lgait with features Lgait. A testing gait sequence Y = {y0, . . . , yK} is classified with the following MAP estimation:

C. Classifier for Action C. Classifier for Action RecognitionRecognitionTesting action sequence Y = {y0, . . . , yK}, with features SY , RY , PY , OY , and HY , is classified with a two-layer classifier fused multiple features. The first layer is a weighted-MAP classifier that fuses three movement features and theconfiguration feature of the whole body as

If the decision c1 of the first layer belongs to single-arm actions, sequence Y will be recognized by the second MAP classifier with arm features as

EXPERIMENTAL RESULTS EXPERIMENTAL RESULTS AND ANALYSISAND ANALYSIS

There are 11 actions, and each subject plays each action three times. All samples contain approximately 29 100 frames in total.

These actions include “check watch,” “cross arm,” “scratch head,” “sit down,” “get up,” “turn around,” “walk in a circle,” “wave hand,” “punch,” “kick,” and “pick up,” .To demonstrate view invariance, subjects freely change their orientations. The acquisition is achieved using five standard firewire cameras.

The image resolution is 390 × 291 pixels, and the volume of interest is divided into 64 × 64 × 64 voxels.


Fig. 8. Samples of images and 3-D volume data.


Fig. 9. Annotation of the human joint. (a) Annotation of the knee joint. (b) Results of annotation.


Fig. 11. Average error of individual joint positions.

Fig. 10. Average error of the joint positions.


Fig. 12. Results of the pose recovery.


Fig. 13. Selected exemplars and recognition rate versus the number of exemplars.


Fig. 14. Comparison of convergence performance between the EHMM andthe HMM.


Fig. 15. Average recognition rates of actions.

CONCLUSIONCONCLUSION

The main contribution of this paper is the fusion of pose recovery and motion recognition.

Future work plans include automatically segmenting temporal sequence, reducing computational complexity, analyzing actions that are more complex, and recognizing 2-D actions based on the 3-D EHMM.

The free-viewpoint 3-D human joint sequence contains a significant amount of information for motion analysis.

In addition to representing the single actions used in this paper, it can be used for more applications, such as analysis of complex actions.

CONCLUSIONCONCLUSION

High DOF and huge 3-D points make the human model-based pose recovery method very time consuming.

To solve this problem, parallel computing, code optimization, and a GPU can be used to lessen time cost. It is difficult to obtain robust volume data of subjects in surveillance and content analysis scenarios at present.

Actions/gaits are affected by different factors, including clothing, age, and gender. In the future, performance with these factors will be analyzed in larger databases.

Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.

Documents

movement features position

human gait

human joint sequence

videobased motion recognition

analysis conclusion

global movement

gait sequences

gait recognition