Real-Time IK Body Movement Recovery from Partial Vision Input

Real-Time IK Body Movement Recovery from Partial Vision Input

Ronan Boulic* Javier Varona† Luis Unzueta‡ Manuel Peinado** Angel Suescun‡ Francisco Perales†

(*)EPFL, Switzerland (†)UIB, Spain (‡)CEIT, Spain

(**)University of Alcalá, Spain E-mail: [email protected], [email protected], [email protected], [email protected],

[email protected], [email protected]

Abstract

Despite its central role in the constitution of a truly enactive interface, 3D interaction through human full body movement has been hindered by a number of technological and algorithmic factors. Let us mention the cumbersome magnetic equipments, or the underdetermined data set provided by less invasive video-based approaches. In the present paper we explore the recovery of the full body posture of a standing subject in front of a stereo camera system. The 3D position of the hands, the head and the center of the trunk segment are extracted in real-time and provided to the body posture recovery algorithmic layer. We extend our comparison of two Inverse Kinematics approaches to this more complex context. Algorithmic issues arise from the very partial and noisy input and the singularity of the human standing posture. Despite stability concerns, results confirm the pertinence of this approach in this demanding context. 1. Introduction

The sense of movement has been under-exploited until now in classical interfaces. Integrating the kinaesthetic sense at a larger scale than desktop manipulations is fundamental for building effective Enactive Interfaces where our dexterity and full body postural knowledge can be exploited. Until now the exploitation of real-time motion capture of full body human movements has been limited to niche applications such as the expressive animation of a virtual character in a live show [1]. Multiple factors hinder a wider adoption of full body movement as a popular 3D user interfaces. Among others we can cite:

the invasiveness of the sensor system, the limited acquisition space and sensor precision, the spatial distortions, the high dimension of the posture space, and the modeling approximations in the mechanical model of the human body. These sources of errors accumulate and result in an approximate posture. It can be sufficient for performance animation where expression counts the most. However, if precise spatial control is desired, this channel may not suited for evaluating complex interaction with virtual objects. The factor we want to improve in the present study is the comfort of the user through a non-invasive vision-based acquisition technology. A prior work has shown the feasibility of the vision-based recovery of the arm posture [2]. We extend the posture recovery to the full body while interacting in front of a workbench. In addition to the 3D position of the hands and the head, we can also exploit an estimate of the center of the trunk segment. The performances of the analytic and the numeric Inverse Kinematics approaches are compared in this highly under-determined context. Additional issues also arise due to the standing human posture which is close to the fully extended singularity. We described the solutions we have experimented to overcome these challenges.

Fig. 1: The simplified body model includes one virtual leg, a simplified spine and two arms.

Proceedings of ENACTIVE05 2nd International Conference on Enactive Interfaces Genoa, Italy, November 17th-18th, 2005

https://www.researchgate.net/publication/3208630_Computer_puppetry?el=1_x_8&enrichId=rgreq-b4a35d9b-16b4-474a-8785-286cc63134d9&enrichSource=Y292ZXJQYWdlOzIyODg2MDA0MTtBUzoxMDEyNTkwODAxMTAwOTlAMTQwMTE1MzQ1NTA1MA==

-2-

Section 2 recalls the background in real-time full-body motion capture, especially for 3D interactions. In the remainder of the paper we first provide an overview of the system prior to detail its vision and movement recovery components (sections 3 to 6). Test cases comparing the two IK approaches follow in section 7 and we conclude in section 8. 2. Background

Real-time full-body motion capture has a long history in performance animation [1]. The very limited user-friendlyness of the exoskeleton technology used at the time has prevented its wider adoption. The magnetic sensor technology has started to be used in the 90s by Badler who investigated the use of four magnetic sensors (waist, head and both hands) for driving the posture of a human model with Inverse Kinematics [3]. The goal is to recreate human postures while minimally encumbering the end user with sensor attachments. However, the uncontrolled degrees of freedom, like the swivel angle of the arms, can lead over time to important differences between the end user and the virtual human model. Molet has described an approach suppressing this ambiguity by using more sensors [4]. Other approaches, identifying in addition the skeleton structure and segment lengths, were proposed by Bodenheimer [5] and O’Brien [6]. Recent works show a renewed interest to propose less invasive approach exploiting a reduced number of sensors. For example Peinado et al. compensate the missing information by constraints deduced from the known class of performed movement [7]. The possibility to associate a strict priority to a constraint is a key aspect for the success of this approach. The enforcement of such arbitrary priority levels is described in [8]. An alternate analytic inverse Kinematics approach is more efficient in terms of computing cost but does not allow to assign priority levels to the constraints [9]. In the framework of example-based techniques, Chai and Hodgins exploit a database of movements that are first detected prior to guide the motion recovery from the knowledge of the position of the hands, the feet and the head (acquired with video camera detecting retro-reflective markers) [10]. Presently this technique is not used for 3D interactions.

Reconstructing the human motion from video image analysis has received a great attention in Computer Vision [11]. An important trend of the computer vision field is to constraint the application to a subset of predefined movements. For example, [12,13] are two representative works for human walking motion capture. However most of the current approaches are not real-time, hence making difficult the comparison of

our approach with non-real-time ones. For our objective, the real-time constraint is very important due to our goal of using the capture motions as an input for a perceptual user interface in virtual environments. An interesting prior work in real-time is the one of Wren et al. at the M.I.T MediaLab [14]. In this work the authors present a system for the 3D tracking of the upper human body in front of a virtual reality device. No performance evaluation of their system is given however. Moreover the possible gestures are restricted to a set of predefined movements learnt previously. This approximation, reduce the searchable space of human motions by learning-from-example. Our system isn't restricted to a set of predefined movements and there the delay between the user’s movement and the system response is not noticeable by the user.

Fig. 2: Vision system layout

Fig. 3: General architecture of the system

ENACTIVE05 Real-Time IK Body Movement Recovery from Partial Vision Input

https://www.researchgate.net/publication/3298937_An_architecture_for_immersive_evaluation_of_complex_human_tasks?el=1_x_8&enrichId=rgreq-b4a35d9b-16b4-474a-8785-286cc63134d9&enrichSource=Y292ZXJQYWdlOzIyODg2MDA0MTtBUzoxMDEyNTkwODAxMTAwOTlAMTQwMTE1MzQ1NTA1MA==

https://www.researchgate.net/publication/225335852_An_inverse_kinematic_architecture_enforcing_an_arbitrary_number_of_strict_priority_levels?el=1_x_8&enrichId=rgreq-b4a35d9b-16b4-474a-8785-286cc63134d9&enrichSource=Y292ZXJQYWdlOzIyODg2MDA0MTtBUzoxMDEyNTkwODAxMTAwOTlAMTQwMTE1MzQ1NTA1MA==

https://www.researchgate.net/publication/220184533_Performance_animation_from_low-dimensional_control_signals?el=1_x_8&enrichId=rgreq-b4a35d9b-16b4-474a-8785-286cc63134d9&enrichSource=Y292ZXJQYWdlOzIyODg2MDA0MTtBUzoxMDEyNTkwODAxMTAwOTlAMTQwMTE1MzQ1NTA1MA==

https://www.researchgate.net/publication/222543433_A_Survey_of_Computer_Vision-Based_Human_Motion_Capture?el=1_x_8&enrichId=rgreq-b4a35d9b-16b4-474a-8785-286cc63134d9&enrichSource=Y292ZXJQYWdlOzIyODg2MDA0MTtBUzoxMDEyNTkwODAxMTAwOTlAMTQwMTE1MzQ1NTA1MA==

https://www.researchgate.net/publication/225197804_Learning_the_Statistics_of_People_in_Images_and_Video?el=1_x_8&enrichId=rgreq-b4a35d9b-16b4-474a-8785-286cc63134d9&enrichSource=Y292ZXJQYWdlOzIyODg2MDA0MTtBUzoxMDEyNTkwODAxMTAwOTlAMTQwMTE1MzQ1NTA1MA==

https://www.researchgate.net/publication/27521428_Automatic_Joint_Parameter_Estimation_from_Magnetic_Motion_Capture_Data?el=1_x_8&enrichId=rgreq-b4a35d9b-16b4-474a-8785-286cc63134d9&enrichSource=Y292ZXJQYWdlOzIyODg2MDA0MTtBUzoxMDEyNTkwODAxMTAwOTlAMTQwMTE1MzQ1NTA1MA==

https://www.researchgate.net/publication/45598500_Real-Time_Control_of_a_Virtual_Human_Using_Minimal_Sensors?el=1_x_8&enrichId=rgreq-b4a35d9b-16b4-474a-8785-286cc63134d9&enrichSource=Y292ZXJQYWdlOzIyODg2MDA0MTtBUzoxMDEyNTkwODAxMTAwOTlAMTQwMTE1MzQ1NTA1MA==

https://www.researchgate.net/publication/3208630_Computer_puppetry?el=1_x_8&enrichId=rgreq-b4a35d9b-16b4-474a-8785-286cc63134d9&enrichSource=Y292ZXJQYWdlOzIyODg2MDA0MTtBUzoxMDEyNTkwODAxMTAwOTlAMTQwMTE1MzQ1NTA1MA==

https://www.researchgate.net/publication/239579537_The_Process_of_Motion_Capture_Dealing_with_the_Data?el=1_x_8&enrichId=rgreq-b4a35d9b-16b4-474a-8785-286cc63134d9&enrichSource=Y292ZXJQYWdlOzIyODg2MDA0MTtBUzoxMDEyNTkwODAxMTAwOTlAMTQwMTE1MzQ1NTA1MA==

-3-

Fig. 4: Vision process

3. System Architecture

For recovering body movements, the user is located in an interactive space that consists of a workbench with two projection screens. This space is instrumented with a stereo camera pair. The stereo pair is used to capture the motions of certain parts of user's body. The choice of what parts of the body depends on the particular body posture recovery algorithm. This configuration allows the user to view virtual environments while standing in front of the workbench. Gesture and manipulation occur in the workspace defined by the screens and the user (Fig. 2 ). The workspace requirements are:

• The background wall is covered with chroma-key material. The system may work without chroma-key background, however, using it ensures a real-time response.

• There cannot be other persons in the space. • The color of the user’s clothes cannot look

like skin color. • The skin colored body parts, other than the

hands and face, cannot be visible. For example, the user can not roll up his sleeves.

If these requirements are carried out, the images from the two synchronized color cameras are the inputs of the system. Usually, locating all the user body joints in order to recover the posture is not possible with only computer vision algorithms. This is mainly due to the fact that most of the joints are occluded by clothes. Therefore, if we can locate clearly visible body parts (for example: face and hands), Inverse Kinematics approaches can solve for the body

posture from their position. We propose a scheme where these visible body part (in advance named end-effectors) are automatically located in real-time and fed into an algorithm of Inverse Kinematics which in turn can provide a 3D feedback to the vision system ( Fig. 3 ). Two types of Inverse Kinematics algorithms, one analytic and one numeric, have been integrated in the system for the purpose of comparing their relative performance for real-time body posture recovery. 4. The Vision System

We apply chroma-keying, skin-color segmentation and 2D-tracking algorithms for each image of the stereo pair To locate user's end-effectors in the scene. Then, we combine these results in a 3D-tracking algorithm to estimate robustly their 3D positions in the scene. Fig. 4 shows this process schematically.

Our skin-color detection algorithm finds in real-time the skin-color pixels present in the image. The results of this skin-color detection are skin-color blobs, which are the inputs of our 2D-tracking algorithm. This algorithm labels the blobs pixels using a hypothesis set from previous frames [16]. The 2D-tracking results are feed to the 3D-tracking algorithm to obtain robustly an estimation of the end-effectors 3D positions. Basically, the 3D-tracking algorithm uses a well known estimation filter to find each end-effector position from all the measurements obtained by the


-4-

2D-tracking algorithm. In order to do this, we define an algorithm that, first, triangulates all the possible combinations of 2D measurements from the two images to obtain the 3D position candidates of each end-effector. And second, for each end-effector , it selects the candidate nearest to the position predicted by the estimation filter [17]. Fig. 5 shows the results of this process by back projecting the corrected associate end-effectors 3D position in the 2D images of the stereo pair.

Fig. 5: hands and face tracking

As explained before, the data required by the

Inverse Kinematics algorithms are the end-effectors goal positions. In our case, the end-effectors are the 3D positions of the wrists joints. To locate the wrists from the hands positions we use the 2D ellipses found by the 2D-tracking algorithm, the 3D hand positions from the 3D-tracking algorithm, and the previous 3D positions of elbows estimated by the IK algorithm. This is done by searching in the image for the intersection between a 2D line, defined by the back projection of the corresponding elbow and 3D hand positions, and the 2D ellipse. Then, from the 2D wrist positions we compute their 3D coordinates by triangulation. Several examples of the wrist location process can be seen in Fig. 6.

Fig. 6: 3D position of the hands end effectors

Finally, an optional algorithm estimates the

location of the user’s center of mass as can be seen in Fig. 4. This algorithm computes the image moments up to order 1 of the user’s binary silhouette:

∑∑=x y

jiij yxIyxm ),,( (1)

where I(x,y) is the pixel value at the (x,y) location (1 if the pixel belongs to the binary silhouette or 0

if not). These values are used to find the center of gravity of the human shape in the image:

.,00

01

00

10

mm

cmm

c yx == (2)

We triangulate both center of gravity to obtain an

approximation of the user’s center of mass.

5. Analytic Inverse Kinematics

One of the two Inverse Kinematics techniques tested for the present comparative study relies on an efficient analytic solution for the arm subsystem [19]. In our prior measurement campaign [2] only the arm posture was controlled according to the algorithm illustrated in Fig. 7. Briefly stated the analytic IK control of the arm has to ensure that the wrist reaches the target position C’ provided by the vision module. This algorithm consists in : 1) rotating the line AC on AC’ by using the shoulder

joint (Fig. 7.1) 2) adjusting both the shoulder and the elbow joints so

that C slides towards C’ along AC’ (Fig. 7.2) 3) optimizing the swivel angle defined as the rotation

around the axis AC’ (Fig. 7.3). Details for stages 1 and 2 together with an attraction

toward the mid-range posture of the shoulder for optimizing the swivel angle can be found in [19].

1 2

3

Fig. 7: the three stages of the arm posture control bringing the wrist C on the goal

position C’ (stages 1 and 2) followed by the swivel angle optimization (stage 3).

The present version allows an additional control of

the trunk that proves necessary even if the user is interacting only through arms gestures. Indeed the


https://www.researchgate.net/publication/248127588_Toward_vision-based_full-body_user_performance_animation_for_human-computer_interaction?el=1_x_8&enrichId=rgreq-b4a35d9b-16b4-474a-8785-286cc63134d9&enrichSource=Y292ZXJQYWdlOzIyODg2MDA0MTtBUzoxMDEyNTkwODAxMTAwOTlAMTQwMTE1MzQ1NTA1MA==

-5-

trunk contributes both to reach actions and to preserving the balance of the whole body by bending and twisting. For this reason the algorithm first estimates the main axes of the trunk from information delivered by the vision system:

1) Longitudinal axis: normalized vector difference

between the position of the head and the one of the vision-based estimated center of mass.

2) Transversal axis: derived from the lateral axis and

the line linking the hands.

Fig. 8: definition of the longitudinal and the transversal axes from the estimation of the silhouette center of gravity (cyan) and the

head position. (edited image)

The Analytic IK algorithm now proceeds by first re-orienting the trunk, and second by controlling the arm postures with the updated position of the shoulder joints. A new search method for the optimal swivel angle (attracted toward the shoulder mid-range angle) has being exploited in the present study [18].

6. Numeric Inverse Kinematics

Our previous comparative study has demonstrated the relevance of exploiting a numeric Inverse Kinematics algorithm for recovering the arms posture in real-time [2]. The major difference with an analytic approach is the dependency on a first order approximation requiring a convergence loop to converge toward the goal positions usually in more than one iteration. However, the 0.8 ms computing cost of a single iteration (2.6 ms for a convergence loop of 5 iterations) was low enough to ensure a 24 fps refresh rate of the whole system (22 fps for 5 iterations). This is to be compared with the 1 ms computing cost of the analytic IK [2]. In addition the 15 degrees of freedom exploited in the body model were not optimally distributed (3 degrees of freedom per elbow joint).

6.1. The Simplified Articulated Body Model

The present study exploits also a 15 degrees of freedom articulated body model now distributed as follow (Fig. 9 ): 1) Virtual foot (2 dofs): roots the body to the floor

with frontal and lateral axes of rotation 2) Back (2 dofs) : corresponds to the beginning of

the spine with frontal and lateral axes of rotation 3) Thorax (3 dofs) : all rotation axes. 4) Shoulders (2 x 3 dofs): all rotation axes 5) Elbows (2 x 1 dof): only the flexion-extension

Fig. 9: simplified body model exploited by the

numeric Inverse Kinematic. The kinematic chain involved in controlling the left wrist end

effector is highlighted in black. The leg and trunk joints are also used by the right wrist.

The wrists are end effectors controlled by the IK

algorithm. Each of them exploits the mobility of the three joints modeling the leg and the trunk. Besides, joints are assigned some joint limits to prevent unnatural posture to appear; in particular the shoulder and elbow joint limits are critical to prevent self collision or the fully flexed singular posture to occur. The key motivation for modeling the simplified leg and trunk is to associate an approximate mass distribution so that the IK algorithm can simultaneously ensure the balance of the body by controlling the position of its center of mass [20]. More precisely, only the two dimensions of its projection on the floor are maintained fixed on their location at calibration stage (Fig. 9 ). As a consequence the center of mass is free to move up and down but not to move outside the supporting surface constituted by the feet. This constraint is given a high priority as it guides the convergence through balanced intermediate postures. For the present study the mass is distributed as follow: 50% is attached to the Virtual Foot joint roughly at the height of the mid thigh, 25% is attached to the Back joint at mid distance to the

long

itudi

nal

transversal


-6-

Thorax joint, the final 25% is attached to the Thorax joint at mid-shoulders distance.

6.2. Overview of the Prioritized IK

The multiple Priority IK (also called Prioritized IK, or PIK) is exploited for reconstructing a believable posture of the user (i.e. its joint state, θ) from the 3D location of selected end-effectors measured with the vision system and used to constrain the posture (noted x). We give here a general overview of the method while section 6.4 describes the specific set of constraints and priorities exploited in this study.

Our general architecture is based on the linearization of the set of equations expressing Cartesian constraints x as functions of the joints’ degrees of freedom q. We denote J the Jacobian matrix gathering the partial derivatives dx/dθ. We use its pseudo-inverse, noted J+, to build the projection operators on the kernel of J, noted N(J). Our approach relies on an efficient computation of projection operators allowing to split the constraints set into multiple constraint subsets associated with an individual strict priority level [8]. The provided solution guarantees that a constraint associated with a high priority is achieved as much as possible while a low priority constraint is optimized only on the reduced solution space that does not disturb all higher priority constraints. For example, such architecture is particularly suited for the off-line evaluation of reachable space by a virtual worker; in such a context the balance constraint is given the highest priority while gaze and reach constraints have lower priority levels [21].

Fig. 10 provides an overview of our Prioritized

Inverse Kinematics control. The outer convergence loop is necessary as the linearization is valid only within the neighborhood of the current state; this requires to limit the norm of any desired constraint variation ∆x toward their respective goal to a maximum value and to iterate the computation of the prioritized solution ∆θ until the constraints are met or until the sum of the errors reaches a constant value. This figure also highlights the clamping loop handling the inequality constraints associated to the mechanical joint limits. Basically we check whether the computed prioritized solution ∆θ leads to violate one or more joint limits. If it is the case, equality constraints are inserted to clamp the flagged joints on their limit and a new prioritized solution is searched in the reduced joint space. Full algorithmic details can be found in [8] while a more conceptual explanation is given in [15].

Fig. 10: The outer loop iterates the construction of the first order solution with

priorities (inner loop) and joint limit enforcement (clamping loop)

6.3. Specific Issues in the Vision-Driven Real-Time Context

The classical stability-robustness tradeoff in inverse problems finds a clear illustration with our vision-driven Inverse Kinematics case study. We have to deal with multiple sources of instability while trying to ensure that end effectors reach their assigned goals sufficiently quickly. Let us review the key sources of potential instability or slow-down factors and how we can handle them in our IK framework:

• Noisy input: the low image resolution and the

combination of various uncertainties result in jitters at the level of the 3D location of the position of the wrists’ goal. To avoid this instability to be transmitted to the articulated structure, the vision module can filter the goals’ location, or the IK module can increase the damping factor exploited to handle the singularities of the posture [22]. In both scenario, there is a risk of introducing a lag when the wrists move fast over large distances. We have experimented successfully a combination of these two possibilities.

• Approximate skeleton: the skeleton is built in the

initial calibration posture (Fig. 9 ) where the approximate location of the Back, shoulders, Elbows, and Wrists are provided by manually


-7-

pointing on the stereo pair images. These 3D vectors are then provided to the skeleton initialization function which infer the location of the Virtual Foot joint and the Thorax joint from the distance between the Back and the mid-shoulder location. This phase may introduce some errors in the arm segment lengths which prevent the exact matching of the real skeleton with the model in other postures. Special care is required in the calibration phase.

• Singular postures: a posture is singular for the IK

algorithm when no solution can be computed for the desired goals. For example, when the a wrist goal is too far and unreachable. More surprisingly, a fully extended arm is singular when the wrist goal is on the line linking the wrist to the shoulder [15][22]. Similarly, a fully flexed arm where the upper and lower arms superimpose, is also a singular posture when the wrist goal is on the line linking the shoulder to the elbow. This is due to the first order approximation of the IK algorithm. In such a context the norm of the pseudo-inverse solution grows to infinity. The solution is to adopt a damped least square inverse [22][8] and to tune the damping factor to a value removing the instability without introducing a lag. The fully-flexed singular posture is removed owing to the elbow flexion joint limit.

• The fully extended arm posture: deserves a special

mention in terms of source of troubles. The previous point has described how to remove the instability due to this singularity. It works fine when the wrist goal is too far. However, it tends to prevent the arm from flexing when the goal is on the line linking the wrist to the shoulder. As a consequence it may slow down the convergence or worse result in other joints from the skeleton taking on the task of moving the wrist to its goal location. Needless to say that the resulting posture, while possible, may not be very plausible… The problem is that the fully extended arm is a very naturally adopted posture; it’s even in the calibration posture. We have analyzed this problem and proposed a solution through the concept of observers [15]. Basically, we observe the conjunction of the fully extended arm posture together with the occurrence of a wrist goal in the shoulder direction (within a tolerance). When this condition is met we activate an elbow flexion increment within the low level optimization task (cf section 6.4). This temporarily forces the arm to flex and speeds up the convergences.

• First order approximation and conflicting

constraints: the prioritized IK architecture theoretically ensures that conflicting constraints can exploit the same joints. However, this is valid only in the neighborhood of the current state, hence requesting the definition of this validity domain through thresholds enforced on the desired ∆x. Improper threshold values is a common source of instability that requires some tuning.

• Performances: our prior work has demonstrated

the feasibility of integrating a numeric IK algorithm within a real-time motion capture loop [7]. This is also the case with the present system; it has been implemented in Visual C++ using the OpenCV libraries [23] and it has been tested in a real-time interaction context on an AMD Athlon 2800+ under Windows XP. The images have been captured using two DFW-500 Sony cameras with an IEEE1394 connection. The cameras provide 320x240 images at a capture rate of 30 frames per second (more details are given in Table 2).

6.4. Constraints Hierarchy

The approach presented in the present paper relies on four levels of priorities ensuring not only the user-defined goals (wrist position) but also general property of the posture space such as the balance. Due to its great importance in the overall quality of the posture, the constraint enforcing the body balance is given the highest priority. This means that all other constraints of lower priority are found in the sub-space of the balanced postures. Therefore it ensures that the intermediate postures show at least this quality before achieving all constraints. Such a channeling of the convergence has two positive consequences: first, this removes some class of local minima that would have otherwise occurred in an approach without priorities, and second, the intermediate postures, being balanced, are better accepted by the viewer even if all constraints are not met. This is important in a real-time context as our time budget may only allow for one or a few IK convergence iteration per frame.

The second rank constraint is the one attracting the wrist end effectors toward their vision-driven goal position. The articulated chain from each wrist to the Virtual Foot root contributes to the achievement of the wrist constraint as experienced for far reach (e.g. in Fig. 1 ). The third rank constraint is the one attracting the shoulders center of rotation toward their initial location in the calibration standing posture. Finally, at


-8-

the lowest priority level we add an optimization vector expressed in the joint variation space (see [8] for details). This vector is composed of two distinct components: 1) Minimization of the joint amplitude: this

correspond to an attraction toward the initial standing posture. It resolves the ambiguity of the swivel angle by attracting the elbow along the trunk.

2) “Full extension” avoidance term for the elbow: in case the condition described in section 6.3 is met, a flexion term replace the value proposed by the joint minimization for the elbow.

The Table 1 below summarizes the hierarchy of

four priorities levels.

Constraint Priority Rank

Projection of the Center of mass over the virtual foot to maintain the balance

1

Wrists position control 2 Shoulders position control 3 Joint space optimization vector, minimizing the joint amplitude and conditionally avoiding the extended arm singularity

4

Table 1: Hierarchy of prioritized constraints

7. Results We have first evaluated the IK architecture on two simple test cases: first, flexing the arm to the maximum flexion and extending back to the full extension, and second, raising one arm laterally up to the horizontal level and then trying to reach the furthest possible point. Despite its simplicity the first test case has highlighted the issues linked to the singularity and described in section 6.3. The observer concept proves to be well-adapted in the present real-time context. The second test case is illustrated more in detail in the following subsections. Performances are discussed in section 8. 7.1. Far Lateral Reach with Prioritized IK This test case underlines the interest of the center of mass control to maintain the balance of the body. Indeed when the arm moves sideward to reach a distant point in space the lower body moves in the opposite direction so that the center of mass still project over the virtual foot (Fig. 11). Note that the (optionally computed) image center of mass is not necessary here.

Fig. 11: The constraint associated with the center of mass ensures the balance of the

whole model (numeric IK solution)


-9-

7.2. Importance of the Trunk Postural Control This comparison highlights the interest of controlling the trunk posture even if the user is standing still in front of the workbench. The left column of Fig. 12 displays both the numeric and the analytic IK solution. The latter is computed without trunk control (as in [2]) which prevent the left wrist from reaching its goal in the bottom image. The numeric solution better reflects the user posture owing to the center of mass control that induce the natural counter-balanced posture. The right column illustrates only the analytic solution with the trunk orientation control described in section 5. The recovered movement is much more realistic with that improvement.

Fig. 12 : analytic IK solution (in red). Left: only hand attraction. Right: exploiting the head and

the center of mass information. 7.3. Handling an Imaginary Box

In that case study the user acts as if he is carrying an imaginary box from side to side. The movement involves a complex deformation of the trunk. Both approaches successfully recover the shoulder locations through their respective trunk postural control. On the other hand the elbow location, determined by the swivel angle optimization, reflects the specific and somewhat arbitrary choice made by each approach.

Fig. 13: : moving an imaginary box (real-time duration : 5.5s). red: analytic solution, white:

numeric solution


-10-

8. Discussion and Conclusion

It can be seen from Fig. 13 that both the analytic and the numeric IK can recover plausible postures from the partial information provided by the vision system. The fact that no information was available to locate the elbow forced each approach to make a somewhat arbitrary decision about what was the optimal swivel angle for the arm. In the present study, the analytic solution reflects an attraction toward the shoulder mid-range posture while the numeric solution retains an attraction of the upper arm along the body. This can lead to visible difference between the user and the recovered posture. This difference depends on the nature of the movement; at the time being we think there is no shoulder posture that can serve as optimal attractor for all movements. It remains to be evaluated whether the user could be annoyed by this postural difference during real-time interaction. Our guess is it will depends on the intended application; for example, if the user wishes to pilot the posture of a virtual worker moving in a cluttered environment, it is probably important to offer a finer postural control including the elbow location too. In contexts such as performance animation or other less demanding applications the present solution can be sufficient.

Regarding other criteria for the comparison of the two IK techniques (Table 2), both are still sensitive to noisy input and the quality of the continuity decreases as the postures are closer to singular cases. The computing cost is similar to the ones measured in the prior simpler IK architectures [2]. The performance of the analytic IK has been improved in the meantime, from 1ms to 0.65ms on average, but its dependency on the computation of the image center of mass adds a supplementary cost of 7ms. The single iteration of the numeric IK is more expensive, from 0.8ms to 1.7ms, due to the higher complexity of the kinematic chain with seven joints instead of four, and the additional priority level. All in all a 5 iterations per frame cost now roughly the same as one analytic solution update, leading to a refresh rate around 20 fps. This is sufficient for real-time interactions.

We foresee the following directions for future work. First a better user skeleton is needed both in terms of correspondence with the real skeleton sizes but also in term of anatomic mobility by including the clavicles, both legs and more vertebrae in the spine. It can be also desirable to have a parameterized mass distribution model to match a wider population of subjects. In the longer term we wish to investigate the problem of controlling the posture of a virtual human that has a different size and body proportions than oneself . This is clearly important for applications

evaluating a product for a large population of potential users. Providing an answer to this question will really empower the user through ones full body motion.

Table 2: Evaluation criteria for the comparison

of the two IK approaches 9. Acknowledgements

This work was partly supported by the European Union through the Networks of Excellence ENACTIVE and INTUITION. The project TIN2004-07926 of Spanish Government and the European Project HUMODAN 2001-32202 from UE V Program-IST have subsidized part of this work. J.Varona acknowledges the support of a Ramon y Cajal fellowship from the Spanish MEC.

References

[1] Sturman, D.: Computer Puppetry. IEEE CGA 18(1) (1998) 38-45

[2] Boulic, R., Varona, J., Herbelin, B., Unzueta, L., Suescun, A., Jaume, A., Perales, F., Thalmann , D., Vision-Based Comparative Study of Analytic and Numeric Inverse Kinematic Techniques for Recovering Arm Movements, On-Line Abstracts Proc. of the first Enactive Conference, Pisa, March 2005

[3] Badler, N. I., Hollick, M. J., Granieri, and J. P., Real-Time Control of a Virtual Human Using Minimal Sensors, In Presence Teleoperators and Virtual Environments, Volume 2(1), pp. 82-86, 1993.

[4] Molet T., Boulic R., Rezzonico S., Thalmann, D.: An architecture for immersive evaluation of complex human tasks, IEEE TRA 15(3) (1999)

[5] Bodenheimer R., Rose, C., Rosenthal, S., Pella, J., The Process of Motion Capture: Dealing with the Data, In proceedings of Computer Animation and Simulation, pp. 3-18, September 1997.

[6] O'Brien, J., BodenHeimer, R.E., Brostow, G.J., Hodgins, J.K., Automatic Joint Parameter Estimation from Magnetic Motion Capture Data, In proceedings of Graphics Interface, pp. 53-60, May 2000.


-11-

[7] Peinado, M., Herbelin, B., Wanderley, M., Le Callennec, B., Boulic, R., Thalmann, D., Meziat, D.: Towards Configurable Motion Capture with Prioritized Inverse Kinematics. In SENSOR04, Proc. of the Third International Workshop on Virtual Rehabilitation (IVWR’04), (2004) 85-97, Lausanne

[8] Baerlocher, P., Boulic, R.: An Inverse Kinematic Architecture Enforcing an Arbitrary Number of Strict Priority Levels. The Visual Computer 20(6), (2004)

[9] Tolani, D., Goswami, A., Badler, N.I.: Real-Time Inverse Kinematics Techniques for Anthropomorphic Limbs. Graphical Models 62(5) (2000) 353-388

[10] Chai, J., Hodgins, J. K., Performance Animation from Low-dimensional Control Signals, ACM Transactions on Graphics (SIGGRAPH 2005).

[11] Moeslund, T.B., Granum, E., “A Survey of Computer Vision-Based Human Motion Capture”, Computer Vision and Image Understanding, 81(3):231-–268, 2001

[12] Sidenbladh, H., Black, M.J., “Learning the statistics of people in images and video”, International Journal of Computer Vision, 54(1-3):183-209, 2003.

[13] Wachter,S., Nagel, H.H., “Tracking Persons in Monocular Image Sequences”, Computer Vision and Image Understanding, 74(3):174--192, 1999.

[14] Wren,C.R., Clarkson,B.P., Pentland, A.P., “Understanding purposeful human motion”, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition, pp.:378-–383, 2000.

[15] Boulic, R., Peinado, M., Le Callennec, B., Challenges in Exploiting Prioritized Inverse Kinematics for Motion Capture and Postural Control, to appear in Lecture notes in Artificial Intelligence, Springer Verlag, 2005.

[16] J. Varona, J.M. Buades, F.J. Perales, “Hands and face tracking for VR applications”, Computers & Graphics, 29(2):179-187, 2005.

[17] J. Varona, R. Boulic, F.J. Perales, B. LeCallennec, M. Peinado, “Toward vision-based full-body user performance animation for human-computer interaction”, submitted to Computer Vision and Image Understanding, 2005.

[18] Unzueta,L., Berselli,G., Cazón, A., Lozano A., Suescun, Á.,"A Fast Inverse Kinematics Method for a Markerless Human Motion Capture System", submitted for publication.

[19] Unzueta,L., Berselli,G., Cazón, A., Lozano A., Suescun, Á.,"Genetic Algorithms Application to the Reconstruction of the Human Motion Using a Non-Invasive Motion Capture, Multibody Dynamics 2005, ECCOMAS Thematic Conference, Madrid, Spain, 21-24 June 2005.

[20] Boulic R., Mas R., Thalmann, D.(1996) "A Robust Approach for the Center of Mass Position Control with Inverse Kinetics", Journal of Computers and Graphics, Volume 20 (5), Elsevier, 1996.

[21] Boulic, R., Baerlocher, P., Rodríguez, I., Peinado, M., Meziat, D.“Virtual Worker Reachable Space Evaluation with Prioritized Inverse Kinematics”, 35th International Symposium on Robotics, March 2004 Paris

[22] Maciejewski,A.A.,"Dealing with the ill-Conditioned Equations of Motion for Articulated Figures", IEEE CGA, Vol. 10, n°3, pp 63-71, 1990

[23] Bradski, G.R., Pisarevsky V., "Intel's Computer Vision Library", Proc. of IEEE Conference on Computer Vision and Pattern Recognition, CVPR00, v. 2: 796-797, 2000


Real-Time IK Body Movement Recovery from Partial Vision Input

Documents