University of Bern University of Neuchatel University of Fribourg EVALUATION OF POINTING STRATEGIES FOR MICROSOFT KINECT SENSOR DEVICE FINAL PROJECT REPORT (Master of Science in Computer Science) Student: Referees: Daria NITESCU Denis Lalanne Matthias Schwaller 14 February 2012
57
Embed
EVALUATION OF POINTING STRATEGIES FOR MICROSOFT KINECT SENSOR
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
University of Bern University of Neuchatel University of Fribourg
EVALUATION OF POINTING STRATEGIES
FOR MICROSOFT KINECT SENSOR DEVICE
FINAL PROJECT REPORT
(Master of Science in Computer Science)
Student: Referees:
Daria NITESCU Denis Lalanne
Matthias Schwaller
14 February 2012
2
Abstract
In this final master project, I investigate different strategies for pointing and selecting
from a distance using only the human hand, by being tracked by the Microsoft Kinect
Sensor Device for Xbox 360. The implemented system provides a hand-free
interaction on a gesture based interface. The user interaction on the graphical interface
consists in the action of “point and click” on a fixed number of targets.
This report describes the Kinect technology and its applications, as well as the
available software development tools. I present different selection strategies that
allow the user to perform freehand gestures to accomplish a task. The project
introduces two selection strategies, the temporal one and the dart one. The temporal
strategy takes into consideration the time parameter, while the dart strategy considers
the distance parameter. Then I introduce the visual feedback strategies developed in
my project, which are based of geometrical figures and colors, with the goal to show
the user how the system recognizes the pointing and selections. In order to achieve the
desired user interaction, I developed the system in Linux OS, using the OpenNI
framework and NITE middleware, Eclipse platform and C++ programming language.
Five performance strategies were implemented considering the time and the distance
parameter. Only three strategies were tested with real users with a defined selection of
targets stored in the configuration files.
In the end of the project, I provide an evaluation on performance of the
implemented pointing strategies using the Fitts’s Law. A detailed statistical evaluation
is presented for the three tested strategies, using an ANOVA test. The calculated
index of performance shows that the Hand to Kinect Relative Distance (2) strategy
has the best performance. The higher the index of performance, the better is the
strategy. The ANOVA test confirms that the Hand to Kinect Relative Distance (2) is
statistically better than the other two strategies.
3
Contents
List of Figures ................................................................................................................4
List of Graphs………………………………………………………………………….5
List of Tables..................................................................................................................5
Figure 4.9: Hand to Kinect Relative Distance Approach (2) – graphical
representation
41
As a feedback for the user, for both Hand to Kinect Relative Distance algorithms,
the size s of the hand decreases when the user comes closer to the Kinect sensor and
increases when he goes farther. The user can see that he performed a click when the
color of the hand becomes darker and the radius of the hexagon is very small, which
is actually set to 0.1.
s = h->pastData.back() - h->pastData.front(); s = 20 * (DIST_CLICK - s) / DIST_CLICK;
For example, in the first Hand to Kinect Relative Distance approach, I have the
pastData array whose dimension is equal to FRAMES_CLICK set in the
configuration file. For the following example its value is 15. The DIST_CLICK
parameter is 10. The steps showing the user feedback for the Hand to Kinect Relative
Distance approach are as follows:
User starts to move his hand in front of the Kinect. I register in
pastData all the distances between the user’s hand and the Kinect, as
follows:
pastData: 14
pastData: 13.9, 14 pastData: 13.8, 13.9, 14 …. pastData: 12.5, 13, 13.1, 13.5, …..13.8, 13.9, 14 pastData: 12, 12.5, 13, 13.1, 13.5, …..13.8, 13.9, 14 After the array pastData is full of elements, the size of the hand (the radius of the hexagon) starts to change its dimension and it shows the user if he is approaching or going farther from the Kinect. Step 1: pastData: 11, 12, 12.5, 13, 13.1, 13.5, …..13.8, 13.9, 14 s = 20 * (10- (14-11) ) / 10 = 14 start a click Step 2: pastData: 10.8, 11, 12, 12.5, 13, 13.1, 13.5, …..13.8, 13.9 s = 20 * (10- (13.9-10.8) ) / 10 = 12.6 process of clicking (the
radius of the hexagon is decreasing � user is approaching the Kinect) …. Step 14: pastData: 3, …., 10.8, 11 s = 20 * (10- (11-3) ) / 10 = 4 Step 15
Even if the value of the user’s hand is negative, for the purpose of the feedback
the radius needs to have a positive value and is set to 1.0.
For example, in the second Hand to Kinect Relative Distance approach, the only
difference is that I always compare with the minimum distance reached in the frame
interval (pd-minimum).
Step 1: pastData: 12, 11, 12.5, 12, 13, 13.1, 13.5, …..13.8, 13.9, 14 s = 20 * (10- (14-11) ) / 10 = 14 start a click Step 2: pastData: 11.8, 12, 11, 12.5, 12, 13, 13.1, 13.5, …..13.8, 13.9 s = 20 * (10- (13.9-11) ) / 10 = 14.2 process of clicking (the
radius of the hexagon is increasing � user is going farther from the Kinect) Step 3: pastData: 10.8, 11.8, 12, 11, 12.5, 12, 13, 13.1, 13.5, …..13.8 s = 20 * (10- (13.8-10.8) ) / 10 = 14 process of clicking (the
radius of the hexagon is decreasing � user is approaching the Kinect) …. Step 14: pastData: 3, …., 12, 11 s = 20 * (10- (11-3) ) / 10 = 4 Step 15 pastData: 0.7,3, …., 12 s = 20 * (10- (12-0.7) ) / 10 = -2.6 CLICK
As we can see, the feedback becomes more intuitive for the user because it follows
the user movements more closely.
4.2.3. Hand to Kinect Absolute Approach
The goal of the Hand to Kinect Absolute method is to perform a click by passing a
hand through an imaginary boundary (a threshold) between the user and the Kinect
Sensor Device.
43
A click is initialized if the distance d between the Kinect device and the user’s
hand on the Z axis, reaches the necessary threshold, which is the distance
DIST_CLICK, set in the configuration file. Another condition to do a click is that the
user’s hand should not be still, so the flag idle is set to 1.
If the user did not pass the boundary, as he is in the exterior of the threshold area,
the click is not performed (Figure 4.10).
Kin
ect
Figure 4.10: Hand to Kinect Absolute Approach – graphical representation
The user knows that he did not moved the hand on the screen as he sees the
feedback of the hand size set at 1.0 unit. When the user starts to move the hand, the
size of the hand is directly proportional with the current distance of the hand
(distance) and the distance of the boundary (the invisible wall), which is
DIST_CLICK. The hand’s size decreases while the user’s hand is approaching to the
Kinect device.
if (h->idle) //idle==1 return 1.0; s = (h->distance - DIST_CLICK)/50;
44
4.3. Summary and Discussion
A system was implemented to evaluate the performance of different pointing
strategies using the Kinect. The system runs in Linux OS using the OpenNI
framework and NITE middleware. The project was developed on Eclipse Platform,
specifically C++ programming language.
The project provides a GUI which allows the user to click on a fixed number of
targets and generates statistics about user interaction efficiency.
The application supports different ways of interacting with Kinect device to
perform clicking. Using a configuration file, the user can choose which clicking
strategy to use, as well as various parameters for configuring it.
I introduce two selection strategies, the temporal one and the dart one. The
temporal strategy takes into consideration the time parameter, while the dart strategy
considers the distance parameter.
The Hand to Kinect Absolute approach is a Kinect oriented strategy, while all the
other methods are user oriented. The Hand to Kinect Absolute approach is a Kinect
oriented strategy because it considers the distance between the Kinect and the user.
The user performed the click if he reached with his hand the necessary threshold (the
invisible wall). All the other dart methods are user oriented because the distance that
the user has to cover to perform a click is the distance between the user’s shoulder and
his hand.
45
Chapter 5
Evaluation
In this chapter, I introduce Fitts’s Law, I present the design of the evaluation and
then I layout the results of the three tested strategies (Temporal, Hand to Shoulder
Absolute Distance, and Hand to Kinect Relative Distance (2)).
5.1. Fitts’s Law
This project explores the use of Fitts’s Law, proposed by Paul Fitts in 1954, as a
performance model for HCI and ergonomics19, which is a law based on the Shannon's
Theorem. The Shannon formulation expresses, in information theory, that the
effective information capacity (bits/s or, simply written, bps) of a communication
channel can be transmitted at a specific bandwidth and in the presence of noise.20
Following the work of Shannon, in Fitts’s Law, the realization of movement is similar
to the transmission of "information" in electronic systems. Movements are assigned
an index of difficulty, in "bits", and in carrying out a movement task the human motor
system is said to transmit so many "bits of information". If the number of bits is
divided by the time to move, then a rate of transmission in "bits per second" can be
ascribed.21
Fitts’s Law is a psychological model of human movement used to measure user
performance in the design of user interfaces.22
Fitts’s Law predicts that the time required to rapidly move to a target area is a
function of the distance to the target and the size of the target. In the Fitts’s Law
description of pointing, the parameters of interest are: the time to move to the target
(MT), the distance (amplitude) of movement from start to the target center (D) and the
19 MacKenzie, I. S., Fitts' Law as a Performance Model in Human-Computer Interaction, Ph.D. thesis
20 C. E. Shannon, Communication in the presence of noise, Proc. Institute of Radio Engineers, January
1949, vol. 37 (1): 10–21. 21 MacKenzie, I. S., Fitts' Law as a Performance Model in Human-Computer Interaction, Ph.D. thesis 22 Fitts, P.M. (1954), The information capacity of the human motor system in controlling the amplitude
of movement. Journal of Experimental Psychology, 47, 381-391.
46
target width (W). According to Fitts’s Law, the time to move and point to a target is a
logarithmic function:
++= 1log2
W
DbaMT (1)
In equation (1), a is the start/stop time of the device (intercept) and b, is the
inherent speed of the device (slope). The a and b parameters are empirically
determined constants, that are device dependent.
The logarithm in Fitts's Law is called the index of difficulty ID for the target,
measured in units of bits and describes the difficulty of the motor tasks. We can
rewrite the law as:
bIDaMT += (2)
An index of performance IP (also called throughput TP), measured in bits/time,
can be defined to characterize how quickly pointing can be done. There are two ways
to define the IP, as mentioned in ISO 9241-9, the final draft international standard
(FDIS) of “Ergonomic requirements for office work with visual display terminals –
Part 9: Requirements for non-keyboard input devices”. One way of defining the IP is:
bIP /1= (3)
In equation (3), b is measured in second/bits, thus the unit of IP is bps and has the
disadvantage of ignoring the effect of a.
The other way of defining the IP, which incorporates also the a constant, is:
avgavg MTIDIP /= (4)
Equation (4) has the disadvantage of depending on the mean ID and it also
depends on the task parameters, such as number, sizes and distances of targets used in
measuring the IP. As mentioned in ISO 9241-9, the IP in equation (4) is an ill-defined
concept.
In my project, I used Fitts’s Law to measure the IP of Kinect input device, by
modeling the act of pointing at distance. The user has to point and click on a fixed
47
number of targets on the designed graphical interface, while he is tracked at distance
by Kinect sensor device. Considering that the targets that I designed are circles, the ID
that I used is:
+= 1
2log2
R
dID (5)
In equation (5), R is the radius of the target, while d represents the distance of
movement from starting point to center of the circle, measured along the axis of
motion.
The IP formula that I used in my project is:
avg
avg
MT
IDIP = (6)
In conclusion, Fitts’s Law is used to assist in the design of user interfaces and in
interface evaluation, but it also helps to study and compare input devices with respect
to their pointing capability. Therefore, there are several consequences to be taken into
consideration while designing graphical interfaces. For example, one consequence is
that buttons and other graphical controls should be a reasonable size, taking into
consideration that it takes longer to hit targets that are further away and smaller.
Another consequence to be taken into consideration is that the edges and corners of
the computer monitor are reached by the cursor while using the mouse, touchpad or
trackball, but are not acquired with touchscreens or Kinect device.
5.2. Design of the Evaluation
The goal of the evaluation is to measure the efficiency, accuracy and satisfaction of
the designed system.
Therefore, I scheduled 6 users (5 men and 1 woman) to interact with the
graphical interface, while being tracked by Kinect. Each user had to test one time the
three conditions (Temporal – T, Hand to Shoulder Absolute Distance – HSA and
Hand to Kinect Relative Distance 2 – HKR 2) that I implemented, for a fixed number
of targets, as represented in the following table:
48
User Condition
C1 C2 C3
U1 T HSA HKR 2
U2 T HKR 2 HSA
U3 HSA T HKR 2
U4 HSA HKR 2 T
U5 HKR 2 T HSA
U6 HKR 2 HSA T
Table 1: The balance conditions to remove bias during experiment
All the users had to click on 30 round targets (circles) with radius of 30 or 40
units. In the configuration files, I have varied the positions on the screen and the
widths of the circles. The same set of 30 targets was used for each of the algorithm.
The tasks to be accomplished by users were balanced (Table 1) in order to avoid bias,
therefore, the tasks does not have an influence on performance. The last 20 targets
were taken into consideration to calculate the IP. The first 10 targets represent the
training set that each user had to click on for each algorithm to get used with the
system.
At the end of the experiment each user received a questionnaire to rank the three
tested methods, respecting the order of testing. In the design of the questionnaire I
asked the users to answer to three questions. According to Fitts’s Law, only untrained
movements are described, therefore to get a valid IP I needed users that had no
experience with the Kinect device. Thus, the first question was checking the previous
experience of the users. The second question was asking users to rank the efficiency
of the tested strategies on a scale of Very good, Good, 2either good nor bad, Bad,
Very bad. The last question was asking for suggestions how to improve the developed
strategies.
5.3. Results and Interpretation
The results of the evaluation with real users provides information regarding the
efficiency of the system by measuring the IP using the Fitts’s Law, the accuracy of
49
the strategies by measuring the number of misses, and the satisfaction of the users by
the questionnaire results.
According to the IP obtained results (Graph 1), the Hand to Kinect Relative
Distance (2) strategy has the best performance because we obtained an IP average of
1.42, while for the other two strategies we obtained 1.15 for the Temporal one and
0.93 for the Hand to Shoulder Absolute Distance strategy.
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Strategies
IP avg
Temporal
Absolut Distance
Relative Distance
Graph 1: The average IP for the three tested strategies
Plotting the performance on each individual target (Graph 2); we can see that the
average performance gets better with time. This shows the effect on training of the
device, and this is why we do not consider the first 10 targets in the calculation of the
IP. We can see that the temporal method is the fastest to learn, because it has the least
difference in the performance before and after the training period.
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
T1 T3 T5 T7 T9 T11
T13
T15
T17
T19
T21
T23
T25
T27
T29
Target
IP
Relative Distance
Absolute Distance
Temporal
Log. (Relative Distance)
Log. (Absolute Distance)
Log. (Temporal)
Graph 2: The performance per target for the three tested strategies
50
A one-way within-subject ANOVA test, in which all users test all the selection
strategies, was conducted to measure the effect of the selection strategies over the IP
(Table 2). Three conditions were used to make the comparisons, as following: (T vs.
HSA), (T vs. HKR 2) and (HSA vs. HKR 2). There is a high statistically significant
effect of the selection strategies on the IP, as we obtained p=0.004, so p < 0.01. Three
paired samples t-tests were used to make post-hoc comparisons. No significant
difference (0.13991479) was found for the (T vs. HSA) pair, and statistical significant
difference (p<0.05) for the (T vs. HKR 2) and (HSD vs. HKR 2) pairs, for which we
obtained 0.01209377 and 0.01123922. We can say that HKR 2 is a better strategy
than the T strategy and also than the HSA strategy.
Performance Strategy Errors/task IP (bps)
T – Temporal 0.02 1.15
HSA – Hand to Shoulder Absolute 0.41 0.93
HKR 2 – Hand to Kinect Relative (2) 0.44 1.42
Table 2: Pointing efficiency with three different performance algorithm
Looking at the amount of errors per click (Table 2), we can see that the Temporal
strategy is the most accurate because it has the least frequency of errors per click. The
Temporal strategy is 10 times more accurate than the other two strategies.
The average performance on screen for the three tested strategies was calculated,
dividing the screen on 9 equal areas. The last 20 targets of the test were distributed on
the 9 equal areas and then counted (Table 3).
2 2 2
1 3 3
3 3 1
Table 3: The total number of targets on each area tested for all the three strategies
The average performance on screen for the three tested strategies is represented in
Figure 5.1. The average IP is higher on the right side of the screen. An explanation
could be because all the users tested the strategies using their right hand. On the left
side we can see that especially on the corners, the performance is lower. A good
performance is obtained in the center area of the screen, due to the easiness of
interacting with Kinect device. The Hand to Shoulder Absolute and the Hand to
51
Kinect Relative Distance (2) got the best IP in the center of the screen, while the
Temporal strategy is almost constant in all the areas.
Figure 5.1: The average performance on screen for the three tested strategies
This evaluation, on the average performance on the screen, was not planned from
the beginning; therefore, better statistical information could be obtained with more
targets equally distributed on the screen.
Regarding the qualitative evaluation, I assigned numbers to the possible replies to
be able to calculate the average (Table 4). The Very good option got a 5, while the
Very bad answer got a 1. The following table shows the average ranking of each
strategy:
Performance Strategy Grade
T – Temporal 4.6
HSA – Hand to Shoulder Absolute 2
HKR 2 – Hand to Kinect Relative (2) 3.33
Table 4: Average user ranking of the three tested strategies
52
The users consider the Temporal method as the best one, followed by the Hand to
Kinect Relative Distance (2). They all agreed that the Hand to Kinect Relative
Distance (2) is Bad. An interpretation of the result could be due to fatigue or easiness.
Even if people ranked the temporal strategy as the best one, the best IP we obtain is
for the Hand to Kinect Relative strategy. A possible reason of the obtained result
could be that people are considering what is easy to use (user perception); however
the Hand to Kinect Relative Distance (2) is the one that gives the best efficiency.
5.4. Summary and Discussion
In this chapter, I present the results of the three tested strategies (Temporal, Hand
to Shoulder Absolute Distance, and Hand to Kinect Relative Distance (2)) using 6
users and 30 targets. The first 10 targets represents the training period. The IP shows
that the Hand to Kinect Relative Distance (2) strategy has the best performance. The
ANOVA test confirms that the Hand to Kinect Relative Distance (2) is statistically
better than the other two strategies. On the other hand, the temporal strategy has the
best accuracy than the other two strategies. The users graded the temporal strategy as
the best one, followed by the Hand to Kinect Relative Distance (2). Users think that
the most accurate strategy is the one that also performs best.
53
Chapter 6
Conclusions and Possible Research Directions
The project introduces different strategies for pointing and selecting from a distance
using only the human hand, by being tracked by the Microsoft Kinect sensor device.
Two selection strategies were designed, the temporal one and the dart one. The
temporal strategy requires the user to hold still over the target to trigger a click. The
dart strategy requires the user to move his hand towards Kinect device to perform a
click. The dart strategy can be additionally divided into two approaches, depending on
which distances were measured: the hand to shoulder distance or the hand to Kinect
distance. Then I introduce the design of the visual feedback which is based on
geometrical figures and colors with the goal to show the user how the system
recognizes the pointing and selections. A system was implemented to evaluate the
performance of the pointing strategies using the Kinect sensor device. The system
provides a graphical interface which allows the user to click on a fixed number of
targets. Three implemented performance strategies were tested with real users with a
defined selection of targets stored in the configuration files. In the end of the project, I
provide an evaluation on performance of the implemented pointing strategies using
the Fitts’s Law. A detailed statistical evaluation is presented for the three tested
strategies, using an ANOVA test. The calculated index of performance shows that the
Hand to Kinect Relative Distance (2) strategy has the best performance. The ANOVA
test confirms that the Hand to Kinect Relative Distance (2) is statistically better than
the other two strategies. The higher the index of performance, the better is the
strategy.
As possible research directions, the Relative Distance Hand to Shoulder approach
could be implemented to perform a bare hand click on a graphical interface. The
gesture accuracy and the system performance could be improved by adding audio
feedback. Another future development could be to explore the use of various visual
feedbacks. Finally, another research direction could be the evaluation of selection
strategies with two hands.
54
References
[1] Fitts, P.M., The information capacity of the human motor system in controlling the
amplitude of movement. Journal of Experimental Psychology, 1954, 47, 381-391
[2] Philip Kortum, HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and
Other 2ontraditional Interfaces, MK, 2008, ISBN: 0123740177
[3] MacKenzie, I. S., Fitts' Law as a Performance Model in Human-Computer
Interaction, Ph.D. thesis
[4] C. E. Shannon, Communication in the presence of noise, Proc. Institute of Radio
Engineers, January 1949, vol. 37 (1): 10–21
[5] D. Vogel, R. Balakrisham, Distant Freehand Pointing and Clicking on Very
Large, High Resolution Displays, Department of Computer Science, University of
Toronto
[6] Shumin Zhai, On the Validity of Throughput as a Characteristic of Computer