Empirical Usability Testing in a Component-Based Environment: Improving Test Efficiency with Component-Specific Usability Measures Willem-Paul Brinkman Brunel University, London Reinder Haakma Philips Research Laboratories Eindhoven Don Bouwhuis Eindhoven University of Technology
38
Embed
Empirical Usability Testing in a Component-Based Environment: Improving Test Efficiency with Component-Specific Usability Measures Willem-Paul Brinkman.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Empirical Usability Testing in a Component-Based Environment:
Improving Test Efficiency with Component-Specific Usability Measures
Willem-Paul Brinkman Brunel University, London
Reinder Haakma Philips Research Laboratories Eindhoven
Don Bouwhuis Eindhoven University of Technology
TopicsTopics Research Motivation Testing Method Experimental Evaluation of the
Testing Method Conclusions
Research MotivationResearch MotivationStudying the usability of
a system
Research MotivationResearch Motivation
ExternalExternal ComparisonComparison relating differences in usability to differences in the systems
InternalInternal ComparisonComparison trying to link usability problems with parts of the systems
Number of messages received directly, or indirectly from lower-level components.
The effort users put into the interaction
Perceived ease-of-use
Perceived satisfaction
Objective performance
ComponentControl process
Control loop: Each message is a cycle of the control loop
Architectural ElementArchitectural ElementInteraction componentElementary unit of an interactive
system, on which behavioural-based evaluation is possible.
A unit within an application that can be represented as a finite state machine which directly, or indirectly via other components, receives signals from the user.
Users must be able to perceive or infer the state of the
interaction component.
A PC
A PC
A PC
Interactor
CNUCE model
C M V
V
MVC
PAC
Example of suitable agents-models
Interaction layersInteraction layers
15 + 23 =
15+23=
01111
10111
Add
100110
38
Processor
Editor
Control results
Control equation
User Calculator
15
15
15 +
15 +
15 + 23
15 + 23
38
38
Control LoopControl Loop
Evaluation
Component
User message
Feedback
Reference value
User
System
Lower Level Control Lower Level Control LoopLoop
User Calculator
Higher Level Control Higher Level Control LoopLoop
User Calculator
80 users8 mobile telephones3 components were manipulated
according to Cognitive Complexity Theory (Kieras & Polson, 1985)
1. Function Selector 2. Keypad3. Short Text Messages
Experimental Evaluation Experimental Evaluation of the Testing Methodof the Testing Method
Voice M ail
TelephoneR outer
Send text m essage
R ead text m essage
R ead address
list
Ed it Address
list
R ead D iary
Ed it D iary S tand-by C all
Keypad
M odeScreen
Function selector
M enuScreen
M ain Screen
Screen Screen Keyboard Keyboard Screen
Function keys, le ft key, right
key, m enu key, ok key, cance l
key
Backspace key0..9 keys, * key, # key,
M ode key
Function request, O k, C ancel
Letter, num ber,
cursor m ove
M ode restric tion
M ode
Letter
C haracters, C ursor
position,STM m enu
d irection
Keyboard
M enu direction
M enu icons
M ode sym bol
C haracters, cursor, S TM m enu icons
Function request, O k, C ancel, le tter, num ber,
cursor m ove, backspace key, function results
F low red irection,
function results
Architecture Mobile Architecture Mobile telephonetelephone
Send Text Message
Send Text Message Function
SelectorFunction Selector
KeypadKeypad
Evaluation study Evaluation study – Function – Function SelectorSelector
Versions:
Broad/shallow
Narrow/deep
Evaluation study Evaluation study – Keypad– Keypad
Versions
Repeated-Key Method
“L”
Modified-Model-Position method
“J”
Evaluation studyEvaluation study– Send Text – Send Text MessageMessage
Versions
Simple
Complex
Statistical Tests Statistical Tests
number of keystrokes
task time
0 8
x = sample mean (estimator of µ)
s = estimation of the standard deviation (σ)
sx = estimation of the standard error of the mean, sx2
= s2/n
Statistical Tests Statistical Tests
p-value: probability of making type I, or , error, wrongly rejecting the hypothesis that underlying distribution is the same.
Statistical Tests Statistical Tests
p-value: probability of making type I, or , error, wrongly rejecting the hypothesis that underlying distribution is the same.
Results Results – Function – Function SelectorSelector
Mean df Measure Broad Deep Hyp. Er. F p η2 Normal Joint measure — — 7 66 34.47 <0.001 0.80 Time in seconds 947 1394 1 72 29.56 <0.001 0.29 Number of keystrokes 461 686 1 72 37.72 <0.001 0.34 Number of messages received 67 265 1 72 155.34 <0.001 0.68 Ease of use mobile phone 5.5 4.8 1 72 11.86 0.001 0.14 Ease of use menu 5.6 4.5 1 72 22.33 <0.001 0.24 Satisfaction of mobile phone 4.4 3.8 1 72 4.25 0.043 0.06 Satisfaction of menu 4.6 3.5 1 72 15.96 <0.001 0.18 Correcteda Joint measure — — 2 71 60.96 <0.001 0.63 Number of keystrokes 437 602 1 72 20.27 <0.001 0.22 Number of messages received 52 190 1 72 75.36 <0.001 0.51
aCorrected for all a-priori differences between versions of the components.
Results of two multivariate analyses and related univariate analyses of variance with the version of the Function Selector as independent between-subjects variable.
Results Results – Keypad– Keypad
Results of multivariate and related univariate analyses of variance with the version of the Keypad as independent between-subjects variable.
Mean df Measure RK MMP Hyp. Er. F p η2 Normal Joint measure — — 7 66 4.05 0.001 0.30 Time in seconds 872 1083 1 72 9.44 0.003 0.12 Number of keystrokes 438 537 1 72 10.34 0.002 0.13 Number of messages received 233 271 1 72 13.92 <0.001 0.16 Ease of use mobile phone 5.3 5.0 1 72 1.07 0.305 0.02 Ease of use keyboard 5.6 4.9 1 72 11.13 0.001 0.13 Satisfaction of mobile phone 4.3 3.9 1 72 1.76 0.188 0.02 Satisfaction of keyboard 4.6 3.8 1 72 8.97 0.004 0.11
Results Results – Send Text – Send Text MessageMessage
Results of two multivariate analyses and related univariate analyses of variance with the version of the STM component as independent between-subjects variable
Mean df
Measure Simple Compl
ex Hyp. Er. F p η2
Normal Joint measure — — 7 66 18.16 <0.001 0.66 Time in seconds 523 672 1 72 8.15 0.006 0.10 Number of keystrokes 269 320 1 72 4.56 0.036 0.06 Number of messages received
12 49 1 72 74.18 <0.001 0.51
Ease of use mobile phone 5.0 5.3 1 72 1.15 0.288 0.02 Ease of use STM function 5.1 4.9 1 72 0.35 0.555 0.01 Satisfaction of mobile phone 3.9 4.2 1 72 0.93 0.339 0.01 Satisfaction of STM function 3.9 3.8 1 72 0.26 0.614 0.01 Correcteda Joint measure — — 2 71 20.85 <0.001 0.37 Number of keystrokes 249 289 1 72 2.30 0.134 0.03 Number of messages received
12 34 1 72 26.23 <0.001 0.27
aCorrected for all a-priori differences between versions of the components.
Power of component-specific Power of component-specific measuresmeasures
Statistical Power: 1 - β
Type II, or β, error: failing to reject the hypothesis when it is false
Power of component-specific Power of component-specific measuresmeasures
x = sample mean (estimator of µ)
s = estimation of the standard deviation (σ)
sx = estimation of the standard error of the mean, sx2
= s2/n
Power of component-specific Power of component-specific measuresmeasures
Statistical Power: 1 - β
Component-specific measure are less affected by usability problems users may or may not encounter with other part of the system
Component-specific measure are less affected by usability problems users may or may not encounter with other part of the system
Results- Power Results- Power AnalysisAnalysis
Average probability that a measure finds a significant (α = 0.05) effect for the usability difference between the two versions of FS, STM, or the Keypad components
Component-Specific measure can be used to test the difference in usability between different versions of an interaction component1. Objective Performance Measure: Number of
messages received directly or indirectly via lower-level components
2. Subjective Usability Measures: Ease-Of-Use
and Satisfaction questionnaire Component-specific measures are potentially more powerful than overall usability measures