Top Banner
Human-Computer Interaction Lecture 8: Usability evaluation methods
51

Human-Computer Interaction Lecture 8: Usability evaluation methods.

Jan 12, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Human-Computer Interaction

Lecture 8: Usability evaluation methods

Page 2: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Different kinds of system evaluation/research

• Analytic/Empirical– ‘Analytic’ means reasoning and working by analysis– ‘Empirical’ means making observations or measurements

• Formative/Summative– Formative research (earlier in a project) evaluates & refines ideas– Summative research (later in a project) tests & evaluates systems

• Qualitative/Quantitative– Qualitative data involves words (or pictures), and can provides broad /

detailed information about a small number of users and their context.– Quantitative data involves numbers, and can be used to compare data

from larger numbers of users, or measure some specific aspect of their behaviour.

Page 3: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Cognitive WalkthroughFormative Analytic Qualitative

Can be used earlier in project

No measurement or observation

Descriptive, not numerical

Page 4: Human-Computer Interaction Lecture 8: Usability evaluation methods.

From cognitive theory of exploratory learning

• User sets a goal to be accomplished, in terms of the expected system capabilities.

• User searches interface for currently available actions.• User selects the action that seems likely to make progress

toward the goal.• User performs the selected action and evaluates the

feedback given by the system, looking for evidence that progress has been made.– The user learns what to do in future by observing what the system

does

Page 5: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Evaluation procedure

• Manually simulate an (imaginary) user carrying out the stages of the model.– relies on knowing enough about this person to anticipate their prior

knowledge / mental model.

• Evaluators move through task, telling a story about why user would choose each action.

• Evaluate the story according to:– user’s current goal.– accessibility of correct control.– quality of match between label and goal.– feedback after the action.

Page 6: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Cognitive walkthrough example

Page 7: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Prerequisites

• Type of user:– Wallace and Gromit fan

• Their knowledge:– Stereos & media players– Windows

• Representative tasks:– Play a CD– Play an MP3 file– Eject a CD

Page 8: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Correct action sequences

• Play a CD:– insert CD, wait for drive to read it, press play.

• Play an MP3 file:– open menu for further functions, open music library

browser, select album and track.

• Eject a CD:– open main window, press eject button.

Page 9: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Playing a CD

• Insert CD, wait for drive to read it:– Story: user starts player, inserts CD in drive. Tries to press

“Play”, but it isn’t available yet. Notices that the CD has not loaded yet. Waits, and CD then loads.

• Goal: Start player.• Accessibility: Controls not accessible until CD read.• Match: OK - controls greyed, status area blank, although no indication of

need to wait.• Feedback: Good - after

loading, controls areenabled, name of CDand track appears.

Page 10: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Playing a CD

• Press play:– Story: user inspects the available controls, notes similarity

to familiar audio controls. Presses play.• Goal: Start playing first track on CD.• Accessibility: Play button is very prominent.• Match: Good - looks like a play button in context.• Feedback: Good - button gets

recessed, track indicator moves,track counter starts, sound starts.

– (but may be a problem if thevolume is turned right down).

Page 11: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Playing an MP3 file

• Open menu for further functions:– Story: user will try various buttons before finding the menu

control. • Goal: Find MP3 functions.• Accessibility: Not directly accessible.• Match: Bad - nonstandard menu button.• Feedback: OK once menu discovered.

– Tooltips do help, although only if you know you need a menu.

Page 12: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Playing an MP3 file

• Open browser:– Goal: choose MP3 track to be played.– Match: Bad - should user choose menu item “Select from

Master Library” or “Browse Master Library”?

Page 13: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Playing an MP3 file

• Select album and track:• Accessibility: OK - tracks are clearly in a hierarchy.• Match: Good - conventional Windows tree browser.

Page 14: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Ejecting a CD

• Open main window:– Story: the user has seen the main window, knows it has an

eject button, but doesn’t know how to get there.• Goal: open main window.• Accessibility: the button is on the current display.• Match: ?

• Press eject button …

Page 15: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Ejecting a CD

Page 16: Human-Computer Interaction Lecture 8: Usability evaluation methods.

GOMSFormative Analytic Quantitative

Can be used with partial

implementationNo measurement

or observation

Provides numerical data

Page 17: Human-Computer Interaction Lecture 8: Usability evaluation methods.

GOMS: Goals, Operators, Methods, Selection

• Goals: what is the user trying to do?• Operators: what actions must they take?

– Home hands on keyboard or mouse– Key press & release (tapping keyboard or mouse button) – Point using mouse/lightpen etc

• Methods: what have they learned in the past?• Selection: how will they choose what to do?

– Mental preparation

Page 18: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Aim is to predict speed of interaction

• Which is faster? Make a word bold usinga) Keys only b) Font dialog

Page 19: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Keys-only method

<shift> +

+

+ + +

<ctrl> + b

Page 20: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Keys-only method

• Mental preparation: M• Home on keyboard: H• Mental preparation: M • Hold down shift: K• Press : K

• Press : K

• Press : K

• Press : K

• Press : K

• Press : K

• Press : K• Release shift: K• Mental preparation: M• Hold down control: K• Press b: K• Release control: K

Page 21: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Keys-only method

• 1 occurrence of H• 3 occurrences of M• 12 occurrences of K

0.401.35 * 30.28 * 127.81 seconds

Page 22: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Font dialog method

click,drag

release,move

click,move

release

move,click

move,click

Page 23: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Motion times from Fitts’ law

• From start of “The” to end of “cat” (t ~ k log (A/W)):– distance 110 pixels, target width 26 pixels, t = 0.88 s

• From end of “cat” to Format item on menu bar:– distance 97 pixels, target width 25 pixels, t = 0.85 s

• Down to the Font item on the Format menu:– distance 23 pixels, target width 26 pixels, t = 0.34 s

• To the “bold” entry in the font dialog:– distance 268 pixels, target width 16 pixels, t = 1.53 s

• From “bold” to the OK button in the font dialog:– distance 305 pixels, target width 20 pixels, t = 1.49 s

Page 24: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Font dialog method

• Mental preparation: M• Reach for mouse: H• Point to “The”: P • Click: K• Drag past “cat”: P• Release: K• Mental preparation: M• Point to menu bar: P• Click: K

• Drag to “Font”: P• Release: K• Mental preparation: M• Move to “bold”: P• Click: K• Release: K• Mental preparation: M• Move to “OK”: P• Click: K

Page 25: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Font dialog method

• 1 occurrence of H• 4 occurrences of M• 7 occurrences of K• 6 mouse motions P

• Total for dialog method:

• Total for keyboard method:

0.401.35 * 40.28 * 71.1 + 0.88 + 0.85 + 0.34 + 1.53 + 1.4913.95 secondsvs.7.81 seconds

Page 26: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Interviews and Ethnographic StudiesFormative Empirical Qualitative

Can be used from start of

projectInvolves

observation

Descriptive, not numerical

Page 27: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Structured interviews

• Additional to requirements definition meetings.

• Encourage participation from a range of users.

• Structured in order to:– collect data into common framework

– ensure all important aspects covered

• Newman & Lamming’s proposed structure:– activities, methods and connections

– measures, exceptions and domain knowledge

• Semi-structured interviews:– Ask further questions to probe topics of interest

Page 28: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Observational task analysis

• Less intrusive than interviews

• Potentially more objective

• Inspired huge debate between cognitive and sociological views of HCI: see Lucy Suchman

• Harder work:– transcription from video protocol

• relative duration of sub-tasks

• transitions between sub-tasks

• interruptions of tasks

– alternatively, transcription from audio recording

Page 29: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Ethnographic field studies

• Field observation to understand users and context• Division of labour and its coordination• Plans and procedures

– When do they succeed and fail?

• Where paperwork meets computer work• Local knowledge and everyday skills• Spatial and temporal organisation• Organisational memory

– How do people learn to do their work?– Do formal methods match reality?

• See Beyer & Holtzblatt, Contextual Design

Page 30: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Controlled ExperimentsSummative Empirical Quantitative

Suitable for end of project

Involves measurements

Provides numerical data

Page 31: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Controlled experiments

• Based on a number of observations:– How long did Fred take to order a CD from Amazon?– How many errors did he make?

• But every observation is different.• So we compare averages:

– over a number of trials– over a range of people (experimental participants)

• Results often have a normal distribution

Page 32: Human-Computer Interaction Lecture 8: Usability evaluation methods.

(statistics: histograms & distributions)

number ofobservations

time

1

2

3

4

5 10 15 20 25 30 35 4540

-4 -3 -2 -1 0 1 2 3 4

normalisationmean

variance

log normalisation

“outlier”

Page 33: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Experimental treatments

• A treatment is some modification that we expect to have an effect on usability:– How long does Fred take to order a CD using this great

new interface, compared to the crummy old one?– Expected answer: usually faster, but not always

number ofobservation

trials

time taken to order CD(faster)

new old

Page 34: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Hypothesis testing

• Null hypothesis:– What is the probability that this amount of difference in

means could be random variation between samples?

– Hopefully very low (p < 0.01, or 1%)

– Use a statistical significance test, such as the t-test.

onlyrandomvariationobserved

observed effectprobably does

result fromtreatment

very significanteffect of

treatment

Page 35: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Sources of variation

• People differ, so quantitative approaches to HCI must be statistical.

• We must distinguish sources of variation:– The effect of the treatment - what we want to measure. – Individual differences between subjects (e.g. IQ).– Distractions during the trial (e.g. sneezing).– Motivation of the subject (e.g. Mondays).– Accidental intervention by experimenter (e.g. hints).– Other random factors.

• Good experimental design and analysis isolates these.

Page 36: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Effect size – means and error bars

• Difference of two means may be statistically significant (if sample has low variance), without being very interesting. – But mean differences must always be reported with a

confidence interval, or plotted with ‘error bars’

(mean) timeto order CD

newold

(mean) timeto order CD

newold

Experiment A: ‘significant’ but boring Experiment B: interesting, but treat with caution

Page 37: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Problems with controlled experiments

• Huge variation between people (~200%)• Mistakes mean huge variation in accuracy (~1000%)• Improvements are often small (~20%)• … or even negative (because new & unfamiliar)• Most people give up using a new product at learning

time anyway, so quantitative measures of ‘expert’ speed and accuracy performance may not be of great commercial interest– We don’t care if it’s slow, so long as users like it– (and user’s perception of speed is inaccurate anyway)

Page 38: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Surveys and QuestionnairesSelf-report measures

Page 39: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Surveys and questionnaires

• Standardised psychometric instruments can be used – To evaluate mental states such as fatigue, stress, confusion– To assess individual differences (IQ, introversion …)

• Alternatively, questionnaires can be used to collect subjective or self-report evaluation from users– as in market research / opinion polls– ‘I like this system’ (and my friend who made it)– ‘I found it intuitive’ (and I like my friend)

• This kind of data can be of limited value– Can be biased, and self-report is often inaccurate anyway– It’s hard to design questionnaires to avoid these problems

Page 40: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Questionnaire design

• Open questions … – Capture richer qualitative information– But require a coding frame to structure & compare data

• Closed questions … – Yes/No or Likert scale (opinion from 1 to 5)– Quantitative data easier to compare, but limited insight

• Collecting survey data via interviews gives more insight but questionnaires are faster– Can collect data from a larger sample– Remember to test questionnaires with a pilot study, as it’s easier to

get them wrong than with interviews

Page 41: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Product Field TestingSummative Empirical Qualitative

Suitable for end of project

Involves observation

Descriptive, not numerical

Page 42: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Product field testing

• Brings advantages of task analysis/ethnography to assessment and testing phases of product cycle.

• Case study: Intuit Inc.’s Quicken product– originally based on interviews and observation– follow-me-home programme after product release:

• random selection of shrink-wrap buyers;• observation while reading manuals, installing, using.

– Quicken success was attributed to the programme:• survived predatory competition from Microsoft Money• later valued at $15 billion.

Page 43: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Non-Evaluation

Page 44: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Bad evaluation techniques

• Purely affective reports: 20 subjects answered the question “Do you like this nice new user interface more than that ugly old one?”– Apparently empirical/quantitative– But probably biased – if friends or trying to please

• No testing at all: “It was deemed that more colours should be used in order to increase usability.”– Apparently formative/analytic– But subjective – since the author is the subject

• Introspective reports made by a single subject (often the programmer or project manager): “I find it far more intuitive to do it this way, and the users will too.”– Apparently analytic/qualitative– Both biased and subjective

Page 45: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Evaluation in Part II projects

Page 46: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Summary of analytic options (analysing your design)

• Cognitive Walkthrough– Normally used in formative contexts – if you do have a working

system, then why aren’t you observing a real user (far more informative than simulating/imagining one)?

– But Cognitive Walkthrough can be a valuable time-saving precaution before user studies start, to fix blatant usability bugs

• GOMS– unlikely you’ll have alternative detailed UI designs in advance– If you have a working system, a controlled observation is superior

• Cognitive Dimensions – better suited to less structured tasks than CW & GOMS, which rely on

predefined user goal & task structure

Page 47: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Summary of empirical options (collecting data)

• Interviews/ethnography– could be useful in formative/preparation phase

• Think-aloud / Wizard of Oz– valuable for both paper prototypes and working systems– can uncover usability bugs if analysed rigorously

• Controlled experiments– appears more ‘scientific’, but only:

• If you can measure the important attributes in a meaningful way• If you test significance and report confidence interval of observed means

• Questionnaires– be clear what you are measuring – is self-report accurate?

• Field Testing– controlled release (and data collection?) may be possible

• See human participants guidance for empirical methods

Page 48: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Evaluation options for non-interactive systems

• Should your evaluation be analytic or empirical?– How consistent / well-structured is your analytic framework?– What are you measuring & why? Are the measurements compatible with your

claims (validity)?• Should your evaluation be formative or summative in nature?

– If formative – couldn’t you finish your project?– If summative – are the criteria internal or external?

• Is your data quantitative or qualitative?– Descriptive aspects of the system, or engineering performance data?– If qualitative, how will you establish objectivity (i.e. that this is not simply your

own opinion)?

Page 49: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Evaluating students’ knowledge of HCI

Page 50: Human-Computer Interaction Lecture 8: Usability evaluation methods.

2013/14 votes on course objectives

• Learn interesting stuff about humans• Prepare for professional life• See cool toys• Find an alternative perspective on CS• Take an opportunity to be more creative• Get easy marks in final exam

151566

111110106633

Page 51: Human-Computer Interaction Lecture 8: Usability evaluation methods.

Options: the course contents

• Lecture 1: Scope of HCI• Lecture 2: Visual representation• Lecture 3: Text and gesture interaction• Lecture 4: Inference-based approaches• Lecture 5: Augmented and mixed reality• Lecture 6: Usability of programming languages• Lecture 7: User-centred design research• Lecture 8: Usability evaluation methods