Week 10 UX Goals and Metrics

Lecture 12 & 13

UX Goals and Metrics

Human Computer Interaction / COG3103, 2014 Fall

Class hours : Tue 1-3 pm/Thurs 12-1 pm

4 & 6 November

PLANNING Tullis Chapter 3.

Lecture #12 COG_Human Computer Interaction 2

INTRODUCTION

•  What are the goals of your usability study?

–  Are you trying to ensure optimal usability for a new piece of functionality?

–  Are you benchmarking the user experience for an existing product?

•  What are the goals of users?

–  Do users complete a task and then stop using the product?

–  Do users use the product numerous times on a daily basis

•  What is the appropriate evaluation method?

–  How many participants are needed to get reliable feedback?

–  How will collecting metric impact the timeline and budget?

–  How will the data be collected and analyzed?


STUDY GOALS

•  How will the data be used within the product development lifecycle?

•  Two general ways to use data

–  Formative

–  Summative


STUDY GOALS

FORMATIVE SUMMATIVE


STUDY GOALS

FORMATIVE SUMMATIVE


Chef who periodically checks a dish while it’s being prepared and makes adjustments to positively impact the end result.

Evaluating the dish after it is completed like a restaurant critic who compares the meal with other restaurants.

STUDY GOALS

•  Formative Usability

–  Evaluates product or design, identifies shortcomings, makes

recommendations

–  Repeats process

•  Attributes

–  Iterative nature of testing with the goal of improving the

design

–  Done before the design has been finalized

•  Key Questions

–  What are the most significant usability issues that are

preventing users from completing their goals or that are

resulting in inefficiencies?

–  What aspects of the product work well for users? What do

they find frustrating?

–  What are the most common errors or mistakes users are

making?

–  Are improvements being made from one design iteration to

the next?

–  What usability issues can you expect for remain after the

product is launched?


Chef who periodically checks a dish while it’s being prepared and makes adjustments to positively impact the end result.

STUDY GOALS

•  Summative Usability

–  Goal is to evaluate how well a product or piece

of functionality meets its objectives

–  Comparing several products to each other

–  Focus on evaluating again a certain set of

criteria

•  Key Questions

–  Did we meet the usability goals of the project?

–  How does our product compare against the

competition?

–  Have we made improvements from one

product release to the next?


Evaluating the dish after it is completed like a restaurant critic who compares the meal with other restaurants.

USER GOALS

•  Need to know about users and what they are trying to

accomplish

–  Forced to use product everyday as part of their jobs?

–  Likely to use product only one or twice?

–  Is product a source of entertainment?

–  Does user care about design aesthetic?

•  Simplifies to two main aspects of the user experience

–  Performance

–  Satisfaction


USER GOALS

•  Performance

–  What the user does in interacting with the product

•  Metrics (more in Ch 4)

–  Degree of success in accomplishing a task or set of

tasks

–  Time to perform each task

–  Amount of effort to perform task

•  Number of mouse clicks

•  Cognitive effort

•  Important in products that users don’t have choice in

how they are used

–  If user can’t successfully complete key tasks, it will fail


USER GOALS

•  Satisfaction

–  What users says or thinks about their interaction

•  Metrics (more in Ch 6)

–  Ease of use

–  Exceed expectations

–  Visually appealing

–  Trustworthy

•  Important in products that users have choice in usage


STUDY DETAILS

•  Budgets and Timelines

–  Difficult to provide cost or time estimates for a any particular type of study

•  General rules of thumb

–  Formative study

•  Small number of participants (≤10)

•  Little impact

–  Lab setting with larger number of participants (>12)

•  Most significant cost – recruiting and compensating participants

•  Time required to run tests

•  Additional cost for usability specialists

•  Time to clean up and analyze data

–  Online study

•  Half of the time is spent setting up the study

•  Running online study requires little if any time for usability specialist

•  Other half of time spent cleaning up and analyzing data

•  100-200 person-hours (50% variation)


STUDY DETAILS

•  Evaluation Methods

–  Not restricted to certain type of method (lab test vs. online test)

–  Choosing method based on how many participants and what metrics

you want to use

•  Lab test with small number of participants

–  One-on-one session between moderator and participant

–  Participant thinking-aloud, moderator notes participant behavior and

responses to questions

–  Metrics to collect

•  Issue based metrics – issue frequency, type, severity

•  Performance metrics – task success, errors, efficient

•  Self-reported metrics – answer questions regarding each task at the end of

study

•  Caution

–  Easy to over generalize performance and self-reported metrics without

adequate sample size


STUDY DETAILS

•  Evaluation Methods (continued)

•  Lab test with larger number of participants

–  Able to collect wider range of data because increased sample size means

increased confidence in data

•  All performance, self-reported, and physiological metrics are fair game

–  Caution

•  Inferring website traffic patterns from usability lab data is not very reliable

•  Looking at how subtle design changes impact user experience

•  Online studies

–  Testing with many participants at the same time

–  Excellent way to collect a lot of data in a short time

–  Able to collect many performance, self reported metrics, subtle design

changes

–  Caution

•  Difficult to collect issue-based data, can’t directly observe participants

•  Good for software or website testing, difficult to test consumer electronics


STUDY DETAILS

•  Participants

–  Have major impact in findings

•  Recruiting issues

–  Identifying the recruiting criteria to determine if participant eligible

for study

•  How to segment users

–  How many users are needed

•  Diversity of user population

•  Complexity of product

•  Specific goals of study

–  Recruiting strategy

•  Generate list from customer data

•  Send requests via email distribution lists

•  Third party

•  Posting announcement on website


STUDY DETAILS

•  Data Collection

–  Plan how you are capturing data needed for study

–  Significant impact on how much work later when analysis begins

•  Lab test with small number of participants

–  Excel works well

–  Have template in place for quickly capturing data during testing

–  Data entered in numeric format as much as possible

•  1 – success

•  0 – failure

–  Everyone should know coding scheme extremely well

•  Someone flips scales or doesn’t understand what to enter

•  Throw out data or have to recode data

•  Larger studies

–  Use data capture tool

–  Helpful to have option to download raw data into excel


STUDY DETAILS

•  Data Cleanup

–  Rarely in a format that is instantly ready to analyze

–  Can take anywhere from one hour to a couple of weeks

•  Cleanup tasks

–  Filtering data

•  Check for extreme values (task completion times)

•  Some participants leave in the middle of study, and times are unusually

large

•  Impossible short times may indicate user not truly engaged in study

•  Results from users who are not in target population

–  Creating new variables

•  Building on raw data useful

•  May create a top-2-box variable for self-reported scales

•  Aggregate overall success average representing all tasks

•  Create an overall usability score


STUDY DETAILS

•  Cleanup tasks (continued)

–  Verifying responses

•  Notice large percentage of participants giving the same wrong

answer

•  Check why this happens

–  Checking consistency

•  Make sure data capture properly

•  Check task completion times and success to self reported

metrics (completed fast but low rating)

–  Data captured incorrectly

–  Participant confused the scales of the question

–  Transferring data

•  Capture and clean up data in Excel, then use another program to

run statistics, then move to Excel to create charts and graphs


SUMMARY

•  Formative vs. summative approach

–  Formative – collecting data to help improve design before it is launched or released

–  Summative – want to measure the extend to which certain target goal were achieved

•  Deciding on the most appropriate metrics, take into account two main aspect of user experiences –

performance and satisfaction

–  Performance metrics – characterize what the user does

–  Satisfaction metrics - relate to what users think or feel about their experience

•  Budgets and timelines need to be planned well out in advance when running any usability study

•  Three general types of evaluation methods used to collect usability data

–  Lab tests with small number of participants

•  Best for formative testing

–  Lab test with large number of participants (>12)

•  Best for capturing a combination of qualitative and quantitative data

–  Online studies with very large number of participants (>100)

•  Best to examine subtle design changes and preferences


SUMMARY

•  Clearly identify criteria for recruiting participants

–  Truly representative of target group

–  Formative

•  6 to 8 users for each iteration is enough

•  If distinct groups, helpful to have four from each group

–  Summative

•  50 to 100 representative users

•  Plan how you are going to capture all the data needed

–  Template for quickly capturing data during test

–  Everyone familiar with coding conventions

•  Data cleanup

–  Manipulating data in a way to make them usable and reliable

–  Filtering removes extreme values or records that are problematic

–  Consistency checks and verifying responses make sure participant intensions map to their responses


UX GOALS, METRICS, AND TARGETS

Hartson Chapter 10.


INTRODUCTION


Figure 10-1 You are here; the chapter on UX goals, metrics, and targets in the context of the overall Wheel lifecycle template.

UX GOALS

•  Example: User Experience Goals for Ticket Kiosk System

–  We can define the primary high-level UX goals for the ticket buyer to include:

•  Fast and easy walk-up-and-use user experience, with absolutely no user training

•  Fast learning so new user performance (after limited experience) is on par with that

of an experienced user [from AB-4-8]

•  High customer satisfaction leading to high rate of repeat customers [from BC-6-16]

–  Some other possibilities:

•  High learnability for more advanced tasks [from BB-1-5]

•  Draw, engagement, attraction

•  Low error rate for completing transactions correctly, especially in the interaction

for payment [from CG-13-17]


UX TARGET TABLES


Table 10-1 Our UX target table, as evolved from the Whiteside, Bennett, and Holtzblatt (1988) usability specification table

WORK ROLES, USER CLASSES, AND UX GOALS


Ticket buyer: Casual new user, for occasional personal use

Walk-up ease of use for new user

Table 10-2 Choosing a work role, user class, and UX goal for a UX target

UX MEASURES

•  Objective UX measures (directly measurable by evaluators)

–  Initial performance

–  Long-term performance (longitudinal, experienced, steady state)

–  Learnability

–  Retainability

–  Advanced feature usage

•  Subjective UX measures (based on user opinions)

–  First impression (initial opinion, initial satisfaction)

–  Long-term (longitudinal) user satisfaction


MEASURING INSTRUMENTS




Initial user performance


Initial customer satisfaction

First impression

Table 10-3 Choosing initial performance and first impression as UX measures


•  Benchmark Tasks

–  Address designer questions with benchmark tasks and UX targets

–  Selecting benchmark tasks

•  Create benchmark tasks for a representative spectrum of user tasks.

•  Start with short and easy tasks and then increase difficulty progressively.

•  Include some navigation where appropriate.

•  Avoid large amounts of typing (unless typing skill is being evaluated).

•  Match the benchmark task to the UX measure.

•  Adapt scenarios already developed for design.

•  Use tasks in realistic combinations to evaluate task flow.



•  Do not forget to evaluate with your power users.

•  To evaluate error recovery, a benchmark task can begin in an error state.

•  Consider tasks to evaluate performance in “degraded modes” due to partial

equipment failure.

•  Do not try to make a benchmark task for everything.

–  Constructing benchmark task content

•  Remove any ambiguities with clear, precise, specific, and repeatable instructions.

•  Tell the user what task to do, but not how to do it.

•  Do not use words in benchmark tasks that appear specifically in the interaction

design.



•  Use work context and usage-centered wording, not system-oriented wording.

•  Have clear start and end points for timing.

•  Keep some mystery in it for the user.

•  Annotate situations where evaluators must ensure pre-conditions for running

benchmark tasks.

•  Use “rubrics” for special instructions to evaluators.

•  Put each benchmark task on a separate sheet of paper.

•  Write a “task script” for each benchmark task.







BT1: Buy special event ticket



First impression

Table 10-4 Choosing “buy special event ticket” benchmark task as measuring instrument for “initial performance” UX measure in first UX target










BT2: Buy movie ticket



First impression

Table 10-5 Choosing “buy movie ticket” benchmark task as measuring instrument for second initial performance UX measure


–  How many benchmark tasks and UX targets do you need?

–  Ensure ecological validity [Write your benchmark task descriptions, how

can the setting be made more realistic?]

•  What are constraints in user or work context?

•  Does the task involve more than one person or role?

•  Does the task require a telephone or other physical props?

•  Does the task involve background noise?

•  Does the task involve interference or interruption?

•  Does the user have to deal with multiple simultaneous inputs, for example,

multiple audio feeds through headsets?














First impression

Questions Q1–Q10 in the QUIS questionnaire

Table 10-6 Choosing questionnaire as measuring instrument for first-impression UX measure



Ease of first-time use Initial performance Time on task

Ease of learning Learnability Time on task or error rate, after given amount of use and compared with initial performance

High performance for experienced users

Long-term performance Time and error rates

Low error rates Error-related performance

Error rates

Error avoidance in safety critical tasks

Task-specific error performance

Error count,with strict target levels (much more important than time on task)

Error recovery performance

Task-specific time performance

Time on recovery portion of the task

Overall user satisfaction User satisfaction Average score on questionnaire

User attraction to product

User opinion of attractiveness

Average score on questionnaire, with questions focused on the effectiveness of the “draw” factor

Quality of user experience

User opinion of overall experience

Average score on questionnaire, with questions focused on quality of the overall user experience, including specific points about your product that might be associated most closely with emotional impact factors

Overall user satisfaction User satisfaction Average score on questionnaire, with questions focusing on willingness to be a repeat customer and to recommend product to others

Continuing ability of users to perform without relearning

Retainability Time on task and error rates re-evaluated after a period of time off (e.g., a week)

Avoid having user walk away in dissatisfaction

User satisfaction, especially initial Satisfaction

Average score on questionnaire, with questions focusing on initial impressions and satisfaction

Table 10-7 Close connections among UX goals, UX measures, and measuring instruments

UX METRICS






Average time on task





Average number of errors



First impression

Questions Q1–Q10 in the QUIS questionnaire

Average rating across users and across questions

Table 10-8 Choosing UX metrics for UX measures

SETTING LEVELS







3 minutes






<1



First impression

Questions Q1–Q10 in questionnaire XYZ


7.5/10

Table 10-9 Setting baseline levels for UX measures

SETTING LEVELS







3 min, as measured at the MUTTS ticket counter

2.5 min






<1 <1



First impression

Questions Q1–Q10 in questionnaire XYZ


7.5/10 8/10

Ticket buyer: Frequent music patron

Accuracy Experienced usage error rate

BT3: Buy concert ticket


<1

<1

Casual public ticket Buyer


Initial user Performance

BT4: Buy Monster Truck Pull tickets

Average time on Task

5 min (online system)

2.5 min

Casual public ticket buyer



BT4: Buy Monster Truck Pull tickets


< 1

<1

Casual public ticket buyer


First Impression

QUIS questions 4–7, 10, 13

Average rating across users and across Questions

6/10 8/10


Walk-up ease of use for user with a little experience

Just postinitial performance

BT5: Buy Almost Famous movie tickets


5 min (including review)

2 min


Walk-up ease of use for user with a little experience

Just postinitial performance

BT6: Buy Ben Harper concert tickets


<1

<1

Table 10-10 Setting target levels for UX metrics

PRACTICAL TIPS AND CAUTIONS FOR CREATING UX TARGETS

•  Are user classes for each work role specified clearly enough?

–  Have you taken into account potential trade-offs among user groups?

–  Are the values for the various levels reasonable?

–  Be prepared to adjust your target level values, based on initial observed

results

–  Remember that the target level values are averages.

–  How well do the UX measures capture the UX goals for the design?

–  What if the design is in its early stages and you know the design will change

significantly in the next version, anyway?

–  What about UX goals, metrics, and targets for usefulness and emotional

impact?


Exercise 10-2: Creating Benchmark Tasks and UX Targets for Your System

•  Goal

–  To gain experience in writing effective benchmark tasks and measurable UX targets.

•  Activities

–  We have shown you a rather complete set of examples of benchmark tasks and UX targets for the Ticket Kiosk

System. Your job is to do something similar for the system of your choice.

–  Begin by identifying which work roles and user classes you are targeting in evaluation (brief description is

enough).

–  Write three or more UX table entries (rows), including your choices for each column. Have at least two UX

targets based on a benchmark task and at least one based on a questionnaire.

–  Create and write up a set of about three benchmark tasks to go with the UX targets in the table.

•  Do NOT make the tasks too easy.

•  Make tasks increasingly complex.

•  Include some navigation.

•  Create tasks that you can later “implement” in your low-fidelity rapid prototype.

•  The expected average performance time for each task should be no more than about 3 minutes, just to keep it

short and simple for you during evaluation.

–  Include the questionnaire question numbers in the measuring instrument column of the appropriate UX target.


Exercise 10-2: Creating Benchmark Tasks and UX Targets for Your System

•  Cautions and hints:

–  Do not spend any time on design in this exercise; there will be time for detailed design in

the next exercise.

–  Do not plan to give users any training.

•  Deliverables:

–  Two user benchmark tasks, each on a separate sheet of paper.

–  Three or more UX targets entered into a blank UX target table on your laptop or on paper.

–  If you are doing this exercise in a classroom environment, finish up by reading your

benchmark tasks to the class for critique and discussion.

•  Schedule

–  Work efficiently and complete in about an hour and a half.


Week 10 UX Goals and Metrics

Education

usability goals

goals of users

product release

existing product

product work

product everyday

user goals performance

ux goals