Towards Automated Web Design Advisors Melody Y. Ivory Marti A. Hearst School of Information Management & Systems UC Berkeley IBM Make IT Easy Conference.
Post on 22-Dec-2015
216 Views
Preview:
Transcript
Towards Automated Web Design Advisors
Melody Y. Ivory Marti A. HearstSchool of Information Management & SystemsUC Berkeley
IBM Make IT Easy ConferenceJune 4, 2002
2
The Problem:Poor Website Design by Non-Professionals
3
The Problem:Poor Website Design by Non-Professionals
4
A Solution
Automatic recommendations and context-specific guidelines.
“Grammar checkers” for web design– Create good templates to incorporate
into web design tools– Compare current design to high-
quality designs and show differences
5
The WebTango Goal
•Predictions•Similarities•Differences•Suggestions•Modification
Quality Checker
User’s DesignProfiles
High Quality Designs
6
The ApproachDevelop Statistical Profiles
1. Create a large set of measures to assess various design attributes
2. Obtain a large set of evaluated sites3. Create models of good vs. avg. vs. poor
sitesTake into account the context and type of site
4. Use models to evaluate other sites 5. Use models to suggest improvements
Idea: Reverse engineer design patterns from high-quality sites and use to assess the
quality of other sites
7
Step 1: Measuring Web Design Aspects Identified key aspects from the
literature– Extensive survey of Web design literature:
texts from recognized experts; user studies• amount of text on a page, text alignment, fonts, colors,
consistency of page layout in the site, use of frames, …
– Example guidelines• Use 2–4 words in text links [Nielsen00].• Use links with 7–12 useful words [Sawyer & Schroeder00].• Consistent layout of graphical interfaces result in a 10–25%
speedup in performance [Mahajan & Shneiderman96].
– There are no theories about what to measure
8
157 Web Design Measures(Metrics Computation Tool)
Text Elements (31)# words, type of words
Link Elements (6)# graphic links, type of links
Graphic Elements (6)# images, type of images
Text Formatting (24)# font styles, colors, alignment, clustering
Link Formatting (3)# colors used for links, standard colors
Graphics Formatting (7) max width of images, page area
Page Formatting (27)quality of color combos, scrolling
Page Performance (37)download time, accessibility
Site Architecture (16)consistency, breadth, depth
TE LE GE
TF LF GF
PF
PP
SA
information, navigation,& graphicdesign
experiencedesign
9
Word Count: 157
10
Content Word Count: 81
11
Body Word Count: 94
12
Step 2: Obtaining a Sample of Evaluated Sites Webby Awards 2000
– Only large corpus of rated Web sites 3000 sites initially
– 27 topical categories• Studied sites from informational categories
– Finance, education, community, living, health, services
100 judges– International Academy of Digital Arts & Sciences
• Internet professionals, familiarity with a category
– 3 rounds of judging (only first round used)• Scores are averaged from 3 or more judges• Converted scores into good (top 33%), average (middle
34%), and poor (bottom 33%)
13
Webby Awards 2000 6 criteria
– Content– Structure &
navigation– Visual design– Functionality– Interactivity– Overall experience
Scale: 1–10 (highest)
Nearly normally distributed
14
Example Page from Good Site
15
Example Page from Avg. Site
16
Example Page from Poor Site
17
The Data Set Downloaded pages from sites
– Downloads informational pages at multiple levels of the site
Computed measures for the sample– Processes static HTML, English pages
• Measures for 5346 pages• Measures for 333 sites
– Categorize by • Topic: education, health, finance, …• Page Type: content, homepage, link page,
…
18
Step 3: Creating Prediction Models
Statistical analysis of quantitative measures– Methods
• Classification & regression tree, linear discriminant classification, & K-means clustering analysis
– Context sensitive models
• Content category, page style, etc.
– Models identify a subset of measures relevant for each prediction
??Good
Average
Poor
19
Page-Level Models (5346 Pages)
Model Method Accuracy
Good
Avg.
Poor
Overall page quality~1782 pgs/class
C&RT 96% 94%
93%
Content category quality~297 pgs/class & cat
LDC 92% 91%
94%
ANOVAs showed that all differences in measures were significant (good vs. avg, good vs. poor, etc.)
20
Page-Level Models (5346 Pages)
Model Method Accuracy
Good
Avg.
Poor
Page type quality~356 pgs/class & type
LDC 84% 78%
84%
Overall page quality C&RT 96% 94% 93%
Content category quality LDC 92% 91% 94%
ANOVAs showed that all differences in measures were significant (good vs. avg, good vs. poor, etc.)
Page Type Classifier (decision tree)Home page, content, form, link, other1770 manually-classified pages, 84% accurate
21
Clustering Good Pages K-means clustering to
identify 3 subgroups ANOVAs revealed key
differences– # words on page, HTML
bytes, table count Characterize clusters as:
– Small-page cluster (1008 pages)
– Large-page cluster (364 pages)
– Formatted-page cluster (450 pages)
Use for detailed analysis of pages
Small page
Large page
Formatted page
22
Step 4: Evaluate Other Sites Make predictions for an existing
design– good, average, poor– How do the scores on th emetrics vary from
good pages?
23
Example
Site drawn from Yahoo Education/Health– Discusses training programs on numerous
health issues– Chose one that looked good at first glance,
but on further inspection seemed to have problems.
– Only 9 pages were available, at level 0 and 1
– Not present in the original study
24
Sample Content Page (Before)
25
26
Page-Level Assessment Decision tree predicts: all 9 pages
consistent with poor pages– Content page does not have accent color;
has colored, bolded body text words• Avoid mixing text attributes (e.g., color, bolding, and
size) [Flanders & Willis98] • Avoid italicizing and underlining text [Schriver97]
27
Page-Level Assessment Cluster mapping
– All pages mapped into the small-page cluster
– Deviated on key measures, including• text link, link cluster, interactive object, content link
word, ad• Most deviations can be attributed to using graphic links
without corresponding text links– Use corresponding text links [Flanders &
Willis98,Sano96]
Link Count Text Link
Count
Good Link Word Count
Font CountSans Serif Word Count
Display Word Count
Top deviant measures for content page
28
Page-Level Assessment
Compared to models for health and education categories– All pages found to be poor for both
models Compared to models for the 5
page styles– All 9 pages were considered poor
pages by page style (after correcting predicted types)
29
Improving the Site Eventually want to automate the translation
from differences to recommendations Revised the pages by hand as follows:
– To improve color count and link count:• Added a link text cluster that mirrors the content of
the graphic links
– To improve text element and text formatting variation
• Added headings to break up paragraphs • Added font variations for body text and headings and
made the copyright text smaller
– Several other changes based on small-page cluster characteristics
30
Sample Content Page (After)
31
32
After the Changes
All pages now classified correctly by style
All pages rated good overall All pages rated good health pages Most pages rated as average
education pages Most pages rated as average by
style
33
Profile Evaluation Small user study
– Page-level comparisons (15 page pairs)• Participants preferred modified pages (57.4% vs.
42.6% of the time, p =.038)
– Site-level ratings (original and modified versions of 2 sites)
• Participants rated modified sites higher than original sites (3.5 vs. 3.0., p=.025)
• Non Web designers had difficulty gauging Web design quality
– Freeform Comments• Subtle changes result in major improvements
34
Summary
Goal: – Provide automated, context-sensitive
suggestions for improving web design. Approach:
– Compute statistics over large collection of rated web sites
– From these build models of good sites– Use these to suggest changes.
Measures
Data
ModelsEvaluate
Validate
35
Advantages and Limitations
Advantages– Derived from empirical data– Context-sensitive– More insight for improving designs– Evolve over time– Applicable to other types of Uis
Limitations– Based on expert ratings– Correlation, not causality – Not a substitute for usability
studies
36
Next Steps Update the profiles (Webby 02 data) Develop tool to facilitate interpretation of
predictions Examine the profiles in more detail
– Factor analysis to highlight design patterns– See which guidelines are valid empirically (studies)
• Moving from predictions to recommendations
Incorporate assessments of content quality (text analysis & studies)
Improve site-level measures and models– Incorporate page-level predictions
New page-level measures (spatial properties) Develop interactive Web design tool
37
Thank You
For more information– http://webtango.berkeley.edu
Research supported by the following grants:Hellman Faculty Fund, Microsoft Research Grant,Gates Millennium Fund, GAANN Fellowship,Lucent Cooperative Research Fellowship Program
Thanks to:Webby Awards (Maya Draisin & Tiffany Shlain)Rashmi Sinha
38
Do Webby Ratings Reflect Usability? Do the profiles assess usability or something
else? User study (30 participants)
– Usability ratings (WAMMI scale) for 57 sites• Two conditions – actual and perceived usability
– Contrast to judges’ ratings
Results– Some correlation between users’ and judges’ ratings– Not a strong finding – Virtually no difference between actual and perceived
usability ratings• Participants thought it would be easier to find info in the
perceived usability condition
top related