2002-04-24 CHI Web Behavior Patterns 1 Separating the Swarm Separating the Swarm Categorization Methods for Categorization Methods for User Sessions on the Web User Sessions on the Web Jeffrey Heer, Ed H. Chi Palo Alto Research Center 2002.04.24 – CHI Web Behavior Patterns
24
Embed
2002-04-24CHI Web Behavior Patterns1 Separating the Swarm Categorization Methods for User Sessions on the Web Jeffrey Heer, Ed H. Chi Palo Alto Research.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
2002-04-24 CHI Web Behavior Patterns 1
Separating the SwarmSeparating the Swarm Categorization Methods for Categorization Methods for User Sessions on the Web User Sessions on the Web
Jeffrey Heer, Ed H. Chi
Palo Alto Research Center
2002.04.24 – CHI Web Behavior Patterns
2002-04-24 CHI Web Behavior Patterns 2
Web Analytics: Web Analytics: What can you measure?What can you measure?
- content- page traffic
Marketing
Infrastructure- load testing
- user intent- usability- user experience
Site Design
Want to improve site design, content, and performance
2002-04-24 CHI Web Behavior Patterns 3
The Change in Web Sites:The Change in Web Sites:What What shouldshould you measure? you measure?
Page-based websites
Activity-based websites
Time
Sit
e C
om
ple
xity
Products
Management Team
I’d like information on used cars.
Search for a car dealer in my neighborhood.
TRAFFIC
USER EXPERIENCE
2002-04-24 CHI Web Behavior Patterns 4
MotivationMotivation
What are users’ information goals?
Understanding the composition of web user traffic.
Strategy: Use all available data to discover user goals.(Content, Usage, Topology)
System Description Evaluation Implications Conclusion
2002-04-24 CHI Web Behavior Patterns 5
System DescriptionSystem Description
Generate a user profile for each user session.– How: Use access logs and site content to to build
a multi-featured model of user activity (multi-modal clustering).
Group user profiles into common activities like “product browsing” and “job seeking”– How: Apply clustering algorithms to user profiles
2002-04-24 CHI Web Behavior Patterns 6
System DescriptionSystem Description
Web CrawlAccess Logs
Document Model
User Sessions
User Profiles
ClusteredProfiles
Steps:
1. Process Access Logs
2. Crawl Web Site
3. Build Document Model
4. Extract User Sessions
5. Build User Profiles
6. Cluster Profiles
2002-04-24 CHI Web Behavior Patterns 7
Document ModelDocument Model
Site is crawled– Pay special attention to pages in logs.
Documents described by feature vectors:Content: TF.IDF weighted keyword vector
URL: Tokenized and TF.IDF weighted
Inlinks: Column vectors in topology matrix
Outlinks: Row vectors in topology matrix
Vectors are concatenated to form a single multi-modal vector Pd for each document.
Web CrawlAccess Logs
Document Model
User Sessions
User Profiles
ClusteredProfiles
2002-04-24 CHI Web Behavior Patterns 8
User SessionsUser Sessions
Sessions extracted and represented by a vector s:– For path i = ABD, si = <1,1,0,1,0>
(For site with 5 documents <A,B,C,D,E>)
Different weightings can be employed in creating the session vector s:Frequency: number of times each page is accessed. ABD, s = <1,1,0,1,0> TF.IDF: hits / # paths including pagePosition: Use order of pages within surfing path.
ABD, s = <1,2,0,3,0>View Time: Use time spent viewing pages.
A10sB20sD15s, s = <10,20,0,15,0>
Web CrawlAccess Logs
Document Model
User Sessions
User Profiles
ClusteredProfiles
2002-04-24 CHI Web Behavior Patterns 9
User ProfilesUser Profiles
User profiles are linear combination of the viewed pages.– “You are what you see.”
N
ddidi PsUP
1User Profiles
Session weights
Document Vectors
Web CrawlAccess Logs
Document Model
User Sessions
User Profiles
ClusteredProfiles
2002-04-24 CHI Web Behavior Patterns 10
ClusteringClustering
Clustering is a form of statistical analysis which organizes data into individual clusters.
– Groupings are determined by a shared similarity.
– Similarity is defined by a computable similarity metric.
Clustering proceeds by recursive bisection, using K-Means to perform the bisections [Zhao01].
Web CrawlAccess Logs
Document Model
User Sessions
User Profiles
ClusteredProfiles
Modalitesm
mj
mimji UPUPwUPUPd ),cos(),(
weights wm specify the
contribution of each modality
2002-04-24 CHI Web Behavior Patterns 11
User population breakdown
Detailed stats
Keywords describing
user groups
Frequent documents accessed by group
2002-04-24 CHI Web Behavior Patterns 12
Clustering ResultsClustering Results
Users reached end of tutorial, had nowhere to go.
http://www.diamondreview.com
2002-04-24 CHI Web Behavior Patterns 13
System EvaluationSystem Evaluation
Does the system correctly infer user intentions?
Logs
System
User Intent Groupings
User Intent
Compare
2002-04-24 CHI Web Behavior Patterns 14
User StudyUser Study
Asked users to surf specific tasks on www.xerox.com– captured actions using the WebQuilt proxy logger [Hong01]– done at their leisure.
15 unique tasks: – Tasks developed after exploring xerox.com and reading user
e-mail feedback– 5 task groups with 3 tasks per group.– Products, TechSupport, Supplies, Company Info, and Jobs
Participation:– 21 users signed up, 18 went through, 104 usable sessions.
2002-04-24 CHI Web Behavior Patterns 15
Results: Results: 340 combinations of clustering schemes
Outlink-based schemes performed poorly (omitted).
2002-04-24 CHI Web Behavior Patterns 16
Analysis: ModalitiesAnalysis: ModalitiesAnalys is of Modalities in Unim odal Cases
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
Path Weighting Schem es
% c
orr
ec
tly
clu
ste
red
RAW PATH
CONTENT
URL
INLINK
OUTLINK
Linear Contrast shows Content sig. different:(unimodal) F(1,105)=32.51, MSE=.005361, p<0.0001
(multimodal) F(1,35)=33.36, MSE=.007332, p<0.0001
Content is King! Mean=0.96, StdDev=0.07
2002-04-24 CHI Web Behavior Patterns 17
Analysis: Path WeightingAnalysis: Path Weighting
Paired t-Test between Time-based and non-Time based weightings: n=60, t(59)=4.85, p=4.68e-6