LinkSelector: Select Hyperlinks for Web Portals Prof. Olivia Sheng Xiao Fang School of Accounting and Information Systems University of Utah
Dec 20, 2015
LinkSelector: Select Hyperlinks for Web Portals
Prof. Olivia ShengXiao Fang
School of Accounting and Information SystemsUniversity of Utah
2
Agenda
Introduction Problem definition -- Hyperlink
Selection Solution -- LinkSelector Evaluation Collaboration
3
Introduction
Size of WWW More than 3 billion web pages (Google.com, 2001) 1 million pages added daily (Lawrence and
Giles,1999)
How to find information on the Web Using search engines (best coverage 38.3%)
(Lawrence and Giles,1999) Clicking through hyperlinks
4
Introduction
Product Category List ABCDEF
Product Category AProduct List A1A2A3A4A5
Product A2 Price: 1000Detailed description
Click on A
Click on A2
Web Page 1
Web Page 2
Web Page 3
B2
5
Introduction
Portal page: is a specific web page which serves as the entrance to a website.
Portal page Important Mainly consisting of hyperlinks
6
Introduction Web portal is a personalized entrance to
a website. (e.g., My Yahoo!)
Default Web Portal/Portal Page
Most My Yahoo! users never customize their default web portals (Manber et al., 2000).
7
Introduction
Homepage of a Website/Portal Page
8
Introduction Not all hyperlinks in a website can be placed in
the portal page of the website
Hyperlinks in a portal page are selected from a hyperlink pool which is a set of hyperlinks pointing to top-level web pages, e.g., hyperlinks in a site index page.
9
Portal page
10
Hyperlink pool
11
Portal page
12
Hyperlink pool
13
Introduction
Number of hyperlinks in a portal page one to several dozens (e.g., 14 in My Yahoo!). (Neilson, 1999)
Number of hyperlinks in a hyperlink pool: one to several hundreds (e.g., 102 in My Yahoo!).
14
Introduction
It is too computational expensive to do an exhaustive search (e.g., ).
Current practice of hyperlink selection – expert selection Based on domain experts’ experiences Subjective and slower to adapt
165.95E14102 C
15
Introduction Our approach is based on
Web access patterns extracted from a web log – objective (web surfers’ actual visiting behaviors)
Web structural patterns extracted from an existing website – objective and dynamically adaptive
16
Hyperlink Selection Metrics to measure the quality of a
portal page Effectiveness Efficiency Usage
The quality of a portal page is measured using a web log.
A web log can be divided into sessions.
17
Hyperlink Selection
Effectiveness: is the percentage of the user-sought top-level web pages that can be easily accessed from a portal page.
Efficiency measures the usefulness of hyperlinks placed in a portal page.
Usage : how often a portal page is visited.
18
Hyperlink Selection
Given
the hyperlink pool of a website, HP,
the number of hyperlinks to be placed in the
portal page of the website, N, where N < |HP|;
Construct the portal page by selecting N hyperlinks
from
the hyperlink pool HP
Objective: optimize the effectiveness, efficiency and
usage
of the resulting portal page
19
LinkSelector
LinkSelector is based on relationships between hyperlinks in a hyperlink pool.
Structure Relationship
Access Relationship
20
LinkSelector
Structure RelationshipL2
L4
L6
L8L1
L3
Web page 1
Web page 2
L5
L7Web page 3
Other Structure relationships:
L1L4 L1L6 L1L8
L3L5 L3L7
Structure relationship:
L1L2
L1: initial hyperlink
L2: terminal hyperlink
21
LinkSelector
A k-HS is denoted as a hyperlink set with k hyperlinks. e.g., {L1,L2} is a 2-HS
The support of a k-HS is the percentage of sessions in which hyperlinks in the k-HS are accessed together.
Example: If L1 and L2 are accessed together in 20 sessions out of total 100 sessions, then the support of the 2-HS {L1,L2} is 20%.
Access Relationship
22
LinkSelector Access Relationship
Definition : For a k-HS , where , there exists an
access relationship among hyperlinks in the k-HS
if and only if its support is greater than a
pre-defined threshold.
2k
Example: If threshold = 0.15 and the support of the 2-HS {L1, L2} is 0.2
then, there exists an access relationship between hyperlinks L1 and L2 and the support of the relationship is 0.2
23
LinkSelector Discover structure relationships
Parse the existing website
Discover access relationships
Data Preprocessing Web log cleaning Session identification
Association rule mining (Agrawal and Srikant,1994 )
24
LinkSelector
25
Evaluation Summary of Data
Hyperlink pool: site-index page of the UA web Site
110 links
26
Evaluation
Summary of Data
Web log: collected from the UA web server in Sep. 2001
10 M records (raw) 4.2 M records (clean)
total 344 K sessions 262 K sessions Training data (23 days) 82 K sessions Testing data (7 days)
27
EvaluationAverage improvement: 12.7%
Improvement decrease from 22.1% to 8.4%
Average number of sessions per day: 11.5k
0.3
0.34
0.38
0.42
0.46
0.5
0.54
0.58
0.62
0.66
2 3 4 5 6 7 8 9 10
Number of Selected Hyperlinks (N)
Eff
ec
tiv
en
es
s
LinkSelector
Expert Selection
Top-Link Selection
28
Evaluation
Group II relationship: 0.2% of the training sessionsGroup I relationship
/shared/sports-entertain.shtml /shared/athletics.shtml
29
EvaluationAverage improvement: 17.0%
Improvement decreases from 30.2% to 9.4%
605/day more user-sought top-level web pages can be easily accessed from the portal page constructed using LinkSelector than from those constructed using the other two approaches
50000
55000
60000
65000
70000
75000
80000
2 3 4 5 6 7 8 9 10
Number of Selected Hyperlinks (N)
Usa
ge
LinkSelecter
Top-Link Selection
Expert Selection
30
EvaluationAverage improvement: 16.9%
Improvement decrease from 30.2% to 9.3%
0.075
0.1
0.125
0.15
0.175
0.2
0.225
0.25
0.275
0.3
0.325
0.35
0.375
0.4
2 3 4 5 6 7 8 9 10
Number of Selected Hyperlinks (N)
Eff
eici
ency
LinkSelecter
Top-Link Selection
Expert Selection