LinkSelector: Select Hyperlinks for Web Portals Prof. Olivia Sheng Xiao Fang School of Accounting and Information Systems University of Utah.

Post on 20-Dec-2015

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

LinkSelector: Select Hyperlinks for Web Portals

Prof. Olivia ShengXiao Fang

School of Accounting and Information SystemsUniversity of Utah

2

Agenda

Introduction Problem definition -- Hyperlink

Selection Solution -- LinkSelector Evaluation Collaboration

3

Introduction

Size of WWW More than 3 billion web pages (Google.com, 2001) 1 million pages added daily (Lawrence and

Giles,1999)

How to find information on the Web Using search engines (best coverage 38.3%)

(Lawrence and Giles,1999) Clicking through hyperlinks

4

Introduction

  Product Category List ABCDEF

  

Product Category AProduct List A1A2A3A4A5

  Product A2 Price: 1000Detailed description 

Click on A

Click on A2

Web Page 1

Web Page 2

Web Page 3

B2

5

Introduction

Portal page: is a specific web page which serves as the entrance to a website.

Portal page Important Mainly consisting of hyperlinks

6

Introduction Web portal is a personalized entrance to

a website. (e.g., My Yahoo!)

Default Web Portal/Portal Page

Most My Yahoo! users never customize their default web portals (Manber et al., 2000).

7

Introduction

Homepage of a Website/Portal Page

8

Introduction Not all hyperlinks in a website can be placed in

the portal page of the website

Hyperlinks in a portal page are selected from a hyperlink pool which is a set of hyperlinks pointing to top-level web pages, e.g., hyperlinks in a site index page.

9

Portal page

10

Hyperlink pool

11

Portal page

12

Hyperlink pool

13

Introduction

Number of hyperlinks in a portal page one to several dozens (e.g., 14 in My Yahoo!). (Neilson, 1999)

Number of hyperlinks in a hyperlink pool: one to several hundreds (e.g., 102 in My Yahoo!).

14

Introduction

It is too computational expensive to do an exhaustive search (e.g., ).

Current practice of hyperlink selection – expert selection Based on domain experts’ experiences Subjective and slower to adapt

165.95E14102 C

15

Introduction Our approach is based on

Web access patterns extracted from a web log – objective (web surfers’ actual visiting behaviors)

Web structural patterns extracted from an existing website – objective and dynamically adaptive

16

Hyperlink Selection Metrics to measure the quality of a

portal page Effectiveness Efficiency Usage

The quality of a portal page is measured using a web log.

A web log can be divided into sessions.

17

Hyperlink Selection

Effectiveness: is the percentage of the user-sought top-level web pages that can be easily accessed from a portal page.

Efficiency measures the usefulness of hyperlinks placed in a portal page.

Usage : how often a portal page is visited.

18

Hyperlink Selection

Given

the hyperlink pool of a website, HP,

the number of hyperlinks to be placed in the

portal page of the website, N, where N < |HP|;

Construct the portal page by selecting N hyperlinks

from

the hyperlink pool HP

Objective: optimize the effectiveness, efficiency and

usage

of the resulting portal page

19

LinkSelector

LinkSelector is based on relationships between hyperlinks in a hyperlink pool.

Structure Relationship

Access Relationship

20

LinkSelector

Structure RelationshipL2

L4

L6

L8L1

L3

Web page 1

Web page 2

L5

L7Web page 3

Other Structure relationships:

L1L4 L1L6 L1L8

L3L5 L3L7

Structure relationship:

L1L2

L1: initial hyperlink

L2: terminal hyperlink

21

LinkSelector

A k-HS is denoted as a hyperlink set with k hyperlinks. e.g., {L1,L2} is a 2-HS

The support of a k-HS is the percentage of sessions in which hyperlinks in the k-HS are accessed together.

Example: If L1 and L2 are accessed together in 20 sessions out of total 100 sessions, then the support of the 2-HS {L1,L2} is 20%.

Access Relationship

22

LinkSelector Access Relationship

Definition : For a k-HS , where , there exists an

access relationship among hyperlinks in the k-HS

if and only if its support is greater than a

pre-defined threshold.

2k

Example: If threshold = 0.15 and the support of the 2-HS {L1, L2} is 0.2

then, there exists an access relationship between hyperlinks L1 and L2 and the support of the relationship is 0.2

23

LinkSelector Discover structure relationships

Parse the existing website

Discover access relationships

Data Preprocessing Web log cleaning Session identification

Association rule mining (Agrawal and Srikant,1994 )

24

LinkSelector

25

Evaluation Summary of Data

Hyperlink pool: site-index page of the UA web Site

110 links

26

Evaluation

Summary of Data

Web log: collected from the UA web server in Sep. 2001

10 M records (raw) 4.2 M records (clean)

total 344 K sessions 262 K sessions Training data (23 days) 82 K sessions Testing data (7 days)

27

EvaluationAverage improvement: 12.7%

Improvement decrease from 22.1% to 8.4%

Average number of sessions per day: 11.5k

0.3

0.34

0.38

0.42

0.46

0.5

0.54

0.58

0.62

0.66

2 3 4 5 6 7 8 9 10

Number of Selected Hyperlinks (N)

Eff

ec

tiv

en

es

s

LinkSelector

Expert Selection

Top-Link Selection

28

Evaluation

Group II relationship: 0.2% of the training sessionsGroup I relationship

/shared/sports-entertain.shtml /shared/athletics.shtml

29

EvaluationAverage improvement: 17.0%

Improvement decreases from 30.2% to 9.4%

605/day more user-sought top-level web pages can be easily accessed from the portal page constructed using LinkSelector than from those constructed using the other two approaches

50000

55000

60000

65000

70000

75000

80000

2 3 4 5 6 7 8 9 10

Number of Selected Hyperlinks (N)

Usa

ge

LinkSelecter

Top-Link Selection

Expert Selection

30

EvaluationAverage improvement: 16.9%

Improvement decrease from 30.2% to 9.3%

0.075

0.1

0.125

0.15

0.175

0.2

0.225

0.25

0.275

0.3

0.325

0.35

0.375

0.4

2 3 4 5 6 7 8 9 10

Number of Selected Hyperlinks (N)

Eff

eici

ency

LinkSelecter

Top-Link Selection

Expert Selection

top related