This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
SEMANTIC WEB-PAGE RECOMMENDER SYSTEM P.Vinothini, T.vetriselvi
Input: An access sequence database, WASD A support threshold
Output: Set of weighted access patterns
Method: 1. For each web access sequence s=p1,p2,….,pn
Set weight (pi) =0; Let length =0; Create linked list C, where node containing item name and their weight;
Set weight to 0; For each occurrence of item pi ,
Increment freq (pi) and add Time (pi); Update the values in C; End for; Update the list of items in LIN with the C
For each pi, Update Take harmonic mean of freq(pi) and Time(pi); Assign it to weight (pi); {End for} 2. For each item pi in LIN, check whether it passes the Support threshold, add the item into frequent pattern 3. Call LL-Mine 4. Return
TABLE2: Algorithm for LL-Mine
Algorithm: LL-Mine Parameters:
Current frequent pattern, p List of fist occurrence, L Absolute support, η
Method: 1. for each weighted frequent item, pi
i. generate the first occurrences list, L1, Initialize L1 with Weight_support=0; Locate the first occurrences of the element p in projected databases D-p using L; Generate L1 with node holding seq-id and pos; Add the weight of the item at each occurrence; Update the header of the list L1 with Weight_support (pi); ii. If the Weight_Support (pi) > η
Add p.pi to F, set of pattern Add p.pi to stack for suffix building. p= p.pi Call LL-Mine (p, L1, η) {End if}
iii Delete the current L. {End for} 2. Return
3.4 semantic network construction:
This section presents the first model, i.e.
Semantic network of a website and their schema and
explains the queries to infer the terms and webpages.
Semantic network is a kind of knowledge map which
represents concepts as domain terms and Web-pages, and
relations between the concepts. To construct the semantic
network, domain terms are collected from the Web-page
titles and then we extract the relations between these
terms by these two aspects: (i) the collocations of terms-
determined by the co-occurrence relations of terms in
Web-page titles; and (ii) the associations between terms
and webpages.
In order to know how these terms are
semantically related, the domain terms and co-occurrence
relations are weighted. Based on these relations, we can
guess how closely the Web-page is associated with each
other semantically. To infer the semantics of Web-pages,
we can query about the relations including relevant pages
and key terms for a given page, and the pages for given
Process: Let TSC = {PageID,X= t1t2 . . . tm , URL } Initialize G;Let R= root or the start node of G Let E= the end node of G For each PageID and each sequence X in TSC{ Initialize a WPage object identified as PageID
For each term ti ϵ X { If node ti is not found in G, then Initialize an Instance object I as a node of G Set I.Name =ti
Else Set I= the Instance object named ti in G Increase I.iOccur by 1 If (i==0) then Initialize an OutLink R-ti if not found Increase R-ti.iWeightby 1 Set R-ti fromInstance=R Set R-ti toInstance =I If (i>0 & i<m) then Get PreI =the Instance object with name ti-1
Initialize an OutLink ti-1-ti if not found Increase ti-1-ti.iWeight by 1 Set ti-1-ti.toInstance = I Set ti-1-ti.fromInstance = preI If (i==m) then Initialize an OutLink ti-E if not found Increase ti-E.iWeight by 1 Set ti –E.toInstance =E Set ti –E.fromInstance = I Set I.hasWPage = PageID Add term ti into PageID.Keywords
} }
5. TermNavNet ALGORITHM: In Section 4, we presented TermNetWP, which
represents the semantics of Web-pages within a website
efficiently but they are not sufficient for making effective
Web-page recommendations on their own. To overcome
this issue, we should integrate the TermNetWP with Web
usage knowledge to obtain the semantic Web usage
knowledge.
The notations used to represent the TermNavNet are
summarized as follows:
∂x: Number of occurrences of tx in F;
∂x, y: Number of times that tx followed by ty in F and there is no
term between them;
∂S,x :Number of times domain term tx is the first item in a
domain term pattern f ;
∂x,E: Number of times a domain term pattern f terminates at
domain term tx ;
∂x,y,z: Number of times that (tx, ty) followed by tz in F and there
is no term between them.
The probability of a transition is estimated by the ratio of
the number of times the corresponding sequence of states
(i.e. visited Web-page) was traversed and the number of
times the anchor state occurred. In our system, we take
into account first-order and second-order transition
probabilities.
Given a CPM having states {S, t1 . . . tp , E}, and N is the
number of term patterns in F, the first-order transition
probabilities are estimated according to the following
expressions:
Transition from the starting state S to state tx:
𝜌𝑆,𝑥 =𝜕𝑆,𝑥
∑ 𝜕𝑆,𝑦 𝑛𝑦=1
(1)
Transition from state tx to ty:
𝜌𝑥,𝑦 = 𝜕𝑥,𝑦
𝜕𝑥 (2)
Transition from state tx to the final state E:
𝜌𝑥,𝐸 = 𝜕𝑥,𝐸
𝜕𝑥 (3)
The second-order transition probability, which is the
probability of the transition (ty, tz) given that the previous
transition that occurred was (tx, ty), are estimated as
follows:
𝜌𝑥,𝑦 ,𝑧 = 𝜕𝑥,𝑦,𝑧
𝜕𝑥,𝑦 (4)
The conceptual prediction model is represented as a triple: Cpm
:=( N, Φ, M), where
N = {(tx, ∂x)}: Set of terms along with the
corresponding occurrence counts,
Φ = {(tx , ty , ∂x,y , ρx,y)}: set of transitions from tx to ty,
along with their transition weights (∂x,y), and first-order
transition probabilities (ρx,y),
M = {(tx , ty, tz, ∂x,y,z, ρx,y,z )}: Set of transitions from tx
, ty to tz, along with their transition weights (∂x,y,z ), and second-
order transition probabilities (ρx,y,z ). If M is non-empty, the
CPM is considered as the second order conceptual prediction
model, otherwise the first-order conceptual prediction model.
5.1 Schema of CPM
TermNavNet is automatically implemented in
OWL. The schema consists of classes cNode defines the
current state node and cOutLink defines the association
from the current state node to a next state node with a
transition probability Prob (e.g. ρx,y.) and relationship
properties referred as inLink, outLink and LinkTo.
CPM schema with FVTP by using following algorithm.
We can obtain a 1st or 2ndorder TermNavNet by using
the 1st or 2nd-order CPM, respectively to update the
transition probability Prob based on first-order or second-
order probability formula.
TABLE 4: TermNavNet construction
Algorithm: Building TermNavNet
Input: F (FVTP) Output: M (TermNavNet) Process: Initialize M For each F= t1t2…tm ϵ F For each ti ϵ F Initialize cNode objects with NodeName = ti ,ti-1, ti+1
Occur =1 if they are not found in M Initialize a cOutLink object with Name =ti_ti+1 and Occur =1 if it is not found in M Increase ti.Occur and ti_ti+1.Occur if they found in M ti_ti+1.linkTo = ti+1
ti.outLink = ti_ti+1
ti.inLink =ti-1
Update all objects into M Update transition probabilities in the cOutLink objects Return M
5.3 Queries
RecTerm (tx, ty) is used to query the next viewed
terms for a given current viewed term curt and previous
viewed term prêt by applying second order transition
probability. If first-order transition probability is used
and we want to query the next viewed terms for a given
current viewed term curT using the query RecTerm (tx).
6. SEMANTIC-ENHANCED WEB-
PAGE RECOMMENDATION
STRATEGIES
Two Web-page recommendation strategies are
proposed depending on the order of CPM (i.e. for a given
current web-page or combination of current and previous
web-page, recommendations are made) as follows:
Recommendation strategy-1 uses TermNetWP and the first-
order CPM:
Step 1 builds TermNetWP;
Step 2 generates FWAP using LL-Mine;
Step 3 builds FVTP;
Step 4 builds a 1st-TermNavNet given FVTP;
Step 5 identifies a set of currently viewed terms
{tk} using query Querytopic (dk) on TermNetWP;
Step 6 infers next viewed terms {tk+1} given each
term in {tk} using query Recterm (tk) on the 1st-order
TermNavNet;
Step 7 recommends pages mapped to each term
in {tk+1} using query Querypage (tk+1) on TermNetWP.
Recommendation strategy-2 uses TermNetWP and the second-
order CPM:
Step 1 builds TermNetWP;
Step 2 generates FWAP using LL-Mine;
Step 3 builds FVTP;
Step 4 builds a 2nd-order TermNavNet given
FVTP.
Step 5 identifies a set of previously viewed terms
{tk-1}, and a set of currently viewed terms {tk} using query
Querytopic (d), d ∈ {dk-1, dk}, on TermNetWP;
Step 6 infers next viewed terms {tk+1} given each
pair {tk-1,tk} using query Recterm(tk-1, tk) on the 2nd-order
TermNavNet;
Step 7 recommends pages mapped to each term
in {tk+1} using query Querypage (tk+1) on TermNetWP.
Web-page recommendation rule, denoted as Rec, is
defined as a set of recommended Web-pages that are
generated by a Web-page recommendation strategy. A
Web-page recommendation rule can be categorised as
follows:
1) Recommendation rule is correct if next web page
accessed by the current user is present in the Rec.
2) Recommendation rule is satisfied if the User’s target
page will be accessed through any of the Web-page
present in Rec.
3) Recommendation rule is empty if next webpage
accessed by the user is not present in the Rec.
In [16], Zhou stated that the performance of Web-page
recommendation strategies is measured in terms of two
performance metrics: Precision and Satisfaction.
Let Rc is the sub-set of Rec, which consists of all correct
recommendation rules. The Web-page recommendation
precision is defined as:
Precision= |𝑅𝑐|
|𝑅𝑒𝑐| (5)
Let Rs be the sub-set of Rec, which consists of all satisfied
recommendation rules. The satisfaction for Web-page