Bamshad Mobasher - DePaul University, Chicago Bettina Berendt - Humboldt University Berlin Myra Spiliopoulou - Leipzig Graduate School of Management KDD for Personalization PKDD 2001 Tutorial September 6, 2001 KDD for Personalization PKDD 2001 Tutorial September 6, 2001 PKDD 2001 Tutorial: “KDD for Personalization” Web Personalization • The Problem – dynamically serve customized content (pages, products, recommendations, etc.) to users based on their profiles, preferences, or expected interests • Personalization v. Customization – In customization, user controls and customizes the site or the product based on his/her preferences – usually manual, but sometimes semi-automatic based on a given user profile – Personalization is done automatically based on the user’s actions, the user’s profile, and (possibly) the profiles of others with “similar” profiles [I-2]
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Bamshad Mobasher - DePaul University, Chicago
Bettina Berendt - Humboldt University Berlin
Myra Spiliopoulou - Leipzig Graduate School of Management
KDD for PersonalizationPKDD 2001 Tutorial
September 6, 2001
KDD for PersonalizationPKDD 2001 Tutorial
September 6, 2001
[2]PKDD 2001 Tutorial: “KDD for Personalization”
Web Personalization
• The Problem
– dynamically serve customized content (pages, products,recommendations, etc.) to users based on their profiles,preferences, or expected interests
• Personalization v. Customization
– In customization, user controls and customizes the siteor the product based on his/her preferences
– usually manual, but sometimes semi-automatic based ona given user profile
– Personalization is done automatically based on theuser’s actions, the user’s profile, and (possibly) theprofiles of others with “similar” profiles
[I-2]
[3]PKDD 2001 Tutorial: “KDD for Personalization”
my.yahoo.commy.yahoo.com
Customization Example
[I-3]
[4]PKDD 2001 Tutorial: “KDD for Personalization”
amazon.comamazon.com
Personalization Example
[I-4]
systemrecommends
A simplified scheme for personalization
PKDD 2001 Tutorial: "KDD for Personalization"
navigation histories- co-occurrence in user´s other navigation histories
selectswhat kind?- document etc.- query
user
other information object(s)
information object(s)
- co-occurrence in other users´- similarity (syntactic/semantic)why?
how?- request, specification- rating related to
[I-5]
Know Thy Customer Knowledge is Power
�Relationships based on customer insight propel an organization from
simply treating customers e�ciently to treating them relative to their
needs, preferences, and value potential. . . .
�Knowing the customer is paramount in today's marketplace where the
customer has more options, greater �exibility and higher expectations.
. . . �
John C. Nash (Accenture) in [46]
PKDD 2001 Tutorial: �KDD for Personalization� [I-6]
Customer knowledge implies:
1.) Acquisition of customer data
2.) Analysis of customer data
3.) Action in accordance with the gained insights
PKDD 2001 Tutorial: �KDD for Personalization� [I-7]
Acquisition of customer data
Customer data are recordings of:
� preferences
� transactions
� pre-sales contacts
� after-sales support
� demographic information
Some of these data:
� may be purchased from third parties
� may be held in multiple disparate databases that serve completely
di�erent purposes
� are of varying quality
with respect to error rates, reliability, coverage, representativeness
�! Data Preparation
PKDD 2001 Tutorial: �KDD for Personalization� [I-8]
Analysis of customer data
Data analysis should provide feedback on questions like
� Which users will become customers?
� Which customers will return again?
� Who is more likely to respond to a promotion action?
� Who would be interested in cross-sale/up-sale suggestions?
closely related to questions like
� Is the Web-site appropriately designed to serve the organisation's
goals?
� Are the customers satis�ed?
� Are the customers satis�ed enough to come again?
� Are the customers satis�ed enough to become promoters of the site?
�! Data Mining
PKDD 2001 Tutorial: �KDD for Personalization� [I-9]
Action in accordance with the gained insights
� Alignment of the marketing policy
� Alignment of the supply chain, including after sales support
� Adjustment of the web site
� static site re-design
� Browsing/Navigation suggestions
� Recommendations on the page
� Intelligent assistance
� Personalized layout and content
Fact: The time lag between insight and action should be minimized.
PKDD 2001 Tutorial: �KDD for Personalization� [I-10]
The action should create value
� for the customer
� for the organisation
PKDD 2001 Tutorial: �KDD for Personalization� [I-11]
A short excursion on value creation
In B2C e-commerce, is not su�cient to:
� o�er an existing product through the Internet
� digitize part/all of the merchandizing chain
� introduce a brilliant new product in the market
The product must bring added value to
� win the customer Customer Conversion
� retain the customer Customer Retention
PKDD 2001 Tutorial: �KDD for Personalization� [1-12]
The model of Kuhlen considers the following types of value [32]:
(1) comparative
(2) improving e�ciency
(3) improving e�ectivity
(4) integrative
(5) organisational
(6) strategic
(7) innovative
PKDD 2001 Tutorial: �KDD for Personalization� [1-13]
From Acquisition to Action
� There is no lack of data.
� Clickstream data accumulate in tremendous pace.
� Demographic data can be acquired.
� Customer pro�les are available or can be acquired.
� There is no lack of methodologies for data analysis.
� The ability to exploit the data increases at a much slower pace [46]
and the number of personalized Web sites is not really large.
� The tolerable elapsed time between acquisition and action is low
[16].
PKDD 2001 Tutorial: �KDD for Personalization� [I-14]
PKDD 2001 Tutorial: "KDD for Personalization"
Personalization: An HCI perspective
= does personalization increase usability?
A Web site’s usability is high if users
- experience high subjective satisfaction.
- achieve their goals / perform their tasks in little time,- do so with a low error rate,
- experts and "normal" users- questionnaires and experiments
- qualitative and quantitative methods
Usability is a special concern on the Web because unlike with other products / software, "users experience
Usability testing:
usability first and pay later". (Nielsen [B12])
[I-15]
[49]
[DP-1]PKDD 2001 Tutorial: “KDD for Personalization”
Data Preparation for Personalization
[DP-2]PKDD 2001 Tutorial: “KDD for Personalization”
Web Usage Mining
• Discovery of meaningful patterns from datagenerated by client-server transactions on one ormore Web servers
• Typical Sources of Data
– automatically generated data stored in server accesslogs, referrer logs, agent logs, and client-side cookies
– e-commerce and product-oriented user events (e.g.,shopping cart changes, ad or product click-throughs,etc.)
– user profiles and/or user ratings
– meta-data, page attributes, page content, site structure
[DP-4]PKDD 2001 Tutorial: “KDD for Personalization”
P reprocess ing P attern Ana lysisP atte rn D iscovery
C ontent andS tructure D ata
"Interesting"R u les, Patte rns,
and S ta tistics
R u les, Patte rns,and S ta tistics
PreprocessedC lickstream
D ata
R aw U sageD ata
The Web Usage Mining Process
[DP-5]PKDD 2001 Tutorial: “KDD for Personalization”
Raw UsageData
DataCleaning
EpisodeIdentification
User/SessionIdentification
Page ViewIdentification
PathCompletion Server Session File
Episode File
Site Structureand Content
Usage Statistics
Usage Data Preprocessing
[DP-6]PKDD 2001 Tutorial: “KDD for Personalization”
Data Preprocessing for Web Usage Mining
• Data cleaning
– remove irrelevant references and fields in server logs
– remove references due to spider navigation
– remove erroneous references
– add missing references due to caching (done aftersessionization)
• Data integration
– synchronize data from multiple server logs
– integrate e-commerce and application server data
– integrate meta-data (e.g., content labels)
– integrate demographic / registration data
[DP-7]PKDD 2001 Tutorial: “KDD for Personalization”
Data Preparation for Web Usage Mining(Cooley, Mobasher, Srivastava, 1999 [15])
• Data Transformation
– user identification
– sessionization / episode identification
– pageview identification
• a pageview is a set of page files and associated objectsthat contribute to a single display in a Web Browser
• Data Reduction
– sampling and dimensionality reduction (ignoringcertain pageviews / items)
• Identifying User Transactions (i.e., sets or sequencesof pageviews possibly with associated weights)
[DP-8]PKDD 2001 Tutorial: “KDD for Personalization”
User and Session Identification: Need forReliable Usage Data
• Validity of results in Web usage mining is affected bythe ability to:
– distinguish among different users to a site
– reconstruct the activities of the users within the site
• Difficult to obtaining reliable usage data
– proxy servers and anonymizers
– rotating IP addresses connections through ISPs
– missing references due to caching
– inability of servers to distinguish among different visits
[DP-9]PKDD 2001 Tutorial: “KDD for Personalization”
Identifying Users and Sessions
• Server log L is a list of log entries each containingtimestamp, host identifier, URL request (includingURL stem and query), referrer, agent, cookie, etc.
• User identification and sessionization
– user activity log is a sequence of log entries in Lbelonging to the same user
– user identification is the process of partitioning L intoa set of user activity logs
– the goal of sessionization is to further partition eachuser activity log into sequences of entriescorresponding to each user visit
[DP-10]PKDD 2001 Tutorial: “KDD for Personalization”
Sessionization Heuristics
• Real v. Constructed Sessions
– Conceptually, the log L is partitioned into an orderedcollection of “real” sessions R
– Each heuristic h partitions L into an ordered collectionof “constructed sessions” Ch
– The ideal heuristic h*: Ch* = R
• Two Basic Types of Sessionization Heuristics
– Time-oriented heuristics
– Navigation-oriented heuristics
[DP-11]PKDD 2001 Tutorial: “KDD for Personalization”
Time-Oriented Heuristics
• Consider boundaries on time spent on individualpages or in the entire a site during a single visit
– Boundaries can be based on a maximum sessionlength or maximum time allowable for each pageview
– Additional granularity can be obtained by treatingdifferent boundaries on different (types of) pageviews
h1: Given t0, and a threshold θθθθ, the timestamp for firstrequest in a constructed session S, the request withtimestamp t is assigned to S, iff t - t0 ≤≤≤≤ θθθθ.
h2: Given t1, and a threshold δδδδ, the timestamp for arequest in constructed session S, the next requestwith timestamp t2 is assigned to S, iff t2 - t1 ≤≤≤≤ δδδδ.
[DP-12]PKDD 2001 Tutorial: “KDD for Personalization”
Navigation-Oriented Heuristics
• Take the linkage between pages into account
– “linkage” can be based on site topology (e.g., split asession at a request that could not have been reachedfrom previous requests in the session)
– or can be usage-based (using referrers in log entries)
• usually more restrictive than topology-based heuristicsand more difficult to implement in frame-based sites
href: Given two consecutive requests p and q, with pbelonging to constructed session S. Then q is assignedto S, if the referrer for q was previously invoked in S, or ifthe referrer for q is “ undefined ” and tq - tp ≤≤≤≤ ∆∆∆∆ (time delay∆∆∆∆ is to allow for proper loading of frameset pages).
[DP-13]PKDD 2001 Tutorial: “KDD for Personalization”
Measures for Sessionization Accuracy(Berendt, Mobasher, Spiliopoulou, 2001 [7])
• A heuristic h maps entries in the log L intoelements of constructed sessions, such that:
– (a) each entry in L is mapped to exactly one elementof a constructed session
– (b) the mapping is order-preserving
• Measures quantify the successful mappings of realsessions to constructed sessions
– a measure M evaluates a heuristic h based on thedifferences between Ch and R
– each measure assigns to h a value M(h) ∈∈∈∈ [0,1] sothat M(h*) = 1
[DP-14]PKDD 2001 Tutorial: “KDD for Personalization”
Measures for Sessionization Accuracy
• Categorical and Gradual Measures
– categorical measures : based on the number of realsessions that are reconstructed by the heuristics
– gradual measures : based on the degree to which thereal sessions are reconstructed by the heuristics
[DP-15]PKDD 2001 Tutorial: “KDD for Personalization”
Categorical Measures
• Based on the notion of “complete reconstruction”
– a real session is completely reconstructed if all itselements are contained in the same constructedsession
– the measure Mcr(h) is the ratio of the number ofcompletely reconstructed real sessions in Ch to thetotal number of real sessions |R|
[DP-16]PKDD 2001 Tutorial: “KDD for Personalization”
Categorical Measures
• Derived categorical measures:
– Mcrs considers only completely reconstructed realsessions whose first element is also the first element ofa constructed session
– Mcre considers only completely reconstructed realsessions whose last element is also the last element ofa constructed session
– Mcrse considers only completely reconstructed realsessions with correct starts and ends
• in absence of overlapping real sessions for individualusers, this gives the number of constructed sessionsthat are identical to corresponding real sessions
[DP-17]PKDD 2001 Tutorial: “KDD for Personalization”
Gradual Measures
• Allow for measuring partial overlaps between realand constructed sessions
– degree of overlap between real sessions r andconstructed session c, dego(r,c), is the number ofelements they have in common divided by totalnumber of elements in r.
– degree of overlap for a real session r is the maximumdego(r,c) over all constructed sessions c.
– the measure Mo(h) is the average degree of overlapover all real sessions
– if a real session is completely reconstructed, itsoverlap degree is 1
[DP-18]PKDD 2001 Tutorial: “KDD for Personalization”
Gradual Measures
• To take the size of constructed session into account,we define the degree of similarity
– degs(r,c) = | r ∩∩∩∩ c | / | r ∪∪∪∪ c |
– Ms(h) is is the average degree of similarityt over all realsessions
– if a real session is completely reconstructed, itssimilarity degree is 1
[DP-19]PKDD 2001 Tutorial: “KDD for Personalization”
Which Measures?
• The choice of the measures depends on the goals ofusage analysis, for example:
– “complete reconstruction” may be appropriate forclustering and association-based analyses (it correctlyshows set of pages accessed together)
• it also preserves sequential order of accesses, so it canbe used for the analysis of users’ navigational behavior
– Mcrs : useful for analyzing access to entry points
– Mcre: useful for analyzing access to exit points
– overlap-based measures can be useful for comparingoverall effectiveness of sessionization heuristics ingrouping pages or objects
[DP-20]PKDD 2001 Tutorial: “KDD for Personalization”
Which Sessionization Heuristics?
• The choice of sessionization heuristic depends onthe characteristics of the data
– if individual users visit the site in short but temporallydense sessions, h2 may perform better than h1
– in cases when timestamps are not reliable (e.g., usingintegrated data across many log files), href may be abetter choice for sessionization
1. Log processing: Establishment of sessions as sets of page requests
2. Cluster mining: Grouping of co-occuring non-linked pages with help
of the site graph
3. Conceptual clustering:
� The representative concept of each cluster is identi�ed.
� Cluster members not adhering to this concept are removed from
the cluster.
� Pages adhering to this concept and not appearing in the cluster
are attached to the cluster.
PKDD 2001 Tutorial: �KDD for Personalization� [PD-3]
For each cluster, the IndexFinder presents to the Web designer:
� An index page with links to all pages of a cluster
The Web designer decides:
� whether the new page should indeed be established
� what its label should be
� where it should be located in the site
According to our categorization:
Visibility:
Static page/site adjustment
Service element: page containing
single application object
Matching based on:
user behaviour and page content
O�-line pattern discovery
PKDD 2001 Tutorial: �KDD for Personalization� [PD-4]
Pattern Discovery for Recommendations
The Collaborative Filtering Approach
Main idea: The objects suggested to a user are those preferred by users
similar to her.
1. The user's transaction is matched against logged transactions.
2. The matches are ranked.
3. The best (set of) match(es) are selected.
4. The objects that were shown in the selected transactions are
ranked � excluding objects already seen.
5. The objects with the highermost rank are shown to the user.
All steps on-line
PKDD 2001 Tutorial: �KDD for Personalization� [PD-5]
Pattern Discovery for Recommendations
The Data Mining Approach
Main idea: User similarity can be de�ned in terms of behaviour,
interests, preferences etc that can be modelled o�-line
1. Pattern discovery over the logged data
2. The contents of the user's transaction are matched against
the discovered patterns.
3. The matches are ranked.
4. The objects associated with the best matches are ranked �
excluding objects already seen.
5. The objects with the highermost rank are shown to the user.
so that) The voluminous logged data are only processed o�-line.
) On-line matching is performed against derived patterns.
PKDD 2001 Tutorial: �KDD for Personalization� [PD-6]
Pattern Discovery Recommendations on correlated items
The approach of Vucetic and Obradovic [50]
The recommendation problem is de�ned as:
Given the ratings of the active user on a set of items, which will
be her ratings on the remaining items?
Main idea:The ratings of an item can be predicted from the ratings
on correlated items.
Visibility:
Personal recommendation
Service element: application object
Matching based on: Rat-
ings of correlated items
O�-line discovery of predictors for the
impact of item correlation on ratings
PKDD 2001 Tutorial: �KDD for Personalization� [PD-7]
Methodology:
� The rating of each item given another item is approximated using a
linear function (named: expert).
� The average correlation among pairs of items is approximated using
random sampling over the user ratings.
� A weighting scheme is proposed to deal with the fact that users with
similar preferences may provide di�erent ratings for the same set of
items.
In this scheme:
� The linear experts for all pairs of items can be computed o�-line.
� The ratings for an active user are predicted from the set of pairs of
items rather than the set of user ratings.
PKDD 2001 Tutorial: �KDD for Personalization� [PD-8]
Pattern Discovery Repeat-buying theory for personalization
The approach of Geyer-Schulz et al [25]
Main idea:
) Recommendations are based on correlated products.
) Correlations can be identi�ed with Ehrenberg's repeat-buying theory,
) after adjusting it to the particularities of anonymous user sessions.
According to our categorization:
Visibility: Recommendation of in-
formation products
Service element: application
object or URL
Matching based on: user prefer-
ences for application objects
O�-line discovery of correlated
application objects
PKDD 2001 Tutorial: �KDD for Personalization� [PD-9]
Ehrenberg's repeat-buying theory
� predicts buyer behaviour from (a) penetration and (b) average
purchase frequency of an item
� by providing a reference model that characterizes repeated
co-occuring purchases of items as random or not random
where
penetration refers to the preference of a customer for a brand
average purchase frequency refers to repeated purchases of the
item, ignoring characteristics of the item, amount of the item and
size of the purchase as a whole.
PKDD 2001 Tutorial: �KDD for Personalization� [PD-10]
Assumptions of [25]:
� The probability of r co-occurences of two products in subsequent
purchases follows a logarithmic series distribution.
� Subsequent purchases of the same customer(s) can be observed as
equivalent to a set of purchase sessions during the log period.
Methodology:
� Computation of the frequency distributions of all co-occurences of
product pairs, counting one co-occurence per session only
� Elimination of distributions with a small number of observations
� Elimination of the � percentil of the high repeat-buy pairs
� Computation of the co-occurence predictor for each pair
so that outliers for each predictor can be observed as correlated items.
PKDD 2001 Tutorial: �KDD for Personalization� [PD-11]
[1]PKDD 2001 Tutorial: “KDD for Personalization”
Basic Idea: match left-hand side of rules with the active usersession and recommend items in the rule’s consequent
Essential to store patterns in efficient data structures
• the search of all rules in real-time is computationallyineffective
Ordering of accessed pages is not taken into account
Good recommendation accuracy, but the main problem is“coverage”
• high support thresholds lead to low coverage and mayeliminate important, but infrequent items from consideration
• low support thresholds result in very large model sizes andcomputationally expensive pattern discovery phase
Pattern Discovery Association mining for personalization
[PD-12]
[2]PKDD 2001 Tutorial: “KDD for Personalization”
Association Mining - Basic Concepts
We start with a set I of items and a set D of transactions.A transaction T is a set of items (a subset of I):
An Association Rule is an implication on itemsets X and Y,denoted by X ==> Y, where
The rule meets a minimum confidence of c, meaning thatc% of transactions in D which contain X also contain Y. Inaddition for each itemset a minimum support of s must besatisfied:
, ,X I Y I X Y⊆ ⊆ ∩ =∅
IT ⊆},...,,{ 21 miiiI =
/c X Y X≤ ∪/s X Y D≤ ∪
[PD-13]
Pattern Discovery Associated/Dissociated items and users
The approach of Lin, Alvarez & Ruiz [37]
Main idea:
) Users are associated to each other in terms of how they rate items.
) Items are associated to each other with respect to user preferences.
Associations among items can be found o�-line.
Associations to the active user can be found on-line.
According to our categorization:
Visibility:
Personal recommendation
Service element: application object
Matching based on: associations
among items and among users
On-line discovery of assoc.
rules with given RHS
PKDD 2001 Tutorial: �KDD for Personalization� [PD-14]
Methodology:
� Recommendations are subject to minimum con�dence and minimum
number of rules constraints.
� The miner discovers association rules iteratively, until the desired
number of rules is extracted.
The support cuto� is adjusted in each iteration.
� Rules concern both items and users:[User1:like] AND [User2:dislike]) [TargetUser:like]
[Item1:like] AND [Item2:like] ) [TargetItem:like]
� Candidate items are computed from associations involving users
similar to the active user. on-line
� Scores of items are computed from associations re�ecting user
preferences. o�-line
� The candidate items with highest scores are suggested to the active
user. on-line
PKDD 2001 Tutorial: �KDD for Personalization� [PD-15]
[3]PKDD 2001 Tutorial: “KDD for Personalization”
Main Idea: avoid offline generation of all association rules;generate recommendations directly from itemsets
• discovered frequent itemsets of are stored into an “itemsetgraph” (an extension of lexicographic tree structure ofAgrawal, et al 1999 [2])
• recommendation generation can be done in constant timeby doing a directed search to a limited depth
Pattern Discovery Association mining for personalization
The approach of Mobasher, et al, 2001 [45]
According to our categorization
Visibility: Personal recommenda-tions or silent dynamic adjustment
Service element: pageview
Matching based on: user behaviour
[PD-16]
[4]PKDD 2001 Tutorial: “KDD for Personalization”
Methodology:
• Construct Frequent Itemset Graph
– each node at depth d in the graph corresponds to anitemset
– I, of size d and is linked to itemsets of size d+1 thatcontain I at level d+1
– the single root node at level 0 corresponds to the emptyitemset
• frequent itemsets are matched against a user's activesession S by performing a search of graph to depth |S|
• a recommendation r is an item at level |S+1| whoserecommendation score is the confidence of rule S ==> r
[PD-17]
[5]PKDD 2001 Tutorial: “KDD for Personalization”
Pattern Discovery Sequence mining for personalization
Main Idea: take the ordering of accessed items into account
Two basic approaches
• use contiguous sequences (e.g., Web navigational patterns)
• use general sequential patterns
Contiguous sequential patterns are often modeled asMarkov chains and used for prefetching (i.e., predictingthe next user access based on previously accessed pages
In context of recommendations, they can achieve higheraccuracy than other methods, but may be difficult to obtainreasonable coverage
[PD-18]
[6]PKDD 2001 Tutorial: “KDD for Personalization”
Markov chain representation often leads to high spacecomplexity due to model sizes
similar to support pruning, used to focus only on significantnavigational paths
• increased coverage can be achieved by using all-Kth-ordermodels (i.e., using all possible sizes for user histories)
Pattern Discovery Sequence mining for personalization
[PD-19]
Pattern Discovery Sequence mining for personalization
The approach of Gaul & Schmidt-Thieme [24]
Main idea:
) Recommendations are based on frequent patterns of past behaviour.
) A recommender is a predictor for a class of events.
) The constellation of the recommenders for all classes returns the
best recommendations for a given user history.
According to our categorization:
Visibility:
Recommendation
Service element: URLs, site objects
Matching based on: navigation
histories and URL proximity
O�-line training of classi�ers :=
local recommender systems
PKDD 2001 Tutorial: �KDD for Personalization� [PD-20]
A generic framework:
� with measures for the quality of a recommendation, taking the
distance between candidate URLs into account
� distinguishing between dynamic and static recommenders that
do/do not take user histories into account
� combining local recommender systems, each of which predicts a
class of events
where a class can be one user history, a group of histories or the whole
dataset.
Thereby, a navigation history is
� a set of events
� a sequence of events
� a more complex structure of co-occuring events
PKDD 2001 Tutorial: �KDD for Personalization� [PD-21]
Pattern Discovery Usage pro�les for personalization
The approach of Mobasher et al [43, 42]
Two types of usage pro�les:
Clusters of similar user transactions en-
hanced by a weighting scheme that removes
pages with support less than a mean value
Clusters of pages accessed
together
aggregating the members of each cluster into one representative pro�le
According to our categorization:
Visibility: Personal recommenda-
tion or silent dynamic adjustment
Service element: pageview
Matching based on: user behaviour
Also: page content in [44]
O�-line discovery of
aggregate pro�les
PKDD 2001 Tutorial: �KDD for Personalization� [PD-22]
Aims:) achieve similar performance to on-line collaborative �ltering
) using a minimal number of pageviews for the active user
Methodology:
� Preprocessing phase
� Assignment of weights to the pageviews
� Signi�cance testing, based on page stay time
� Normalization of pageview weights
� PACT: Pro�le Aggregation based on Clustering Techniques
1. Clustering of usage data to establish the aggregate pro�les
2. Materialization of the pro�les as vectors of (page,weight) pairs
3. Scan of the user's history by means of a sliding window that
allows only a set of page accesses to be considered in the pro�le
4. Matching the user session with each pro�le
5. Match ranking
PKDD 2001 Tutorial: �KDD for Personalization� [PD-23]
[7]PKDD 2001 Tutorial: “KDD for Personalization”
A Framework for Personalization Based onAggregate Profiles
Offline Phase
[PD-24]
[8]PKDD 2001 Tutorial: “KDD for Personalization”
Input from thebatch process
Usage Profiles
Content Profiles
• Match current user’s activity against the discovered profiles
• Each recommended item is assigned a score based on
– matching criteria and quality of aggregate profiles
– “information value” of the item based on domain knowledge
OnlinePhase
A Framework for Personalization Based onAggregate Profiles
[PD-25]
[9]PKDD 2001 Tutorial: “KDD for Personalization”
Aggregate Profiles Based on ClusteringTransactions (PACT) (Mobasher, et al, [42, 43])
• Input
– set of relevant pageviews in preprocessed log
– set of user transactions
– each transaction is a pageview vector
1 2{ , , , }
nP p p p= !
1 2{ , , , }mT t t t= !
1 2( , ), ( , ),..., ( , )nt w p t w p t w p t=
[PD-26]
[10]PKDD 2001 Tutorial: “KDD for Personalization”
Aggregate Profiles Based on ClusteringTransactions (PACT)
• Transaction Clusters
– each cluster contains a set of transaction vectors
– for each cluster compute centroid as clusterrepresentative
• Aggregate Usage Profiles
– a set of pageview-weight pairs: for transaction clusterC, select each pageview pi such that (in the clustercentroid) is greater than a pre-specified threshold
1 2, , ,c c cnc u u u=" !
ciu
[PD-27]
[11]PKDD 2001 Tutorial: “KDD for Personalization”
1.00 Call for Papers0.67 ACR News Special Topics0.67 CFP: Journal of Psychology and Marketing I0.67 CFP: Journal of Psychology and Marketing II0.67 CFP: Journal of Consumer Psychology II0.67 CFP: Journal of Consumer Psychology I
1.00 Call for Papers0.67 ACR News Special Topics0.67 CFP: Journal of Psychology and Marketing I0.67 CFP: Journal of Psychology and Marketing II0.67 CFP: Journal of Consumer Psychology II0.67 CFP: Journal of Consumer Psychology I
Weight Pageview ID Significant Features (stems)1.00 CFP: One World One Market world challeng busi co manag global0.63 CFP: Int'l Conf. on Marketing & Development challeng co contact develop intern0.35 CFP: Journal of Global Marketing busi global0.32 CFP: Journal of Consumer Psychology busi manag global
Weight Pageview ID Significant Features (stems)1.00 CFP: Journal of Psych. & Marketing psychologi consum special market1.00 CFP: Journal of Consumer Psychology I psychologi journal consum special market0.72 CFP: Journal of Global Marketing journal special market0.61 CFP: Journal of Consumer Psychology II psychologi journal consum special0.50 CFP: Society for Consumer Psychology psychologi consum special0.50 CFP: Conf. on Gender, Market., Consumer Behaviorjournal consum market
Integration of Content Profiles(Mobasher, et al., 2000 [44])
• Cluster features over the n-dimensional space of pageviews
• For each feature cluster derive a content profile bycollecting pageviews in which these features appear assignificant (represented as overlapping collections ofpageview-weight pairs)
[PD-36]
[20]PKDD 2001 Tutorial: “KDD for Personalization”
Integration of Content Profiles
• Integration with Recommendation Engine
– Usage and content profiles have similar representation,so they can be used by the recommendation engine inthe same way
• Item weights in profiles must be normalized, so contentand usage profiles can be compared on the same scale
– One approach: match active user session with allprofiles (both content and usage); then use the maximalrecommendation score for candidate recommendations
– Another approach: use content profiles for generatingrecommendations only if no matching usage profiles(with sufficient confidence) is found
[PD-37]
PKDD 2001 Tutorial: “KDD for Personalization”
Evaluating Personalization
[E-1]
A Web site’s usability is high if users
- experience high subjective satisfaction.
- achieve their goals / perform their tasks in little time,- do so with a low error rate,
PKDD 2001 Tutorial: "KDD for Personalization"
Evaluating usability: goals / tasks?
Recall operational definition:
Depending on the site, relevant goals / tasks may be to:
- ...
- stay in the site, return to the site, buy... => E-metrics- locate content (search),- learn,
[E-2]
PKDD 2001 Tutorial: "KDD for Personalization"
Evaluating usability: methodological caveats
- many uncontrolled variables (e.g., user intentions)
=> causal attribution of success to personalization becomes difficult
- poss. several differences between sites/site versions
Comparisons of sites with/without personalization,or before/after personalization introduced,with respect to "normal user behavior" (server logs):
usually a quasi-experiment
Questionnaire data:
observation of behavior in experiments advisableself-reports are often biased;
[E-3]
- 81% of 694 respondents have visited a person. site
- 64% of those found it useful: helpful, time saving
- perceived usefulness changes with product
- main problems: privacy, ineffectiveness when behav. (books > music > inf.technol. > news/articles > other)
did not reflect user "personally" (e.g., buying a gift) - concern that possible choices may be limited
- little differences of opinion between personalization occurring in response to behavior or to solicited input
PKDD 2001 Tutorial: "KDD for Personalization"
Evaluating usability: results I
CyberBehavior Research Center 1999 survey
[E-4]
increased by the user of its suggestions:
selection to a system they ed
increased of how it worked, and with
PKDD 2001 Tutorial: "KDD for Personalization"
Evaluating usability: results II
in IR systems carried out at Rutgers Univ. since 1995:
- measures of performance and subj. satisfaction
- relevance feedback worked well, but bettter with both
- relevance feedback + term suggestion performed better than, and was preferred to, pure relevance feedback- users preferred to save effort: were willing to hand over the subsidiary task of term
knowledgecontrol
trust
Belkin [3], reviewing studies of recommendations
[E-5]
PKDD 2001 Tutorial: "KDD for Personalization"
Evaluating usability: results III
Nielsen Net Ratings 1999
registered visitors of portal sites,
- spend > 3 times longer at home portal than others
- view 3-4 times more pages
i.e., those who can customize,
[E-6]
PKDD 2001 Tutorial: "KDD for Personalization"
Why are results scarce? Possible reasons
Mainspring and User Interface Engineering
"Web personalization is much over-rated and mainly used asa poor excuse for not designing a navigable website."
"Personalization costs. ... You’re more likely to get a good return on your efforts ... by fixing other problems, such as difficulty in locating content."
few web designers can afford to subject theirweb sites to formal usability testing in special labs."
"In essence, web design is a problem in user interface design.However, ...
Perkowitz & Etzioni [52]: Adaptive web sites: an AI challenge.
Nielsen [47]: Personalization is over-rated.
Lighthouse on the Web [36], quoting from
[E-7]
- adaptive link annotation
- encourages novices to navigate non-sequentially - enables users to rate the difficulty of a page better
- can reduce no. of visited pages + learning time
- adaptive presentation (more info depending on user knowledge) improves comprehension and reduces reading time
PKDD 2001 Tutorial: "KDD for Personalization"
Can other results be transferred?
- usually, user control helpful for learning; adaptive interfaces particularly helpful for novices
Research on adaptive educational software since ~ 1970
- interfaces changing over time: difficult to learn
[E-8]
- but unstable order of options is confusing for novices so hiding is better for novices
- for novices, direct guidance is useful ("next" link is most popular choice)
- the more users agree with the system’s suggestions, the better their test results
PKDD 2001 Tutorial: "KDD for Personalization"
Can other results be transferred? (contd.)
- adaptive link ordering improves user performance in information search tasks
(surveys in [11,12])
[E-9]
PKDD 2001 Tutorial: "KDD for Personalization"
Further factors affecting subjective satisfaction
- users don’t like to be recognized too soon
- users want to be anonymous, at least at certain times
- users want openness / disclosure
- must match user’s interests at the moment
- users don’t want extra work: "paradox of the active user"
- people don’t want relationships with corporations,
- be specific without being exclusive but with other people
(non-monetary rewards better than differential pricing)- consider information structure on Web
respect the user !
user control- (general guideline for software development)
PKDD 2001 Tutorial: �KDD for Personalization� [E-25]
The methodology of [58] to measure and improve contact and
conversion e�ciency of pages:
I. Speci�cation of the action and target pages as abstract concepts in
a service-based concept hierarchy
II. Discovery of frequent navigation patterns involving action pages
III. Discovery of frequent (and less frequent) patterns leading to target
pages
IV. Pattern visualization to identify the pages, at which the con�dence
drops (the users abandon the path)
PKDD 2001 Tutorial: �KDD for Personalization� [E-26]
Business Success From hits to customers
Micro-conversion rates by Lee et al [34]
Four steps until the purchase of a product:
1) Product impression: Seeing the hyperlink leading to a product
2) Click through: Following the link to the product
3) Basket placement: Selecting the product for purchase
4) Purchasing the product
and metrics for them:
1) product impression �
2) click through look-to-click rate
3) basket placement click-to-basket rate
4) purchase basket-to-buy rate
look-to-buy rate
PKDD 2001 Tutorial: �KDD for Personalization� [E-27]
The methodology of [34] to monitor site e�ectiveness:
I. Identi�cation of three aspects of a site for web merchandizing:
Merchandizing cues: Techniques for presenting and grouping
products to motivate purchases
Shopping metaphors: Means o�ered to the shoppers for �nding
products of interest
Web design features: Site layout
II. Problem decomposition:
1. Classifying hyperlinks by their merchandizing purposes
2. Measuring and analysing tra�c across those hyperlinks
3. Attributing the e�ectiveness of each hyperlink to merchandizing
cues, shopping metaphor or design features
using a visualization technique based on star�eld displays
PKDD 2001 Tutorial: �KDD for Personalization� [E-28]
Business Success Customer Loyalty
Loyalty is more than site re-visitation.
It relates to new purchases and their
� Recency
� Frequency
� Monetary value
Loyalty contributes to the customer's lifetime value .
PKDD 2001 Tutorial: �KDD for Personalization� [E-29]
Business Success Customer loyalty
Customer involvement by J. Lee et al [33]
Factors a�ecting customer loyalty:
� Trust
� Transaction costs
which in turn are a�ected by:
� Comprehensive information that su�ces for a purchase decision
� Shared value in the form of common beliefs among customers
� Communication among customers and store
� Uncertainty on the product quality
� Speci�city of the store
� Number of competitors
PKDD 2001 Tutorial: �KDD for Personalization� [E-30]
Hypotheses:
+ Comprehensive information, shared value and communication a�ect
trust positively.
+ Trust has a positive impact on customer loyalty.
� Transaction costs have negative impact on customer loyalty.
� Trust reduces transaction costs.
+ Uncertainty and number of competitors increase transaction costs.
� Speci�city a�ects transaction costs negatively.
and after the �rst set of experiments:
+ Speci�city has a positive impact on trust.
being tested with Questionnaires
PKDD 2001 Tutorial: �KDD for Personalization� [E-31]
The distinction between low and high involvement groups showed that:
HIGH Group:
� Speci�city has no impact on trust.
+ Speci�city a�ects transaction costs negatively.
LOW group:
+ Uncertainty has a positive impact on trust.
� The number of competitors decreases transaction costs.
+ Shared value has a positive impact on trust.
indicating that the factors a�ecting customer loyalty (site re-visits) di�er
among the two groups.
PKDD 2001 Tutorial: �KDD for Personalization� [E-32]
Business Success From hits to loyal customers
The e-metrics of NetGenesis [16]
Factors:
What should be disse-
minated by the mea-
sures?
Framework:
What is the basis
of the analysis?
Formulae:
What should be mea-
sured and how?
as result of an interview-based study with 20 successful e-companies [16]
PKDD 2001 Tutorial: �KDD for Personalization� [E-33]
Business Success From hits to loyal customers
e-Metrics Factors [16]
When measuring site (and business) success, marketeers consider:
� Awareness
� Acquisition vs Abandonment
� Conversion vs Attrition
� Retention vs Churn
PKDD 2001 Tutorial: �KDD for Personalization� [E-34]
Business Success From hits to loyal customers
e-Metrics Framework [16]
There is no agreed upon de�nition of most factors
� Is the base of the analysis a user, a session, a page request, a page
impression or a hit?
� What is a session?
� When does a user becomes a customer?
� When is a customer assumed to have attrited?
� How is loyalty de�ned?
Thesis: A company-internal de�nition is necessary.
PKDD 2001 Tutorial: �KDD for Personalization� [E-35]
Business Success From hits to loyal customers
e-Metrics Framework [16]
Example: The behaviour of a loyal customer in terms of
� visit duration
� number of visits during a period of time
� pages visited each time
is fundamentally di�erent for
� customers that make purchases in a retail store
� customers that plan a major purchase, e.g. of a con�gurable product
(contract, car)
� cooperation partners in a B2B setting
PKDD 2001 Tutorial: �KDD for Personalization� [E-36]
Business Success From hits to loyal customers
e-Metrics Formulae [16]
A large set of metrics is proposed, including
� stickiness
� slipperiness
� focus
of parts of a site.
The identi�cation and monitoring of
� optimal paths
is further suggested.
PKDD 2001 Tutorial: �KDD for Personalization� [E-37]
Business Success From hits to loyal customers
e-Metrics Formulae [16]
Implications:
For parts of a site:
� stickiness
� slipperiness
� focus
Monitoring of
� optimal paths
=)
) The notion of site-part must be
properly de�ned and disseminated to
the data analysis software.
) The monitoring of optimal paths
must be implemented somehow.
) The impact of the site structure
must be understood and made ex-
plicit.
PKDD 2001 Tutorial: �KDD for Personalization� [E-38]
Business success and the role of KDD
1. Where should success metrics be applied upon ?
� The whole population User populations are rarely uniform.
� Each user/customer Scalability might be an issue, speed also.
� Each group of users/customers
It is essential to distinguish among user/customer groups, e.g. in terms of
� experience
� interests
� demographics
� behaviour
and lifecycle value
PKDD 2001 Tutorial: �KDD for Personalization� [E-39]
Business success and the role of KDD
2. How should the metrics be computed?
� Mapping of statistical measures (accuracy, intercluster distance,
con�dence, support) on business measures
� Incorporation of computation prerequisites into the mining core ,!
� Impact of the site structure
PKDD 2001 Tutorial: �KDD for Personalization� [E-40]
,! Mine the gap !
� A user contacts a site in a sequence of sessions.
� The time inside a session plays a role.
� The elapsed time among sessions plays a role.
� The volatility of Web and population plays a role.
� eCRM observes both the individual sessions of a user and the whole
lifecycle of the user.
� Both must be supported in a seamless way.
� The associated information must be integrated and exploited.
� The web-site structure a�ects everything.
� Ordering and repetition are important.
� If optimal paths are speci�ed, suboptimal ones must be
quanti�ed and monitored.
� The behavioural patterns are a�ected by the site structure.
PKDD 2001 Tutorial: �KDD for Personalization� [E-41]
PKDD 2001 Tutorial: “KDD for Personalization”
Personalization and Privacy
[P-1]
- limits on the government’s power to interfere with personal decisions- physical privacy: limits on others’ abilitiy to learn things about a person by accessing their property- information privacy: the "right to control information about ourselves"
PKDD 2001 Tutorial: "KDD for Personalization"
Personalization and privacy
What is privacy?
"The right to be let alone." Warren & Brandeis [65]
includes
[P-2]
PKDD 2001 Tutorial: "KDD for Personalization"
Personalization and privacy
Why is privacy a central concern for personalization?
(1) Adapting to a person requires data on that person
"The Internet industry is built on trust betweenbusinesses and their customers - and privacy isthe number one ingredient in trust."
(3) The commercial side:
(2) The legal side: not all data may be collected/used
TrustE: How does Online Privacy Impact Your Bottom Line? [62]
[P-3]
PKDD 2001 Tutorial: "KDD for Personalization"
What are the dangers to privacy?
basic:
unethical practices:
technical:
security:
data are corrupted during entry, transfer, or storage
data are intercepted
data are used for novel purposes, sold to third parties, ...
data are correctly and legally transferred and stored,but embody "knowledge about a person"-> exacerbated by user ignorance(cf. widespread confusion or ignorance about what a cookie is: Ackerman, Cranor, & Reagle [1])
[P-4]
PKDD 2001 Tutorial: "KDD for Personalization"
What data are transmitted during Web usage?
transferred by the browser
IP addressdomain name (-> organization)
platform: browser type and versionreferrer address
query strings, form fill-ins
other technologies
cookiesglobally unique identifiersweb bugs
[P-5]
PKDD 2001 Tutorial: "KDD for Personalization"
User concerns about privacy
User concerns about privacy vary
- with respect to their severity: e.g., 27% marginally concerned, 56% pragmatic majority, 17% privacy fundamentalists
- the kind of data, e.g., credit card no. -> ... -> name -> ... -> email address -> ... -> favorite TV show:
(Spiekermann, Grossklags, & Berendt [57])
- depending on whether personal identity or profiling information is disclosed
(Ackerman et al. [1])
(Ackerman et al. [1])
[P-6]
US main stance
- avoid the generation of data ("data parsimony")- try to protect generated data
How to protect privacy I: general
PKDD 2001 Tutorial: "KDD for Personalization"
How to protect privacy II: agents and methods
German/European main stance
- state / law
- users / technology
self-governance
- parties to the transaction / market
[P-7]
PKDD 2001 Tutorial: "KDD for Personalization"
What to protect: Data in relation to persons(personally identifiable data)
"Jane Doe plays football."
"The person is a male American famoustennis player, and will soon marrya famous German tennis player."
person-related data
person-relatable data
Note: IP addresses at least person-relatable!
[P-8]
PKDD 2001 Tutorial: "KDD for Personalization"
(German laws, EU directive 95/46/EC)
German / EU legal basics I
informed consent:anything that is not explicitly allowed is forbidden(the greater the risk, the more detail must be explained)
whowhat forhow much
person-related data may only be collected with the
usage that deviates from any of these 3 is illegal
informed consent (opt-in!) about
rights against the state -> rights against other private parties
- : who collects the data- : for what purpose- : quality and amount necessary for purpose
[P-9]
analysis / research:
- aggregate into groups >= 10- if necessary, original data can be stored by a trustee
- person-related data must be anonymized s.t. it cannot be related back to the person
PKDD 2001 Tutorial: "KDD for Personalization"
German / EU legal basics II
[P-10]
PKDD 2001 Tutorial: "KDD for Personalization"
Implications for personalization
+ = legal, - = illegal, ? = controversial
- using results to send unsolicited snail/e-mail ? cookies: web site must also function without cookies;
? P3P: is the delegation of my privacy preferences to a computer program still an expression of my human will?
+ analyzing non-person-relatable web usage data+ using results to personalize a web page based on the
- using results to personalize based on past sessions
problematic if user unaware of cookie setting
current user’s current session
Privacy statements must be opt-in (cf. software licence agreements: "I agree")
German / EU legal basics III:
[P-11]
EU - US: Safe Harbor Principles (July 2000)- American enterprises that collect + process data from EU voluntarily subject themselves to principles that correspond to EU standard- FTC control
PKDD 2001 Tutorial: "KDD for Personalization"
Further rights under EU directive 95/46/EC
- individuals can inspect and correct their data, and they can disallow usage
data protection- independent institutions overlook data protection in each member country
- no data transfer to countries with inadequate
[P-12]
- government must not reveal medical histories etc.- government must not reveal certain information: Privacy Act, Driver’s Privacy Protection Act, ...- bars on third parties: video stores, lawyers, doctors, ...,
-> apply only to a narrow range of revelations "disclosure of private facts" tort
PKDD 2001 Tutorial: "KDD for Personalization"
- 4th Amendment: limits government’s power to search people, their homes, and their papers; trespass laws, ...
Third parties:
-> applies only to parties to a contract.
conflict information privacy - freedom of speech?
Information privacy gets protection from law of contract
US legal basics
Volokh [64]
[P-13]
PKDD 2001 Tutorial: "KDD for Personalization"
Self-governance: privacy seals
US privacy seals: TRUSTe, BBBOnline, CPA Web Trustwww.truste.org, www.bbb-online.org, www.cpawebtrust.org
evolving technologies and business models."
informed marketplace
not act alone; rather, it must work in concert with existing lawsand develop best practices. Self-governance relies on an
practices and the opportunity to exercise choice about how
existing laws and assuring that industry continue to work toward
development of self-governance to assure it remains true to itsunderlying principles and goals and meets the challenges of
"Unlike self-regulation, self-governance requires that
that demands disclosure of privacy
ubiquitous adoption of best practices. Media and advocacy groupsact as a collective conscience by scrutinizing the
industry
Governmentinformation is used. must fulfill its role by enforcing
TRUSTe Online Privacy Resource Book [63]
[P-14]
- third party audit - refer case to government authorities, usually FTC"Companies acting outside the bounds of the TRUSTe licenseagreement may be in breach of contract and be subject torevocation of the TRUSTe seal. This may be the most powerful [TRUSTe] tool , because of ... public relations consequences ..."
PKDD 2001 Tutorial: "KDD for Personalization"
Self-governance: How does TRUSTe work?
- contract signed between TRUSTe and the Web site- allows TRUSTe to address users’ privacy concerns regardless of their citizenship or the TRUSTe licensee- users can bring their complaints to TRUSTe Watchdog- Web site is required to respond quickly, TRUSTe can begin to mediate a resolution - change in company practice, or in posted policy
- an initiative of the World Wide Web Consortium (W3C) in conjunction with many industry partners including Microsoft
- P3P allows the user agent to warn the user, or block communication altogether, if a selected Web site’s privacy policy does not comply with user preferences
- P3P enables Web sites to express their privacy practices in a standard format that can be retrieved automatically and interpreted easily by user agents
- 35-50% of questions were non-legitimate / irrelevant- still, 54% of participants answered at least 98% of the questions, although they had previously agreed to the sale und further usage of their data
In an experimental online store,agent Luciposed 56 questions in a sales dialogue.
PKDD 2001 Tutorial: "KDD for Personalization"
Problem "soft" interaction, communication flow
(Spiekermann et al. [57])
[P-19]
Q categories
peip
pepr
u
pd
top 10
product info
more product info
prod.inf./purch.opt.
purchase
PKDD 2001 Tutorial: "KDD for Personalization"
Communication flow and "obedient" answering
Examplequestions
Do you consider yourself photogenic?
How important are trend models to you?
When do you usually take photos?
What zoom do you want?
(Berendt [5])
[P-20]
PKDD 2001 Tutorial: “KDD for Personalization”
Conclusions
[C-1]
- what are the relevant criteria of evaluation? how can they be combined?
PKDD 2001 Tutorial: "KDD for Personalization"
Conclusions
powerful methods and software for personalization available,
but many questions remain, including:
... if there are not enough other users
... if that user is judged as an "uninteresting case"
- but: user reveals information, may not get a good return
- recommendations welcomed by users
- privacy concerns:
=> often, more data are collected than put to good use
[C-2]
more explicit user modeling
- integration with other data easier (XML etc.) - involve the user in diagnosis, provide for opt-out / opt-in
PKDD 2001 Tutorial: "KDD for Personalization"
(Some) future directions
changing roles of participants:
"opt-in with incentives": permission marketing
anonymity, pseudonymity, and personalization
- computers: knowledge organization and representation (-> personalization + information architecture design)- users interact more strongly with one another
- service providers offer "real" personal assistants (Web communities)
[C-3]
PKDD 2001 Tutorial: “KDD for Personalization”
References
[R-1]
References
[1] Ackerman, M.S., Cranor, L.F., and J. Reagle. Privacy in E-Commerce: Examining user scenarios andprivacy preferences. In Proceedings of the ACM Conference on Electronic Commerce. see alsohttp://www.research.att.com/library/trs/TRs/99/99.4/
[2] R. Agarwal, C. Aggarwal, and V. Prasad. A tree projection algorithm for generation of frequent itemsets. InProceedings of the High Performance Data Mining Workshop, Puerto Rico, 1999.
[3] Belkin, N.J. (2000). Helping people �nd what they don't know. Communications of the ACM, 43 (8), 58�61.
[4] Belkin, N.J., Cool, C., Head, J., Jeng, J., Kelly, D., Lin, S.J., Lobash, L., Park, S.Y., Savage-Knepshield,P., and Sikora, C. (2000). Relevance feedback versus local context analysis as term suggestion devices. InProceedings of the Eighth Text Retrieval Conference TREC8. Washington, D.C.
[5] Berendt, B. (2001). Understanding web usage at di�erent levels of abstraction: coarsening and visualisingsequences. In Working Notes of the Workshop �WEBKDD 2001 � Mining Log DAta Across All Customer
Touchpoints�, 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. SanFrancisco, CA, August.
[6] Berendt, B. (2000). Web usage mining, site semantics, and the support of navigation. In Working Notes ofthe Workshop �Web Mining for E-Commerce � Challenges and Opportunities.� 6th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining. (pp. 83�93). Boston, MA, August.
[7] B. Berent, B. Mobasher, M. Spiliopoulou, and J. Wiltshire. Measuring the accuracy of sessionizers for Webusage analysis. In Proceedings of the Web Mining Workshop at the First SIAM International Conference onData Mining, Chicago, 2001.
[8] Berendt, B. and Spiliopoulou, M. (2000). Analysis of navigation behaviour in web sites integrating multipleinformation systems. The VLDB Journal, 9, 56�75.
[9] P. Berthon, L. F. Pitt, and R. T. Watson. The world wide web as an advertising medium. Journal ofAdvertising Research, 36(1):43�54, 1996.
PKDD 2001 Tutorial: �KDD for Personalization� [R-2]
References
[10] Brusilovsky, P. (1996). Methods and techniques of adaptive hypermedia. User Modeling and User-Adapted
Interaction, 6, 87�129.
[11] Brusilovsky, P. (1997). E�cient techniques for adaptive hypermedia. In C. Nicholas and J. May�eld (Eds.),Intelligent hypertext: Advanced techniques for the World Wide Web, Berlin: Springer. 12�30.
[12] Brusilovsky, P. and Eklund, J. (1998). A study of user model based link annotation in educationalhypermedia. Journal of Universal Computer Science, 4, 429�448.
[13] Carroll, J.M. and Rosson, M.B. (1987). The paradox of the active user. In J.M. Carroll (Ed.), InterfacingThought: Cognitive Aspects of Human-Computer Interaction. Cambridge, MA: MIT Press.
[14] Cooley, R. (2000). Web Usage Mining: Discovery and Application of Interesting Patterns from Web Data.
University of Minnesota, Faculty of the Graduate School: Ph.D. dissertation.http://www.cs.umn.edu/research/websift/papers/rwc_thesis.ps
[15] Robert Cooley, Bamshad Mobasher, and Jaidep Srivastava. Data preparation for mining world wide webbrowsing patterns. Journal of Knowledge and Information Systems, 1(1), 1999.
[16] M. Cutler and J. Sterne. E-metrics � business metrics for the new economy. Technical report, NetGenesisCorp., http://www.netgen.com/emetrics, 2000. access date: July 22, 2001.
[17] M. Deshpande and G. Karypis. Selective Markov models for predicting Web-page accesses. TechnicalReport #00-056, University of Minessota, 2000.
[18] Dimitrova, V., Self, J., and Brna, P. (2000). Involving the learner in diagnosis � potentials and problems. InWeb Information Technologies: Research, Education and Commerce. Montpellier, France, May.
[19] Directive 95/46/EC of the European Parliament and the Council of 24 October 1995 on the protection ofindividuals with regard to the processing of personal data and on the free movement of such data.http://europa.eu.int/comm/internal_market/en/media/dataprot/law/index.htm
[20] X. Drèze and F. Zufryden. Testing web site design and promotional content. Journal of AdvertisingResearch, 37(2):77�91, 1997.
PKDD 2001 Tutorial: �KDD for Personalization� [R-3]
References
[21] J. Eighmey. Pro�ling user responses to commercial web sites. Journal of Advertising Research, 37(2):59�66,May-June 1997.
[22] X. Fu, J. Budzik, and K. J. Hammond. Mining navigation history for recommendation. In Proc. 2000International Conference on Intelligent User Interfaces, New Orleans, 2000.
[23] Gar�nkel, S. (2000). Database Nation. The Death of Privacy in the 21st Century. Sebastopol, CA: O'Reilly.
[24] W. Gaul and L. Schmidt-Thieme. Recommender systesms based on navigation path features. In [29], SanFransisco, CA, Aug. 2001. ACM.
[25] A. Geyer-Schulz, M. Hahsler, and M. Jahn. A customer purchase incidence model applied to recommendersystems. In [29], San Fransisco, CA, Aug. 2001. ACM.
[26] E-H. Han, G. Karypis, V. Kumar and B. Mobasher. Hypergraph Based Clustering in High-Dimensional DataSets: A Summary of Results. IEEE Bulletin of the Technical Committee on Data Engineering, (21) 1, 1998.
[27] T. Joachims, D. Freitag, and T. Mitchell. Webwatcher: A Tour Guide for the World Wide Web. InProceedings of the 15th International Conference on Arti�cial Intelligence, Nagoya, Japan, 1997.
[28] Kobsa, A., J. Koenemann and W. Pohl (2001). Personalized hypermedia presentation techniques forimproving online customer relationships. To appear in The Knowledge Engineering Review.http://www.ics.uci.edu/ kobsa/papers/2001-KER-kobsa.pdf
[29] R. Kohavi, B. Masand, M. Spiliopoulou, and J. Srivastava, editors. KDD'2001 Workshop WEBKDD'2001,San Fransisco, CA, Aug. 2000. ACM.
[30] R. Kohavi, M. Spiliopoulou, and J. Srivastava, editors. KDD'2000 Workshop WEBKDD'2000 on WebMining for E-Commerce � Challenges and Opportunities, Boston, MA, Aug. 2000. ACM.
[31] Kotwica, K. (1999). Survey: Website Personalization. Cyber Behavior Research Center.http://www.cio.com/forums/behavior/edit/survey7.html.
[32] R. Kuhlen. Informationsmarkt: Chancen und Risiken der Kommerzialisierung von Wissen. 2 edition, 1996.
PKDD 2001 Tutorial: �KDD for Personalization� [R-4]
References
[33] J. Lee, J. Kim, and J. Y. Moon. What makes internet users visit cyber stores again? key design factors forcustomer loyalty. In Proc. CHI'2000, pages 305�312, The Hague, NL, 2000. ACM.
[34] J. Lee, M. Podlaseck, E. Schonberg, R. Hoch, and S. Gomory. Analysis and visualization of metrics foronline merchandizing. In [39], pages 123�138. 2000.
[35] H. Lieberman. Letizia: An Agent that Assists Web Browsing. In Proceedings of the 1995 InternationalJoint Conference on Arti�cial Intelligence, Montreal, Canada, 1995.
[36] Lighthouse on the Web. (2000). Personalization goes one-on-one with reality.http://www.shorewalker.com/hype/hype60.html.
[37] C. R. W. Lin, S. A. Alvarez, and C. Ruiz. Collaborative recommendation via adaptive association rulemining. In [30], 2000.
[38] B. Liu, W. Hsu, and Y. Ma. Association rules with multiple minimum supports. In Proceedings of the ACMSIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-99, poster), San Diego,CA, 1999.
[39] B. Masand and M. Spiliopoulou, editors. Advances in Web Usage Mining and User Pro�ling: Proceedings ofthe WEBKDD'99 Workshop, LNAI 1836. Springer Verlag, July 2000.
[40] Mobasher, B., Cooley, R., and Srivastava, J. (2000). Automatic personalization based on web usage mining.Communications of the ACM, 43(8), 142�151.
[41] B. Mobasher, R. Cooley, and J. Srivastava. Creating adaptive web sites through usage-based clustering ofURLs. In IEEE Knowledge and Data Engineering Workshop (KDEX'99), 1999.
[42] B. Mobasher, H. Dai, T. Luo, and M. Nakagawa. Improving the e�ectiveness of collaborative �ltering onanonymous web usage data. 2001.
[43] B. Mobasher, H. Dai, T. Luo, M. Nakagawa, Y. Sun, and J. Wiltshire. Discovery of aggregate usage pro�lesfor web personalization. In [30], 2000.
PKDD 2001 Tutorial: �KDD for Personalization� [R-5]
References
[44] B. Mobasher, H. Dai, T. Luo, Y. Su, and J. Zhu. Integrating web usage and content mining for moree�ective personalization. In E-Commerce and Web Technologies, volume 1875 of LNCS. Springer Verlag,Sept. 2000.
[45] B. Mobasher, H. Dai, T. Luo, M. Nakagawa. E�ective personalization based on association rule discoveryfrom Web usage data. Technical Report 01-010, Deaprtment of Computer Science, DePaul University.
[46] J. C. Nash. Know thy customer � from customer knowledge to customer insight. White paper, Accenture,accenture CRM Portal http://www.crmproject.com, access date: July 22, 2001.
[47] Nielsen, J. (1998). Personalization is Over-Rated. Alertbox for October 4, 1998.http://www.useit.com/alertbox/981004.html
[48] Nielsen, J. (2001). Usability Metrics. Alertbox, January 21, 2001.http://www.useit.com/alertbox/20010121.html
[49] Nielsen, J. (2000). Designing Web Usability: The Practice of Simplicity. New Riders Publishing.
[50] Z. Obradovic and S. Vucetic. A regression-based approach for scaling-up personalized recommender systemsin e-commerce. In [30], 2000.
[51] Parent, S., Mobasher, B., and Lytinen, S. (2001). An adaptive agent for web exploration based on concept
hierarchies. In Proceedings of the 9th International Conference on Human Computer Interaction. New
Orleans, LA, August.
[52] M. Perkowitz and O. Etzioni. Adaptive web sites: Automatically synthesizing web pages. In Proc. ofAAAI/IAAI'98, pages 727�732, 1998.
[53] M. Perkowitz and O. Etzioni. Adaptive web sites. Special Section of the Communications of ACM on�Personalization Technologies with Data Mining�, 43(8):152�158, Aug 2000.
[54] Pirolli, P., Pitkow, J., and Rao, R. Silk from a sow's ear: Extracting usable structures from the web. InCHI-96, Vancouver.
PKDD 2001 Tutorial: �KDD for Personalization� [R-6]
References
[55] Shneiderman, B. (1998). Designing the User Interface. Reading, MA: Addison-Wesley.
[56] M. Spendolini. Customer measurement systems � opportunities for improvement. White paper, MJSAssociates, accenture CRM Portal http://www.crmproject.com, access date: July 22, 2001.
[57] Spiekermann, S., Grossklags, J., and Berendt, B. (2001). Stated privacy preferences versus actual behaviourin EC environments: a reality check. In Proceedings der 5. Internationalen Tagung Wirtschaftsinformatik2001. Augsburg, Germany, September.
[58] M. Spiliopoulou and C. Pohle. Data mining for measuring and improving the success of web sites. InR. Kohavi and F. Provost, editors, Journal of Data Mining and Knowledge Discovery, Special Issue onE-commerce, volume 5, pages 85�114. Kluwer Academic Publishers, Jan.-Apr. 2001.
[59] Sterne, J. (1997). Do you know me?. WebMaster Magazine, April, 1997.http://www.cio.com/archive/webbusiness/040197_customer.html
[60] T. Sullivan. Reading reader reaction: A proposal for inferential analysis of web server log �les. In Proc. ofthe Web Conference'97, 1997.
[61] Thompson, M. (1999). Registered Visitors are a portal's best friend. The Industry Standard, June 7, 1999.http://www.thestandard.com.au/metrics/display/0,1283,901,00.html
[62] TrustE. (no date). How does Privacy Impact Your Bottom Line?http://www.truste.org/bus/pub_bottom.html