Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber

04/18/13 Data Mining: Principles and Algorithms1

Data Mining: Concepts and Techniques

— Chapter 11 —

Addit ional Theme: Collaborative Filtering & Data Mining

Jiawei Han and Micheline KamberDepartment of Computer Science

University of Illinois at Urbana-Champaignwww.cs.uiuc.edu/~hanj

©2006 Jiawei Han and Micheline Kamber. All rights reserved

http://www.cs.uiuc.edu/~hanj



Outline

Motivation Systems in Action A Conceptual Framework User-User Methods Item-Item Methods Recent Advances and Open Problems


Motivation

User Perspective Lots of online products, books, movies, etc. Reduce my choices…please…

Manager Perspective

“ if I have 3 million customers on the web, I should have 3 million stores on the web.”

CEO of Amazon.com [SCH01]


Example: Recommendation

Customers who bought this book also bought:

•Data Preparation for Data Mining: by Dorian Pyle (Author) •The Elements of Statistical Learning: by T. Hastie, et al •Data Mining: Introductory and Advanced Topics: by Margaret H. Dunham•Mining the Web: Analysis of Hypertext and Semi Structured Data


Example: Personalization


Other Examples

Movielens: movies Moviecritic: movies again My launch: music Gustos starrater: web pages Jester: Jokes TV Recommender: TV shows Suggest 1.0 : different products And much more…

http://www.movielens.umn.edu/

http://www.moviecritic.com/

http://www.mylaunch.com/

http://www.gustos.com/starrater.html



http://shadow.ieor.berkeley.edu/humor/

http://www.njcmr.org/elizabeth/welcome.html

http://www-users.cs.umn.edu/~karypis/suggest/


How it Works?

Each user has a profile Users rate items

Explicitly: score from 1..5 Implicitly: web usage mining

Time spent in viewing the item Navigation path Etc…

System does the rest, How? This is what we will show today


Basic Approaches

Collaborative Filtering (CF) Look at users collective behavior Look at the active user history Combine!

Content-based Filtering Recommend items based on key-words More appropriate for information retrieval


Collaborative Filtering: A Framework

u1

u2

…

ui

...

um

Items: I

i1 i2 … ij … in

3 1.5 …. … 2

2

1

3

rij=?

The task:Q1: Find Unknown ratings?Q2: Which items should we recommend to this user?...

Unknown function f: U x I→ R

Users: U


Collaborative Filtering Road Map

User-User Methods Identify like-minded users Memory-based: K-NN Model-based: Clustering

Item-Item Method Identify buying patterns Correlation Analysis Linear Regression Belief Network Association Rule Mining


User-User Similarity: Intuit ion

TargetTargetCustomerCustomer

Q1: How to measure similarity?

Q2: How to select neighbors?

Q3: How to combine?


How to Measure Similarity?

Pearson correlation coefficient

Cosine measure Users are vectors in product-dimension space

∑∑

∑

∈∈

∈

−−

−−=

Items RatedCommonly j

2


2


)()(

))((

),(iijaaj

iijaaj

prrrr

rrrr

iaw

ui

ua

i1 in

22*

.),(

ia

iac rr

rriaw =


Nearest Neighbor Approaches [SAR00a]

Offline phase: Do nothing…just store transactions

Online phase: Identify highly similar users to the active one

Best K ones All with a measure greater than a threshold

Prediction

∑∑ −

+=

i

iiji

aaj iaw

rriawrr

),(

)(),(

User a’s neutralUser i’s deviation

User a’s estimated deviation


Horting Method [AGG99]

K-NN is not transitive Horting takes advantage of transitivity Uses new similarity measure: Predictability User i predicts user a if

They have rated sufficiently common items There is an error-bounded linear

transformation from user i’s ratings to a’s ones


How Horting Works?

Offline phase: build neighborhood graph Online phase: Compute raj

Ua

1- Identify users who predict ua

2- Identify users who rated j

3- Find shortest paths from group1 to 2

4- Backward propagation and averaging

- Better for sparse environments- Not well evaluated


Clustering [BRE98]

Offline phase: Build clusters: k-mean, k-medoid, etc.

Online phase: Identify the nearest cluster to the active user Prediction:

Use the center of the cluster Weighted average between cluster members

Weights depend on the active user

Faster Slower but a little more accurate


Clustering vs. k-NN Approaches

K-NN using Pearson measure is slower but more accurate

Clustering is more scalableActive user

Bad recommendations

We can use soft clustering but will lose computational edge


Did We Answer the Questions?

TargetTargetCustomerCustomer

Q1: How to measure similarity?

Q2: How to select neighbors?

Q3: How to combine?


Are We Done? Q1:How to measure similarity?

.....

......

),( Items RatedCommonly j∑

∈=iawp

What about Sparsity?Not enough common Itemsimplies spurious neighbors and hence bad recommendations

Sparsity results from the poor representation!

U1 rates recycled letter pads HighU2 rates recycled memo pads High

Both of them like Recycled office products

They are similar but the math won’t work for that

Example from [SAR00P]

By working at the right level of abstraction we can eliminate sparsity

Done... Really??


The Power of Representation [UNG98]

Action Foreign Classic

Q1-B: How can we formalize this intuition?


How to Abstract?

Semi-manual Methods Use product features Cluster products first, then cluster users Works only if we have descriptive features

Automatic Methods Adjusted Product Taxonomy Latent Semantic Indexing


Adjusted Product Taxonomy [CHO04]

• Input : product taxonomy•Output: modified taxonomy with even distribution


Adjusted Product Taxonomy (2)

Using original taxonomy

Using adjusted taxonomy

Number of transactions having this category


Latent Semantic Indexing [SAR00b]

=R

m X n

U

m X r

S

r X r

I’

r X n

Sk

k X k

Uk

m X k

Ik’

k X n

The reconstructed matrix Rk = Uk.Sk.Ik’ is the closest rank-k matrix to the original matrix R.

Rk

• Captures latent associations• Reduced space is less-noisy


Are We Done? (2)

Q2:How to Select Neighbors? We don’t expect to use the same neighbors

for all products Neighbors should be product-category

specific

Not adequately answered

Q2-B. How can we determine whether or not a user is relevant to a given product?


Selecting Relevant Instances [YU01]

Superman and Batman and correlated Titanic and Batman are negatively correlated “Dances with Wolves” has nothing to do with Batman’s rating Karen is not a good instance to consider

MI(X;Y) = H(X) – H(X|Y)

How can we formalize this? Mutual Information

Predict this


Selecting Relevant Instances (2)

Offline phase: Estimate mutual information between items For each item:

Find users who rated it Compute their strength (how many relevant items

they also rated) Retain subset of them (10% works fine)

Online phase: To predict the target item’s rating, run k-NN on

its reduced instance space

Better results with less data… quality not quantity is what matter


Are We Done? (3)

Q3:How to combine? Weighted average Discover association rules in neighbors’ transactions

[LEE01, WAN04] For every x in this group: like(x, Item1) ^ like(x, Item2) like(x, Item3) Use confidence and support to judge the quality of the

prediction Prediction is done on the binary level (like, dislike) Costly to run online


User-User Methods Evaluation

Achieve good quality in practice The more processing we push offline, the better

the method scale However:

User preference is dynamic High update frequency of offline-calculated

information No recommendation for new users

We don’t know much about them yet


Collaborative Filtering Road Map

User-User Methods Identify like-minded users Memory-based: K-NN Model-based: Clustering

Item-Item Method Identify buying patterns Correlation Analysis Linear Regression Belief Network Association Rule Mining


Item-Item Similarity: The Intuit ion

Search for similarities among items All computations can be done offline Item-Item similarity is more stable that user-user

similarity No need for frequent updates

First Order Models Correlation Analysis Linear Regression

Higher Order Models Belief Network Association Rule Mining


Correlation-based Methods [SAR01]

Same as in user-user similarity but on item vectors Pearson correlation coefficient

Look for users who rated both items

u1

um

i1 ii ij in

∑∑∑

∈∈

∈

−−

−−=

ItemsBoth Rated Usersu

2


2


)()(

))((

iuijuj

iuijuj

ijrrrr

rrrrs


Correlation-based Methods (2)

Offline phase: Calculate n(n-1) similarity measures For each item

Determine its k-most similar items Online phase:

Predict rating for a given user-item pair as a weighted sum over similar items that he rated

Ua ?2 3 4

∑∑

∈

∈=

itemssimilariij

aiitemssimilariij

aj s

rsr

j


Regression Based Methods [VUC00]

Offline phase: Fit n(n-1) linear regressions Fij(x) is a linear transformation of a user rating on

item i to his rating on item j Online phase

Same as previous method The weights are inversely proportional to the

regression error rates

∑∑

∈

∈=

aby items

aby items

)(

ratediij

airatedi

ijij

aj w

rfw

r


Higher Order Models

Previous approaches used the Naïve Bayes assumption Item effects on a given one are independent

Not always true Higher order models can do better

Belief Network Association Rule Mining


Bayesian Belief Network: introduction

Bayesian belief network allows a subset of the variables to be conditionally independent

A graphical model of causal relationships Represents dependency among the variables Gives a specification of joint probability distribution

X Y

ZP

Nodes: random variablesLinks: dependencyX,Y are the parents of Z, and Y is the parent of PNo dependency between Z and PHas no loops or cycles


Bayesian Belief Network: An Example

FamilyHistory

LungCancer

PositiveXRay

Smoker

Emphysema

Dyspnea

LC

~LC

(FH, S) (FH, ~S) (~FH, S) (~FH, ~S)

0.8

0.2

0.5

0.5

0.7

0.3

0.1

0.9

Bayesian Belief Networks

The conditional probability table for the variable LungCancer:Shows the conditional probability for each possible combination of its parents

∏=

=n

iZParents iziPznzP

1))(|(),...,1(


Belief Network for CF [BRE98]

Every item is a node Binary rating (like, dislike) Learn offline a belief network over the training date CPT table at each node is represented as a decision tree Use greedy algorithms to determine the best network

structure Use probabilistic inference for online prediction


Belief Network for CF: An Example

decision tree for the random variable “Melrose Palace” in the movie domain

Probability

Friends

M.P

B.H CPT


Association Rule Mining

Offline processing Work on the binary level (like, dislike) View user as market basket containing items

liked by user Discover association rules between items

Online processing: Match items that the active user like with rules

left hand side Recommend rules’ consequent based on

support and confidence


Association Rule Mining : Problems

High support threshold leads to low coverage and may eliminate important, but infrequent items from consideration

Low support thresholds result in very large model sizes, computationally expensive offline pattern discovery phase and slower online matching phase

Solution: Adaptive Association Rule Mining


Adaptive Association Rule Mining [LIN01]

minSupport

minConfidenceDesired number

of rules

Given: transaction dataset target item desired range for number of

rules specified minimum confidence

Find: set S of association rules for target item such that number of rules in S is in given range rules in S satisfy minimum confidence constraint rules in S have higher support than rules not in S that satisfy above

constraints


Adaptive Association Rule Mining (2)

Discover rules with one item on the head Like (x, item1) ^ Like (x, item2) Like(x,

target)

The miner discovers association rules iteratively (for each target item) until the desired number of rules are extracted

Support is adjusted per-item


Item-Item Methods: Why It Works?

Like(x,Book1)^like(x,book2) like(x,book3)

Like(x,Movie1) like(x,Movie2)

Support Support

We use the right neighbors for each item

Without discovering the groups themselves thus eliminating costly online matching

In general better quality than user-user methods and better response time [LIN03]

Book1, Book2Movie1

Bookgang

Moviegang


Recent Work and Open Problems

Order-based methods Ordering items is more informative than rating them [KAM03] developed k-o’mean to work on orders

Preference-based methods Total ordering of items is not feasible Work on partial orders (preferences) [COH99]

Integrating background knowledge User demographic information, item-features, etc..

Modeling time Sequential patterns


References (1) Charu C. Aggarwal, Joel L. Wolf, Kun-Lung Wu, Philip S. Yu: Horting Hatches

an Egg: A New Graph-Theoretic Approach to Collaborative Filtering. KDD 1999: 201-212

J. Breese, D. Heckerman, C. Kadie Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In Proc. 14th Conf. Uncertainty in Artificial Intelligence, Madison, July 1998.

Yoon Ho Cho and Jae Kyeong Kim: Application of Web usage mining and product taxonomy to collaborative recommendations in e-commerce. Expert Systems with Applications, 26(2), 2003

William W. Cohen, Robert E. Schapire, and Yoram Singer. Learning to order things. In Advances in Neural Processing Systems 10, Denver, CO, 1997

Jiawe Han, Fall 2003 online course notes available at: http://www-courses.cs.uiuc.edu/~cs397han/slides/05.ppt Toshihiro Kamishima: Nantonac collaborative filtering: recommendation

based on order responses. KDD 2003: 583-588 Lee, C.-H, Kim, Y.-H., Rhee, P.-K. Web personalization expert with combining

collaborative filtering and association rule mining technique. Expert Systems with Applications, v 21, n 3, October, 2001, p 131-137


References (2) W. Lin, 2001P, online presentation available at: http://www.wiwi.hu-

berlin.de/~myra/WEBKDD2000/WEBKDD2000_ARCHIVE/LinAlvarezRuiz_WebKDD2000.ppt

Weiyang Lin, Sergio A. Alvarez, and Carolina Ruiz. Efficient adaptive-support association rule mining for recommender systems. Data Mining and Knowledge Discovery, 6:83--105, 2002

G. Linden, B. Smith, and J. York, "Amazon.com Recommendations Iemto -item collaborative filtering", IEEE Internet Computing, Vo. 7, No. 1, pp. 7680, Jan. 2003. Badrul M. Sarwar, George Karypis, Joseph A. Konstan, John Riedl: Analysis of recommendation algorithms for e-commerce. ACM Conf. Electronic Commerce 2000: 158-167

B. Sarwar, G. Karypis, J. Konstan, and J. Riedl: Application of dimensionality reduction in recommender systems--a case study. In ACM WebKDD 2000 Web Mining for E-Commerce Workshop, 2000.

B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Riedl. Item-based collaborative filtering recommendation algorithms. WWW’01


References (3) B. Sarwar, 2000P, online presentation available at: http://www.wiwi.hu-

berlin.de/~myra/WEBKDD2000/WEBKDD2000_ARCHIVE/badrul.ppt J. Ben Schafer, Joseph A. Konstan, John Riedl: E-Commerce

Recommendation Applications. Data Mining and Knowledge Discovery 5(1/2): 115-153, 2001

L.H. Ungar and D.P. Foster: Clustering Methods for Collaborative Filtering, AAAI Workshop on Recommendation Systems, 1998.

Yi-Fan Wang, Yu-Liang Chuang, Mei-Hua Hsu and Huan-Chao Keh: A personalized recommender system for the cosmetic business. Expert Systems with Applications, v 26, n 3, April, 2004 Pages 427-434

S. Vucetic and Z. Obradovic. A regression-based approach for scaling-up personalized recommender systems in e-commerce. In ACM WebKDD 2000 Web Mining for E-Commerce Workshop, 2000.

Kai Yu, Xiaowei Xu, Martin Ester, and Hans-Peter Kriegel: Selecting relevant instances for efficient accurate collaborative filtering. In Proceedings of the 10th CIKM, pages 239--246. ACM Press, 2001.

Cheng Zhai, Spring 2003 online course notes available at: http://sifaka.cs.uiuc.edu/course/2003-497CXZ/loc/cf.ppt


Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber

Technology