Top Banner
Filtering and Recommender Systems Content-based and Collaborative
29

The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Dec 14, 2015

Download

Documents

Will Herrell
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Filtering and Recommender SystemsContent-based and Collaborative

Page 2: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Filtering and Recommender

SystemsContent-based

and Collaborative

Some of the slides based

On Mooney’s Slides

Page 3: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Personalization• Recommenders are instances of

personalization software.• Personalization concerns adapting to the

individual needs, interests, and preferences of each user.

• Includes:– Recommending– Filtering– Predicting (e.g. form or calendar appt. completion)

• From a business perspective, it is viewed as part of Customer Relationship Management (CRM).

Page 4: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Feedback & Prediction/Recommendation

• Traditional IR has a single user—probably working in single-shot modes– Relevance feedback…

• WEB search engines have:– Working continually

• User profiling– Profile is a “model” of the user

• (and also Relevance feedback)– Many users

• Collaborative filtering– Propagate user preferences to other

users…

You know this one

Page 5: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Recommender Systems in Use

• Systems for recommending items (e.g. books, movies, CD’s, web pages, newsgroup messages) to users based on examples of their preferences.

• Many on-line stores provide recommendations (e.g. Amazon, CDNow).

• Recommenders have been shown to substantially increase sales at on-line stores.

Page 6: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Feedback Detection

– Click certain pages in certain order while ignore most pages.

– Read some clicked pages longer than some other clicked pages.

– Save/print certain clicked pages.

– Follow some links in clicked pages to reach more pages.

– Buy items/Put them in wish-lists/Shopping Carts

– Explicitly ask users to rate items/pages

Non-Intrusive Intrusive

Page 7: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Justifying Recommendation..

• Recommendation systems must justify their recommendations– Even if the justification is bogus..– For search engines, the “justifications” are the page

synopses• Some recommendation algorithms are better at

providing human-understandable justifications than others– Content-based ones can justify in terms of classifier

features..– Collaborative ones are harder-pressed other than saying

“people like you seem to like this stuff”– In general, giving good justifications is important..

Page 8: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Content/Profile-basedRedMars

Juras-sicPark

LostWorld

2001

Foundation

Differ-enceEngine

Machine Learning

UserProfile

Neuro-mancer

2010

Collaborative Filtering

A 9B 3C: :Z 5

A B C 9: :Z 10

A 5B 3C: : Z 7

A B C 8: : Z

A 6B 4C: :Z

A 10B 4C 8. .Z 1

UserDatabase

ActiveUser

CorrelationMatch

A 9B 3C . .Z 5

A 9B 3C: :Z 5

A 10B 4C 8. .Z 1

ExtractRecommendations

C

Content-based vs. CollaborativeRecommendation

Needs description of items…

Needs only ratings from other users

Page 9: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Content-Based Recommending

• Recommendations are based on information on the content of items rather than on other users’ opinions.

• Uses machine learning algorithms to induce a profile of the users preferences from examples based on a featural description of content.

• Lots of systems

Page 10: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Adapting Naïve Bayes idea for Book Recommendation

• Vector of Bags model– E.g. Books have several different fields that are all text

• Authors, description, …• A word appearing in one field is different from the same word appearing

in another– Want to keep each bag different—vector of m Bags; Conditional

probabilities for each word w.r.t each class and bag

• Can give a profile of a user in terms of words that are most predictive of what they like– Strengh of a keyword

• Log[P(w|rel)/P(w|~rel)]– We can summarize a user’s profile in terms of the words that have strength

above some threshold. – Related to mutual information

S

m

dm

imi smcjaP

BookP

cjPBookcjP

1

||

1

),|()(

)()|(

Page 11: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Collaborative Filtering

A 9B 3C: :Z 5

A B C 9: :Z 10

A 5B 3C: : Z 7

A B C 8: : Z

A 6B 4C: :Z

A 10B 4C 8. .Z 1

UserDatabase

ActiveUser

CorrelationMatch

A 9B 3C . .Z 5

A 9B 3C: :Z 5

A 10B 4C 8. .Z 1

ExtractRecommendations

C

Correlation analysis

Here is similar to the

Association clusters

Analysis!

Page 12: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Item-User Matrix

• The input to the collaborative filtering algorithm is an mxn matrix where rows are items and columns are users – Sort of like term-document matrix (items are terms

and documents are users)• Can think of users as vectors in the space of

items (or vice versa)– Can do vector similarity between users

– Pearson correlation coefficient is a variation• And find who are most similar users..

– Can do scalar clusters over items etc.. • And find what are most correlated items

Th

ink

users

docs

Items

keyw

ords

Page 13: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

A Collaborative Filtering Method(think kNN)

• Weight all users with respect to similarity with the active user.– How to measure similarity?

• Could use cosine similarity; normally pearson coefficient is used

• Select a subset of the users (neighbors) to use as predictors.

• Normalize ratings and compute a prediction from a weighted combination of the selected neighbors’ ratings.

• Present items with highest predicted ratings as recommendations.

Page 14: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Finding User Similarity with Person Correlation Coefficient

• Typically use Pearson correlation coefficient between ratings for active user, a, and another user, u.

ua rr

uaua

rrc

),(covar

,

ra and ru are the ratings vectors for the m items rated by both a and u

ri,j is user i’s rating for item jm

rrrrrr

m

iuiuaia

ua

1

,, ))((),(covar

m

rrm

ixix

rx

1

2, )(

m

rr

m

iix

x

1

,

Page 15: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Person Correlation Coefficient is the same as vector similarity over

centered ratings vectors• It is easy to check for yourself that

pearson correlation coefficient is the same as the cosine theta distance between centered ratings vectors– Covariance = dot product– Sqrt (Variance of each vector) = norm of

each vector

Page 16: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Neighbor Selection

• For a given active user, a, select correlated users to serve as source of predictions.

• Standard approach is to use the most similar k users, u, based on similarity weights, wa,u

• Alternate approach is to include all users whose similarity weight is above a given threshold.

Page 17: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Rating Prediction• Predict a rating, pa,i, for each item i, for active user,

a, by using the k selected neighbor users,

u {1,2,…k}.• To account for users different ratings levels, base

predictions on differences from a user’s average rating.

• Weight users’ ratings contribution by their similarity to the active user.

n

uua

n

uuiuua

aia

w

rrwrp

1,

1,,

,

||

)(

ri,j is user i’s rating for item j

ua rr

uaua

rrc

),(covar

,

Page 18: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Significance Weighting

• Important not to trust correlations based on very few co-rated items.

• Include significance weights, sa,u, based on number of co-rated items, m.

uauaua csw ,,,

50 if

50

50 if 1, m

mm

s ua

ua rr

uaua

rrc

),(covar

,

Page 19: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Item-centered Collaborative Filtering

• Starting with a “centered” user-item matrix, we found k-nearest users to the active user and used them to recommend unrated items

• We can also use the centered U-I matrix to compute item-item correlations by starting with U-I’xU-I, and doing (a) association clusters and (b) scalar clusters

• This will give us, for each item, k-nearest items– Now, given a new item In to be rated for a user U, we first find k items

closest to In and, and take their (weighted) average rating from the user U as predictive of U’s rating of In

– An advantage of this method over the “user-centered” idea is that the justifications for the recommendations can be more meaningful (you can tell the user that we are recommending In because she rated the items in its association cluster high..)

Page 20: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

LSI-style techniques for collaborative filtering

• The NETFLIX prize was won by an approach that did “latent factor analysis” (aka LSI) on the u-i matrix, so that both users and items are seen as vectors in a k-dimensional factor space

• One technical difficulty in doing LSI on u-i matrix is that it has many “null” values– D-t matrix is sparse and that is good. U-I

matrix has null values and that is bad (because null != 0)

• Two approaches:– “fill in” the missing ratings (“Imputation”

method) so we have no more null values– “compute distance between vectors only in

terms of their common non-null dimensions• Problem: Overfitting. Solution:

Regularization—penalize “large factor” values.

qi item in factor spacepu user in factor space

Page 21: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Problems with Collaborative Filtering• Cold Start: There needs to be enough other users

already in the system to find a match.• Sparsity: If there are many items to be recommended,

even if there are many users, the user/ratings matrix is sparse, and it is hard to find users that have rated the same items.

• First Rater: Cannot recommend an item that has not been previously rated.– New items– Esoteric items

• Popularity Bias: Cannot recommend items to someone with unique tastes. – Tends to recommend popular items.

• WHAT DO YOU MEAN YOU DON’T CARE FOR BRITNEY SPEARS YOU DUNDERHEAD? #$%$%$&^

Page 22: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Advantages of Content-Based Approach

• No need for data on other users.– No cold-start or sparsity problems.

• Able to recommend to users with unique tastes.• Able to recommend new and unpopular items

– No first-rater problem.• Can provide explanations of recommended items

by listing content-features that caused an item to be recommended.

• Well-known technology The entire field of Classification Learning is at (y)our disposal!

Page 23: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Disadvantages of Content-Based Method

• Requires content that can be encoded as meaningful features.

• Users’ tastes must be represented as a learnable function of these content features.

• Unable to exploit quality judgments of other users.– Unless these are somehow included in the content

features.

Page 24: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Content-Boosted CF - I

Content-Based Predictor

Training Examples

Pseudo User-ratings Vector

Items with Predicted Ratings

User-ratings Vector

User-rated Items

Unrated Items

Page 25: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Content-Boosted CF - II

• Compute pseudo user ratings matrix– Full matrix – approximates actual full user ratings matrix

• Perform CF– Using Pearson corr. between pseudo user-rating vectors

• This works better than either!

User RatingsMatrix

Pseudo UserRatings Matrix

Content-BasedPredictor

Page 26: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Why can’t the pseudo ratings be used to help content-based filtering?

• How about using the pseudo ratings to improve a content-based filter itself? (or how access to unlabelled examples improves accuracy…)– Learn a NBC classifier C0 using the few items for which we have user

ratings– Use C0 to predict the ratings for the rest of the items– Loop

• Learn a new classifier C1 using all the ratings (real and predicted)• Use C1 to (re)-predict the ratings for all the unknown items

– Until no change in ratings • With a small change, this actually works in finding a better classifier!

– Change: Keep the class posterior prediction (rather than just the max class)• This means that each (unlabelled) entity could belong to multiple classes—with

fractional membership in each• We weight the counts by the membership fractions

– E.g. P(A=v|c) = Sum of class weights of all examples in c that have A=v divided by Sum of class weights of all examples in c

• This is called expectation maximization – Very useful on web where you have tons of data, but very little of it is

labelled– Reminds you of K-means, doesn’t it?

• (no coincidence—K-means is “hard-assignment” EM)

Unlabeled examples help only when they are drawn from the same distribution as the labeled ones..

Page 27: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.
Page 28: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

(boosted) content filtering

Page 29: The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.

Discussion of the Google News Collaborative Filtering Paper