Top Banner
You may also like... An Introduction to Recommendation Systems Raj Bandyopadhyay Damballa 1 Thursday, December 13, 12
36

An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Oct 09, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

You may also like...An Introduction to Recommendation Systems

Raj BandyopadhyayDamballa

1Thursday, December 13, 12

Page 2: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

The long tail• Brick-and-mortar stores: cater to aggregate

population, limited product diversity

• Online retailers: cater to eclectic/niche preferences, rare and less popular items

• However, users need to be able to discover those rare and niche items!

• Recommendation systems provide a way

2Thursday, December 13, 12

Page 3: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

The long tail

TraditionalRetailers

OnlineRetailers

Popularized by Chris Anderson, editor of Wired.

3Thursday, December 13, 12

Page 4: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Problem Statement• For a system with users and items

• Predict how a user U will rate an item I

• user ‘likes’ a story on Facebook

• user gives a movie 4/5 stars on Netflix

• Find items likely to be rated highly by a user

• Examples of items: movies (Netflix), products (Amazon), stories (Facebook)

4Thursday, December 13, 12

Page 5: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

HP1 HP2 HP3 TW1 TW2 TW3 SW4 SW5 SW6

A 4 5 5 1

B 2 3 2 3

C 3 5 4 4

D 3 2 2 4 5 4

HP: Harry Potter, TW: Twilight, SW: Star Wars

Utility (ratings) matrix

Users

Movies

5Thursday, December 13, 12

Page 6: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

A (bipartite) graph view

HP1 TW1 TW2 TW3 SW4 SW5 SW6HP2 HP3

A B C D

The weights of the lines represent the ratings

6Thursday, December 13, 12

Page 7: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Recommendation systems• Collaborative filtering: recommend items

based on ratings by users with similar preferences

• Content-based: recommend items similar intrinsically to previous items highly rated by user

• Latent Factor Models: Use techniques from Linear Algebra to extract hidden features, on which predictions are based

• Hybrid Systems: combine different approaches

7Thursday, December 13, 12

Page 8: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Collaborative Filtering• For user U, find a neighborhood of users

{N}, who have preferences similar to U

• Predict U’s ratings based on ratings of other users in {N}

• How do we find users similar to U?

• We use a similarity metric

8Thursday, December 13, 12

Page 9: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Intuition: Similarity metric

HP1 TW1 TW2 TW3 SW4 SW5 SW6HP2 HP3

A C D

C and D are more similar than A and D

9Thursday, December 13, 12

Page 10: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Intuition: Similarity metric

• Users C and D should be assigned a higher similarity score if and only if:

• C and D watch many of the same movies

• C and D rate the same movies similarly

• Several similarity metrics meet these conditions. Let’s look at one...

10Thursday, December 13, 12

Page 11: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Cosine similarity

• Treat each user as a vector of ratings

• Cosine similarity of u and v: u.v/|u||v|

• Always in the range [0,1]

• Higher value => Greater similarity

• Cosine of the angle θ between u,v

θ

v

u

11Thursday, December 13, 12

Page 12: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Cosine similarityHP1 HP2 HP3 TW1 TW2 TW3 SW4 SW5 SW6

A 4 5 5 1

B 2 3 2 3

C 3 5 4 4

D 3 2 2 4 5 4

A B C D

A 1 0.26 0.26 0.04

B 0.26 1 0.7 0.57

C 0.26 0.7 1 0.44

D 0.04 0.57 0.44 1

Takes into account:i) number of items in commonii) similarity of ratings

C is more similar to D than A

12Thursday, December 13, 12

Page 13: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Predicting ratings• How can we predict the rating given by a

user u to an item i?

• Use the similarity metric to find the similarity of other users to u

• Find top K most similar neighbors to u who have rated item i

• Average the neighbors’ ratings for i

13Thursday, December 13, 12

Page 14: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Predicting ratingsHP1 HP2 HP3 TW1 TW2 TW3 SW4 SW5 SW6

A 4 5 5 1

B 2 3 2 3

C 3 5 4 4 ???

D 3 2 2 4 5 4

Example: How would user C rate movie SW4?

Choosing K=2,Top K neighbors for C who have rated SW4 are B and D

(0.7x3 + 0.44x4)/(0.7 + 0.44) = 3.39

A B C D

A 1 0.26 0.26 0.04

B 0.26 1 0.7 0.57

C 0.26 0.7 1 0.44

D 0.04 0.57 0.44 1

14Thursday, December 13, 12

Page 15: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Item based similarity• Current approach: user-based similarity

• “People who like this also like...”

• Could we use the same approach for items?

• “You may also like...”

• Yes! Item-based similarity: dual of user-based

• Items rated similarly by the same user get higher similarity score

• Predict ratings for a new item based on ratings by a user for similar items

15Thursday, December 13, 12

Page 16: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Item-based similarity

HP1 TW1 SW4

A 4 1

B 2 3 3

C 3 5

D 3 4

Treat the columns as vectors and calculate similarity between them

HP1 TW1 SW4

HP1 1 0.7 0.22

TW1 0.7 1 0.63

SW4 0.22 0.63 1

Example: How would user C rate movie SW4?

Similar to user-based method, choosing K = 2Top 2 movies similar to SW4 rated by C are HP1 and TW1

(0.63x5 + 0.22x3)/(0.63 + 0.22) = 4.48

16Thursday, December 13, 12

Page 17: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

User or Item based similarity?

• Theoretically, should be similar

• In practice, item-based works better

• Items are less ‘complicated’ than people

• Easier to categorize items

• Better performance: needs fewer neighbors to get accurate prediction

Example: CD similar to Mozart, Beethoven, Bach => classical. Not same for users

A user typically rates only a small fraction of items, so less items to iterate over

17Thursday, December 13, 12

Page 18: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Collaborative Filtering: Problems• Cold Start and First Rater: new item/user

• Fraud/Attacks: fake items/ratings

• Sparsity of utility matrix

• Implicit features: what do users actually like?

• How can we address these problems?

• Use recommendations based on intrinsic properties of users/items (content-based models)

• Use linear algebra to extract useful features and reduce sparsity (latent factor models)

18Thursday, December 13, 12

Page 19: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Recommendation systems• Collaborative filtering: recommend items

based on ratings by users with similar preferences

• Content-based: recommend items similar intrinsically to previous items highly rated by user

• Latent Factor Models: Use techniques from Linear Algebra to extract hidden features, on which predictions are based

• Hybrid Systems: combine different approaches

19Thursday, December 13, 12

Page 20: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Content-based systems• Create a profile vector for each item/user

• Use machine learning (clustering and classification) to find similar items/users

• Profile vectors composed of features designed to reflect intrinsic properties

• What features?

20Thursday, December 13, 12

Page 21: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Profile features• Tags/categories for items

• Demographic features for users

• Examples:

• Gender, race, income

• Movie genres: Sci-fi, Fantasy, Comedy

• Does movie X cast actor Y?

21Thursday, December 13, 12

Page 22: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Examples: featuresFantasy Sci-Fi Magical

powersSupernatural

creaturesSpaceships Alan

RickmanHarrison

Ford

HP1 1 0 1 1 0 1 0

TW1 1 0 0 1 0 0 0

SW4 0 1 1 0 1 0 1

• How do we use these feature vectors? Here’s one way:

• Calculate similarity between movie profile vectors

• Similar to item-based CF, use weighted average

• Many other algorithms can be appliedTalk about problems with content based systems- need to decide and add features manually

22Thursday, December 13, 12

Page 23: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

So far...• General approach in recommenders:

• Find similar users/items

• Use a weighted average of neighbors’ ratings to predict unknown rating

• Collaborative filtering: behavioral similarity

• Content-based systems: intrinsic similarity

23Thursday, December 13, 12

Page 24: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Recommendation systems• Collaborative filtering: recommend items

based on ratings by users with similar preferences

• Content-based: recommend items similar (intrinsically) to previous items highly rated by user

• Latent Factor Models: Use techniques from Linear Algebra to extract hidden features, on which predictions are based

• Hybrid Systems: combine different approaches

24Thursday, December 13, 12

Page 25: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Latent Factor Models• Can we make the utility matrix denser?

• Can we gain some insight into what user preferences actually mean?

• Linear algebra techniques can uncover these latent (hidden) factors

• SVD: singular value decomposition

25Thursday, December 13, 12

Page 26: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

How does SVD help?• Find low-dim approximation of ratings matrix

• Use low-dim vectors in to calculate similarity

Dim 1 Dim 2

A 0.9 -0.8

B 0.1 0.3

C 0.24 0.85

D -0.89 0.4

HP1 HP2 HP3 TW1 TW2 TW3 SW4 SW5 SW6

A 4 5 5 1

B 2 3 2 3

C 3 5 4 4

D 3 2 2 4 5 4

We have reduced each user from a sparser 9-D vector to a 2-D vector.

How can we interpret the dimensions here?Perhaps as “user tastes” or “movie genres”.

SVD

26Thursday, December 13, 12

Page 27: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Interpreting the SVDX=Dim 1 Y=Dim 2

A 0.9 -0.8

B 0.1 0.3

C 0.24 0.85

D -0.89 0.4

HP1 HP2 HP3 TW1 TW2 TW3 SW4 SW5 SW6

A 4 5 5 1

B 2 3 2 3

C 3 5 4 4

D 3 2 2 4 5 4

A

D

C

B

Female

Fantasy

Male

Sci-Fi

In this case we interpret the SVD approximation as showing:

Dim 1: Sci-Fi to FantasyDim 2: Male to Female

Note: C is closer to D than A

27Thursday, December 13, 12

Page 28: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Recommendation systems• Collaborative filtering: recommend items

based on ratings by users with similar preferences

• Content-based: recommend items similar (intrinsically) to previous items highly rated by user

• Latent Factor Models: Use mathematical techniques to extract hidden features, on which predictions are based

• Hybrid Systems: combine different approaches

• Let’s look at a case study

28Thursday, December 13, 12

Page 29: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Case Study: Netflix contest

Hybrid system with some important innovations

CF

SVD

Other..

DT~500 features

Final ratings

Winner: Bellkor’s Pragmatic Chaos

29Thursday, December 13, 12

Page 30: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Bellkor’s innovations• Incorporate user biases in model

• Bias parameters for each user & movie

• Ratings are deviations from user/movie-specific bias

• Incorporate temporal shifts in model

• User ratings change based on time

• Algorithms parametrized by time

• Focus on good ways to blend algorithms

• GBDT: Gradient boosted decision trees

30Thursday, December 13, 12

Page 31: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Summary

• We have seen several algorithms:

• CF, content-based, latent-factor (SVD)

• Hybrid: combinations of the above

• So which algorithm should you use?

• Depends on your use case and business

31Thursday, December 13, 12

Page 32: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Summary• Are recommendations an absolutely core, critical

part of your business? (e.g. Netflix, Amazon)

• You should be using a hybrid system

• Spend a lot of effort tuning based on domain-specific knowledge

• Find innovative ways to combine different kinds of features

32Thursday, December 13, 12

Page 33: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Summary• Is your data set extremely sparse?

• Latent factor (SVD) based models may extract the “essence” of your data

• YMMV on whether they provide usable insights into customer behavior

• Otherwise...

• Collaborative filtering models: fast and easy to implement and test

33Thursday, December 13, 12

Page 34: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

The road ahead• Users can be fickle about providing ratings

• (How often do you rate stuff?)

• Collect better data without annoying users

• Games to collect ratings information

• “gamification”, work of Luis von Ahn

• NLP to parse reviews & other sources

• sentiment analysis to infer ratings

34Thursday, December 13, 12

Page 35: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

The road ahead

• Incorporate social network information

• Mine your twitter feed for content

• Use your social graph to identify similar users/friends

• Recommend across genres/categories

35Thursday, December 13, 12

Page 36: An Introduction to Recommendation Systemsfiles.meetup.com/3924342/Recommendation Systems.pdfAn Introduction to Recommendation Systems Raj Bandyopadhyay Damballa Thursday, December

Further reading• Machine Learning in Action: Harrington,

Manning Publishers

• Recommender Systems: Melville and Sindhwani, IBM Research

• Mining of Massive Datasets (Ch 9): Rajaraman, Ullman and Leskovec, Stanford University

• The Bellkor Solution to the Netflix Grand Prize: Bell and Koren, AT&T labs

36Thursday, December 13, 12