Top Banner
Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper for a quick & great overview of RS methodologies. 1
28

Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

Apr 01, 2015

Download

Documents

Mitchel Rippon
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

1

Memory vs. Model-based Approaches

SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the

Adomavicius and Tuzhilin, TKDE 2005 paper for a quick & great overview of RS methodologies.

Page 2: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

2

Basics

• So far we discussed user-based and item-based CF. • In both, we predict an unknown rating by taking some

kind of aggregate of: ratings on the distinguished item by the distinguished user’s most similar neighbors (user-based) or ratings of the distinguished user on the distinguished item’s most similar neighbors (item-based).

• Both based on a literal memory of past ratings: thus, memory-based.

• Both look at closest neighbors: thus, neighborhood-based.

Page 3: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

3

Model-based approaches

• Build a model of each user’s behavior: what s/he looks for in an item.

• Build a model of each item: what does it have to offer. • Problem: these “features” are not explicitly present. • Turn to latent features. • Matrix factorization: Approximate ratings matrix as a

product of two low-rank matrices. • Dimensionality reduction. • Components of each user/item vector – latent

factors/features.

Page 4: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

4

Geared towards females

Geared towards males

serious

escapist

The PrincessDiaries

The Lion King

Braveheart

Lethal Weapon

Independence Day

AmadeusThe Color Purple

Dumb and Dumber

Ocean’s 11Sense and Sensibility

Gus

Dave

Stolen from: Bell, Koren, and Volinsky’s Netflix Prize talk.

Page 5: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

5

×

USERS

I T E M S

𝑀𝑛×𝑚

𝑃𝑛×𝑘 𝑄𝑘×𝑚

Page 6: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

6

×

?𝑢

𝑖

𝑝𝑢

𝑞𝑖

�̂�𝑢𝑖=𝑝𝑢𝑇 ⋅𝑞𝑖 .

Page 7: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

7

MF (contd.)

• computed to best fit known ratings. • Singular Value Decomposition (SVD) solves a

related problem, but is undefined when the given matrix is incomplete (i.e., unknown entries).

• If very few known ratings, risk of overfitting. • It’s instructive to recall how SVD works to

better understand the foundations of MF.

Page 8: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

8

SVD Recap

• Given a matrix of rank , we can express it as

• Orthogonal eigenvectors of

• Orthogonal eigenvectors of

• being the eigenvalues of or equivalently, the eigenvalues of [singular values.]

Page 9: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

9

SVD Recap (contd.)

• We can find a low-rank approximation using SVD framework. For a small rank zero out the smallest singular values.

• Find • Notice, now we have of order of order of order and of

order • This “reduced” form of SVD is what is used in many

applications where data is large and computation is intensive.

• Best possible rank--approximation under Frobenius norm of the error of approximation by any such matrix.

Page 10: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

10

Returning to MF

• Recall, in MF you factor a given ratings/utility matrix into

• Caution: In the literature, you will sometimes encounter, the “transpose notation” --

• Problem: Find latent user and item vectors {and }: is minimized.

• .

Page 11: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

11

MF Usage • In practice, we minimize the error above over a training

subset of known ratings and validate against the test subset.

• Challenges in minimizing error: multiple local minima. Typically use gradient descent method.

• Basic Idea:

• Initialize some way. • Iteratively Update:

where is used to control the “step size”. E.g., • Stop when the aggregate error squared (e.g., sum) is a

set threshold.

Page 12: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

12

Some Enhancements

• Things are not quite perfect: risk of overfitting; solution – regularization.

• Redefine the error for one entry as That is, • Discourages vectors with large magnitude since

we minimize error; thus overcomes overfitting. • We manage to get with not too large entries

while approximating

Page 13: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

13

Enhancements

• Modified update equations:

• Try different initializations: e.g., random, all s, all s, all , where #non-blank entries of random perturbation to this setting, from different distributions, etc. Try different initializations and orders (see below) and pick the best local minimum.

Page 14: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

14

Enhancements • Visit (i.e., update) elements of in row-major or column-major

order and in round robin. • Or choose a random permutation of the entries and follow that

order. • Pick the best “local minimum”. Sometimes, we pick the average of

the values returned by each local minimum, to avoid overfitting. • Keep in mind, in practice, we don’t quite manage to find local

minimum: we stop when the drop in error between successive rounds is below a threshold.

• Other bells & whistles are possible: e.g., adding effects of user’s rating bias and item’s popularity bias; these parameters then included in the regularization.

Page 15: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

15

Extensions

• Location-aware: recommend items depending on where you are.

• Time-aware: e.g., recommend fastfood restaurants for lunch on weekdays and formal/classy ones for dinner; recommend shows that are currently in town; recommend games during hockey season; …

• Context-aware: e.g., recommend a movie depending on your current company.

Page 16: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

16

What if feedback was implicit

• More common than explicit feedback: – Not every customer will bother to rate/review. – Purchase history of products. – Browsing history or search queries. – Playcount in last.fm is a kind of implicit feedback.– Simple thumbs up/down for TV shows.

Based on: Y.F. Hu, Y. Koren, and C. Volinsky, Collaborative Filtering for Implicit Feedback Datasets, Proc. IEEE Intl Conf. Data Mining (ICDM 08), IEEE CS Press, 2008, pp. 263-272.

Page 17: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

17

What is different about implicit feedback?

• Just “adopt/no adopt” data inherently only gives positive feedback.

• For all missing pairs, user may or may not be aware of the item: we know nothing about those cases. – Contrast: in explicit f/b, ratings do include negative f/b. However,

even there, missing data doesn’t necessarily mean negative info. – Data may not be missing at random: question of what the user is

aware of. • It’s in the missing part where we expect our negative

signals. Cannot “ignore” missing data unlike with explicit f/b.

Page 18: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

18

More Differences

• Even the so-called positive cases are noisy. E.g., – Tune to a certain TV channel and talk to a friend the whole time. – Perhaps my experience after buying a smartphone was negative. – Perhaps I bought that watch as a gift. – No way of knowing!

• Explicit f/b (rating) preference; implicit f/b confidence. E.g., – How often do I watch a series and for how long? – Playcount of songs on last.fm.

• Evaluation metric – unclear; needs to account for availability and competition.

Page 19: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

19

Basics

• Natural to consider all missing values. (Why?) • Neighborhood-based methods face an issue:

implicit f/b such as watching/listening frequency may vastly differ between users (unlike everyone rating on the same scale).– Rating biases exist but differences in

watching/listening frequency can be much larger and can exacerbate the situation.

• Complicates similarity calculation.

Page 20: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

20

A Model

• -- observations in the form of adoption or counts. Binarized to: if and otherwise. Think of as belief of user liking or not liking item.

• We posit a confidence in our belief • ( found to work best by authors.)

– min. confidence is 1 (no observations!). • Use MF as before with some important differences. • , where are -dimensional user and item latent

vectors. How do we learn them?

Page 21: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

21

A Model

i.e., • Note, error being computed over all entries, including where no

observations are available. The confidence can vary widely (see confidence expression).

• obtained by cross-validation. • Mere error computation is prohibitive (unlike for explicit

feedback) – significant challenge. (SGD won’t scale!) • Alternating Least Squares (which has been used in the ratings

world as well) to the rescue! • ALS idea: Alternately treat the item factors or the user factors

as fixed and try to minimize the error w.r.t. the other until “convergence”.

error

Page 22: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

22

In Implicit Feedback MF so far …

• Observations (playcounts, adoption signals, etc.) lead to beliefs (which are binary) of user liking item, along with a confidence more observationsmore confidence.

• We want to predict missing along with corresponding confidences: i.e., currently our beliefs in these cases is “not like” and we’d like to revise them.

• Unlike explicit f/b case, must treat all entries including missing ones (zero observations cases) – computing even the error of a given model is prohibitive!

• Need to reassess the error, update user/item factors iteratively until convergence. scalability challenge unique to implicit f/b case.

Page 23: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

23

ALS Details

• Equating partial derivative of error w.r.t. to 0 and doing some elementary matrix calculus, and solving for we get ,

where the matrix with row being . Equiv., = the matrix whose column is . is and • Similarly, .

Page 24: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

24

A Scalability Trick

• Challenge: Speaking about users, must be computed for each of users. direct approach will take time! – Note: all diagonal elements of are – Consider only diagonal elements are non-zero,

where = # elements user has “clicked”. – Write as

Precompute in Time.Compute in time.

Page 25: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

25

Scalability Story (contd.)

• Overall work for one user Invert an matrix + its product with – Do right associative multiplication: has just elements,

leveraging which we can do the above product in time. • Even assuming a naïve matrix inversion algorithm,

total work for user can be done in time and for all users in time where = no. of non-blank entries in – Which is usually quite small.

• Empirically, authors find acceptable. Essentially, linear running time: no. of alternations needed in practice is small (a few tens at most).

Page 26: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

26

Recommendation Generation

• user recommend the top- items with the highest • Worth experimenting with recommending items

with the highest – where is the confidence in the prediction, not

handled in the paper. – Analogy: score of a paper given by a reviewer &

confidence of reviewer are both (to be) used in computing overall score!

– One drawback: paper does not derive confidences for predictions! Scope for research.

Page 27: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

27

What freedom do we have?

• We could transform the observations into beliefs and confidences differently: e.g., use a threshold and use a smoother function for such as

• We can still guarantee linear running time.

Read paper for experimental study/evaluation.

Page 28: Memory vs. Model-based Approaches SVD & MF Based on the Rajaraman and Ullman book and the RS Handbook. See the Adomavicius and Tuzhilin, TKDE 2005 paper.

28

Interesting side benefits

Potential use for explaining recommendation. Potential for incremental computation?

How similar are items and in ’s eyes.