2. Introduction Modeling User Rating Profiles For Collaborative Filtering Benjamin M. Marlin University of Toronto. Department of Computer Science. Toronto,

2. Introduction

Modeling User Rating ProfilesFor Collaborative Filtering

Benjamin M. Marlin

University of Toronto. Department of Computer Science.Toronto, Ontario, Canada AP 08

[email protected]

• We present a new latent variable model for rating-based collaborative filtering called the User Rating Profile model (URP). URP has complete generative semantics at the user and rating profile levels.

• URP is related to several models including a multinomial mixture model, the aspect model, and latent Dirichlet allocation, but has advantages over each.

• A variational Expectation Maximization procedure is used to fit the URP model. Rating prediction makes use of a well defined variational inference procedure.

• Empirical results on two rating prediction tasks using the EachMovie and MovieLens data sets show that URP attains lower error rates than the multinomial mixture model, the aspect model, and neighborhood-based techniques.

1. Abstract

2. Introduction

Preference Indicators

Co-occurrence Pair (u,y): u is a user index and y is an item index.

Count Vector (n1u, n2u, … , nMu): nyu is the number of times (u,y) is observed.

Rating Triplet (u,y,r): u is a user index, y is an item index, r is a rating value.

Rating Vector (r1u, r2u, … , rMu): ryu is rating assigned to item y by user u.

Collaborative Filtering Formulations

Additional Features

In a pure formulation no additional features are used. A hybrid formulation incorporates additional content-based item and user features.

Preference Dynamics

In a sequential formulation the rating process is modeled as a time series. In a non-sequential formulation preferences are assumed to be static.

The Pure, Non-Sequential, Rating-Based Formulation

Tasks: The two main tasks under this formulation are recommendation and rating prediction.

Rating prediction is the task of estimating all unknown ratings for the active user.

The focus of research is developing highly accurate methods for rating prediction.

y1 y2 y3 y4

a 1 5 2 5

Predicted Ratings

y1 y2 y3 y4

U1 5 4 ? ?U2 ? 5 2 5U3 4 ? 4 3u4 1 5 ? 5

Rating Database

y1 y2 y3 y4

a 1 ? ? 5

Active User Ratings

Rating Prediction

1. Item y2

2. Item y3

Item List

Recommendation

Sor

t

Figure 1: Given a rating prediction method, a recommendation method is easily obtained: predict, then sort.

Additional Features: NonePreference Dynamics: Non-sequentialPreference Indicators: Ordinal rating vectors

Items: y=1,…,MUsers: u=1,…,NRatings: r=1,…,V

Formal Description:

3. Related Work

Neighborhood Methods:

• Introduced by Resnick et al (GroupLens),

Shardanand and Maes (Ringo).

• All variants can be seen as modifications of the K-Nearest Neighbor classifier.

Rating Prediction:

1. Compute similarity measure between active user and all users in database.

2. Compute predicted rating for each item.

Multinomial Mixture Model:

Learning:

• A simple mixture model with fast, reliable learning by EM, and low prediction time.

• Simple but correct generative semantics. Each profile is generated by 1 of K types.

Rating Prediction:

E-Step:

M-Step:

• Many versions proposed by Hofmann. Of main interest are dyadic, triadic, and new vector version proposed by Marlin.

• All have incomplete generative semantics.

Latent Dirichlet Allocation:

• Proposed by Blei et al. for text modeling.

• Can be used in a co-occurrence based CF formulation. Can not model ratings.

• A correct generative version of the dyadic aspect model. User’s distribution over types is random variable with Dirichlet prior.

Learning:E-Step:

M-Step:

Learning (Vector):

Rating Prediction (Vector):

• Model learned using variational EM or Minka’s Expectation propagation.

• Exact inference not possible.

Prediction:

• Needs approximate inference. Variational methods result in an iterative algorithm.

The Aspect Model:

Graphical Models:

Figure 2: Dyadic Aspect Model

Variable U: User indexVariable Z: Attitude indexVariable Y: Item IndexParameter : P(Z|U=u)Parameter : P(Y|Z=z)

Figure 4: Vector Aspect Model

Variable U: User indexVariable Zy: Attitude indexVariable Ry: Rating valueVariable Y: Item IndexParameter : P(Z|U=u)Parameter : P(R|Z=z,Y=y)

Figure 3: Triadic Aspect Model

Variable U: User indexVariable Z: Attitude indexVariable Y: Item IndexVariable R: Rating ValueParameter : P(Z|U=u)Parameter : P(R|Z=z,Y=y)

Co-occurrence to Ratings

Ratings to Rating profiles

Ge

ne

rative

Ge

ne

rative

Figure 5: LDA Model

Variable : P(Z|U=u) Variable Z: Attitude indexVariable Y: Item indexParameter : Dirichlet priorParameter : P(Y|Z=z)

Figure 6: URP Model

Variable : P(Z|U=u) Variable Zy: Attitude indexVariable Ry: Rating valueVariable Y: Item indexParameter : Dirichlet prior Parameter : P(Ry |Z=z)

Co-occurrence to Rating Profile

4. The URP ModelModel Specification:

Description:

• The latent space description of a user is a Dirichlet random variable that encodes a multinomial distribution over user types.

• Each setting of the multinomial variables Zy is an index into K user types or user attitudes.

• Each user attitude is represented by a multinomial distribution over ratings for each item encoded by .

• The multinomial variables Ry give the ratings for each item y. Possible values are from 1 to V.

Generative Process:

• Unlike a simple mixture model, each user has a unique distribution over .

• Unlike the aspect model family, there are proper generative semantics on .

• Unlike LDA, URP generates a set of complete user rating profiles

1. For each user u = 1 to N 2. Sample ~ Dirichlet()3. For each item y = 1 to M4. Sample z ~ Multinomial()5. Sample r ~ Multimonial(yz)

Learning

Variational Inference

Variational Approximation

• Exact inference is intractable with URP. We define a fully factorized approximate q-distribution with variational multinomial parameters u, and variational Dirichlet parameters u.

Paramter Estimation

Solve

Rating Prediction

• Once rating distributions are estimated, any number of prediction techniques can be used. The prediction technique should match the error measure used.

5. Experimentation

Weak Generalization Experiment:

• Available ratings for each user split into observed and unobserved sets. Trained on the observed ratings, tested on the unobserved ratings.

• Repeated on 3 random splits of data.

Strong Generalization Experiment:

• Users split into training set and testing set. Ratings for test users split into observed and unobserved sets. Trained on training users, tested on test users.

• Repeated on 3 random splits of data.

Error Measure:

EachMovie: Compaq Systems Research Center

Data Sets:

• Ratings: 2,811,983• Sparsity: 97.6%• Filtering: 20 ratings

• Users: 72916• Items: 1628 • Rating Values: 6

• Ratings: 1,000,209• Sparsity: 95.7%• Filtering: 20 ratings

• Users: 6040• Items: 3900 • Rating Values: 5

MovieLens: GroupLens Research Center

Figure 7: Distribution of ratings in weak and strong filtered data sets compared to base data sets.

Normalized Mean Absolute Error:

• Average over all users of the absolute difference between predicted and actual ratings.

• Normalized by expectation of the difference between predicted and actual ratings under empirical rating distribution of the base data set.

• URP and the aspect model attain the same minimum weak generalization error rate, but URP does so using far fewer model parameters.

Figure 8: MovieLens Weak Generalization Results Figure 9: MovieLens Strong Generalization Results

5. Experimentation and Results6. ResultsN

orm

.

No

rm.

• On the more difficult EachMovie data set, URP clearly performs better than the other rating prediction methods considered.

Figure 10: EachMovie Weak Generalization Results Figure 11: EachMovie Strong Generalization Results

No

rm.

No

rm.

Conclusions:

• We have introduced URP, a new generative model specially designed for pure, non-sequential, ratings-based collaborative filtering. URP has consistent generative semantics at both the user level, and the rating profile level.

• Empirical results show that URP outperforms other popular rating prediction methods using fewer model parameters.

7. Conclusions and Future Work

Future Work:

• Models with more intuitive generative semantics. Currently under study are a promising family of product models.

• Models that integrate additional features, or sequential dynamics, or both.

1. D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993-1022, January 2003.

2. John S. Breese, David Heckerman, and Carl Kadie. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In Proceedings of the Fourteenth Annual Conference on Uncertainty in Artificial Intelligence, pages 43-52, July 1998.

3. Thomas Hofmann. Learning What People (Don't) Want. In Proceedings of the European Conference on Machine Learning (ECML), 2001.

5. Thomas Minka and John Lafferty. Expectation-Propagation for the Generative Aspect Model. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, 2002.

6. R. M. Neal and G. E. Hinton. A new view of the EM algorithm that justifies incremental, sparse and other variants. In M. I. Jordan, editor, Learning in Graphical Models, pages 355-368. Kluwer Academic Publishers, 1998.

7. P. Resnick, N. Iacovou, M. Suchak, P. Bergstorm, and J. Riedl. GroupLens: An Open Architecture for Collaborative Filtering of Netnews. In Proceedings of ACM 1994 Conference on Computer Supported Cooperative Work, pages 175{186, Chapel Hill, North Carolina, 1994. ACM.

8. Upendra Shardanand and Patti Maes. Social information ltering: Algorithms for automating “word of mouth". In Proceedings of ACM CHI'95, volume 1, pages 210-217, 1995.

8. References

2. Introduction Modeling User Rating Profiles For Collaborative Filtering Benjamin M. Marlin University of Toronto. Department of Computer Science. Toronto,

Documents

rating triplet u

rating prediction tasks

rating process

rating value

rating prediction method

rating vector r

user rating profile

item y