Transcript

Recommender Systems

Chu-Yu Hsu 20150319

Who am I?Chu-Yu Hsu Data Scientist @ IBM Taiwan Dedicated to Recommender System rio512hsu@gmail.com https://github.com/ChuyuHsu

Outline• What is Recommender System

• Related Algorithms

• Content Based Algorithms

• Collaborative Filtering (CF)

• Latent Factor Model

• Going Any Further

Item Database

User Query Search

Suggestion

• More choices necessitate better filters

• Example:

• Books, movies, music, news articles, products

• People

Types of Recommenders

• Editorial and hand curated

• Simple aggregates

• Tailored to individual users

Who Uses Recommenders

Netflix Prize

• An open competition to predict user ratings for films

• Algorithms are evaluated in the Root Mean Squared Error (RMSE)

Approaches

• Content based

• Collaborative

• Laten factor model

Content Based RecommenderMain idea: Recommend items to customer x similar to previous items rated highly by x

Pros 1.No need for data of other

users

2. Able to recommend to users with unique tastes

3. Able to recommend new & unpopular items

4. Explanations for recommendations

Cons 1.Finding appropriate

features is hard

2. Overspecialisation

3. Cold-start for new users

Collaborative FilteringMain idea: Find set N of other users whose ratings are “similar” to X’s ratings

Similarity

• Jaccard Similarity

• Cosine Similarity

• Centered Cosine SimilarityNormalize ratings by subtracting row meanAlso known as Pearson Correlation

Rating Predicting

• User Based CF

• Item Based CF

Item Based v.s. User Based• In theory user based CF and item based CF are dual

• Item based CF usually outperforms user-based in many use cases

• Items are "simpler" than users

• Items belong to a small set of "genres", users have varied tastes

• Item similarity is more meaningful than User Similarity

Latent Factor Model

• For now let’s assume we can approximate the rating matrix

• SVD should be a intuitive choice

• But R has missing entries

• SVD assumes all missing entries are zero

• Ignore the missing entries

• Forget to be orthogonal/unit length

• Our goal is to find P and Q such that (Sum of Square Error):

• Root Mean Square Error (RMSE)

Alternative Least Squares• Because p and q are both unknown, the object

function is not convex

• If fix one of the unknowns -> can be solved as a least squares problem

Overfitting• To solve overfitting we introduce regularization:

• Allow rich model where there are sufficient data

• Shrink aggressively where data are scarce

What’s More• Prediction accuracy won’t always be the most

important

• Recentness

• Novelty

• Explanation based diversity

• Temporary diversity

What’s More• All kind of user behaviors

Open Problems

• How to weight different behaviors

• How to improve deferent metrics

• How to evaluate and evolve

References

• Anand Rajaraman and Jeffrey David Ullman. 2011. Mining of Massive Datasets. Cambridge University Press, New York, NY, USA.

• 项亮. 2012. 推荐系统实践. ⼈人⺠民邮电出版社, 北京

– Jeffrey M. O’Brien, CNN Money

“The Age of Search has come to an end. Long live the recommendation!”

top related