Performance of Recommender Algorithms on Top-N Recommendation Tasks RecSys 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business Technology Seoul National University Seoul, Korea Presented by Sangkeun Lee 1/14/2011 Paolo Cremonesi, Yehuda Koren, Roberto Turri Politecnico di Milano, Yahoo! Research Haifa, Israel, Neptuny Milan, Ita
17
Embed
Performance of Recommender Algorithms on Top-N Recommendation Tasks RecSys 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Performance of Recommender Algorithms on Top-N Recommendation Tasks
RecSys 2010
Intelligent Database Systems Lab.
School of Computer Science & Engineering
Seoul National University
Center for E-Business TechnologySeoul National UniversitySeoul, Korea
Presented by Sangkeun Lee1/14/2011
Paolo Cremonesi, Yehuda Koren, Roberto TurrinPolitecnico di Milano, Yahoo! Research Haifa, Israel, Neptuny Milan, Italy
Copyright 2010 by CEBT
Introduction
Competition of recommender systems
By evaluating their error metrics such as RMSE (Root mean squared error)
Average error between estimated ratings and actual ratings
Why the majority of the literature is focused on error metrics?
Logical & convenient
However, many commercial systems perform top-N recom-mendation tasks
The systems suggest a few specific items to the user that are likely to be very appealing to him
Copyright 2010 by CEBT
Introduction: Top-N Performance
Classical error measures (e.g. RMSE, MAE) do not really mea-sure top-N performance
Measure for Top-N Performance
Accuracy metrics
– Recall and Precision
In this paper,
The authors present an extensive evaluation of several state-of-art recommender systems & naïve non-personalized algorithms
And they give us some insight from the experimental results
On Netflix & Movielens datasets
Copyright 2010 by CEBT
Testing Methodology: Dataset
For each dataset, known ratings are split into two sub-sets :
Training set M and test set T
Test set T contains only 5-starts ratings
– So, we can reasonably state that T contains items relevant to the respective users
For the Neflix dataset,
Training set = training dataset 100M ratings for Netflix prize
Test set = 5-star ratings from probe dataset for Netflix prize (|T|=384,573)
For the Movielens dataset,
Randomly sub-sampled 1.4% of the ratings from the dataset to create testset
Copyright 2010 by CEBT
Testing Methodology: measuring precision and recall
1) Train the model over the ratings in M
2) For each item I rated 5-starts by user u in T
Randomly select 1000 additional items unrated by user u
Predict the ratings for the test item I and for the additional 1000 items
Form a ranked list by ordering 1001 items according to the predicted ratings. Let p denote the rank of the item I within this list. (The best result: p=1)
Form a top-N recommendation list by picking the N top ranked items from the list. If p<=N we have a hit. Other-wise we have a miss.
Copyright 2010 by CEBT
Testing Methodology: measuring precision and recall
For any single test case,
recall for a single test can assume either 0 (miss) or 1(hit)
Precision for a single test can assume either the value 0(miss) or 1/N (hit)
The overall recall and precision are defined by averaging over all test cases
Copyright 2010 by CEBT
Rating distribution : Popular items vs.Long-tail
About 33% of ratings collected by Netflix involve only the 1.7% of most popular items
To evaluate the accuracy of recommender algorithms in suggesting non-trivial items, T has been partitioned into Thead and Tlong
Copyright 2010 by CEBT
Algorithms
Non-personalized models
Movie Rating Average (MovieAvg) – average of ratings
Top Popular (TopPop) – number of ratings – non applicable to mea-sure error metrics
Collaborative Filtering models
Neighborhood models
– The most common approaches
– Based on similarity among either users or items
Latent factor models
– Finding hidden factors
– Model users and items in the same latent factor spaces